Discover and publish cutting edge, open research.

Browse 16,388 multi-disciplinary research preprints

Most recent

John M. Brooks

and 5 more

Objective: To assess the ability of an extended Instrumental Variable Causal Forest Algorithm (IV-CFA) to provide personalized evidence of early surgery effects on benefits and detriments for elderly shoulder fracture patients. Data Sources/Study Setting: Population of 72,751 fee-for-service Medicare beneficiaries with proximal humerus fractures (PHFs) in 2011 who survived a 60-day treatment window after an index PHF and were continuously Medicare fee-for-service eligible over the period 12 months prior to index to the minimum of 12 months after index or death. Study Design: IV-CFA estimated early surgery effects on both beneficial and detrimental outcomes for each patient in the study population. Classification and regression trees (CART) were applied to these estimates to create patient reference classes. Two-stage least squares (2SLS) estimators were applied to patients in each reference class to scrutinize the estimates relative to the known 2SLS properties. Principal Findings: This approach uncovered distinct reference classes of elderly PHF patients with respect to early surgery effects on benefit and detriment. Older, frailer patients with more comorbidities, and lower utilizers of healthcare were less likely to gain benefit and more likely to have detriment from early surgery. Reference classes were characterized by the appropriateness of early surgery rates with respect to benefit and detriment. Conclusions: Extended IV-CFA provides an illuminating method to uncover reference classes of patients based on treatment effects using observational data with a strong instrumental variable. This study isolated reference classes of new PHF patients in which changes in early surgery rates would improve patient outcomes. The inability to measure fracture complexity in Medicare claims means providers will need to discuss the appropriateness of these estimates to patients within a reference class in context of this missing information.

Sophia Tsabouri

and 18 more

Background: Although well described in adults, there are scarce and heterogeneous data on the diagnosis and management of chronic urticaria (CU) in children (0-18 years) throughout Europe. Our aim was to explore country differences and identify the extent to which the EAACI/GA²LEN/EDF/WAO guideline recommendations for paediatric urticaria are implemented. Methods: The EAACI Taskforce for paediatric CU disseminated an online clinical survey among EAACI paediatric section members. Members were asked to answer 35 multiple choice questions on current practices in their respective centres. Results: The survey was sent to 2,773 physicians of whom 358 (13.8%) responded, mainly paediatric allergists (80%) and paediatricians (49.7%), working in 69 countries. For diagnosis, Southern European countries used significantly more routine tests (e.g., autoimmune testing, allergological tests, and parasitic investigation) than Northern European countries. Most respondents (60.3%) used a 2nd generation antihistamine as first- line treatment of whom 64.8% up dosed as a second- line. Omalizumab, was used as a second line treatment by 1.7% and third-line by 20.7% of respondents. Most clinicians (65%) follow EAACI/WAO/GA2LEN/EDF guidelines when diagnosing CU, and only 7.3% follow no specific guidelines. Some clinicians prefer to follow national guidelines (18.4%, mainly Northern European) or the AAAAI practice parameter (1.7%). Conclusions: Even though most members of the Paediatric Section of EAACI are familiar with the EAACI/WAO/GA2LEN/EDF guidelines, a significant number do not follow them. Also, the large variation in diagnosis and treatment strengthens the need to re-evaluate, update and standardize guidelines on the diagnosis and management of CU in children.

Linfeng Xie

and 7 more

Abstract Background: Hepatic dysfunction (HD) is a serious complication after cardiovascular surgery. However, risk factors of developing hepatic dysfunction after acute type A aortic dissection (AAAD) are largely unclear. Methods: The clinical data of 227 patients with AAAD repaired by modified triple-branched stent graft implantation from January 2018 to January 2020 were collected retrospectively, including preoperative , surgical and postoperative information. Logistics regression was used to explore the potential risk factors of HD. Results: In the early stage after operation, a total of 57 patients were complicated with HD, accounting for 25.11%. The hospital mortality rate in these patients with HD was 19.30%, while the rate in patients without HD was only 6.5%. We found that preoperative body mass index (BMI)>30kg/㎡(OR: 7.054, 95%CI: 1.798-27.678, P=0.005), preoperative renal insufficiency(OR:7.575,95%CI:2.923-19.629, P=0.000),preoperative moderate/severe pericardial effusion(OR: 16.409, 95%CI: 2.81-93.444, P=0.002) and cardiopulmonary bypass time>180min (OR: 7.190, 95%CI: 3.113-16.608, P=0.000) were independent risk factors for HD after AAAD repaired by modified triple-branched stent graft implantation. Conclusions: Preoperative BMI>30kg/㎡, preoperative renal insufficiency, preoperative moderate/severe pericardial effusion and cardiopulmonary bypass time>180min are independent risk factors for HD after total arch repair with modified triple-branched stent graft implantation in AAAD patients. And the occurrence of HD after operation would prolong the time of mechanical ventilation and the hospitalization time of ICU, and significantly increase the in-hospital mortality of patients. Keywords: risk factors,acute type A aortic dissection,hepatic dysfunction, modified triple-branched stent graft implantation, total arch repair

Derya Suluhan

and 6 more

John Jones

and 5 more

Hongwei Wang

and 9 more

Vegetation plays important roles in the development and protection of permafrost; it is one of the main local and ecosystemic factors that affect the thermal stability of the underlying soil strata. Multi-period land use and cover change (LUCC) data and long-time series of air temperature were chosed. Based on these data, spatiotemporal changes in mean annual air temperature (MAAT) were simulated by the Ordinary Least Squares (OLS) method and Ordinary Kriging (OK) model in the 1980s-2010s in Northeast China. The influences of LUCC on MAAT in Northeast China and distribution of the Xing’an permafrost were analyzed and the results showed that: (1) Decadal average of MAAT increased from 4.60oC (1980s) to 5.38oC (2010s) in Northeast China, with an upward trend of 0.25oC/10a. (2) During the 1980s to 2010s, the total permafrost area showed a decreasing trend (3.668×104 km2/10a). (3) In permafrost regions, LUCC had undergone significant structural changes: forested land showed a consistent decreasing trend and other lands showed an overall increasing trend. (4) The effects of different LUCC on MAAT in the permafrost region varied substantially. The mean MAAT of forested land was the lowest (2.33oC), and; that of unused land, the highest (0.37oC). The change rate in MAAT of cultivated land was the highest (0.37oC/10a), and; that of unused land, the lowest (0.28oC/10a). (5) The degradation rates of permafrost in forested land (1.822×104 km2/10a) and grassland (1.397×104 km2/10a) were the largest from 1980s to 2010s.

Dang Tinh Pham

and 7 more

OBJECTIVE The aim of this study was to access the influence of active warming after epidural anesthesia (EDA) and before general anesthesia in prevention of perioperative hypothermia. METHOD This randomized controlled trial was conducted in the department of anesthesiology in university medical center of Ho Chi Minh city, Vietnam from December 2019 until April 2020. This trial included 60 adult patients who were scheduled for major abdominal surgery with a duration of at least 120 minutes and under combined general anesthesia and EDA. Patients were excluded if age was below 18 years, American Society Anesthesiologists’ physical status classification of IV or higher, or refusal of EDA. Written informed consent was obtained for all patients. Patients were divided randomly into two groups. The first group received 10 minutes of active air-forced warming after EDA before the induction of general anesthesia. The second group was covered with a blanket 10 minutes after EDA and before general anesthesia. Core temperatures were recorded throughout the study. The primary outcome measures were the incidence of perioperative hypothermia and the degree of hypothermia. The secondary outcome measures were rate and time for body temperature to return to normal and incidence of postoperative body shivering. RESULTS Without active warming (n = 21), 70% of patients became hypothermic (<36°C) postoperatively. Active air-forced warming for 10 minutes after EDA and before induction of general anesthesia decreased the incidence of postoperative hypothermia to 26.7% (n = 8). CONCLUSION Active air-forced warming for 10 minutes after EDA and before induction of general anesthesia is efficient in reducing the incidence of perioperative hypothermia.

Run-xin Gan

and 6 more

Objective: To investigate the efficacies of three cycle regimens in women receiving FET with a history of CS: natural cycle (NC) treatment, hormone replacement therapy (HRT) and treatment with gonadotropin-releasing hormone agonist (GnRH-a) + HRT). Design: Retrospective cohort study. Setting: University-affiliated center. Population: Patients (N = 6,159) with a history of CS who fulfilled the inclusion criteria were enrolled in the study from January 2014 to December 2019. Methods: Reproductive outcomes of patients in the NC (n = 4,306) versus HRT (n = 1,007) versus GnRH-a + HRT groups (n = 846) were compared. Main Outcome Measure: The main outcome measure was the live birth rate per embryo transfer (ET). Results: The unadjusted odds of the miscarriage rate of singleton pregnancies were also significantly higher in the HRT-group compared with the NC-group (25.5% versus 20.4%, respectively). After adjusting for possible confounding factors, the early miscarriage rate and the miscarriage rate of singleton pregnancies remained significantly higher in the HRT-group than the NC-group. The clinical pregnancy rates in the NC-, HRT- and GnRH-a + HRT-groups of women with a history of CS was 48.8%, 48% and 47.1%, respectively, and the live birth rates were 37%, 34.1% and 35.7%, respectively. Conclusion(s): In women undergoing FET with a history of CS, HRT for endometrial preparation was associated with a higher early miscarriage rate, albeit after statistical adjustment for confounding factors. Funding: The National Science Foundation of China (81501328). Key Words: Caesarean section, endometrial preparation, frozen embryo transfer, miscarriage

Browse more recent preprints

Recently published in scholarly journals

Sangjun Yoo

and 7 more

Introduction: We assessed the effects of preoperative bladder compliance on the long-term functional outcomes, especially focused on postoperative storage symptom changes, after laser prostatectomy. Materials and Methods: From January 2008 to March 2014, 1608 men who underwent laser prostatectomy, including holmium laser enucleation or photo-vaporization of the prostate, were included in the analysis. We divided patients into 3 groups according to bladder compliance on a baseline urodynamic study: < 12.5; 12.5–25.0; ≥25 mL/cm H2O. A multivariable analysis was performed to determine the impact of bladder compliance on long-term functional outcomes after laser prostatectomy. Results: Bladder compliance was less than 12.5 ml/cm H2O in 50 (3.1%), 12.5-25 ml/cm H2O in 232 (14.4%) patients. As bladder compliance decreased, the baseline International Prostate Symptom (I-PSS) total score and storage sub-score were increased; the voiding sub-score remain unchanged. At postoperative 36 months, improvements in the I-PSS total score and storage sub-score were significantly higher in < 12.5 mL/cm H2O group compared to other groups, although those were equivalent at postoperative 1 and 12 months. On the multivariable analysis, decreased bladder compliance < 12.5 mL/cm H2O was significantly associated with superior improvement in storage sub-score at postoperative 36 months, although it was not associated with voiding sub-score. Conclusion: In patients with preoperative bladder compliance < 12.5 mL/cm H2O, storage symptoms could be further improved at 36 months after laser prostatectomy compared to others. Thus, laser prostatectomy could be a considerable treatment option for patients with severely decreased bladder compliance

Colum Keohane

and 6 more

Abstract Objective To determine whether the introduction of a one-stop see and treat clinic offering early reflux ablation for Venous Leg Ulcer (VLU) patients in July 2016 has affected rates of unplanned inpatient admissions due to venous ulceration. Design Review of inpatient admission data and analysis of related costs. Materials The Hospital Inpatient Enquiry collects data from acute public hospitals in Ireland on admissions and discharges, coded by diagnosis and acuity. This was the primary source of all data relating to admissions and length of stay. Costs were calculated from data published by the Health Service Executive in Ireland on average costs per inpatient stay for given diagnosis codes. Methods Data were collected on admission rates, length of stay, overall bed day usage, and costs across a four-year period; the two years since the introduction of the rapid access clinic, and the two years immediately prior as a control. Results 218 patients admitted with VLUs accounted for a total of 2,529 inpatient bed-days, with 4.5(2-6) unplanned admissions, and a median hospital stay of 7(4-13) days per month. Median unplanned admissions per month decreased from 6(2.5-8.5) in the control period, to 3.5(2-5) after introduction of the clinic p=.040. Bed-day usage was significantly reduced from median 62.5(27-92.5), to 36.5(21-44) bed-days per month (p=.035), though length of stay remained unchanged (p=.57). Cost of unplanned inpatient admissions fell from median \euro33,336.25(\euro14,401.26-\euro49,337.65) per month to \euro19,468.37(\euro11,200.98-\euro22,401.96) (p=.03). Conclusions Admissions for inpatient management of VLUs have fallen after beginning aggressive endovenous treatment of venous reflux in a dedicated one-stop see-and-treat clinic for these patients. As a result, bed-day usage has also fallen, leading to cost savings.

Mohammed Al-Sadawi

and 7 more

Abstract: Background: This meta-analysis assessed the relationship between Obstructive Sleep Apnea (OSA) and echocardiographic parameters of diastolic dysfunction (DD), which are used in the assessment of Heart Failure with Preserved Ejection Fraction (HFpEF). Methods: We searched the databases including Ovid MEDLINE, Ovid Embase Scopus, Web of Science, Google Scholar, and EBSCO CINAHL from inception up to December 26th, 2020. The search was not restricted to time, publication status or language. Comparisons were made between patients with OSA, diagnosed in-laboratory polysomnography (PSG) or home sleep apnea testing (HSAT), and patients without OSA in relation to established markers of diastolic dysfunction. Results: Primary search identified 2512 studies. A total of 18 studies including 2509 participants were included. The two groups were free of conventional cardiovascular risk factors. Significant structural changes were observed between the two groups. Patients with OSA exhibited greater LAVI (3.94 CI [0.8, 7.07]; p=0.000) and left ventricular mass index (11.10 CI [2.56,19.65]; p=0.000) as compared to control group. The presence of OSA was also associated with more prolonged DT (10.44 ms CI [0.71,20.16]; p=0.04), IVRT (7.85 ms CI[4.48, 11.22]; p=0.000), and lower E/A ratio (-0.62 CI [-1,-0.24]; p=0.001) suggestive of early DD. The E/e’ ratio (0.94 CI[0.44, 1.45]; p=0.000) was increased. Conclusion: An association between OSA and echocardiographic parameters of DD was detected that was independent of conventional cardiovascular risk factors. OSA may be independently associated with DD perhaps due to higher LV mass. Investigating the role of CPAP therapy in reversing or ameliorating diastolic dysfunction is recommended.

Hans Fangohr

and 2 more

Guest Editors’ IntroductionNotebook interfaces – documents combining executable code with output and notes – first became popular as part of computational mathematics software such as Mathematica and Maple. The Jupyter Notebook, which began as part of the IPython project in 2012, is an open source notebook that can be used with a wide range of general-purpose programming languages.Before notebooks, a scientist working with Python code, for instance, might have used a mixture of script files and code typed into an interactive shell. The shell is good for rapid experimentation, but the code and results are typically transient, and a linear record of everything that was tried would be long and not very clear. The notebook interface combines the convenience of the shell with some of the benefits of saving and editing code in a file, while also incorporating results, including rich output such as plots, in a document that can be shared with others.The Jupyter Notebook is used through a web browser. Although it is often run locally, on a desktop or a laptop, this design means that it can also be used remotely, so the computation occurs, and the notebook files are saved, on an institutional server, a high performance computing facility or in the cloud. This simplifies access to data and computational power, while also allowing researchers to work without installing any special software on their own computer: specialized research software environments can be provided on the server, and the researcher can access those with a standard web browser from their computer.These advantages have led to the rapid uptake of Jupyter notebooks in many kinds of research. The articles in this special issue highlight this breadth, with the authors representing various scientific fields. But more importantly, they describe different aspects of using notebooks in practice, in ways that are applicable beyond a single field.We open this special issue with an invited article by Brian Granger and Fernando Perez – two of the co-founders and leaders of Project Jupyter. Starting from the origins of the project, they introduce the main ideas behind Jupyter notebooks, and explore the question of why Jupyter notebooks have been so useful to such a wide range of users. They have three key messages. The first is that Notebooks are centered around the humans using them and building knowledge with them. Next, notebooks provide a write-eval-think loop that lets the user have a conversation with the computer and the system under study, which can be turned into a persistent narrative of computational exploration. The third idea is that Project Jupyter is more than software: it is a community that is nourished deliberately by its members and leaders.The following five articles in this special issue illustrate the key features of Project Jupyter effectively. They show us a small sample of where researchers can go when empowered by the tool, and represent a range of scientific domains.Stephanie Juneau et al. describe how Jupyter has been used to ‘bring the compute to the data’ in astrophysics, allowing geographically distributed teams to work efficiently on large datasets. Their platform is also used for education & training, including giving school students a realistic taste of modern science.Ryan Abernathey et al. , of the Pangeo project, present a similar scenario with a focus on data from the geosciences. They have enabled analysis of big datasets on public cloud platforms, facilitating a more widely accessible ‘pay as you go’ style of analysis without the high fixed costs of buying and setting up powerful computing and storage hardware. Their discussion of best practices includes details of the different data formats required for efficient access to data in cloud object stores rather than local filesystems.Marijan Beg et al. describe features of Jupyter notebooks and Project Jupyter that help scientists make their research reproducible. In particular, the work focuses on the use of computer simulation and mathematical experiments for research. The self-documenting qualities of the notebook—where the response to a code cell can be archived in the notebook—is an important aspect. The paper addresses wider questions, including use of legacy computational tools, exploitation of HPC resources, and creation of executable notebooks to accompany publications.Blaine Mooers describes the use of a snippet library in the context of molecular structure visualization. Using a Python interface, the PyMOL visualization application can be driven through commands to visualize molecular structures such as proteins and nucleic acids. By using those commands from the Jupyter notebook, a reproducible record of analysis and visualizations can be created. The paper focuses on making this process more user-friendly and efficient by developing a snippet library, which provides a wide selection of pre-composed and commonly used PyMOL commands, as a JupyterLab extension. These commands can be selected via hierarchical pull-down menus rather than having to be typed from memory. The article discusses the benefits of this approach more generally.Aaron Watters describes a widget that can display 3D objects using webGL, while the back-end processes the scene using a data visualization pipeline. In this case, the front-end takes advantage of the client GPU for visualization of the widget, while the back-end takes advantage of whatever computing resources are accessible to Python.The articles for this special issue were all invited submissions, in most cases from selected presentations given at JupyterCon in October 2020. Each article was reviewed by three independent reviewers. The guest editors are grateful to Ryan Abernathey, Luca de Alfaro, Hannah Bruce MacDonald, Christopher Cave-Ayland, Mike Croucher, Marco Della Vedova, Michael Donahue, Vidar Fauske, Jeremy Frey, Konrad Hinsen, Alistair Miles, Arik Mitschang, Blaine Mooers, Samual Munday, Chelsea Parlett, Prabhu Ramachandran, John Readey, Petr Škoda and James Tocknell for their work as reviewers, along with other reviewers who preferred not to be named. The article by Brian Granger and Fernando Perez was invited by the editor in chief, and reviewed by the editors of this special issue.Hans Fangohr is currently heading the Computational Science group at the Max Planck Institute for the Structure and Dynamics of Matter in Hamburg, Germany, and is a Professor of Computational Modelling at the University of Southampton, UK. A physicist by training, he received his PhD in Computer Science in 2002. He authored more than 150 scientific articles in computational science and materials modelling, several open source software projects, and a text book on Python for Computational Science and Engineering. Contact him at hans.fangohr@mpsd.mpg.deThomas Kluyver is currently a software engineer at European XFEL. Since gaining a PhD in plant sciences from the University of Sheffield in 2013, he has been involved in various parts of the open source & scientific computing ecosystems, including the Jupyter & IPython projects. Contact him at thomas.kluyver@xfel.euMassimo Di Pierro is a Professor of Computer Science at DePaul University. He has a PhD in Theoretical Physics from the University of Southampton and is an expert in Numerical Algorithms, High Performance Computing, and Machine Learning. Massimo is the lead developer of many open source projects including web2py, py4web, and pydal. He has authored more than 70 articles in Physics, Computer Science, and Finance and has published three books. Contact him at

Jumpei Ogura

and 9 more

Introduction: Methicillin-resistant Staphylococcus aureus (MRSA) infection has a significant clinical impact on both pregnant women and neonates. The aim of this study was to accurately assess the vertical transmission rate of MRSA and its clinical impacts on both pregnant mothers and neonates.Material and Methods: We conducted a prospective observational cohort study of 898 pregnant women who were admitted to our department and 905 neonates from August 2016 to December 2017. MRSA was cultured from  nasal and vaginal samples taken from the mothers at enrollment and from nasal and umbilical surface swabs taken from neonates at the time of delivery. We examined the vertical transmission rate of MRSA in mother-neonate pairs. We used multivariable logistic regression to identify risk factors for maternal MRSA colonization and maternal/neonatal adverse outcomes associated with maternal MRSA colonization.Results: The prevalence of maternal MRSA colonization was 6.1% (55 out of 898) at enrollment. The independent risk factors were multiparity and occupation (healthcare provider) (OR: 2.35, 95% CI: 1.25-4.42, OR: 2.58, 95% CI: 1.39-4.79, respectively). The prevalence of neonatal MRSA colonization at birth was 12.7% (7 out of 55 mother-neonate pairs) in the maternal MRSA-positive group, whereas it was only 0.12% (one out of 843 pairs) in the maternal MRSA-negative group (OR: 121, 95% CI: 14.6-1000). When maternal vaginal samples were MRSA positive, vertical transmission was observed in four out of nine cases (44.4%) in this study. Skin and soft tissue infections (SSTIs) developed more frequently in neonates in the maternal MRSA-positive group than in the MRSA-negative group (OR: 7.47, 95% CI: 2.50-22.3).Conclusions: The prevalence of MRSA in pregnant women was approximately 6%. Vertical transmission caused by maternal vaginal MRSA colonization was observed in four out of nine cases (44.4%). Although our study includes limited number of maternal MRSA positive cases, the vertical transmission of MRSA may occur in up to 44% of neonates of mothers with vaginal MRSA colonization. Maternal MRSA colonization may associate with increased development of SSTIs in neonates via vertical transmission.
Many societal opportunities and challenges, both current and future, are either inter- or transdisciplinary in nature. Focus and action to cut across traditional academic boundaries has increased in research and, to a less extent, teaching. One successful collaboration has been the augmentation of fields within the Humanities, Social Sciences, and Arts by integrating complementary tools and methods originated from STEM. This trend is gradually materializing in formal undergraduate and secondary education.The proven effectiveness of Jupyter notebooks for teaching and learning STEM practices gives rise to a nascent case for education seeking to replicate this interdisciplinary design to adopt notebook technology as the best pedagogical tool for this job. This article presents two sets of data to help argue this case.The first set of data demonstrates the art of the possible. A sample of undergraduate and secondary level courses showcases existing or recent work of educational stakeholders in the US and UK who are already pioneering instruction where computational and data practices are integrated into the study of the Humanities, Social Sciences, and Arts, with Jupyter notebooks chosen as a central pedagogical tool. Supplementary data providing an overview of the types of technical material covered by each course syllabi further evidences what interdisciplinary education is perceived to be or is already feasible using this Jupyter technology with student audiences of these levels.The second set of data provides more granular, concrete insight derived from user experiences of a handful of the courses from the sample. Four instructors and one student describe a range of pedagogical benefits and value they attribute to the use of Jupyter notebooks in their course(s).In presenting this nascent case, the article aims to stimulate the development of Jupyter notebook-enabled, computational data-driven interdisciplinary education within undergraduate and secondary school programs.
Many high-performance computing applications are of high consequence to society. Global climate modeling is a historic example of this. In 2020, the societal issue of greatest concern, the still-raging COVID-19 pandemic, saw a legion of computational scientists turning their endeavors to new research projects in this direction. Applications of such high consequence highlight the need for building trustworthy computational models. Emphasizing transparency and reproducibility has helped us build more trust in computational findings. In the context of supercomputing, however, we may ask: how do we trust results from computations that cannot be repeated? Access to supercomputers is limited, computing allocations are finite (and competitive), and machines are decommissioned after a few years. In this context, we might ask how reproducibility can be ensured, certified even, without exercising the original digital artifacts used to obtain new scientific results. This is often the situation in HPC. It is compounded now with greater adoption of machine learning techniques, which can be opaque. The ACM in 2017 issued a Statement on Algorithmic Transparency and Accountability, targeting algorithmic decision-making using data models \cite{council2017}. Among its seven principles, it calls for data provenance, auditability, validation and testing. These principles can be applied not only to data models, but to HPC in general. I want to discuss the next steps for reproducibility: how we may adapt our practice to achieve what I call unimpeachable provenance, and full auditability and accountability of scientific evidence produced via computation.An invited talk at SC20I was invited to speak at SC20 about my work and insights on transparency and reproducibility in the context of HPC. The session's theme was Responsible Application of HPC, and the title of my talk was "Trustworthy computational evidence through transparency and reproducibility." At the previous SC, I had the distinction to serve as Reproducibility Chair, leading an expansion of the initiative, which was placed under the Technical Program that year. We moved to make Artifact Description appendices required for all SC papers, created a template and an author kit for the preparation of the appendices, and introduced three new Technical Program tracks in support of the initiative. These are: the Artifact Description & Evaluation Appendices track—with an innovative double-open constructive review process—, the Reproducibility Challenge track, and the Journal Special Issue track, for managing the publication of select papers on the reproducibility benchmarks of the Student Cluster Competition. This year, the initiative was augmented to address issues of transparency, in addition to reproducibility, and a community sentiment study was launched to assess the impact of the effort, six-years in, and canvas the community's outlook on various aspects of it.Allow me to thank here Mike Heroux, Reproducibility Chair for SC in 2017 and 2018, Michela Taufer, SC19 General Chair—who put her trust in me to inherit the role from Mike—, and Beth Plale, the SC20 Transparency and Reproducibility Chair. I had countless inspiring and supportive conversations with Mike and Michela about the topic during the many months of planning for SC19, and more productive conversations with Beth during the transition to her leadership. Mike, Michela and I have served on other committees and working groups together, in particular, the group that met in July 2017 at the National Science Foundation (convened by Almadena Chtchelkanova) for the Workshop on Reproducibility Taxonomies for Computing and Computational Science. My presentation at that event condensed an inventory of uses of various terms like reproducibility and replication, across many fields of science \cite{barba2017}. I then wrote the review article "Terminologies for Reproducible Research," and posted it on arXiv \cite{barba2018}. It informed our workshop's report, which came out a few months later as a Sandia technical report \cite{taufer2018}. In it, we highlighted that the fields of computational and computing sciences provided two opposing definitions of the terms reproducible and replicable, representing an obstacle to progress in this sphere.The Association of Computing Machinery (ACM), representing computer science and industry professionals, had recently established a reproducibility initiative, and adopted diametrically opposite definitions to those used in computational sciences for more than two decades. In addition to raising awareness about the contradiction, we proposed a path to a compatible taxonomy. Compatibility is needed here because the computational sciences—astronomy, physics, epidemiology, biochemistry and others that use computing as a tool for discovery—and computing sciences (where algorithms, systems, software, and computers are the focus of study) have community overlap and often intersect in the venues of publication. The SC conference series is one example. Given the historical precedence and wider adoption of the definitions of reproducibility and replicability used in computational sciences, our Sandia report recommended that the ACM definitions be reversed. Several ACM-affiliated conferences were already using the artifact review and badging system (approved in 2016), so this was no modest suggestion. The report, however, was successful in raising awareness of the incompatible definitions, and the desirability of addressing it.A direct outcome of the Sandia report was a proposal to the National Information Standards Organization (NISO) for a Recommended Practice Toward a Compatible Taxonomy, Definitions, and Recognition Badging Scheme for Reproducibility in the Computational and Computing Sciences. NISO is accredited by the American National Standards Institute (ANSI) to develop, maintain, and publish consensus-based standards for information management. The organization has more than 70 members; publishers, information aggregators, libraries and other content providers use its standards. I co-chaired this particular working group, with Gerry Grenier from IEEE and Wayne Graves from ACM; Mike Heroux was also a member. The goal of the NISO Reproducibility Badging and Definitions Working group was to develop a Recommended Practice document—a step before development of a standard. As part of our joint work, we prepared a letter addressed to the ACM Publications Board, delivered in July 2019. It described the context and need for compatible reproducibility definitions and made the concrete request that ACM consider a change. By that time, not only did we have the Sandia report as justification, but the National Academies of Sciences, Engineering and Medicine (NASEM) had just released the report Reproducibility and Replicability in Science \cite{medicine2019}. It was the product of a long consensus study conducted by 15 experts, including myself, and sponsored by the National Science Foundation responding to Congressional decree. The NASEM report put forth its definitions as:Reproducibility is obtaining consistent results using the same input data, computational steps, methods and code, and conditions of analysis.Replicability is obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.The key contradiction with the ACM badging system resides on which term comprises using the author-created digital artifacts (e.g., data and code). We stated in the NISO working-group letter that if the ACM definitions of reproducible and replicable could be interchanged, the working group could move forward towards its goal of drafting recommended practices for badging that would lead to wider adoption in other technical societies and publishers. The ACM Publications Board responded positively, and began working through the details on how to make changes to items already published in the Digital Library with the "Results Replicated" badge—about 188 items existed at that time that were affected. Over the Summer of 2020, the ACM applied changes to the published Artifact Review and Badging web pages, and added a version number. From version 1.0, we see a note added that, as a result of discussions with NISO, the ACM was harmonizing its terminologies with those used in the broader scientific research community.All this background serves to draw our attention to the prolonged, thoughtful, and sometimes arduous efforts that have been directed at charting paths for adoption and giving structure to reproducibility and replicability in our research communities. Let us move now to why and how might the HPC community move forward.Insights on transparent, reproducible HPC researchDeployed barely over a year ago, the NSF-funded Frontera system at the Texas Advanced Computing Center (TACC) came in as the 8th most powerful supercomputer in the world, and the fastest on a university campus. Up to 80% of the available time on the system is allocated through the NSF Petascale Computing Resource Allocation program. The latest round of Frontera allocations (as of this writing) was just announced on October 25, 2020. I read through the fact sheet on the 15 newly announced allocations, to get a sense for the types of projects in this portfolio. Four projects are machine-learning or AI-focused, the same number as those in astronomy and astrophysics, and one more than those in weather or climate modeling. Other projects are single instances spanning volcanology/mantle mechanics, molecular dynamics simulations of ion channels, quantum physics in materials science, and one engineering project in fluid-structure interactions. One could gather these HPC projects in four groups:Astronomy and astrophysics are mature fields that in general have high community expectations of openness and reproducibility. As I'll highlight below, however, even these communities with mature practices benefit from checks of reproducibility that uncover areas of improvement. The projects tackling weather and climate modeling are candidates for being considered of high consequence to society. One example from the Frontera allocations concerns the interaction of aerosols caused by industrial activity with clouds, which can end up composed of smaller droplets, and become more reflective, resulting in a cooling effect on climate. Global climate models tend to overestimate the radiative forcing, potentially underestimating global warming: why? This is a question of great consequence for science-informed policy, in a subject that is already under elevated scrutiny from the public. Another project in this cluster deals with real-time high-resolution ensemble forecasts of high-impact winter weather events. I submit that high standards of transparency, meticulous provenance capture, and investments of time and effort in reproducibility and quality assurance are justified in these projects. Four of the winning projects are applying techniques from machine learning to various areas of science. In one case, the researchers seek to bridge the gap in the trade-off between accuracy of prediction and model interpretability, to make ML more applicable in clinical and public health settings. This is clearly also an application of high consequence, but in addition all the projects in this subset face the particular transparency challenges of ML techniques, requiring new approaches to provenance capture and transparent reporting. The rest of the projects are classic high-performance computational science applications, such as materials science, geophysics, and fluid mechanics. Reproducible-research practices vary broadly in these settings, but I feel confident saying that all or nearly all those efforts would benefit from prospective data management, better software engineering, and more automated workflows. And their communities would grow stronger with more open sharing. The question I have is: how could the merit review of these projects nudge researchers towards greater transparency and reproducibility? Maybe that is a question for later, and a question to start with is how could support teams at cyberinfrastructure facilities work with researchers to facilitate their adoption of better practices in this vein? I'll revisit these questions later.I also looked at the 2019 Blue Waters Annual Report, released on September 15, 2020, with highlights from a multitude of research projects that benefitted from computing allocations on the system. Blue Waters went into full service in 2013 and has provided over 35 billion core-hour equivalents to researchers across the nation. The highlighted research projects fall into seven disciplinary categories, and include 32 projects in space science, 20 in geoscience, 45 in physics and engineering, and many more. I want to highlight just one out of the many dozens of projects featured in the Blue Waters Annual Report, for the following reason. I did a word search on the PDF with Zenodo, and that project was the only one listing Zenodo entries in the "Publications & Data Sets" section that ends each project feature. One other project (in the domain of astrophysics) mentions that data is available through the project website and in Zenodo, but doesn't list any data sets in the report. Zenodo is an open-access repository funded by the European Union's Framework Programs for Research, and operated by CERN. Some of the world’s top experts in running large-scale research data infrastructure are at CERN, and Zenodo is hosted on top of infrastructure built in service of what is the largest high-energy physics laboratory of the world. Zenodo hosts any kind of data, under any license type (including closed-access). It has become one of the most used archives for open sharing of research objects, including software.The project I want to highlight is "Molten-salt reactors and their fuel cycles," led by Prof. Kathryn Huff at UIUC. I've known Katy since 2014, and she and I share many perspectives on computational science, including a strong commitment to open-source software. This project deals with modeling and simulation of nuclear reactors and fuel cycles, combining multiple physics and multiple scales, with the goal of improving design of nuclear reactors in terms of performance and safety. As part of the research enabled by Blue Waters, the team developed two software packages: Moltres, described as a first-of-its-kind finite-element code for simulating the transient neutronics and thermal hydraulics in a liquid-fueled molten-salt reactor design; and SaltProc: a Python tool for fuel salt reprocessing simulation. The references listed in the project highlight include research articles in the Annals of Nuclear Energy, as well as the Zenodo deposits for both codes, and a publication about Moltres in the Journal of Open Source Software, JOSS. (As one of the founding editors of JOSS, I'm very pleased.) It is possible, of course, that other projects of the Blue Waters portfolio have also made software archives in Zenodo or published their software in JOSS, but they did not mention it in this report and did not cite the artifacts. Clearly, the research context of the project I highlighted is of high consequence: nuclear reactor design. The practices of this research group show a high standard of transparency that should be the norm in such fields. Beyond transparency, the publication of the software in JOSS ensures that it was subject to peer review and that it satisfies standards of quality. JOSS reviewers install the software, run tests, and comment on usability and documentation, leading to quality improvements.Next, I want to highlight the work of a group that includes CiSE editors Michela Taufer and Ewa Deelman, posted last month on arXiv \cite{e2020}[6]. The work sought to directly reproduce the analysis that led to the 2016 discovery of gravitational waves, using the data and codes that the LIGO collaboration had made available to the scientific community. The data had previously been re-analyzed by independent teams using different codes, leading to replication of the findings, but no attempt had yet been made at reproducing the original results. In this paper, the authors report on challenges they faced during the reproduction effort, even with availability of data and code supplementing the original publication. A first challenge was the lack of a single public repository with all the information needed to reproduce the result. The team had the cooperation of one of the original LIGO team members, who had access to unpublished notes that ended up being necessary in the process of iteratively filling in the gaps of missing public information. Other highlights of the reproduction exercise include: the original publication did not document the precise version of the code used in the analysis; the script used to make the final figure was not released publicly (but one co-author gave access to it privately); the original documented workflow queried proprietary servers to access data, which needed to be modified to run with the public data instead. In the end, the result—the statistical significance of the gravitational-wave detection from a black-hole merger—was reproduced, but not independently of the original team, as one researcher is co-author in both publications. The message here is that even a field that is mature in its standards of transparency and reproducibility needs checks to ensure that these practices are sufficient or can be improved.Science policy trendsThe National Academies study on Reproducibility and Replicability in Science was commissioned by the National Science Foundation under Congressional mandate, with the charge coming from the Chair of the Science, Space, and Technology Committee. NASEM reports and convening activities have a range of impacts on policy and practice, and often guide the direction of federal programs. NSF is in the process of developing its agency response to the report, and we can certainly expect to hear more in the future about requirements and guidance for researchers seeking funding.The recommendations in the NASEM report are directed at all the various stakeholders: researchers, journals and conferences, professional societies, academic institutions and national laboratories, and funding agencies. Recommendation 6-9, in particular, prompts funders to ask that grant applications discuss how they will assess and report uncertainties, and how the proposed work will address reproducibility and/or replicability issues. It also recommends that funders incorporate reproducibility and replicability in the merit-review criteria of grant proposals. Combined with related trends urging for more transparency and public access to the fruits of government-funded research, we need to be aware of the shifting science-policy environment.One more time, I have a reason to thank Mike Heroux, who took time for a video call with me as I prepared my SC20 invited talk. In his position as Senior Scientist at Sandia, 1/5 of his time is spent in service to the lab's activities, and this includes serving in the review committee of the internal Laboratory Directed Research & Development (LDRD) grants. As it is an internal program, the Calls for Proposals are not available publicly, but Mike told me that they now contain specific language asking proposers to include statements on how the project will address transparency and reproducibility. These aspects are discussed in the proposal review and are a factor in the decision-making. As community expectations grow, it could happen that between two proposals equally ranked in the science portion the tie-break comes from one of them better addressing reproducibility. Already some teams at Sandia are performing at a high level, e.g., they produce an Artifact Description appendix for every publication they submit, regardless of the conference or journal requirements.We don't know if or when NSF might add similar stipulations to general grant proposal guidelines, asking researchers to describe transparency and reproducibility in the project narrative. One place where we see the agency start responding to shifting expectations about open sharing of research objects is the section on results from prior funding. NSF currently requires here a listing of publications from prior awards, and "evidence of research products and their availability, including …data [and] software."I want to again thank Beth Plale, who took time to meet with me over video and sent me follow-up materials to use in preparing my SC20 talk. In March 2020, NSF issued a "Dear Colleague Letter" on Open Science for Research Data, with Beth then acting as the public access program director. The DCL says that NSF is expanding its Public Access Repository (NSF PAR) to accept metadata records, leading to data discovery and access. It requires research data to be deposited in an archival service and assigned a Digital Object Identifier (DOI), a global and persistent link to the object on the web. A grant proposal's Data Management Plan should state the anticipated archive to be used, and include any associated cost in the budget. Notice this line: "Data reporting will initially be voluntary." This implies that it will later be mandatory! The DCL invited proposals aimed at growing community readiness to advance open science. At the same time, the Office of Science and Technology Policy (OSTP) issued a Request for Information early this year asking what could Federal agencies do to make the results from research they fund publicly accessible. The OSTP sub-committee on open science is very active. An interesting and comprehensive response to the OSTP RFI comes from the MIT Libraries. It recommends (among other things): Policies that default to open sharing for data and code, with opt-out exceptions available [for special cases]… Providing incentives for sharing of data and code, including supporting credentialing and peer-review; and encouraging open licensing. Recognizing data and code as “legitimate, citable products of research” and providing incentives and support for systems of data sharing and citation… The MIT Libraries response addresses various other themes like responsible business models for open access journals, and federal support for vital infrastructure needed to make open access to research results more efficient and widespread. It also recommends that Federal agencies provide incentives for documenting and raising quality of data and code, and also "promote, support, and require effective data practices, such as persistent identifiers for data, and efficient means for creating auditable and machine readable data management plans."To boot, the National Institutes of Health (NIH) just announced on October 29 a new policy on data management and sharing. It requires researchers to plan prospectively for managing and sharing scientific data openly, saying: "we aim to shift the culture of research to make data sharing commonplace and unexceptional."Another setting where we could imagine expectations to discuss reproducibility and open research objects is proposals for allocation of computing time. For this section, I need to thank John West, Director Of Strategic Initiatives at the Texas Advanced Computing Center (and CiSE Associate EiC), who took time for a video call with me on this topic. We bounced ideas about how cyber-infrastructure providers might play a role in growing adoption of reproducibility practices. Currently, the NSF science proposal and the computing allocation proposal are awarded separately. The Allocation Submission Guidelines discuss review criteria, which include: intellectual merit (demonstrated by the NSF science award), methodology (models, software, analysis methods), research plan and resource request, and efficient use of the computational resources. For the most part, researchers have to show that their application scales to the size of the system they are requesting time on. Interestingly, the allocation award is not tied to performance, and researchers are not asked to show that their codes are optimized, only that they scale and that the research question is feasible to be answered in the allocated time. The responsible stewardship of the supercomputing system is provided for via a close collaboration between the researchers and the members of the supercomputing facility. Codes are instrumented under the hood with low-overhead collection of system-wide performance data (in the UT facility, with TACC-Stats) and a web interface for reports.I see three opportunities here: 1) workflow-management and/or system monitoring could be extended to also supply automated provenance capture; 2) the expert staff at the facility could broaden their support to researchers to include advice and training in transparency and reproducibility matters; and 3) cyber-infrastructure facilities could expand their training initiatives to include essential skills for reproducible research. John floated other ideas, like the possibility that some projects be offered a bump on their allocations (say, 5% or 10%) to engage in R&R activities; or, more drastic perhaps, that projects may not be awarded allocations over a certain threshold unless they show commitment and a level of maturity in reproducibility.Next steps for HPCThe SC Transparency and Reproducibility Initiative is one of the innovative, early efforts to gradually raise the expectations and educate a large community about how to address it and why it matters. Over six years, we have built community awareness, and buy-in. This year's community sentiment study shows frank progress: 90% of the respondents are aware of the issues around reproducibility, and only 15% thought the concerns are exaggerated. Importantly, researchers report that they are consulting the artifact appendices of technical papers, signaling impact. As a community, we are better prepared to adapt to raising expectations from funders, publishers, and readers.The pandemic crisis has unleashed a tide of actions to increase access and share results: the Covid-19 Open Research Dataset (CORD-19) is an example \cite{al2020}; the COVID-19 Molecular Structure and Therapeutics Hub at MolSSI is another. Facing a global challenge, we as a society are strengthened by facilitating immediate public access to data, code, and published results. This point has been made by many in recent months, but perhaps most eloquently by Rommie Amaro and Adrian Mulholland in their Community Letter Regarding Sharing Biomolecular Simulation Data for COVID-19—signed by more than a hundred researchers from around the world \cite{j2020}. It says: "There is an urgent need to share our methods, models, and results openly and quickly to test findings, ensure reproducibility, test significance, eliminate dead-ends, and accelerate discovery." Then it follows with several commitments: to making results available quickly via pre-prints; to make available input files, model-building and analysis scripts (e.g., Jupyter notebooks), and data necessary to reproduce the results; to use open data-sharing platforms to make available results as quickly as possible; to share algorithms and methods in order to accelerate reuse and innovation; and to apply permissive open-source licensing strategies. Interestingly, these commitments are reminiscent of the pledges I made in my Reproducibility PI Manifesto \cite{barba2012} eight years ago!One thing the pandemic instantly provided is a strong incentive to participate in open science and attend to reproducibility. The question is how much will newly adopted practices persist once the incentive of a world crisis is removed.I've examined here several issues of incentives for transparent and reproducible research. But social epistemologists of science know that so-called Mertonian norms (for sharing widely the results of research) are supported by both economic and ethical factors—incentives and norms—in close interrelation. Social norms require a predominant normative expectation (for example, sharing of food in a given situation and culture). In the case of open sharing of research results, those expectations are not prime, due to researchers' sensitivity to credit incentives. Heesen \cite{heesen2017} concludes: "Give sufficient credit for whatever one would like to see shared ... and scientists will indeed start sharing it."In HPC settings, where we can hardly ever reproduce results (due to machine access, cost, and effort), a vigorous alignment with the goals of transparency and reproducibility will develop a blend of incentives and norms, will consider especially the applications of high consequence to society, and will support researchers with infrastructure (human and cyber). Over time, we will arrive at a level of maturity to achieve the goal of trustworthy computational evidence, not by actually exercising the open research objects (artifacts) shared by authors (data and code), but by a research process that ensures unimpeachable provenance.

Browse more published preprints

How it works

Upload or create your research work
You can upload Word, PDF, LaTeX as well as data, code, Jupyter Notebooks, videos, and figures. Or start a document from scratch.
Disseminate your research rapidly
Post your work as a preprint. A Digital Object Identifier (DOI) makes your research citeable and discoverable immediately.
Get published in a refereed journal
Track the status of your paper as it goes through peer review. When published, it automatically links to the publisher version.
Learn More
Featured communities
Explore More Communities

Other benefits of Authorea


A repository for any field of research, from Anthropology to Zoology


Discuss your preprints with your collaborators and the scientific community

Interactive Figures

Not just PDFs. You can publish d3.js and graphs, data, code, Jupyter notebooks

Featured templates
Featured and interactive
Journals with direct submission
Explore All Templates