Hans Fangohr - Authorea

Guest Editors’ IntroductionNotebook interfaces – documents combining executable code with output and notes – first became popular as part of computational mathematics software such as Mathematica and Maple. The Jupyter Notebook, which began as part of the IPython project in 2012, is an open source notebook that can be used with a wide range of general-purpose programming languages.Before notebooks, a scientist working with Python code, for instance, might have used a mixture of script files and code typed into an interactive shell. The shell is good for rapid experimentation, but the code and results are typically transient, and a linear record of everything that was tried would be long and not very clear. The notebook interface combines the convenience of the shell with some of the benefits of saving and editing code in a file, while also incorporating results, including rich output such as plots, in a document that can be shared with others.The Jupyter Notebook is used through a web browser. Although it is often run locally, on a desktop or a laptop, this design means that it can also be used remotely, so the computation occurs, and the notebook files are saved, on an institutional server, a high performance computing facility or in the cloud. This simplifies access to data and computational power, while also allowing researchers to work without installing any special software on their own computer: specialized research software environments can be provided on the server, and the researcher can access those with a standard web browser from their computer.These advantages have led to the rapid uptake of Jupyter notebooks in many kinds of research. The articles in this special issue highlight this breadth, with the authors representing various scientific fields. But more importantly, they describe different aspects of using notebooks in practice, in ways that are applicable beyond a single field.We open this special issue with an invited article by Brian Granger and Fernando Perez – two of the co-founders and leaders of Project Jupyter. Starting from the origins of the project, they introduce the main ideas behind Jupyter notebooks, and explore the question of why Jupyter notebooks have been so useful to such a wide range of users. They have three key messages. The first is that Notebooks are centered around the humans using them and building knowledge with them. Next, notebooks provide a write-eval-think loop that lets the user have a conversation with the computer and the system under study, which can be turned into a persistent narrative of computational exploration. The third idea is that Project Jupyter is more than software: it is a community that is nourished deliberately by its members and leaders.The following five articles in this special issue illustrate the key features of Project Jupyter effectively. They show us a small sample of where researchers can go when empowered by the tool, and represent a range of scientific domains.Stephanie Juneau et al. describe how Jupyter has been used to ‘bring the compute to the data’ in astrophysics, allowing geographically distributed teams to work efficiently on large datasets. Their platform is also used for education & training, including giving school students a realistic taste of modern science.Ryan Abernathey et al. , of the Pangeo project, present a similar scenario with a focus on data from the geosciences. They have enabled analysis of big datasets on public cloud platforms, facilitating a more widely accessible ‘pay as you go’ style of analysis without the high fixed costs of buying and setting up powerful computing and storage hardware. Their discussion of best practices includes details of the different data formats required for efficient access to data in cloud object stores rather than local filesystems.Marijan Beg et al. describe features of Jupyter notebooks and Project Jupyter that help scientists make their research reproducible. In particular, the work focuses on the use of computer simulation and mathematical experiments for research. The self-documenting qualities of the notebook—where the response to a code cell can be archived in the notebook—is an important aspect. The paper addresses wider questions, including use of legacy computational tools, exploitation of HPC resources, and creation of executable notebooks to accompany publications.Blaine Mooers describes the use of a snippet library in the context of molecular structure visualization. Using a Python interface, the PyMOL visualization application can be driven through commands to visualize molecular structures such as proteins and nucleic acids. By using those commands from the Jupyter notebook, a reproducible record of analysis and visualizations can be created. The paper focuses on making this process more user-friendly and efficient by developing a snippet library, which provides a wide selection of pre-composed and commonly used PyMOL commands, as a JupyterLab extension. These commands can be selected via hierarchical pull-down menus rather than having to be typed from memory. The article discusses the benefits of this approach more generally.Aaron Watters describes a widget that can display 3D objects using webGL, while the back-end processes the scene using a data visualization pipeline. In this case, the front-end takes advantage of the client GPU for visualization of the widget, while the back-end takes advantage of whatever computing resources are accessible to Python.The articles for this special issue were all invited submissions, in most cases from selected presentations given at JupyterCon in October 2020. Each article was reviewed by three independent reviewers. The guest editors are grateful to Ryan Abernathey, Luca de Alfaro, Hannah Bruce MacDonald, Christopher Cave-Ayland, Mike Croucher, Marco Della Vedova, Michael Donahue, Vidar Fauske, Jeremy Frey, Konrad Hinsen, Alistair Miles, Arik Mitschang, Blaine Mooers, Samual Munday, Chelsea Parlett, Prabhu Ramachandran, John Readey, Petr Škoda and James Tocknell for their work as reviewers, along with other reviewers who preferred not to be named. The article by Brian Granger and Fernando Perez was invited by the editor in chief, and reviewed by the editors of this special issue.Hans Fangohr is currently heading the Computational Science group at the Max Planck Institute for the Structure and Dynamics of Matter in Hamburg, Germany, and is a Professor of Computational Modelling at the University of Southampton, UK. A physicist by training, he received his PhD in Computer Science in 2002. He authored more than 150 scientific articles in computational science and materials modelling, several open source software projects, and a text book on Python for Computational Science and Engineering. Contact him at [email protected] Kluyver is currently a software engineer at European XFEL. Since gaining a PhD in plant sciences from the University of Sheffield in 2013, he has been involved in various parts of the open source & scientific computing ecosystems, including the Jupyter & IPython projects. Contact him at [email protected] Di Pierro is a Professor of Computer Science at DePaul University. He has a PhD in Theoretical Physics from the University of Southampton and is an expert in Numerical Algorithms, High Performance Computing, and Machine Learning. Massimo is the lead developer of many open source projects including web2py, py4web, and pydal. He has authored more than 70 articles in Physics, Computer Science, and Finance and has published three books. Contact him at [email protected]