What is in the Astronomer Software Tool Stack?

In this section we consider the most common software tools for professional astronomers. We refer to the full set of software tools an astronomer uses as their “stack”. In the survey form we suggested 19 software tools and allowed participants to add any options we missed. The input was edited to standardize spelling and capitalization of tools. In total, participants added 64 custom options. 10 respondents did not provide an answer to this question. While “C” was an option, “C++” was not part of our suggestions. Some participants noted in the comments what they chose “C” even though they actually use “C++”. For this reason we consider C and C++ together in our analysis. Within the top-20 most used software tools there are four items that were not on our original list: C++, Mathematica, gnuplot and awk.

The overall astronomer stack is rather narrow (Figure \ref{fig:stack1}, first panel). Only ten of the software tools are used by more than 10% of the survey participants. These are (from most popular to least popular): Python, shell scripting, IDL, C/C++, Fortran, IRAF, spreadsheets, HTML/CSS, SQL and Supermongo. Across all participants the most common programing language is Python (\(67\pm2\%\)), followed by IDL (\(44\pm2\%\)), C/C++ (\(37\pm2\%\)) and Fortran (\(28\pm2\%\)). Shell scripting is the second most popular tool for astronomers (\(47\pm2\%\)). The IRAF (Image Reduction and Analysis Facility) environment is used by \(24\pm1\%\) of the survey participants.

Across the different career stages, we notice that senior astronomers have a broader tool stack, i.e. they utilize a wider variety of tools in their research. Only eight tools are used by more than 10% of graduate students, nine tools are used by more than 10% of postdocs and 11 tools are used by more than 10% of faculty and scientists. Python is the most popular tool at all career levels, and it is most popular among junior researchers. Four out of five graduate students use Python (\(80\pm5\%\)), as do \(70\pm5\%\) of postdocs and half of faculty and scientists (\(53\pm4\%\)). IDL, IRAF and compiled languages have a more uniform user base across all career levels. Some tools are unique to certain demographics. Graduate students have the highest fraction of Matlab users (\(11\%\)), while faculty and research scientists dominate HTML/CSS (21%), Supermongo (16%) and Perl (16%).

Unsurprisingly, software tools depend strongly on the research area (Figure \ref{fig:stack2}). Without attempting to be exhaustive, we note some interesting differences between fields. Observational astronomers have the highest fractions of IDL (\(48\pm2\%\)) and IRAF (\(31\pm2\%\)) users. Theoretical researchers have the highest fractions of compiled language users: C/C++ with \(56\pm4\%\) and Fortran with \(50\pm4\%\). Researchers in instrumentation have a high fraction of C/C++ (\(52\pm6\%\)) and spreadsheet (\(28\pm5\%\)) users. Other tools, however show little field-to-field variation. Python use is consistently high across all fields at 60 - 70%, as is shell scripting at \(\sim50\%\).

Finally, in Figure \ref{fig:stack3} we consider the software stack for researchers in different countries. Researches in the USA have the highest fractions of IDL (\(49\pm3\%\)) and IRAF (\(25\pm2\%\)) users, while Australia has the lowest fraction of users of these tools, \(32\pm7\%\) and \(12\pm4\%\), for IDL and IRAF respectively. The UK has the highest fraction of SQL users (\(21\pm5\%\)); Germany has the highest fraction of C/C++ users (\(48\pm5\%\)); and Australia has the highest fraction of Matlab users (\(13\pm4\%\)). However, these results can be strongly influenced by the research areas represented for each country within our sample so we caution against drawing far-reaching conclusions.

We can also compare the USA and non-USA survey respondents, since those two samples are comparable in size (Figure \ref{fig:stack3}, second and sixth panels). Overall the rankings and fractions of users of different tools are very similar as can be expected by the global mobility of many astronomers. The only notable exceptions are IDL and R. The fraction of IDL users in the USA is 10% larger than of non-USA participants. The user base of the statistical package R is reversed: \(8\pm1\%\) of non-USA researchers choose this option vs. only \(3\pm1\) of USA researchers. Considering the wide-spread use of R in other scientific fields, its popularity among astronomers is strikingly low.