Open Source Molecular Modeling
The success of molecular modeling and computational chemistry efforts are, by definition, dependent on quality software applications. Open source software development provides many advantages to users of modeling applications, not the least of which is that the software is free and completely extendable. In this review we categorize, enumerate, and describe available open source software packages for molecular modeling and computational chemistry.
Free and open source software (FOSS) is software that is both considered “free software,” as defined by the Free Software Foundation (http://fsf.org) and “open source,” as defined by the Open Source Initiative (http://opensource.org). The distinctions between free and open source software are largely philosophical - the free software movement is primary motivated by user freedoms (”free as in speech, not free as in beer”) while the open source movement is more concerned with promoting an open development model to enhance software quality. However, as a practical matter, especially with regards to scientific software, such distinctions remain philosophical rather than practical as the most popular software licenses are both free and open source.
The unifying theme of open source software licenses is that they allow users to use, modify, and distribute software without significant restrictions. This is achieved by making the full source code of the software available to users. Broadly speaking, open source licenses fall into two categories: permissive and copyleft. Permissive licenses, such as the Apache, BSD, MIT, and Python licenses, place minimal restrictions on how modified code may be distributed, such as requiring attribution and limiting liability. They specifically do not require that redistributions of modified source code be licensed under the same license as the original source code. This enables source code licensed under a permissive license to be incorporated into commercial, proprietary programs that are not open source. In contrast, copyleft licenses, such as the different versions of the GNU Public License (GPL), require that public redistributions of licensed software remain licensed under a GPL license. That is, the source code must remain publicly and freely available. The GNU Lesser General Public License (LGPL) is slightly less restrictive version of the GPL used primarily for libraries as it does not require software that uses LGPL licensed software as a library to be licensed under the LGPL. Although copyleft licenses do not prohibit selling software, since the full source code must remain freely available, in practice vendors of copylefted software must commercialize the support of the product, rather than the product itself. Finally, we note there are other software licenses that make source code available, but are not open source licenses. These licenses typically prohibit the redistribution of the source code. Such licenses, which we will refer to as “source-available” licenses, have some popularity in academia as they allow source code to be distributed to other researchers in non-profit institutions, but allow the code to be sold to commercial entities.
The value of open source software in cheminformatics and molecular modeling is somewhat controversial. Unsurprisingly, those affiliated with commercial scientific software argue that traditional commercial development, with its associated support and continuous development, provides a superior value (Krylov 2015), while open source advocates feel the benefits outweigh the burdens (Gezelter 2015, Jacob 2016). Our goal is not to revisit these arguments. Instead, we assert that open source scientific software is a de facto part of the scientific community, and so in this review we catalog those open source packages that fall within the domain of cheminformatics and molecular modeling.
There are a few aspects of the open source software debate that we find particularly relevant. First, opponents are right to point out that free software is not free - users of open source software generally take on a much greater burden in supporting the software than with commercial software. This is one reason why it is important, when possible, to seek open source software that is under active development and supported by a broad community. Therefore, in this review we attempt to quantify the current level of development and usage of each package as an indirect measure of quality and usability. Second, the primary advantage of open source software is the ability to redistribute code without restriction. This inherently enables reproducibility and lets scientists “stand on the shoulders of giants” instead of reinventing the wheel. Consequently, in this review we limit ourselves to a survey of true open source software and exclude source-available software that may place restrictions on the publication of reproducible research results.