Images were developed for ChemML\cite{Haghighatlari_2019,Hachmann_2018} and ANI\cite{Smith2017}. These represent two very different classes of machine learning models. The illustrated model developed by ChemML package takes in a SMILES string, and predicts corresponding properties which are output as key-value pairs. ANI behaves more like a typical quantum chemistry code that takes in 3D chemical structure, can perform an energy calculation, geometry optimization, or frequency calculation and output (potentially updated) 3D chemical structure. It doesn't require as much input as is typical for other codes, and it doesn't output electronic structure. Both required further generalization of our treatment of tasks in order to adapt to their unique requirements, and added examples of quite distinct types of model paving the way to easily add more using similar patterns.
ChemML is a general-purpose machine learning package developed in Python for the analysis and modeling of chemical and materials data. Although ChemML provides essential tools to develop and train machine learning models in general, we aim to deploy trained models and broaden their utilization in the chemistry community. Our long-term goal is to compile a collection of predictive models for certain properties that can be used as alternatives to corresponding physics-based modeling or simulation approaches. To start, we developed a deep learning model for prediction of the refractive index values of organic compounds and made it available as a Docker container.
The model takes as input the SMILES string of an organic molecule, and returns as output a few quantities of interest that the model has been trained on. The predictions are available almost instantaneously (unlike a physics-based model via the Lorentz-Lorenz equation parametrized by inputs from quantum chemistry and molecular dynamics calculations). The results of the model are comparable with those of other data-derived prediction models in terms of diversity of molecular candidates and the accuracy of predictions. The current implementation enables the user to retrain each model for a better or more generalized prediction power. Moreover, a trained model can leverage other relevant ML models through the concept of transfer learning design methodologies. Our near-term plans for future development include the addition of models trained on other data sets and for other material properties.
The ANI image used the PyTorch\cite{library} implementation of the ANI potentials - TorchANI \cite{Gao2020}. We make use of the algorithms in ASE to drive tasks such as geometry optimization and normal mode calculations. The TorchANI image features the ani-1x and ani-1ccx optimized potentials that can be used to generate single point energies, perform geometry optimizations and compute Hessians for frequency calculations. This makes the ANI image look quite similar to the quantum chemistry codes, but no basis set selection is necessary and there is only support for a limited set of chemical elements.
For the elements that are supported ANI offers superior performance to geometry optimizations performed using conventional quantum chemistry codes, often trained on a higher level of theory than would typically be used. It also offers performance comparable with molecular mechanics codes without the need for atom typing and other parameterization. This often makes a geometry optimization desirable before using other codes assuming the chemical structure only uses the elemental set supported. It is also possible to use other techniques and compare the output in order to validate its use for particular chemical systems.