l
ChemML is a machine learning package developed in Python for the analysis and modeling of chemical and materials data. In addition to adding simple property prediction from SMILES strings we aim to make these techniques more accessible and broaden their dissemination in the chemistry community. The focus of the current work has been to compile a collection of trained models for certain materials properties that can be used as alternatives to corresponding physics-based modeling or simulation approaches. As proof of principle, we designed and implemented a deep learning model for the prediction of the refractive index values of organic compounds and made it available as a Docker container.
A dynamic figure? Any other pieces from below that could go in here?
The model takes as input the SMILES string of an organic molecule, and returns as output a few quantities of interest that the model has been trained on. The predictions are available almost instantaneously (unlike a physics-based model via the Lorentz-Lorenz equation parametrized by inputs from quantum chemistry and molecular dynamics calculations). The results of the model are comparable with those of other data-derived prediction models in terms of diversity of molecular candidates and the accuracy of predictions. The current implementation enables the user to retrain each model for a better or more generalizable prediction power. Moreover, a trained model can leverage other relevant ML models through the concept of transfer learning design methodologies. Our near-term plans for future development include the addition of models trained on other data sets and for other material properties.
The ANI image used the Pytorch implementation of the ANI potentials---TorchANI. We make use the algorithms in ASE to drive tasks such as geometry optimization and normal mode calculations. The TorchANI image features the ani-1x and ani-1ccx optimized potentials that can be used to generate single point energies, perform geometry optimizations and compute Hessians for frequency calculations. This makes the ANI image look quite similar to the quantum chemistry codes, but no basis set selection is necessary and there is support for a limited set of chemical elements.
For the elements that are supported ANI offers superior performance to geometry optimizations performed using conventional quantum chemistry codes, often using higher levels of theory. It also offers performance comparable with molecular mechanics codes without the need for atom typing and other parameterization. This often makes a geometry optimization desirable before using other codes assuming the chemical structure only uses the elemental set supported. It is also possible to use other techniques and compare the output in order to validate its use for particular chemical systems.