Jan Jensen edited section_Introduction_Semiempirical_electronic_structure__.tex  over 8 years ago

Commit id: 87aafab240edcf12648393dfa757db5538137f6d

deletions | additions      

       

\section{Introduction}  Semiempirical electronic structure methods are increasingly parameterized and benchmarked against data obtained by DFT or wavefunction-based calculations using rather than experimental data (PM6, PM6-DH+, DFTB3, Von Lillienfeld). Using calculated data has the advantage that it represents the precise value (usually the electronic energy) that is being parameterized, with little random noise with good coverage of chemical space, including molecules that are difficult to synthesize or perform measurements on. Carefully curated benchmark sets, such as GMTKN30 \cite{Goerigk_2011}, are therefore an invaluable resource to the scientific community and heavily used.   For example, Korth and Thiel \cite{Korth_2011} used the GMTKN24-hcno dataset (21 subsets of the GMTKN24 data set \cite{Goerigk_2010}, an earlier version of GMTKN30) to show that modern semi-empirical methods are approaching the accuracy of PBE/TZVP and B3LYP/TZVPcalculations. While this is encouraging one concern is whether the results obtained for the small systems that make up these data sets are representative of those one would obtain for the large systems. For example, Yalmazer and Korth \cite{Yilmazer_2013} performed a benchmark study of hundreds of protein-ligand complexes that included protein atoms within up to 10 Å from the ligand and showed that, for example, that the mean absolute deviation (MAD) between interaction energies computed using PM6-DH+ and BP86-D2/TZVP was 14 kcal/mol.  In part because of these newer semiempirical methods such as PM6, OMD3 and DFTB3 are starting to rival the accuracy of standard DFT calculations for some properties, especially when combined with dispersion and hydrogen bond corrections. For example, Korth and Thiel have shown that ..   However, there are comparatively few data sets on reactivity and most of these data sets are reactions that may not be relevant to enzymatic catalysis.