Jan Jensen edited section_Introduction_Semiempirical_electronic_structure__.tex  over 8 years ago

Commit id: 98fb1e45a45a7b8b8517f4daafd0915c7bb971ae

deletions | additions      

       

\section{Introduction}  Semiempirical electronic structure methods are increasingly parameterized and benchmarked against data obtained by DFT or wavefunction-based calculations using rather than experimental data \citep{Stewart_2007, \cite{Stewart_2007,  OM3, Gaus_2013}. Using calculated data has the advantage that it represents the precise value (usually the electronic energy) that is being parameterized, with little random noise with good coverage of chemical space, including molecules that are difficult to synthesize or perform measurements on. Carefully curated benchmark sets, such as GMTKN30 \cite{Goerigk_2011}, are therefore an invaluable resource to the scientific community and heavily used. For example, Korth and Thiel \cite{Korth_2011} used the GMTKN24-hcno dataset (21 subsets of the GMTKN24 data set \cite{Goerigk_2010}, an earlier version of GMTKN30) to show that modern semi-empirical methods are approaching the accuracy of PBE/TZVP and B3LYP/TZVPcalculations. While this is encouraging one concern is whether the results obtained for the small systems that make up these data sets are representative of those one would obtain for the large systems. For example, Yalmazer and Korth \cite{Yilmazer_2013} performed a benchmark study of hundreds of protein-ligand complexes that included protein atoms within up to 10 Å from the ligand and showed that, for example, that the mean absolute deviation (MAD) between interaction energies computed using PM6-DH+ and BP86-D2/TZVP was 14 kcal/mol. In comparison the MADs for the S22 interaction energy subset of GMTKN24 are < 2 kcal/mol for both dispersion corrected PM6 and DFT/TZVP calculations in the Korth and Thiel study \cite{Korth_2011}. One likely explanation is that the systems in the S22 subset are too small to exhibit many-body polarization contributions to the binding energy that semi-empirical methods fail to capture. Another, or additional, reason is that the S22 subset does not include ionic groups, which are quite common in proteins and ligands.