Authorea

Jan Jensen edited section_Introduction_A_large_proportion__.tex over 8 years ago

Commit id: e2f8864e8a6fc72ace81a06a25bbe2cbc5bf43af

deletions | additions

\section{Introduction} A large proportion of organic molecules relevant to medicine and biotechnology contain one or more ionizable groups, which means that fundamental physical and chemical properties (e.g. the charge of the molecule) depend on the pH of the surroundings via the corresponding pKa values of the molecules. As drug- and material design increasingly is being done through high throughput screens, fast - yet accurate - computational pKa prediction methods are becoming crucial to the design process. There are several empirical pKa prediction tools (e.g. ACD pKa DB, Chemaxon, and Epik) that offer predictions in less than a second and can be used by non-experts. These methods are generally quite accurate but often fail for classes of molecules that are not found in the underlying database. \citet{Settimo_2013} have recently shown that the empirical methods are particularly prone to failure for amines (see Figure 1 for an example), amines, which represent a large fraction of drugs currently on the market or in development. These underlying databases are not public and it is therefore difficult to anticipate when empirical methods will fail. Furthermore, the user is generally not able to augment the databases for cases where they are found to fail rendering the empirical methods essentially useless for certain molecular design projects. pKa values can be predicted with significantly less empiricism using computational quantum mechanics (QM).2 electronic structure theory (QM) \cite{Ho_2014}. The accuracy of these QM-based predictions appear to rival that of the empirical approaches, but a direct comparison on a common set of molecules has not appeared in the literature and most QM-based pKa prediction studies have focussed focused on relatively small sets of simple benchmark molecules. One notable exception to the latter statement is the study by Eckert and Klamt3 \citet{Eckert_2005} who computed the pKa by xx \begin{equation} \mathrm{pK_a}=\mathrm{pK_a^{ref}} + \frac{\Delta G^\circ}{RT\ln (10)} + 1.5(N_C - 1) \end{equation} where ΔG° $\Delta G^\circ$ denotes the change in standard free energy for the reaction xx \mathrm{ BH^+ + B_{ref} } and is approximated as the sum of the electronic and solvation free energy. NC - 1 is an empirical correction accounting for the observation that the the method systematically underestimated the pKa of secondary (NC = 2) and tertiary (NC = 3) amines by ca 1 and 2 pH units, respectively. Using this approach the pKa values of 58 drug-like molecules containing one or more ionizable N atoms could be reproduced with a root mean square deviation (RMSD) of 0.7. However, the method relies on conformer search at the BP/TZVP level of theory which is computationally too expensive for routine use in screening and design. Semiempirical QM (SQM) methods are many orders of magnitude faster than conventional QM but their application to small molecule pKa prediction has been very limited and have focussed mainly indirect prediction using atomic charges.4,5 The most likely reason for this is that SQM methods give significantly worse pKa predictions if used with an arbitrary reference molecule such as H2O (Rxn 1). However, we6 and others7,8 have shown that a judicious choice of reference molecule is a very effective way of reducing the error in pKa predictions. Here we show that this approach is the key to predict accurate pKa values using our PM6-D3H+ SQM method9 combined with the SMD solvation method.