Compound Libraries

Unique to Pharmit is the ability to select from a number of provided compound libraries or to submit a custom library for screening. The library to screen is selected through a pull down menu in the search button (see FigureĀ \ref{pharmfig}).

Provided Libraries

Large libraries corresponding to compound catalogs from a variety of sources are provided and periodically updated to ensure continued relevance, especially with regard to compound availability from commercial sources. Currently, Pharmit has pre-built libraries generated from CHEMBL21 \cite{Gaulton_2011}, with \(>1.4\) million compounds; ChemDiv (www.chemdiv.com), with \(>1.4\) million compounds; MolPort (www.molport.com), with \(>6.5\) million compounds; the NCI Open Chemical Repository (dtp.cancer.gov), with \(>108,000\) compounds; and PubChem \cite{Kim_2015}, with \(>66\) million compounds. Although a search is limited to the compounds of the selected library, all compounds within these provided libraries are cross-annotated so, for example, it is possible to look up the PubChem record of a compound found by searching the commercial MolPort library to check for known bioactivities.

Library Creation

Users may submit their own libraries for screening. In the spirit of the open access and open-source nature of Pharmit, users are encouraged to make their submitted libraries publicly accessible, in which case they are available to all users for screening as a user contributed library. However, registered users have the ability to create a private library, as well as remove or update previously submitted libraries.

In order to create a library, compounds may be provided either in the two-dimensional SMILES or three-dimensional SDF formats. If the user uploads compounds in the SMILES format, duplicated canonical SMILES are removed, the molecules are protonated using OpenBabel \cite{O_Boyle_2011} using default settings, and only the largest component of a molecule is retained (e.g., salts are removed). Then RDKit (rdkit.org) and the UFF force field \cite{Rappe_1992} are used to generate up to 10 3D conformers for each compound resulting from this procedure. This approach has been shown to generate high quality conformations \cite{Ebejer_2012}. Alternatively, if the user provides compounds in the SDF format, the provided structures are assumed to be valid conformers and are used directly, with protonation states determined by OpenBabel.