Authorea

David Koes edited subsection_Shape_Indexing_and_Similarity__.tex over 8 years ago

Commit id: 527d160e80f22cb6792e3734f3a78079730c2bc4

deletions | additions

As with VAMS\cite{VAMS}, we use a matching and packing\cite{Koes_2014} bulk-loading algorithm to initialize an efficient data structure for volumetric shape constraint searches. Briefly, shapes are stored in a GSS-tree\cite{keim1999} where each leaf of the tree is a single molecular shape and each internal node includes a maximum included volume (MIV) and minimum surrounding volume (MSV). The MSV is the union of all the molecular shapes beneath the node in the tree while the MIV is the intersection. By appropriately applying minimum and maximum shape constraints to the MIV and MSV, it can be determined if any of the shapes lower in the tree have the potential to match the constraints. If not, the entire subtree of molecular shapes can be eliminated from consideration, resulting in a sub-linear running time. Shape %Shape constraints combined with shape indexing provide a rapid way to filter a virtual library. Alternatively, %Alternatively, instead of serving as hard constraints, they can also be used to rank molecular shapes by similarity to the shape constraint query. We use the shape Tanimoto\cite{RushIII2005} to compute the similarity of two shapes: $$\delta(A,B) %$$\delta(A,B) = \frac{A \cap B}{A \cup B}$$ where %where a larger score indicates a greater degree of similarity. The %The similarity of a shape, $A$, with the minimum, $MIN$, and maximum, $MAX$, shape constraints is computed by combining their shape Tanimotos: $$\delta(A,MIN,MAX) %$$\delta(A,MIN,MAX) = \delta(A,MIN) + \delta(A,MAX) = \frac{A \cap I}{A MIN}{A \cup I} MIN} + \frac{A \cap \overline{E}}{A MAX}{A \cup \overline{E} MAX }$$The closer a shape is to meeting the included constraint, the larger the value of $\delta(A,I)$, while the more a shape violates the excluded constraint, the smaller the value of $\delta(A,\overline{E})$. The more an shape exceeds the included constraint, the more it is penalized by the $\delta(A,I)$ term, but, to the extent that its volume avoid the conflicting with the excluded shape constraint, it is rewarded by the $\delta(A,\overline{E})$ term.