Automatic Scanning of the PDB Databank
As the number of high-throughput computational methods increases, and
PyFREC provides means for quick screening of excited state resonances,
electronic couplings, and quantum dynamics simulations, it is convenient
to have a tool for automatic extraction of structural information. The
Protein Data Bank (PDB)19 provides a convenient
interface for such operations. Employingurllib2 ,35 PyFREC automatically downloads and
parses PDB files based on a user-provided PDB ID list. PyFREC then
analyzes the downloaded PDB structures (e.g., identifies chlorophyll
pigments inside PDB files) in order to compute electronic couplings
between the selected fragments. Currently, the identification of
pigments is based on chemical structure and topology of chemical bonds
(e.g., the central Mg atom surrounded by nitrogen and oxygen atoms at
particular distances).2 In the future,
machine-learning algorithms (see below) will be used for this analysis.
Processing of multiple molecular structures (e.g., proteins from the PDB
databank) produces datasets that can be interpreted and analyzed using
the network (graph) theory. For example, electronic couplings or
orientation factors that characterize interactions between pigments and
affect the exciton energy transfer can be rationalized in terms of
network theory. PyFREC employs NetworkXlibrary36 to generate and analyze networks. Various
properties of the network are computed, including average shortest path
length, average clustering coefficient, and current-flow closeness
centrality.