Figure 1 Flow chart of ACGC.
ACGC is composed of atomic adjacent group (AAG), shape factor (SF) and atomic connectivity factor (ACF). The working process of ACGC is described in Figure 1. AAG is a systematic group definition approach that explicitly decomposes each molecular structure into a set of non-overlapping functional groups based on the relationship between core and adjacent atoms. SF is used to calculate the effect of molecular shape. ACF is used to calculate topological position factors by atomic properties to describe the positions of groups. We also analyze and evaluate the model by external verification, internal verification and Y-randomization test31.
Atomic adjacent group (AAG)
The traditional group contribution method requires a higher level of groups to make a more accurate division, and the division method is complicated. The AAG is proposed by the atomic adjacent relationship. Atoms are classified into two types: endpoint atom and connection atom. The atom connected with only one non-hydrogen atom is defined as endpoint atom. The atom connected with two or more non-hydrogen atoms is defined as connection atom. A group consists of core atom, adjacent atoms and bond types, which are shown in Figure 2. Core atom is a connection atom, which neighbors with two or more non-hydrogen atoms. Core atoms include carbon, oxygen, nitrogen, silicon, sulfur, phosphorus and so on. Atoms connected to core atom include endpoint atoms and connection atoms. Endpoint atom is described in the parenthesis, ’()’. Connection atom is described in the bracket, ’[]’. Bond types between the core atom and adjacent atom are presented before the adjacent atoms. The single bond, double bond, triple bond of linear structure is described as -, =, ≡, respectively. The single bond, double bond, triple bond of cyclic structure and aromatic bond are described by ~,≈,≋ and ∷, respectively. Four examples are used to describe the group definition rules in Table S1 of Supporting Information (contribution-coefficient.docx).