[insert Figure 5 here], [insert Figure 6 here], and
[insert Table 6 here]
Model substitution
The above result from GDP (92.4 in Table 6) is slightly larger than the
maximal sensorial rating (89.7) of the original 761 data samples.
Although a local optimal solution has already been obtained, the GDP
problem is still challenging to solve. In fact, the choice of the
initial values greatly affects whether feasible solutions can be
obtained and the quality of local solution. It is found that the major
computational difficulties come from the rigorous mechanistic models for
perfume evaporation (Eq. 36) and diffusion (Eq. 41), which requires the
handling of many highly nonlinear equations. For instance, the
vapor-liquid equilibrium and UNIFAC equations must be calculated at
every time point (i.e., Eq. 38-40). Thus, in order to solve the
formulation problem more efficiently and find better solutions, model
substitution is employed here.
Whether the top note of a perfume can be dominated by a lemon-like or
non-lemon-like scent is a binary decision. Thus, the prediction of the
odor type can be transformed into a classification problem. In other
words, the complex mechanistic models (Eq. 36-45) for predicting the
odor type in the top note is substituted by a classification-based
surrogate model. To do so, random sampling is applied to generate 15000
artificial perfume recipes that account for the heuristic rules in Eq.
46-52. Among them, 7500 recipes consist of 0.25-0.75% limonene
(lemon-like), 5000 recipes contain 0.75-1.25%, and 2500 recipes have
1.25-1.75%. These recipes are used as the input data. For each recipe,
their odor intensities in the top note are calculated using Eq. 36-45.
If a lemon-like odor has the highest intensity, the output is set equal
to 1. Otherwise, it is equal to 0. Then, a support vector classification
(SVC) model with linear kernel function is trained. Through 10-fold
cross validation, the hyperparameter C indicating the
regularization strength is tuned to be 10. Figure S3 presents the
classification error distribution. For the 7500 data samples containing
0.25-0.75% limonene, the classification accuracy is 93.3%. For the
other half samples, the accuracy is 98.9 %. The overall accuracy is
96.1%. These statistics indicate that this SVC model can serve as a
relatively simple surrogate for substituting the original complex
mechanistic models. The SVC model consists of 2126 support vectors and
is expressed as
\(OTTN=\sum_{c=1}^{2126}{\alpha_{c}\bullet K_{c}+bs}\) (55)
\(K_{c}=\sum_{i=1}^{48}{SV_{c,i}}\bullet\text{VN}_{i}\) (56)
\(VN_{i}=\frac{V_{i}-V_{i,min}}{V_{i,max}-V_{i,min}}\) (57)
where \(\alpha_{c}\) and bs are the weights for support
vector and a constant, respectively. \(SV_{c,i}\) is the support vector.\(V_{i,max}\) and \(V_{i,min}\) are normalization coefficients. These
parameters are optimized automatically during the training process and
provided in the Github platform mentioned above.
By substituting Eq. 36-45 with Eq. 55-57, the resulting perfume
formulation problem (MINLP-SVC) is solved using the global solver BARON.
Table 5 shows the computational statistics. It consists of 2860 single
variables, 2920 equations, and 2928 nonlinear matrix entries. Clearly,
the problem size and nonlinearity are much less than those of the GDP
problem. It takes 143 seconds to obtain the global solution given in the
last column of Table 6. The maximum sensorial rating is 98.3 which is
better than the GDP result. The new perfume formula consists of 13
different fragrances in different volume fractions. The total volume
fraction of fragrances is 20%. Moreover, the design targets on\(LD_{50}\) and flash point are fulfilled. As listed in Table S4, all
the ingredient’s volume fractions are less than their volume solubility
in the ethanol-water solvent. In addition, the major odor type in the
top note is classified as 1 (i.e., lemon-like) by the SVC model. As
validated using the original mechanistic models (Eq. 36-45), Figure 5b
shows the odor intensity in the first 350 seconds. Again, only the top
note fragrances are plotted. It is clear that the lemon-like fragrance
limonene has the maximum odor intensity (around 3.5) which is higher
than those of other fragrances. This validates the SVC results as well.
In addition, Figure 6b shows the diffusion profile of 4 top note
fragrances at 5 minutes, which is simulated using the original
mechanistic models. Figure S4a and S4b present the simulated diffusion
of 5 middle note fragrances at 1 hour and 4 base note fragrances at 5
hours, respectively.