(b) Separation section.
Figure 1. Process flow diagram of ethylene oxide production.
Design of experiments (DoE)
To reduce the number of simulations or experiments, a DoE was developed.
It is a method for determining which candidates are to be experimented
first among experimental candidates. Here, we adopt the D-optimal
design13 in the DoE, which can select several
informative X values from a large number of candidates. Therefore, it is
possible to select the X candidates that are likely to construct a good
regression model with only the dataset of X.
Adaptive DoE
An adaptive DoE that decides on the experimental candidates
sequentially, on the basis of the experimental results, was developed.
Response surface methodology15 and Bayesian
optimization12 are often used in adaptive DoE. Here,
we use Bayesian optimization that can estimate regions with a small
number of samples or extrapolation regions and can estimate from a small
number of samples. Bayesian optimization has been used to search for
optimal experimental conditions16, 17, 18, 19 and
process conditions20, 21, 22, and to optimize
hyperparameters23. Here, we apply Bayesian
optimization to process design.
Gaussian process regression (GPR)
Bayesian optimization uses Gaussian process regression
(GPR)24. When the input x is given, the
output y(x ) is represented as a probability model that follows
a normal distribution. Assuming that the GPR model is linear, the i-th
samples are given by:
\(y^{(i)}=\mathbf{x}^{(i)}\mathbf{b}\) (4)
where b is a vector of regression coefficients. The prior
distribution of b assumes a normal distribution with a zero
mean and variance σ b2. Then,
the mean vector mi of they (i ) and the covarianceσ yi ,j 2 of they (i ) andy (j ) are calculated from:
\(m_{i}=E\left[y^{(i)}\right]=0\) (4)
\({\sigma_{yi,j}}^{2}=cov\left[y^{\left(i\right)},y^{\left(j\right)}\right]={\mathbf{x}^{(i)}\mathbf{x}^{(j)}}^{T}{\sigma_{b}}^{2}\)(5)
The input x is transformed by the nonlinear function φ ,
and σ yi ,j 2 is
calculated from:
\({\sigma_{yi,j}}^{2}=\varphi\left(\mathbf{x}^{(i)}\right)\varphi\left(\mathbf{x}^{(j)}\right)^{T}{\sigma_{b}}^{2}\)(6)
Actual y have a measurement error. The i-th samples including the
measurement error isy obs(i ) and its
measurement error is e (i) . Thus,y obs(i ) is given by:
\({y_{\text{obs}}}^{(i)}=y^{(i)}+e^{(i)}\) (7)
e (i ) assumes a normal distribution with
a zero mean and variance σ e2,
and e (i ) is independent for each sample.
Then, the covarianceσ yobs,i ,j 2between y obs(i ) andy obs(j ) is calculated as
follows:
\({\sigma_{\text{yobs\ }i,j}}^{2}=\varphi\left(\mathbf{x}^{(i)}\right)\varphi\left(\mathbf{x}^{(j)}\right)^{T}{\sigma_{b}}^{2}+\delta_{i,j}{\sigma_{e}}^{2}=K\left(\mathbf{x}^{(i)},\mathbf{x}^{(j)}\right)\)(8)
where, K is a kernel function.
In the GPR method, if the output\(\mathbf{y}_{\text{obs}}=\left({y_{\text{obs}}}^{(1)}\cdots{y_{\text{obs}}}^{(n)}\right)^{T}\)corresponding to the past input vector\(\mathbf{x}^{(1)}\cdots\mathbf{x}^{\left(n\right)}\) is used as
training data, the output for the new input vector\({\ \mathbf{x}}^{\left(n+1\right)}\) can be estimated as a normal
distribution with mean\(m\left(\mathbf{x}^{\left(n+1\right)}\right)\) and variance\(\sigma^{2}\left(\mathbf{x}^{\left(n+1\right)}\right)\), and is
estimated by:
\(m\left(\mathbf{x}^{(n+1)}\right)=\mathbf{k}\sum_{n}^{-1}\mathbf{y}_{\text{obs}}\)(9)
\(\sigma^{2}\left(\mathbf{x}^{(n+1)}\right)=K\left(\mathbf{x}^{(n+1)},\mathbf{x}^{(n+1)}\right)-\mathbf{k}\sum_{n}^{-1}\mathbf{k}^{T}\)(10)
subject to:
\(\mathbf{k}=\left[K\left(\mathbf{x}^{(1)},\mathbf{x}^{(n+1)}\right)\ \cdots\ K\left(\mathbf{x}^{\left(i\right)},\mathbf{x}^{\left(n+1\right)}\right)\ \cdots
K\left(\mathbf{x}^{(n)},\mathbf{x}^{(n+1)}\right)\right]\)(11)
Probability in target range
For the evaluation of each candidate in Bayesian optimization, the
probability P that each y is within the range\(\left(y_{\min}\leq y\leq y_{\max}\right)\) is
calculated25. It assumed that outputy (i ) is a normal distribution with mean\(m\left({\mathbf{x}_{\text{new}}}^{(i)}\right)\) and variance\(\sigma^{2}\left({\mathbf{x}_{\text{new}}}^{(i)}\right)\) when a new
sample x new(i ) is
input into the GPR model. P is thus calculated by integrating the
area of the normal distribution, as given by:
\(P\left({\mathbf{x}_{\text{new}}}^{(i)}\right)=\ \int_{y\_min}^{y\_max}{\frac{1}{\sqrt{2\pi\sigma^{2}\left({\mathbf{x}_{\text{new}}}^{(i)}\right)}}\exp\left\{-\frac{1}{2\sigma^{2}\left({\mathbf{x}_{\text{new}}}^{(i)}\right)}\left(y-m\left({\mathbf{x}_{\text{new}}}^{(i)}\right)\right)^{2}\right\}\text{dy}}\)(12)
Figure 2 shows the basic concept of P . When the predicted values
are close to the target range of y, the probability of candidates likec 1, which has m (x ) that are close
to the target range of y, is large; and when the predicted values are
far from the target range of y, the probability of candidates likec 2 with high\(\sigma^{2}\left(\mathbf{x}\right)\) is large.