Predicting Fe and Mn concentrations from optical measurements
using PLSR
We used PLSR to compute predictions of total and soluble Fe and Mn
concentrations based on the correlation between absorbance spectra and
sampling data. Data analysis and QA/QC was performed in the R
programming environment (R.v.4.2.1). Model building was conducted using
the pls package (Mevik et al. 2020; R Core Team 2022), as
described in Supplementary Information 1.1.
Separate PLSR models were developed for each variable (total Fe, soluble
Fe, total Mn, and soluble Mn) and deployment. Based on the distinctly
different chemical and biological characteristics between layers of the
reservoir (i.e., epilimnion and hypolimnion), we found that the best fit
was obtained when we used different models for the two layers. In
stratified reservoirs such as FCR, Fe and Mn concentrations are much
higher in the hypolimnion than the epilimnion. Therefore, we had an
epilimnion model which included data from 0.1, 1.6, and 3.8 m and a
hypolimnion model which included data from 6.2, 8.0, and 9.0 m (Table
1). Although we also collected data from 5.0 m, we did not include them
in our analyses since this is at the transition between the two layers
(metalimnion; see McClure et al. 2018) and thus not applicable to either
layer. We developed separate models for the Turnover Deployment and the
Oxygen On Deployment. In the end, we had four separate models for each
of the four variables (total and soluble Fe and Mn), resulting in 16
different models.
To assess the uncertainty of the predictions made using PLSR, we
calculated nonparametric bootstrap predictive intervals following
methods described by Denham (1997) and reported in Supplementary
Information 1.2. Model skill was assessed using the coefficient of
determination (R2) from the linear regression between
predicted and observed values, as well as the root mean squared error of
prediction (RMSEP) for each model (following Wold et al. 2001 and Mevik
et al. 2020).
All observational data, including the spectrophotometer data, are
published in the Environmental Data Initiative repository (Carey et al.
2022a, Carey et al. 2022b, Carey et al. 2022c, Schreiber et al. 2022,
and Hammond et al. 2022). All code used to analyze the spectrophotometer
data with PLSR and generate the figures is available in the Zenodo
repository (Hammond 2022).
3. Results
3.1 Routine Fe and Mn sampling trends
Weekly sampling at FCR showed levels of Fe and Mn in exceedance of the
EPA standards during the 2020 and 2021 stratified periods, with maximum
total Fe and Mn concentrations of 18.5 mg/L and 2.2 mg/L, respectively
(Figure 2). Hypolimnetic concentrations of both metals generally
increased throughout the summer stratified period of each year, until
reservoir fall turnover (Figure 2). Following reservoir turnover,
concentrations of both metals remained low (< 1 mg/L) until
the following spring. HOx activation from 11 June until 02 December in
2020 resulted in substantially lower hypolimnetic total Fe but not total
Mn concentrations (Figure 2).
3.2 PLSR Model Performance
A comparison of skill metrics among the 16 models revealed that PLSR
performed best for models calibrated with higher Fe and Mn
concentrations that exhibited a larger standard deviation (Tables 1, S1;
Figure S10). Model skill was also sensitive to the number of components
included in each model. For the Turnover Deployment, the number of
components included in the PLSR models ranged from 3-5 (9-14% of n).
For the Oxygen On Deployment, 4 components were used for all PLSR models
(8-9% of n) (Table 1). Sample size was negatively correlated with
R2, but positively correlated with RMSEP (Figure S10).
Turnover Deployment models explained a high proportion of the
variability in total and soluble Fe and Mn concentrations, excluding
hypolimnetic soluble Fe which had a poor model fit (R2= 0.06), due to extremely low concentrations (median = 0.02 mg/L) during
this time period (Table 1; Figure 3). In comparison, Oxygen On
Deployment models explained a lower proportion of the variability in
total and soluble Fe and Mn concentrations, despite having larger sample
sizes for calibration (Table 1). In particular, PLSR model performance
for total and soluble Mn was notably lower for the Oxygen On Deployment
than for the Turnover Deployment (Tables 1, S1). PLSR model performance
also varied between the hypolimnion and epilimnion. For most models, the
epilimnetic PLSR model had a higher R2 value than the
corresponding hypolimnetic PLSR model (Table 1).
In most cases, PLSR predictions were within the range of concentration
values in the calibration dataset (Figures 3, S11-12), but they did not
capture some of the high-magnitude fluctuations in the sampling data.
Analysis of the Fe and Mn time series (Figures 4D-E and 5D-E) and
calibration (Figures S11-12) suggests that inaccuracy in the models was
largely attributed to high calibration error for observations far from
the mean concentration of the calibration data (i.e., outliers).
Additionally, when predicting variables with relatively low
concentrations (< 1 mg/L), especially with the epilimnion
models, some predictions were in the negative range (Figures 4D-E;
5D-E).
3.3 Reservoir Turnover Deployment