Figure 5. Results of a bootstrapping analysis applied to the B3LYP contact density. Regression lines corresponding to bootstrap samples (1000 in each case) are coloured blue, their means are marked as black lines. The red bands represent 95% confidence intervals of prediction (assuming a normal distribution). (A) Result for all data points (i.e. excluding 9 and 10 , not shown). The histogram in the inset shows the (non-normal) distribution of function values for small values of the contact density. For the histogram, 10000 bootstrap samples were drawn. (B) Result for a data set that excludes data points 5 , 6 , 7 , and 8 (coloured gray) from the data set shown in (A).
For small values of the contact density, the ensemble of regression lines (blue) is skewed toward larger values of the isomer shift. This non-normal distribution of function values is illustrated in the inset (Figure 5A) and can be explained by the presence of the cluster of four data points to the very left of the calibration plot. Neglecting these four data points corresponding to complexes 5 , 6 ,7 and 8 changes the distribution of regression lines and their mean significantly (Figure 5B). Both intercept and slope (which are strongly correlated due to the large values of the contact density) decrease, which also decreases the ability to discriminate between predictions. In the absence of additional data points at intermediate to low contact densities, it is difficult to provide a conclusive answer as to which regression line is more reliable. Nevertheless, we consider the cluster of four data points a valuable addition to the data set for two reasons: (i) the complexes associated with this cluster (5 , 6 , 7 , and 8 ) have different structural motifs, providing an argument against a systematic bias; (ii) coefficients of linear regression models (intercept and slope) tend to be biased toward small absolute values (“regression toward the mean”130).
In an effort to future-proof the calibration presented here and the statistical analysis, we constructed a tool to include more data points, facilitating manifest statistical conclusions beyond the data reported here. To this end, an online database is set up (tinyurl.com/mbs-notebook), which is publicly accessible and open to submissions from other researchers. This database can be used in at least three ways:
  1. Obtaining a predicted isomer shift or quadrupole splitting including the associated uncertainty estimates simply by typing in thecomputed contact density or quadrupole splitting.
  2. Submitting reference data points for additional complexes to obtain more rigorous statistics; the data points will be reviewed by the authors.
  3. Obtaining complete statistical analyses by submitting new data sets computed with a different computational setup, e.g. different basis sets, solvation models, relativistic corrections, etc.; the data sets will be validated by the authors.
With this database, the authors provide a tool for the prediction and rigorous statistical analysis of computed Mössbauer parameters that will hopefully be of value for all researchers interested in the analysis of electronic structures with 57Fe Mössbauer spectroscopy.