Significance

So why is the number of samples used to do the calculation such a concern? It comes down to the idea of statistical significance; the question of how confident we can be that the associations we observe are not coincidental. If you have a sour stomach after a glass of milk, can you be sure the milk was off, or rather have you just discovered that you’re lactose intolerant? If you smoke five cigarettes with no notable ill effects, does that prove that they’re harmless to you?

We saw [\ref{fig:comp_examples_err2null}] that certain regions see, on average, less rainfall during El Niño periods. But what if we had just pulled the values of a few random months from the full data set and averaged those? Even though the mean anomaly of the full data set is, by definition, zero, the odds are reasonable that you could by chance select only high or low values. Although, the odds of this decrease as the size of the samples increase. To illustrate we’ll consider just four grid cells, which contain large cities in south-east Africa.