Statistically Modelling Observed Scatter and Bias


While investigating the \(\sim 7\%\) bias Nick was seeing in his velocity dispersion plot, it became obvious that this bias is a result of binning a sample with scatter in the observables and inherant functions in mass. Originally we thought these relationships would not manifest themselves in the median or mean values of the true observables (table values) that we compare with, but in this walk through we will show otherwise.

There are several key probabilities we need to know in order to accurately predict the expectation value of some observable with mass. In more statistical language, we would like to know \(P(\hat{\theta}|M)\). That is, given a cluster of mass \(M\), what is the probability the observable is detected as \(\hat{\theta}\). For example, the observable \(\hat{\theta}\) can be velocity dispersion, some richness, or total luminosity.

Let’s take velocity dispersion as an example here. There are two relevant velocity dispersions in our studies. The first is the true 3D/2D velocity dispersion \(\sigma\) which we cannot measure in the real universe. Evrard et al. (2008) measured \(\sigma\) for halos in N-body simulations and showed that they relate to the critical mass \(M_{200}\) of the host halo on a very tight relation \[\langle \sigma | M \rangle = 1093 \left(\frac{h(z) M}{1e15}\right)^{0.34}\] This relationship has a very small lognormal scatter of \(S_{\ln\sigma | \ln M}\sim 4\%\). So: \[P(\sigma | M) = \frac{1}{\sqrt{2\pi}S_{\ln\sigma | \ln M}} \exp\left({\frac{(\ln\sigma - \ln\langle \sigma | M \rangle)^{2}}{2 S_{\ln\sigma | \ln M}^{2}}}\right )\]

The second velocity dispersion is the observed velocity dispersion \(\hat{\sigma}\). In Gifford et al. (2013), we define this as the l.o.s velocity dispersion. This has all kinds of ugly things in it including cluster shape effects, cluster environment contamination, substructure, redshift-space interlopers, and non-gaussianity. Not to mention the low number statistics that exist at low mass. Even though this is a messy observable, most are, and this is what we need to predict for a given mass \(M\). So here is the generative model for observable: \[P(\hat{\sigma} | M) = \sum_{\sigma} P(\hat{\sigma} | \sigma) P(\sigma | M)\] Really there are completeness and purity terms in there as well, but lets ignore those for a second. So that is our expected distribution of \(\hat{\sigma}\) for a given \(M\). The other term is equally important \(P(\hat{\sigma} | \sigma)\). This represents the probability of observing a velocity dispersion \(\hat{\sigma}\) given \(\sigma\). Why is this important? When we observe clusters in the real universe, we don’t measure the “Evrard" velocity dispersion \(\sigma\). We are randomly drawing from a distribution where the \(\sigma\) is the expectation value. This is what Gifford et al. (2013) means by l.o.s effects. So what is that distribution? It’s approximately lognormal with \(S_{\log(\hat{\sigma}) | \log(\sigma)}\sim 15\%\). So: \[P(\hat{\sigma} | \sigma) = \frac{1}{\sqrt{2\pi}S_{\ln\hat{\sigma} | \ln\sigma}} \exp\left({\frac{(\ln\hat{\sigma} - \ln\sigma)^{2}}{2 S_{\ln\hat{\sigma} | \ln\sigma}^2}}\right)\]

But we are binning! That means that we have a distribution of masses in our bin that we must integrate over. This integral takes the form: \[\langle \hat{\sigma} \rangle = \frac{\int dM \frac{d \langle n \rangle}{dM} P(\hat{\sigma} | M)}{\int dM \frac{d \langle n \rangle}{dM}}\] So this is what we want to compare the measured observable with. A scatter convolved, mass-function weighted expectation value.

This being said, this is not what we are interested in. Really, we are interested in the average mass per bin. This is easy if we are binning on mass. Simply: \[\langle M \rangle = \frac{\int M \frac{d \langle n \rangle}{dM} dM}{\int \frac{d \langle n \rangle}{dM} dM}\] But unfortanately, we are not binning on mass in the real universe. Instead, we are binning on some observable \(\hat{\theta}\) and asking what is the probability of observing \(M\). This switches some things around in the integrals, so let’s take a look.

Before when we needed to know \(\langle M \rangle\), we simply mass average weighted by the mass function within some bin. We no longer have this luxury. Instead, lets start with what we have/want. We have \(\hat{\theta}\) and want \(P(M | \hat{\theta})\). If we know the scatter between these two things, we could make some basic assumptions. We could say that \(P(M | \hat{\theta})\) is log-normal with some scatter and a power-law relationship. However, to get a value for the bin we would need \(\frac{d\langle n \rangle}{d \hat{\theta}}\). This requires turning \(\frac{d\langle n \rangle}{d M}\) into \(\frac{d\langle n \rangle}{d \hat{\theta}}\), although I suppose that can be achieved by knowing the average power-law relationship between \(M\) and \(\hat{\theta}\). Then \[\frac{d\langle n \rangle}{d \hat{\theta}} = \frac{d\langle n \rangle}{d M} \frac{d M}{d \hat{\theta}}\]

So what does this mean in terms of an observable?


  1. A. E. Evrard, J. Bialek, M. Busha, M. White, S. Habib, K. Heitmann, M. Warren, E. Rasia, G. Tormen, L. Moscardini, C. Power, A. R. Jenkins, L. Gao, C. S. Frenk, V. Springel, S. D. M. White, J. Diemand. Virial Scaling of Massive Dark Matter Halos: Why Clusters Prefer a High Normalization Cosmology. 672, 122-137 (2008). Link

  2. D. Gifford, C. Miller, N. Kern. A Systematic Analysis of Caustic Methods for Galaxy Cluster Masses. 773, 116 (2013). Link