Authorea

Kyle Cranmer undid 45a3211a25741106418c927e0a663aec04bf5ffe about 9 years ago

Commit id: 87f5c6e7ca2bf897b776f6129e79cbb8070b8c6d

deletions | additions

\subsection{Auxiliary measurements}\label{S:AuxMeas} Auxiliary measurements or control regions can be used to estimate or reduce the effect of systematic uncertainties. The signal region and control region are not fundamentally different. In the language that we are using here, they are just two different channels. A common example is a simple counting experiment with an uncertain background. In the frequentist way of thinking, the true, unknown background in the signal region is a nuisance parameter, which I will denote $\nu_B$.\footnote{Note, you can think of a counting experiment in the context of Eq.~\ref{Eq:markedPoisson} with $f(x)=1$, thus it reduces to just the Poisson term.} If we call the true, unknown signal rate $\nu_S$ and the number of events in the signal region $n_{\rm SR}$ then we can write the model $\Pois(n_{\rm SR} | \nu_S + \nu_B)$. As long as $\nu_B$ is a free parameter, there is no ability to make any useful inference about $\nu_S$. Often we have some estimate for the background, which may have come from some control sample with $n_{\rm CR}$ events. If the control sample has no signal contamination and is populated by the same background processes as the signal region, then we can write $\Pois(n_{\rm CR}|\tau \nu_B)$, where $n_{\rm CR}$ is the number of events in the control region and $\tau$ is a factor used to extrapolate the background from the signal region to the control region. Thus the total probability model can be written $\F_{\rm sim}(n_{\rm SR},n_{\rm CR} | \nu_S, \nu_B) = \Pois(n_{\rm SR} | \nu_S + \nu_B)\cdot \Pois(n_{\rm CR}|\tau\nu_B)$. This is a special case of Eq.~\ref{Eq:simultaneous} and is often referred to as the ``on/off' problem~\cite{Cousins:2008zz}. Based on the control region alone, one would estimate (or `measure') $\nu_B = n_{\rm CR}/\tau$. Intuitively the estimate comes with an `uncertainty' of $\sqrt{n_{\rm CR}}/\tau$. We will make these points more precise in Sec.~\ref{S:estimation}, but the important lesson here is that we can use auxiliary measurements (ie. $n_{\rm CR}$) to describe our uncertainty on the nuisance parameter $\nu_B$ statistically. Furthermore, we have formed a statistical model that can be treated in a frequentist formalism -- meaning that if we repeat the experiment many times $n_{\rm CR}$ will vary and so will the estimate of $\nu_B$. It is common to say that auxiliary measurements `constrain' the nuisance parameters. In principle the auxiliary measurements can be every bit as complex as the main signal region, and there is no formal distinction between the various channels. The use of auxiliary measurements is not restricted to estimating rates as in the case of the on/off problem above. One can also use auxiliary measurements to constrain other parameters of the model. To do so, one must relate the effect of some common parameter $\alpha_p$ in multiple channels (ie. the signal region and a control regions). This is implicit in Eq.~\ref{Eq:simultaneous}. \subsection{Frequentist and Bayesian reasoning} The intuitive interpretation of measurement of $\nu_B$ to be $n_{\rm CR}/\tau \pm \sqrt{n_{\rm CR}}/\tau$ is that the parameter $\nu_B$ has a distribution centered around $n_{\rm CR}/\tau$ with a width of $\sqrt{n_{\rm CR}}/\tau$. With some practice you will be able to immediately identify this type of reasoning as Bayesian. It is manifestly Bayesian because we are referring to the probability distribution of a parameter. The frequentist notion of probability of an event is defined as the limit of its relative frequency in a large number of trials. The large number of trials is referred to as an ensemble. In particle physics the ensemble is formed conceptually by repeating the experiment many times. The true values of the parameters, on the other hand, are states of nature, not the outcome of an experiment. The true mass of the $Z$ boson has no frequentist probability distribution. The existence or non-existence of the Higgs boson has no frequentist probability associated with it. There is a sense in which one can talk about the probability of parameters, which follows from Bayes's theorem: \begin{equation} \label{Eq:Bayes} P(A|B) = \frac{P(B|A) P(A)}{P(B)} \; . \end{equation} Bayes's theorem is a theorem, so there's no debating it. It is not the case that Frequentists dispute whether Bayes's theorem is true. The debate is whether the necessary probabilities exist in the first place. If one can define the joint probability $P(A,B)$ in a frequentist way, then a Frequentist is perfectly happy using Bayes theorem. Thus, the debate starts at the very definition of probability. The Bayesian definition of probability clearly can't be based on relative frequency. Instead, it is based on a degree of belief. Formally, the probability needs to satisfy Kolmogorov's axioms for probability, which both the frequentist and Bayesian definitions of probability do. One can quantify degree of belief through betting odds, thus Bayesian probabilities can be assigned to hypotheses on states of nature. In practice human's bets are not generally not `coherent' (see `dutch book'), thus this way of quantifying probabilities may not satisfy the Kolmogorov axioms. Moving past the philosophy and accepting the Bayesian procedure at face value, the practical consequence is that one must supply prior probabilities for various parameter values and/or hypotheses. In particular, to interpret our example measurement of $n_{\rm CR}$ as implying a probability distribution for $\nu_B$ we would write \begin{equation} \pi(\nu_B | n_{\rm CR}) \propto f(n_{\rm CR} | \nu_B) \eta(\nu_B) \; , \end{equation} where $\pi(\nu_B | n_{\rm CR})$ is called the \textit{posterior} probability density, $f(n_{\rm CR} | \nu_B)$ is the likelihood function, and $\eta(\nu_B)$ is the \textit{prior} probability. Here I have suppressed the somewhat curious term $P(n_{\rm CR})$, which can be thought of as a normalization constant and is also referred to as the \textit{evidence}. The main point here is that one can only invert `the probability of $n_{\rm CR}$ given $\nu_B$' to be `the probability of $\nu_B$ given $n_{\rm CR}replace_content#x27; if one supplies a prior. Humans are very susceptible to performing this logical inversion accidentally, typically with a uniform prior on $\nu_B$. Furthermore, the prior degree of belief cannot be derived in an objective way. There are several formal rules for providing a prior based on formal rules (see Jefferey's prior and Reference priors), though these are not accurately described as representing a degree of belief. Thus, that style of Bayesian analysis is often referred to as objective Bayesian analysis. {\flushleft{Some useful and amusing quotes on Bayesian and Frequentist reasoning:}} \begin{quote} {\em ``Using Bayes's theorem doesn't make you a Bayesian, \textbf{always} using Bayes's theorem makes you a Bayesian.''} --unknown \end{quote} \begin{quote} {\em ``Bayesians address the questions everyone is interested in by using assumptions that no one believes. Frequentist use impeccable logic to deal with an issue that is of no interest to anyone.''}- Louis Lyons \end{quote} \subsection{Consistent Bayesian and Frequentist modeling of constraint terms}\label{S:Constraint} Often a detailed probability model for an auxiliary measurement are not included directly into the model. If the model for the auxiliary measurement were available, it could and should be included as an additional channel as described in Sec.~\ref{S:AuxMeas}. The more common situation for background and systematic uncertainties only has an estimate, ``central value'', or best guess for a parameter $\alpha_p$ and some notion of uncertainty on this estimate. In this case one typically resorts to including idealized terms into the likelihood function, here referred to as ``constraint terms'', as surrogates for a more detailed model of the auxiliary measurement. I will denote this estimate for the parameters as $a_p$, to make it manifestly frequentist in nature. In this case there is a single measurement of $a_p$ per experiment, thus it is referred to as a ``global observable'' in \roostats. The treatment of constraint terms is somewhat \emph{ad hoc} and discussed in more detail in Section~\ref{S:ConstraintExamples}. I make it a point to write constraint terms in a manifestly frequentist form $f(a_p | \alpha_p)$. Probabilities on parameters are legitimate constructs in a Bayesian setting, though they will always rely on a prior. In order to distinguish Bayesian pdfs from frequentist ones, greek letters will be used for their distributions. For instance, a generic Bayesian pdf might be written $\pi(\alpha)$. In the context of a main measurement, one might have a prior for $\alpha_p$ based on some estimate $a_p$. In this case, the prior $\pi(\alpha_p )$ is really a posterior from some previous measurement. It is desirable to write with the help of Bayes theorem \begin{equation} \label{eq:urprior} \pi(\alpha_p | a_p) \propto L( \alpha_p ) \eta(\alpha_p) = f(a_p|\alpha_p) \eta(\alpha_p)\; , \end{equation} where $\eta(\alpha_p)$ is some more fundamental prior.\footnote{Glen Cowan has referred to this more fundamental prior as an 'urprior', which is based on the German use of 'ur' for forming words with the sense of `proto-, primitive, original'.} By taking the time to undo the Bayesian reasoning into an objective pdf or likelihood and a prior we are able to write a model that can be used in a frequentist context. Within \roostats, the care is taken to separately track the frequentist component and the prior; this is achieved with the \texttt{ModelConfig} class. If one can identify what auxiliary measurements were performed to provide the estimate of $\alpha_p$ and its uncertainty, then it is not a logical fallacy to approximate it with a constraint term, it is simply a convenience. However, not all uncertainties that we deal result from auxiliary measurements. In particular, some theoretical uncertainties are not statistical in nature. For example, uncertainty associated with the choice of renormalization and factorization scales and missing higher-order corrections in a theoretical calculation are not statistical. Uncertainties from parton density functions are a bit of a hybrid as they are derived from data but require theoretical inputs and make various modeling assumptions. In a Bayesian setting there is no problem with including a prior on the parameters associated to theoretical uncertainties. In contrast, in a formal frequentist setting, one should not include constraint terms on theoretical uncertainties that lack a frequentist interpretation. That leads to a very cumbersome presentation of results, since formally the results should be shown as a function of the uncertain parameter. In practice, the groups often read Eq.~\ref{eq:urprior} to arrive at an effective frequentist constraint term. I will denote the set of parameters with constraint terms as $\mathbb{S}$ and the global observables $\mathcal{G}=\{a_p\}$ with $p\in\mathbb{S}$. By including the constraint terms explicitly (instead of implicitly as an additional channel) we arrive at the total probability model, which we will not need to generalize any further: \begin{equation} \label{Eq:ftot} \F_{\textrm{tot}}(\datasim, \mathcal{G}|\alpha) = \prod_{c\in\rm channels} \left[ \Pois(n_c|\nu_c(\alpha)) \prod_{e=1}^{n_c} f_c(x_{ce}|\alpha) \right] \cdot \prod_{p \in \mathbb{S}} f_p(a_p | \alpha_p)\; . \end{equation}