However, there are various hidden assumptions in this method. First, each probability is computed independently, considering only language and all non-language studies. However, by lumping all non-language studies together, the probability of non-language studies activating the inferior frontal gyrus is diluted. For example, of the 199 non-language studies, let us assume that 90 are studying the cognitive function A and find activity at the inferior frontal gyrus. Let us also assume that, of these roughly 2154 non-language studies that don't find activity at the inferior frontal gyrus, 10 are studying cognitive function A. Under these assumptions, the probability of cognitive function A activating the inferior frontal gyrus is 0.9. In fact, inferior frontal gyrus activation would indicate the presence of cognitive function A, rather than language processing. Thus, the inferior frontal gyrus is specific for function A. Language studies recruit function A 17% of the time (a probability of 0.19 times a probability of 0.9), which is why language studies activate the inferior frontal gryus  19% of the time. Second, the probabilities are computed for each voxel or brain region independently. But observing a single region may not be very informative for two reasons. First, cognitive functions potentially arises from interacting and overlapping brain networks. We assume this model here, instead of the inaccurate or at least incomplete model in which single regions each execute a distinct cognitive function. Second, a single region's activation, relative to a collection of regions' activation, does not provide as much information when inferring whether one of two cognitive functions is engaged.
Thus, reverse inference must utilize a model in which cognitive tasks engage common and distinct cognitive functions, which are in turn supported by multiple, possibly overlapping, brain regions. However, this model requires an ontology of cognitive functions. In the above example, we need to know what cognitive function A is, and where it is most likely to be activated, along with all the other cognitive functions, and where they are most likely to be activated. To achieve this, we recently instantiated a mathematical model that encodes this notion \cite{25249407}, allowing us to estimate a nested cognitive ontology based on the BrainMap database. The ontology consists of a collection of cognitive components, the probability that a task would recruit a component and the probability a component would activate a voxel.
Here we consider how our mathematical framework and resulting ontology provide conceptual and computational advances over the original reverse inference framework described above. Our mathematical framework explicitly models how interactions between cognitive functions give rise to brain activation. Therefore, it is easy to perform reverse inference across the whole brain within our framework. Here, we perform whole brain reverse inference during task states on individual frames (i.e., single timepoints during a scan). We characterize the accuracy of the reverse inference, both in terms of how well the model fits the data and how well the model can be used to decode what task the subject is in. We also provide two novel applications. First, we extend a previous meta-analysis finding—when more cognitive functions are engaged during a task, activity at connector nodes in the fronto-parietal and dorsal attention networks increases. While the original analysis looked across tasks in the BrainMap database, we look across timepoints in a single scanning session during multiple task states. Our new method also allows us to test the extent to which each voxel is dependent on the number of cognitive functions engaged. Second, we clustered individual timepoints into “states”, and find that each state corresponds remarkably well to the activation of one or two traditional resting-state networks, and that connector node regions are activated in the most states. Thus, we argue that reverse inference, when computed jointly and probabilistically across the brain with a principled model, is valid and can be applied to deepen our understanding of global brain function.

Method

Joint Probabilistic Reverse Inference

To estimate cognitive functions, we used an author-topic model. This application has been described previously \cite{25249407}, but we describe it here in the context of reverse inference. An author-topic model defines an exact probabilistic model relating documents, authors, topics, and words \cite{Rosen_Zvi_2010}. Consider a collection of scientific documents or papers. Across this collection, we have authors and their words. The author-topic model lets us discover abstract topics, which are made concrete by their association with certain authors and their association with certain words. An important point is that the topics are never observed because topics are abstract. The model generates what the actual topics are,  in that each topic has a probability of being written about by a particular author, and each word has a probability of appearing given a certain topic. We analyzed the BrainMap database with the author-topic model. The BrainMap database to which the model is applied contains 10,000 imaging experiments. Each experiment is tagged with one of 83 tasks. Each experiment is also tagged with the activated voxels in MNI space. If we think of documents as experiments, authors as behavioral tasks, topics as cognitive components, and voxels as words, there is a one-to-one mapping between the author-topic model in text mining and our problem. We applied the author-topic model to the BrainMap database. We don’t assume functions like “emotion”, “language” or “vision”. The model estimates them. One parameter we have to set is the number of cognitive components. Here, we consider the 12 component model (see \cite{25249407} for a complete discussion of choosing the number of components).
Essentially, we formulate an optimization problem where we want the parameters \(\theta\) and \(\beta\) that maximizes the posterior probability \(\Pr\left(\theta,\ \beta\ |\ BrainMap\ Data\right)\), where \(\theta\) is the \(Pr(Component | Task)\) and \(\beta\) is the \(Pr(Voxel | Component)\). This is achieved with an expectation maximization algorithm. Given the estimate of \(\beta\), it is actually quite simple to perform reverse inference, as we have the probability of a voxel being active given a component. More specifically, for a reverse inference, we want to find the new \(\theta\ast\) that maximizes the posterior probability of \(\theta\) given the BrainMap estimate of \(\beta\ \), which was estimated with the author-topic model, and activation data from a new "experiment’s" whole brain activation; here, the experiment is a single time point (i.e., frame or volume) of an fMRI time series. Essentially, we find the \(\theta\ast\) that maximizes \(\Pr\left(\theta\left|\beta,\ Whole\ Brain\ Data\right|\right)\). This is also done via expectation maximization. This gives us, for each component, \(P(component\ |\ Whole\ Brain\ Data)\). Thus, for each time point in an fMRI time series, we have a probability of each component being engaged. This distribution sums to 1.