It is a thing of no great difficulty to raise objections against another man's oration,—nay, it is a very easy matter; but to produce a better in its place is a work extremely troublesome.  -PlutarchIntroduction Although many twin studies have been conducted (which is quite an understatement; there are almost 9,000 hits for "twin study" on PubMed!), there have long been critics who argue that they are scientifically worthless. Obviously, the behavior geneticists who conduct these studies with the aim of separating the influences of genes from that of environment are none too happy about people calling one of their favorite research designs fatally flawed. So how do they respond, and are their responses more compelling than the original criticisms? I will dig into these in this article (which will be long, so be warned).First, I should define what a twin study is in this context. When the phrase "twin study" is used, it is almost always used to refer to a study using the so-called "classical twin design" (abbreviated CTD). So for the rest of this paper, I will use the phrase "twin study" to refer only to studies using the CTD. This type of study involves comparing monozygotic (MZ) twins and dizygotic (DZ) twins with the purpose of estimating how much of the variation in a given trait (i.e. %) is due to genes, a.k.a. "heritability". The way this is done in the CTD is by calculating concordance (the degree of similarity between each twin in a pair; referred to as correlation if a disease is being studied rather than a normal trait) on the trait in question for the monozygotic twins and dizygotic twins separately, and comparing the two concordances. (Important note: monozygotic twins-aka "identical twins"-are assumed to share 100% of their genes, while dizygotic twins-aka "fraternal twins"-only share 50% of theirs. More on the first of these assumptions later.) These concordances are then converted into an estimate of heritability (in this case, narrow-sense heritability, h2, which only includes additive genetic effects) using this formula (known as Falconer's formula):h2 = 2(rMZ - rDZ)Where rMZ and rDZ are, respectively, the phenotypic correlation (for traits) or concordance (for diseases) between MZ twins, and that between DZ twins, on the trait/disease of interest.\cite{Mayhew_2017}Thus, when this formula is used (which it typically is in twin studies), if the concordance is higher among monozygotic twins than among dizygotic twins, this is taken to be due to genetic differences between the two sets of twins, and will yield a nonzero heritability estimate. In contrast, if the concordance rates were the same between MZ and DZ twins on the trait, it would indicate an estimated narrow-sense heritability of zero for the trait, under the CTD model\cite{Guo2001}. This is obvious from looking at the formula above, where rMZ and rDZ being equal makes the value of h2 equal to 0. Note that the extent to which h2 > 0 is often conflated with the extent to which a trait is under "genetic influence" in the twin study literature.Final note: there are two other components to this model: shared and nonshared environment. The three components of the model, to recap, are: additive genetic variance (A), shared/common environment (C), and nonshared environment (E). Thus it is often called the "ACE model".\cite{hh2005} The basic idea behind this model is that total phenotypic variance is exactly equal to the sum of each of these three components: A + C + E.But there have long been many criticisms of the specific way that twin studies are conducted, and critics claim that such studies can't really separate genetic and environmental contributions to any trait. But Beatty et al. (2002) inform us that such critics are mistaken: "..., satisfactory responses to critics (see for example, Bouchard, 1993, 1994; Goldsmith, 1983; Lykken, 1995; Martin, Boomsma, & Machin, 1997; Scarr & Carter-Saltzman, 1979; Segal, 1997, 1999) have led contemporary behavior geneticists to describe the twins design as “the perfect natural experiment” (Martin et al., 1997. p. 387)."\cite{Beatty_2002}, (p. 2)Oh, so that's good to know, apparently criticisms of twin studies have already been conclusively refuted. So I'll go back to these sources later in this article to see if they actually conclusively show that critics of the twin method are wrong about its purported uselessness.Criticism 1: The equal environments assumptionFor decades, critics of twin studies have argued that they don't really separate the influence of genes and environment on a specific trait because the validity of their heritability estimates depends on the so-called "equal environments assumption" (abbreviated EEA). This is the assumption that the within-twin similarity of environmental exposures that are relevant for the causation of the trait being studied are equal for MZ and DZ twins. In other words, it assumes that MZ twins experience equally similar environments as do DZ twins, at least with regard to the environmental factors that can cause the trait being studied. If this assumption is wrong, it will lead to heritability estimates that are also wrong: if MZ twins experience more similar environments than DZ twins, this can produce higher concordance rates without any genetic influence on the trait whatsoever.\cite{Guo2001}As you may have guessed, there is a lot of evidence that MZ twins experience more similar environments than do DZ twins (for a summary, see \cite{Simons2012}; see also Joseph 2002, Horwitz et al. 2003, Rende et al. 2005). In fact, even an article by BGists aiming to defend the validity of twin studies acknowledged that "There is overwhelming evidence that MZ twins are treated more similarly than their DZ counterparts" (Evans & Martin 2000). But to be fair, that's not necessarily the same as the kind of fundamental invalidating flaw critics often portray violations of the EEA to be. Typically, behavior geneticists respond to this evidence by making one or more of the following arguments (hereafter referred to as "argument #", where "#" is the number listed corresponding to each argument immediately below):Greater environmental similarity between twins in a pair is not significantly associated with greater phenotypic similarity. This argument is based on the assumption that the EEA does not have to be entirely valid for all environmental factors, but instead just for those that are "trait-relevant", i.e. that have a causal effect on whatever trait is being studied. The practice of defining the EEA as applying only to "trait-relevant" environmental factors is common in behavior genetic research (e.g. \cite{Mitchell2007}, \cite{Bouchard_2002},\cite{Kendler_1993}). This argument further claims that the hypothesis that the EEA is invalid for trait-relevant environmental factors must be tested rather than assumed to always be true, after which studies are cited and/or performed that supposedly test this definition of the EEA. These "EEA-test" studies come in many forms: a) "perceived zygosity" or "misclassified twin" studies, a specific type of twin study designed to test the EEA. Such studies take advantage of the fact that some MZ twins are perceived as being DZ and vice versa. The idea is that if greater genetic similarity between MZ twins is what makes them more similar, then actual zygosity should have a greater impact on twin similarity, i.e. MZ twins should be more phenotypically similar to each other than DZ twins even if the MZ twins are incorrectly perceived as DZ. By contrast, if greater environmental similarity (such as is, presumably, experienced by twins perceived to be MZ, whether they are actually MZ or not) explains the greater phenotypic similarity of MZ twins relative to DZ twins, then MZ twins perceived as DZ should be no more phenotypically similar to each other than are DZ twins correctly perceived as DZ. Indeed, actual zygosity has been found to be more strongly associated with twin concordance in multiple such studies.\cite{Conley_2013} b) studies aimed at correlating physical similarity and phenotypic similarity between twins in a pair, and if no such (statistically significant) correlation is found, then it is concluded that the EEA is "valid" in that it doesn't significantly affect heritability estimates for whatever trait is being studied.\cite{Kendler_1993} c) studies including perceived zygosity as well as other measures of environmental similarity, controlling for such measures, and then comparing the "corrected" MZ and DZ correlations to see if the difference between the two correlations remains significant. Two such studies have found that it does.\cite{LaBuda_1997}\cite{CRONK_2002} Greater environmental similarity between members of an MZ twin pair than a DZ twin pair, rather than causing the greater phenotypic similarity between MZ than DZ twins, is actually due to the MZ twins' greater genetic similarity. This argument posits that because MZ twins are more genetically similar than DZ twins, they are treated more similarly and have more similar behavior, which in turn leads to the environments of MZ twins being more similar than those of DZ twins. Many twin researchers have made this argument since at least the 1950s. It has been widely criticized as circular because it assumes that the greater genetic similarity between MZ than DZ twins causes their greater phenotypic similarity, which is also what it purports to demonstrate.\cite{Joseph_2012} Non-CTD studies that do not rely on the EEA generate similar heritability estimates to CTD studies for the same trait. Ostensibly, the similar results generated by these non-CTD methods "validate" the EEA on which CTD studies depend. The word "validate" or derivatives thereof is often used in this literature (for two recent examples, see \cite{Kendler_2014} and \cite{BARNES_2014}). The methodologies of these studies include twins reared apart, adoption studies, and full- vs. half-sibling studies.\cite{BARNES_2014}Response by critics of behavior geneticsCriticism 1: Response to Argument 1Joseph commonly rebuts this argument by pointing out that it reveals a different standard of logical inference that twin researchers use in assessing twin studies versus family studies. Specifically, according to twin researchers, the greater similarity of environments among MZ compared to DZ twins does not necessarily invalidate the classical twin method as a means of separating genetic and environmental influences, but the greater similarity of environments shared by members of the same family relative to members of the general population does necessarily invalidate the ability of family studies to separate such influences. For this reason, Joseph argues that twin researchers are guilty of special pleading by arguing that standards that apply to family studies should not apply to classical twin studies. As an example, he wrote in 2002 that: "genetic researchers acknowledge that family studies do not prove the existence of genetic factors—since the clustering of a condition among family members could be caused by purely environmental factors. However, no one to my knowledge has argued for a “trait-relevant EEA” for family studies; that is, the claim that family studies prove the existence of genetic influences unless specific environmental factors shared by family members are demonstrated to have a causal relationship to the condition in question. Quite the contrary; family studies are acknowledged to be confounded by the simple fact that family members share a common environment. In the same way, the twin method is confounded by the greater environmental similarity of MZ twins, regardless of whether the specific trait-relevant environmental factors are known."\cite{Joseph_2002}, (p. 76)Joseph also notes that by using this argument and defining the EEA as needing to be "trait-relevant", twin researchers attempt to falsely claim that the burden lies on critics of the classical twin method to prove that MZ environments are more similar than those of DZ twins specifically with regard to those environmental factors that affect a given trait. He also points out some other logical problems with this argument in some of his papers.\cite{jay2013}\cite{Joseph_2002}In addition, Joseph and other researchers have been highly critical of the EEA-test studies cited as part of this argument.\cite{jay2013,Joseph_2012}\cite{Richardson_2005}\cite{Beckwith_2008}Criticism 1: Response to Argument 2Joseph has noted the following about this argument: "...this is a circular argument, because twin researchers' conclusion that MZ-DZ differences are explained by genetics is based on assuming the very same thing."\cite{jay2013}, (p. 5) Furthermore, he points out some other logical issues with it, such as that it portrays "...children (twins) as behaving according to a genetic behavioral blueprint, yet somehow parents and other adults have themselves tossed aside the blueprint and are able to flexibly change their behavior and treatment of others on the basis of the twins' behavior and personalities," which he wittily sums up with the phrase "Genetically-programmed human children, meet your ever-so-flexible human parents." He has also pointed out another problem with this argument, namely that "...even if twins do indeed create more similar environments for themselves because of their greater genetic similarity, MZ pairs could still show much higher concordance for psychiatric disorders than DZ pairs for purely environmental reasons."\cite{Joseph_2012}
AbstractThe first aim of this article is to shed light on the two distinct definitions of the assumption of additivity in quantitative behavior genetics and its associated methodologies, such as heritability estimation. In addition, this article aims to assess the validity of this assumption, based on both of the ways in which it has been defined. There appear to be two related but distinct definitions of the additivity assumption: 1) the assumption that the total phenotypic variance of any quantitative trait can be expressed as the sum of the genetic variance and environmental variance components for that trait, i.e. that the whole is equal to the sum of its separable parts, and 2) the assumption that most, if not all, genetic variance in a trait is additive in nature, i.e. that the total genetic effect is mostly or entirely equal to the sum of the effects of multiple genes acting independently and additively. It is hoped that highlighting these distinct meanings will help to clarify future discussions surrounding research in quantitative behavior genetics and the assumptions on which such research depends.IntroductionThe assumption of additivity underlying much research in the quantitative genetics of human behavioral traits (e.g. twin studies, heritability analysis, etc.) has long been criticized as untenable. But what exactly does this assumption mean? Here I outline the confusing fact that there appear to be two distinct answers to this question, and attempt to distinguish between these two distinct assumptions that are often confusingly referred to in similar or even identical terms. I also try to assess the extent to which each of these assumptions is supported by the available empirical evidence, and call for greater semantic clarity in future discussions of the existence of genetic additivity.Defining the assumption of additivityFirst, I will attempt to describe and distinguish between the two definitions that have been used for the assumption of additivity in the context of quantitative behavior genetics and heritability analyses. My goal of doing so is influenced by Moore, who recently described the different, often-confused meanings of the widely used phrase "gene-environment interaction".\cite{Moore_2018} Next, I attempt to situate each definition of this assumption in the context of heritability estimation and related human genetics research, and to critically assess the validity of each of the assumption's definitions in light of the available scientific evidence.Definition AElsewhere, Moore has defined the additivity assumption as the assumption "that genetic and environmental influences on phenotypic variation are additive".\cite{Moore_2006} This is what I will call "definition A" of the assumption of additivity: that phenotypic variance (V) can be accurately expressed as the sum of two separate components, one for genes and one for the environment. (Note that I will henceforth use the words "variance" and "variation" interchangeably.) This may be written as the following equation (e.g., \cite{Tabery_2008}):V = VG + VE                                         (Eq. 1)Here, V (also sometimes written VP, with the P standing for "phenotype") = total phenotypic variation in a given trait in a given population, VG = variation in that trait that is due to genetic factors, and VE = variation due to environmental factors. This definition of the assumption of additivity is only valid if equation 1 is accurate. It would not be accurate if, for instance, there is gene-environment interaction (G x E), in which case you would not be able to get the total variance just by adding together the genetic and environmental sources thereof.Another way  of understanding definition A can be gleaned from an examination of the ACE model, according to which total variation in a phenotype is equal to the sum of Additive genetic factors (A), Common environmental factors (C), and random Environmental factors (E). The validity of this model, at least in its simplest form, also depends on a form of definition A of the assumption of additivity (hereafter simply "definition A"). This is because the ACE model assumes that the A, C, and E components do not significantly interact with each other.\cite{Benchek2013} Therefore, the ACE model, like the idea that total variance can be described  as the sum of that due to genetic and environmental factors, relies on the validity of definition A. Fundamentally, then, the equation underlying the ACE model is the same as Eq. 1, except that VG, which normally represents all genetic variance, is replaced by "A", which only represents additive genetic variance, and the term for environmental variance in Eq. 1 (VE) is replaced with C and E, which, added together, are supposed to equal the total environmental variance. In some cases, the A, C, and E terms in the ACE model can be represented with h, c, and e; because we are talking about variance, each of these terms should be squared. Thus, the equation underlying the ACE model can be denoted as follows:\cite{Partridge_2011}P = h2 + c2 + e2    (Eq. 2)What follows are some more examples, from various publications, of the assumption of additivity being referred to in keeping with definition A:" is asserted that the measured score (or phenotype) of an individual on a psychological test (Yi) is the sum of only two components, Gi determined by the genes and Ei specified by the person's environment; that is, G and E must not interact."\cite{Wahlsten_1994} (p. 245)"The traditional way to model the impact of genetic and environmental factors on risk for disease has been to assume additivity. That is, we assume the final risk is a result of the addition of genetic to environmental vulnerability."\cite{Kendler_2010}"...the assumption of additive gene and environmental effects is not only shown to be invalid by the ubiquity of G-E effects. Even more, however, the additive assumption leads developmental science down a fruitless path."\cite{Lerner_2006}"...standard behavioral genetic models assume that genetic, shared environmental, and nonshared environmental influences are additive and separable – the additivity assumption."\cite{Daw_2015}"In twin and adoption studies, estimates of the power of environmental factors are derived by adopting the additive assumption, i.e. by assuming that that the sources of variation in a trait can be separated into independent genetic (G) and environmental (E) components that together (along with error variance) add to l00% of the variance to be accounted for."\cite{Maccoby_2000}Definition A is sometimes framed as a conceptual one: the assumption that it makes fundamental scientific sense to try to determine the relative importance of "nature vs. nurture" to variation in any trait. Or it can also be viewed as the assumption that it makes sense to conceptualize the total variation in a trait as the sum of genetic variation and environmental variation, however many additional interaction terms one may add, based on the underlying additive framework. Relational developmental systems is one framework that, in contrast to quantitative behavior genetics, views traits as the result of genes and environments interacting in a complex and inseparable way that renders attempts to determine the relative importance of genetics vs. the environment futile (e.g. \cite{Overton_2011}). In short, definition A is the assumption that the "nature-nurture debate" is a legitimate debate with two opposing sides, each with their distinct quantitative level of importance, and that the relative importance of these two factors on variation in a phenotype can be determined through statistical techniques. If this assumption is true, then the total variation is equal to the sum of the genetic variance and the environmental variance.There is little question that it is really important for quantitative BG researchers that definition A of the assumption of additivity is true: it is essential to being able to interpret heritability estimates causally, as Lewontin noted in 1974. If this assumption is invalid, no functional conclusions can be drawn regarding the trait being analyzed.\cite{LEWONTIN_2006} As Oftedal explains, critics argue that this is because of the inherently local nature of heritability estimates if the additivity assumption is not met: "In situations of additivity, heritability estimates are no longer just local. The result from one environment can be extrapolated to other environments."\cite{Oftedal_2005} (p. 702) Similarly, Lynch (2016) has recently noted that the heritability statistic's validity "...relies on the assumption that VG and VE act additively, so that there is no interaction or correlation between the two terms."\cite{Lynch_2016} Here's another way to phrase the points summarized in the preceding paragraph: if definition A of the assumption is true, then heritability estimates are not just local. If heritability estimates are not just local, they can be applied to other environments, and if they can be applied to other environments, they can be interpreted causally. But returning to where we started, if definition A is false, then heritability estimates cannot be interpreted causally. In this case they acquire the status of being meaningless for any research looking for causes of traits.Can non-additivity be adequately corrected for?There are a number of methodological approaches that have been suggested and employed to correct for violations of definition A. One of  these is log-transformation of a specific type of non-additive relationship between genes (G) and environment (E)--namely, a multiplicative relationship--so that it becomes additive. This can be used to dismiss the "rectangle" analogy, which compares the relative importance of genes/environment to that of length/width of a rectangle's area. Advocates of this approach include Neven Sesardic.\cite{sesardic2005}(p. 53) The idea is that you start with a multiplicative relationship between G and E that, when multiplied together, produce the phenotype (Y):Y = G*EThis is supposed to be analogous to the fact that the area of a rectangle is equal to the product of its length and its width. Then you take the log of both sides of the above equation, producing:log(Y) = log(G) + log(E)And voila! What was once a multiplicative relationship is now additive, and definition A is no longer false. Or is it? Wahlsten (1990) notes that this practice, which is focused on removing non-additive relationships from data before it is analyzed, can distort the actual relationship between variables, rather than actually solving the non-additivity problem:"The log transform alters the relations among the variables; consequently, transforming the scale of measurement may conceal the relations among heredity and environment, as it might conceal the essence of gravitation."\cite{Wahlsten_1990} (p. 118)And on the very next page of this paper:"If H and E really are multiplicative in a particular situation, a calculated "heritability"  is nonsensical and taking the log of the observations may compound this."\cite{Wahlsten_1990} (p. 119)In addition, Partridge (2011) has pointed out several other approaches for dealing with violations of definition A include: "...extensions to multivariate and latent variable models (ACE models, in which A = additive genetic variance, C = common environmental variance, and E = unique environmental variance) (Martin & Eves,1977; Michel & Moore, 1995), as well as use of multilevel models, to better address the nonindependence inherent in twin data (see Medlend & Neale, 2010)..." Partridge observes that these  modifications "...are notable but still follow the same basic structure of Fisher’s original model", by which Partridge means Fisher's infinitesimal model, in which the total genetic effect on a quantitative trait is  simply the sum of a very large number of individual loci, each of which has a very small effect on the trait.Definition BIn addition to definition A, there is a second definition of the additivity assumption in regards to BG: namely, the assumption that the effects of genetic loci on variation in a complex trait are independent of each other and of the environment, so that you can add their individual effects together to get the total genetic effect. In other words, this definition, which I will call definition B, is that most genetic variance for complex traits in general is additive. As Wahlsten (1994) wrote in describing the method of heritability analysis, "the effect of all polymorphic loci affecting a behaviour are combined by adding them to yield the total Gi for an individual, which assumes genotype at one locus does not influence the action of genes at other loci." \cite{Wahlsten_1994} (p. 245) Thus, I will refer to the assumption that all genetic variance in complex traits, behavioral or otherwise, is additive in nature as "definition B" of the additivity assumption. Even if some variance is due to non-additive genetic effects (dominance, epistasis, and/or GxE), definition B is still tenable so long as most of the genetic variance (VG) is due to additive effects. Therefore, this definition pertains to the % of VG that is due to additive genetic effects (VA) that are exactly equal to the sum of their parts. Thus this excludes all "genetic effects" due to dominance, epistasis, or gene-environment interactions. In other words, definition B is strictly true if VA/VG = 100% and is generally true (using a more lax definition) is VA/VG > 50%. Hill et al. (2008) concluded that VA/VG is in fact typically above 50% and often at or near 100%.\cite{Hill_2008} Nevertheless, many aspects of variance components analyses aimed at determining the relative importance of additive and non-additive genetic variation have been criticized.\cite{Huang_2016}It should be noted that these two definitions overlap to some extent: if definition B is true, it seems to at least make definition A more plausible than if definition B is false. This is because if VG can be viewed as consisting entirely of additive genetic effects, we do not need to include terms for dominance, epistasis, or GxE in variance components equations to get a reasonably accurate result. There is also a sort of semantic confusion involved in the logic here: if genetic effects are mostly/entirely additive (i.e. definition B is true), then it follows that it makes sense to view phenotypic variation as the additive combination of genetic and environmental variation (i.e. definition A is true), because you will not need to include GxE or other non-additive terms in Eq. 1.Definition B seems to be somewhat more commonly used in the literature compared to definition A. This is likely because it focuses on obvious, tangible things--specifically, genes and how they interact with each other and with the environment. Here are some examples of definition B being used to refer to the assumption of additivity:"...most GWA studies of heritability rely, in part or entirely, on an assumption of ‘additivity.’ For example, the claim that 8000 SNPs account for half the heritability of schizophrenia depends upon the assumption that each SNP contributes 1/8000 to half the heritability."\cite{Charney_2016} (p. 5)" is common to assume additive-only genetics — that is, where the effect of each SNP’s minor allele is strictly additive in relation to its count."\cite{Sabourin2015}"There are two types of genetic nonadditivity. The first is caused by genetic dominance."\cite{Rodgers_2001} "In most human genetic studies, the "solution" has been simply to make the (usually unstated) assumption that there is no genetic interaction... Typically, the studies assume a strictly additive model."\cite{Zuk_2012} (p. 1194)Some of the confusion between the definitions stems from papers that use both definitions A and B without outlining that genetic "additivity" may refer to either definition (e.g. \cite{Wahlsten_1994}). Defining heritabilityNext, it should be noted that there are two types of heritability: narrow-sense heritability (hB2) and broad-sense heritability (h2).  The difference between the two is that hB2 is only based on additive genetic variance, whereas h2 is based on both additive and non-additive (i.e. total) genetic variance.\cite{edition} hB2, rather than h2, is the value that agricultural breeders care about, because it is used to predict what the fastest way will be to maximize a desired trait in the organism of interest through a specific selective breeding strategy.\cite{Feldman1975} It also needs to be explained just what heritability estimation actually is: as Oftedal has noted, it is "a statistical method based on a linear analysis of variance".\cite{Oftedal_2005} (p. 700)So first we will consider the first definition ("definition A") of the additivity assumption: that variation in a phenotype = genetically caused + environmentally caused + maybe a small interaction term. Lynch (2016) has recently highlighted the two ways that this assumption can be violated: gene-environment interaction (G x E) and gene-environment correlation (the latter also called gene-environment covariance, abbreviated G-E covariance).\cite{Lynch_2016}In addition, Wahlsten noted in a 1990 paper that "Additivity is often tested by examining the interaction effect in a two-way analysis of variance (ANOVA) or its equivalent multiple regression model. If this effect is not statistically significant at the α = 0.05 level, it is common practice in certain fields (e.g., human behavior genetics) to conclude that the two factors really are additive and then to use linear models, which assume additivity." But he reported in the same paper that "...ANOVA often fails to detect nonadditivity because it has much less power in tests of interaction than in tests of main effects. Likewise, the sample sizes needed to detect real interactions are substantially greater than those needed to detect main effects."\cite{Wahlsten_1990} In a subsequent paper, Wahlsten argues that there are fundamental biological reasons to believe that the assumption of additivity will almost always be false: "The additive model is not biologically realistic. There are so many instances where the response of an organism to a change in environment depends on its genotype or where the consequences of a genetic defect depend strongly upon the environment, that genuine additivity of the two factors is very likely the rare exception."\cite{Wahlsten_1994} (p. 249)The three quotes from Wahlsten cited above make it clear that he is discussing definition A of the assumption of additivity.And elsewhere, he contends that human BG faces unique obstacles in controlling for non-additivity that animal researchers (such as himself) do not have as much of a problem with: "To test interaction between genotype and environment, there must be many individuals with the same genotype who are reared in different environments. This is easily achieved with standard laboratory strains but not with humans. For our species, there is no valid test of gene x environment interaction, no matter what the sample size, unless distinct alleles of a specific gene in question can be identified...Because the additivity assumption cannot be tested empirically, the whole edifice of path models must be accepted on faith, if it is to be accepted at all."\cite{Wahlsten_2000}(p. 50)Many critics of BG have argued that definition A of the additivity assumption is untenable, and that the way in which genes and environments actually interact to produce phenotypes is just that--interactive, not additive. Thus this criticism alleges that heritability calculations are uninterpretable (at least in terms of the relative roles of genes vs. environment in causing phenotypic variation), because definition A is simply false. This criticism is well summed up in a paper by Vreeke: "The core of the critique of behavior genetics, as far as it relies on the analysis of variance, is thus that it conceptualises the relation between genes and the environment as (mainly) additive, whereas in fact development is interactive."\cite{Vreeke_2000} (p. 37) The same paper notes, "Experimental animal research shows that interaction between genotype and the environment occurs often. And if genes and the environment interact, it is not possible to separately weigh the effect of one of those factors: they depend on each other. There is no reason to expect that humans are different in this respect. An analysis of variance ignores those effects, so cannot provide a true account of the causes of behavior."\cite{Vreeke_2000} (p. 37)Locality and causalityAs noted  above, critics of heritability analysis argue that the additivity assumption is false, and that heritability estimates are really just local. But which assumption is it that the critics claim is false? To some extent it is both, but definition A seems to be a more common target of such criticisms. Lewontin makes it clear that he considers the locality of heritability estimates to prevent them from allowing causal conclusions to be drawn: "There is one circumstance in which the analysis of variance can, in fact, estimate functional relationships...It is not surprising that the assumption of additivity is so often made, since this assumption is necessary to make the analysis of variance anything more than a local description."\cite{LEWONTIN_2006} This criticism is referred to by Oftedal as the "locality objection".\cite{Oftedal_2005} (p. 702)So what we have here are two BG responses to argument B: 1) Actually, most (if not all) genetic variance is additive, so this assumption is going to be at least mostly correct, and  2) to the extent that the assumption of additivity is false, there are plenty of ways that we can successfully account for it already, thank you very much! I will now focus a bit more on the first of these responses: that most variation in the traits behavior geneticists are studying is actually additive, meaning that the assumption Charney criticizes so harshly is actually pretty accurate. One frequently cited study by those making this claim is that of Hill et al. (2008), entitled "Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits". I noted above that Neiderhiser et al. (2017) cited this paper to justify their claim that genetic variation in complex traits is mostly additive. But is this conclusion really justified by the evidence presented by Hill et al. (2008)? Zuk et al. (2012) don't seem to think so: they argue that "...mistakenly assuming that a trait is additive can seriously distort inferences about missing heritability. From a biological standpoint, there is no a priori reason to expect that traits should be additive. Biology is filled with nonlinearity: The saturation of enzymes with substrate concentration and receptors with ligand concentration yields sigmoid response curves; cooperative binding of proteins gives rise to sharp transitions; the outputs of pathways are constrained by rate-limiting inputs; and genetic networks exhibit bistable states."\cite{Zuk_2012} In their supplementary information (p. 45), Zuk et al. go into more detail about why they consider Hill et al.'s claims not to stand up to scrutiny. First, Zuk et al. explain two key arguments made by Hill et al.: "(a) most variants in a large population will have extremely low minor allele frequency and (b) traits caused by low-frequency alleles will not have substantial variance due to interactions." But Zuk et al. don't find these arguments the least bit convincing: "Their claim is wrong, because the LP [linear pathway] models (a) can have substantial variance due to interactions (indeed, the majority) and yet (b) can involve any class of allele frequencies. (Specifically, LP models are defined as the minimum value of a set of traits, each of which is additive and normally distributed. There is no constraint on the allele frequencies of the variants that sum to yield these additive and normally distributed traits.)" And on the next page:"In effect, Hill et al.’s theory thus actually describes what happens for rare traits caused by a few rare variants. Not surprisingly, interactions account for a small proportion of the variance for such traits. Hill et al.’s model, however, is not pertinent to common traits. The interesting complex traits are those that have significant genetic variance in the population: these traits necessarily have higher allele frequencies (assuming they depend only on a few, e.g. two loci) and thus, under Hill et al.’s analysis, can involve larger interaction variance and a higher ratio VAA/VG." (Note: VAA = interaction variance and VG = total genetic variance.)Behavior geneticists respondTo the extent that BG heritability-estimation researchers have defended their practice against the charge that it inaccurately assumes additivity, they have made such arguments as this one, made by Michael Rutter in 2003: "Critics of behavior genetics are fond of attacking it on the grounds of the unwarranted presumption of additivity. However, behavior geneticists are well aware of this issue, and it is commonplace nowadays to make explicit tests for dominance or epistatic effects. Moreover, it is perfectly straightforward to include these in any overall model. There is a need to consider such effects, but their likely existence for some traits is not a justifiable reason for doubting behavior genetics."\cite{michael2003} This quotation is clearly referring So how exactly do behavior geneticists take the (non)existence of additivity/presence of non-additive effects into account? Rutter makes it sound really easy, but just how do they do it, and are their procedures for doing so adequate? It is important to keep in mind that many critics of BG argue that the techniques researchers in the field use to try to test and account for genetic non-additivity are woefully inadequate. In fact, such arguments have been made since at least 1973(!), when Willis Overton wrote, " does not change the situation any to maintain that this position does consider interactions by introducing an interaction term into the analysis of variance...As discussed by Overton and Reese [1972], such interaction effects, ‘are themselves linear, since they are defined as population cell means minus the sum of main effects (plus the population base rate)’ (p. 84). In fact, the very use of the term ‘interaction’ within this paradigm indicates that definitions of terms are not model independent".\cite{Overton_1973} Just fix the model!More recently, Partridge has argued that "Although these advances in GxE transactional models represent a substantial step forward for quantitative behavioral genetics models, there are inherent structural limitations to their analytic foundations...the nature of GxE transactions go much deeper than statistical interactionist models can accommodate. If structural sequences in the genome were isomorphic to genetic function and, more important, to protein function, then the inferred genetic variability assumed by behavioral genetic models might be more instrumental. However, genes, rather than being static structural entities, are dynamic processes."\cite{Partridge_2011}
The concept of a general factor of intelligence, typically abbreviated g, was originally proposed by Spearman in a now-famous 1904 article in the American Journal of Psychology.\cite{Spearman_1904} Consequently, g is often referred to as "Spearman's g"; it is also often called "psychometric g", the "g factor", or "the general factor of intelligence". The argument that g exists in any remotely meaningful sense depends on the reality of the following premises, each of which I will critically examine in this paper: The results of individuals' scores on many cognitive tests, even those that seem totally different, tend to be highly positively correlated with one another.The existence of these positive correlations, and their magnitude, is highly consistent.The existence of these correlations (collectively often referred to as the "positive manifold") supports the existence of g over the existence of multiple equally important factors of intelligence.The underlying model of a single g factor, like any good scientific hypothesis, is falsifiable by actual data.The g factor is not an artifact of a specific form of statistical analysis, nor of a specific method of factor analysis. Instead, it "is a stable, replicable phenomenon that—unlike the IQ score—is independent of the “vehicles” (tests) for measuring it" (\cite{Gottfredson_2002}, p. 27).The g factor is not merely a set of numbers in a correlation matrix, but is a valid construct: it may not be "real", but it is just as valid a scientific construct as things like mass, evolution and gravity (actual argument: see \cite{Jensen_1994}, p. 232).The validity of g is definitively established by its correlations with other things that can be directly measured, and that are independent of the statistical methods or development of IQ tests. At times, reading g theorists describe the putative robustness of the existence of g bears less resemblance to a sober scientific analysis and more to listening to a rapper brag about his extreme wealth, haughtily dissing the credibility and authenticity of his rivals all the while. Arden & Plomin (2007), for example, write that "Spearman’s g is the most well-documented construct in the human behavioral sciences. The reliability of g is greater than the reliability of height and weight measured in a doctor’s office, [and] its predictive power leaves rival psychometric constructs in the dust... [citations omitted]"\cite{Arden_2007} A similar claim is found in the first "key point" of the abstract of a 2010 review article by Deary, Penke, & Johnson: "More than 100 years of empirical research provide conclusive evidence that a general factor of intelligence (also known as g, general cognitive ability, mental ability and IQ (intelligence quotient)) exists, despite some claims to the contrary."\cite{Deary_2010} But how strong is the evidence to support these sorts of claims? I will try to explore this topic here. The positive manifold and its (in)significanceThe g factor, according to those who consider it to be an empirical reality rather than a statistical illusion, refers to the ostensibly-almost-universal presence of a positive correlation (sometimes also called a "positive manifold") between how well one performs on one mental test and how well one performs on all other such tests. A catchy way the presence of this correlation is often described by those who believe in g (sometimes called g theorists) is as follows: "...people who do well on one kind of test tend to do well on the others, and people who do poorly generally do so across the board."\cite{linda1998} (page 24) One may also phrase it a bit differently, as Colom et al. (2006) did when they wrote, "g results from the empirical phenomenon, first discovered by Spearman (1904), that all cognitive tests are positively correlated with one another, irrespective of the cognitive domain sampled."\cite{Colom_2006} How compelling is the evidence that a positive correlation exists between performance on all types of mental ability tests? Johnson et al. (2004) provided an exceptionally blatant answer to this question in the very first sentence of their paper's introduction section: "That performances of individuals on tests of different mental abilities are positively intercorrelated is a well-established fact." [Emphasis mine.] Their study also found strong evidence of such intercorrelations, which they interpreted as "...further evidence for the existence of a higher-level g factor".\cite{Johnson_2004} Similarly, the introduction section of Colom et al. (2002) begins, "One of the most strongly established empirical facts in psychological science is the positive correlation of all human cognitive abilities".\cite{Colom_2002}But of course, if these positive correlations are very weak, this would not be very meaningful evidence that there is some underlying, unitary factor that affects performance on all these different tests. In contrast, if the correlations were consistently both positive and strong, this would provide a much clearer indication that there at least may be a single g factor underlying performance on such tests. Several summaries of the relevant literature by g theorists have, unsurprisingly, clearly come down on the side of these correlations being very large: according to Colom et al. (2010), "...the g factor constitutes more than half of the total common factor variance in a cognitive test or task in samples representative of the population".\cite{Colom2010} [Emphasis mine.] Karama et al. (2011) similarly state in their introduction that g is "...typically the major source of variance in test scores, accounting for 40% or more of total variance in performance on mental test batteries in representative samples [citations omitted]."\cite{Karama_2011} [Emphasis also mine.]This brings me to the crux of the debate over whether g really exists or not: claims that it does exist rest on the presence of high correlations between performances on one mental test and performances on any other such test, as explained above. The question, then, is whether these correlations (which I will follow previous g theorists by subsequently referring to as intercorrelations) actually provide strong evidence, much less conclusive evidence, for the existence of a single g factor. As noted above, g theorists argue that the existence of high positive intercorrelations is conclusive evidence of the reality of g, but this conclusion has been hotly disputed by other researchers in psychology and related fields, as I will discuss below. First, however, I will try to see how strong the evidence of the existence of g, based on the existence of the positive manifold, actually is. I intend to do so by looking in detail at both writings by g theorists and by their critics (see section "Correlation, causation, and g"). Then I explore other issues, aside from that of the difference between correlation and causation, that must be addressed before confidently concluding that g actually exists (see " Correlations = g? Not so fast"). Correlation, causation, and gThere is little debate among anyone with even the most rudimentary knowledge of statistics that just because 2 or more variables are correlated with each other does not mean that one of those variables causes the other, nor does it necessarily mean that some other variable must be causing both (or all) of the intercorrelated variables. The fact that g theorists regularly cite the existence of positive correlations as evidence for their theory (as noted in the above section) opens the door to allow g theory's critics to accuse them of conflating correlation and causation. As Bowman et al. (2002) note, for example, "...a large number of noncognitive variables (e.g., athleticism, openness, absence of neuroticism, and psychosis) each correlate positively with intelligence tests yet do not represent a functional unity".\cite{Bowman_2002} This is similar to the point made by Richardson (2002) when he noted, "A correlation between test scores does not necessarily mean that they are measuring the same thing"\cite{Richardson_2002} (p. 300). There are other examples of critics of g theory making similar claims about how correlation does not mean causation: e.g. Schlinger (2003, p. 18) states, "...the positive correlations need to be explained some other way as do the performances of subjects on individual tests. As scientists know, correlation does not mean causation, although we apparently need to be reminded of this dictum fairly regularly." The issue of correlation not implying causation is clearly an important one in any discussion of g theory, as the entire theory appears to be based on assuming that correlations between test scores represent an underlying causal, unitary, g factor supposedly causing these correlations. Yet few g theorists have addressed this criticism. Instead, from what I have read, it seems that g theorists simply identify g as a construct that accounts for nothing more or less than the fact that results of individuals' scores on widely different mental tests tend to be positively correlated. Yet there is another important issue here: the difference between positive correlations and the actual degree of performance an individual exhibits on a test. The issue of reificationThe fallacy of reification, also known as the "fallacy of misplaced concreteness", is the "treatment of an analytic or abstract relationship as though it were a concrete entity."\cite{reification} In the past, prominent critics of the concept of g have accused its proponents of this fallacy. Ostensibly, this is because they simply regurgitate the conclusions drawn by Spearman (1904), the father of g theory. This is said to be a problem because, according to this criticism, Spearman "...took an abstract mathematical correlation and reified it as the general intelligence that someone possesses"\cite{henry2003} (p. 17), thereby committing the fallacy of reification.  Needless to say, g theorists have responded to this argument multiple times in the past. Jensen & Weng (1994, p. 232), for instance, claim that "... the consensus of experts is that g need not be a “thing”-a “single, ” “hard, ” “object’‘-for it to be considered a reality in the scientific sense. The g factor is a construct. Its status as such is comparable to other constructs in science: mass, force, gravitation, potential energy, magnetic field, Mendelian genes, and evolution, to name a few. But none of these constructs is a “thing.”"\cite{Jensen1994} The point being made here is that while the fallacy of reification involves treating something that only exists in one's mind or on paper as though it were a real thing, that g theory does not involve treating g as a real thing, but as a "construct". This, presumably, is supposed to mean that the g factor is not a product of the fallacy of reification.But there is another important component to this argument: Jensen & Weng are trying to appropriate the credibility of "hard" sciences (especially physics) and their conclusively-established concepts (like mass, gravity, magnetism, etc.), in the hopes that this credibility will "rub off on" their g theory. They are trying to portray g as a construct, and thus an equally valid one to other scientific/mathematical constructs developed to explain observed phenomena. In the case of g, the phenomena to be explained are the positive intercorrelations between mental test scores. Naturally, this seemingly robust argument glosses over some important differences between these constructs: for one thing, the ease with which it is (or isn't) possible to accurately measure the construct using the instruments designed for that purpose. But that is outside the scope of this article, which is limited to assessing the validity of g theory itself rather than that of IQ tests. Spearman's hypothesisAs defined by Jensen in a famous 1985 paper, "Spearman's hypothesis" (no reward for guessing who it's named after) holds that there is a strong positive relationship between the g-loading of an IQ test and the black-white mean difference in the scores on that test.\cite{Jensen_1985} But what is a "g-loading" (sometimes written without a hyphen)? I think Colom et al. (2006) explained the answer to this question quite well in their paper's "Introduction" section, so I will quote them verbatim below: Factor analysis was the statistical method expressly designed by Spearman (1904) to precisely quantify the g-loadings of several diverse cognitive tests. It is important to note that the computation of a robust and stable g-loading requires the simultaneous consideration of several diverse intelligence tests \cite{Jensen_1994}. The g-loading for test X derives from its average correlation with all the remaining tests in a comprehensive test battery: the higher its average correlation, the larger its g-loading. From a theoretical standpoint, a high g-loading for test X can only result from the empirical fact that it shares a large amount of mental processes with the other tests in the battery. Therefore, a test with a perfect g-loading should comprise most of the mental processes germane to the general factor of intelligence (g).Back to Spearman's hypothesis. Peter Schönemann argued that it is a mere statistical artifact, and that support for this hypothesis proves nothing with respect to the nature of intelligence (or black-white differences therein) as being the result of g.\cite{Sch_nemann_1989} This argument, however, has been disputed by other researchers.\cite{Braden_1989}\cite{Dolan1997} Specifically, Braden (1989) analyzed data on differences in IQ scores between children with and without deafness, aiming to compare the g-loadings of different IQ tests for deaf children with the magnitude of the hearing-deaf difference in mean scores on that test. In other words, Braden wanted to test Spearman's hypothesis using totally different data to see if you would find support for the more broadly defined hypothesis of a positive correlation between group differences and g loadings no matter what the groups were. But that is not what Braden found: instead, he reported that "The magnitude of differences between deaf and normal-hearing children are negatively correlated with g loadings (r = −.14, NS [not significant]), and are significantly different from positive correlations reported for black-white differences (zs > 2.51, ps < .01)."In a critique of Braden (1989) and two other papers, Isham & Kamin (1993) wrote, "They assumed that the deaf have suffered environmental (linguistic) deprivation, and argued that the different correlations indicate that black-white differences (unlike hearing-deaf differences) are not likely the result of environmental deprivation. We demonstrate that the deaf data they employed have been defectively and inconsistently reported, and can provide no support to the claim that black-white differences are not environmetal [sic] in origin." (My emphasis.)\cite{Isham_1993}But is it (an) art(ifact)?Prominent g theorist Gottfredson (1998, p. 1) responds to some of g theory's supposed critics thusly: "A general factor suffusing all tests is not, as is sometimes argued, a necessary outcome of factor analysis. No general factor has been found in the analysis of personality tests, for example; instead the method usually yields at least five dimensions (neuroticism, extraversion, conscientiousness, agreeableness and openness to ideas), each relating to different subsets of tests. But, as Spearman observed, a general factor does emerge from analysis of mental ability tests". Similarly, Deary & Johnson (2010, p. 204) claim, "Prominent accounts arguing that g is a necessary artefact of the statistical analyses — such as principal components analysis — are incorrect". And one of the most influential g theorists of them all, Arthur Jensen, wrote the following: "The g factor, far more than any other linearly independent sources of variance in psychometric tests, is correlated with various phenomena that are wholly independent of both psychometrics and factor analysis...This evidence of biological correlates of g supports the theory that g is not a methodological artifact but is, indeed, a fact of nature."\cite{Jensen_1986} From these quotes, it is clear that many g theorists want to refute critics who are arguing that g is a mere statistical artifact. However, this represents a straw man argument in the sense that those arguing against the existence of g typically do not simply say that it is an inevitable consequence or artifact of factor analysis. Instead, they argue that it is unwarranted to assume that positive intercorrelations are evidence for g and against a multifactorial interpretation of human intelligence. These arguments are discussed in the section directly below this one.Correlations = g? Not so fastIt is common for researchers to claim, explicitly or implicitly, that the presence of positive intercorrelations is conclusive evidence of the existence of g. One explicit example of such a claim by g theorists is as follows: "Tests of intelligence reflect different cognitive abilities. Although the test correlations range from .20 to .80, they are all positive and greater than zero. This empirical phenomenon means that all kinds of ability tests measure something in common: the g factor." [Emphasis mine.]\cite{COLOM_2004} But not all researchers believe that these positive intercorrelations are the slam-dunk evidence of g that its adherents would have you believe it is. One reason is that, as noted above,  correlation does not imply causation, but also because of some fundamental criticisms of  the very model of a single g factor, as discussed below.Furthermore, Detterman (1982) argued that g may be more of a consequence of complex components interacting with each other than a legitimate concept. He also argued that "neither parsimony, the pervasiveness of such constructs, nor biological reductionism provide adequate justification for invoking higher-order constructs [like g] as explanations of intellectual functioning."\cite{Detterman_1982} In addition, the late psychologist Peter Schonemann was especially critical of g theory in general, and the work of Jensen on this topic in particular. He argued in 1997 that "... Spearman's g is beset with numerous problems, not the least of which is universal rejection of Spearman's model by the data".\cite{Sch_nemann_1997} More recently, van der Maas et al. (2006) proposed a model of human intelligence that explains the positive intercorrelations on which g theory is based, but without the single g factor the existence of which this theory presumes\cite{van2006} (though it should be noted that this theory has been tested and found not to fit the evidence)\cite{Gignac_2014}.But perhaps the clearest evidence that positive correlations between seemingly unrelated (and even objectively unrelated) mental tests does not provide evidence for a single intelligence factor was provided by McFarland (2012). He showed through simulation analyses that "...the presence of a positive manifold in the correlations between diverse cognitive tests does not provide differential support for either single factor or multiple factor models of general abilities."\cite{McFarland2012} Researchers have also recently reported that g " accounted for by cognitive tasks corecruiting multiple networks", and argued that intelligence represents multiple distinct abilities, not a single factor.\cite{Hampshire_2012} Recent research further indicates that a multi-factor model of intelligence fits the data better than does a single g factor-based model.\cite{Sa__2017} There have also recently been suggestions that the contemporary form of g theory is unfalsifiable and thus, not a proper scientific theory at all. Specifically, it has been suggested that Spearman's g, as he first conceived of it in 1904, was falsifiable in that it required g to be the only common factor among the results of different, unrelated mental tests. It also, crucially, posited that there was "no covariance among any specific factors in measures used to indicate different kinds of mental effort", but this supposition was later refuted by empirical evidence. Then, the argument goes, g theorists concocted a "weaker" and unfalsifiable form of g after the "hard" version Spearman originally thought of was refuted.\cite{McArdle_2014} Evidence of the reality of gIn 2014, researchers used multivariate analysis to find support for the g factor as a valid construct, based on data from a twin study. In fact, these researchers themselves stated that "By directly testing multiple competing models of the relationships among cognitive tests in a genetically-informative design, we are able to provide stronger support than in prior studies for g being a valid latent construct."\cite{Panizzon_2014}Bouchard (2009) defended the reality of g as follows: "The previously widespread belief that different intelligence tests yield estimates of quite different intelligences is simply false. Virtually any well-developed multi-scale intelligence test measures the same underlying g factor [citations omitted]."\cite{Bouchard_2009}Factor analysis and its methodological assumptionsThis section describes the way in which factor analytic techniques are used to support the existence of g by analyzing test score results. In doing so, this section also addresses arguments regarding the extent to which factor analysis itself is responsible for creating the mere appearance of g, as some critics have claimed. For instance, Onwuegbuzie & Daley (2001, p. 211) write, "L. L. Thurstone, an American psychologist, demonstrated in the late 1930s that by changing the type of factor rotation, g disappears (Thurstone, 1938)".\cite{Onwuegbuzie_2001} The citation to Thurstone (1938) is, specifically, a reference to his 1938 work Primary mental abilities, which was published in book form. The argument by Onwuegbuzie & Daley appears to be the same as that made by Gould (1981) in The Mismeasure of Man, in which he argued that Thurstone was "...the exterminating angel of Spearman's g" (p. 296). Gould further claimed that as a result of Thurstone's work, "...Spearman's g had been rotated away, and general mental worth evaporated with it" (Gould 1981, p. 304). Jensen (1982), in a predictably negative review of The Mismeasure of Man, was none too happy with this argument. He argued that it was wrong for the following reasons: "Actually, the g variance was not at all "exterminated" by Thurstone's method, but merely dispersed among the primary factors. Later, Thurstone himself realized that he could obtain a closer fit to his criterion of simple structure by allowing the factor axes to be obliquely rotated (i.e., correlated). Thurstone also came to realize that subsequent factor analysis of the intercorrelations among the oblique primary factors would recover the g factor, essentially the same g as arrived at by the Spearman and Burt methods of g extraction!"\cite{arthur1982}So Jensen's 3 points in his supposed rebuttal of Gould (and by extension of Onwuegbuzie & Daley) are as follows: Thurstone (1938) did not eliminate g simply by changing the type of factor rotation, but rather "dispersed" its variance among multiple different factors.Thurstone himself later changed his mind and decided that allowing the different factor axes (remember, there are multiple factors in his method) to be correlated with each other allows a better fit to his "simple structure" criterion.Thurstone himself also realized that if factor analysis was used on intercorrelations between his own multiple primary factors, "essentially the same" g factor as Spearman's and Burt's would be obtained. Before addressing these points, I need to explain what Thurstone's "simple structure" criterion. This criterion "...seeks a solution in which tests either load high or near zero on a factor."\cite{k2013}
Histogram p values
AbstractThere has long been controversy concerning whether genes influence human intelligence, and if so, which genes influence it, and to what extent. Many genome-wide association studies (GWASs) have attempted to find single nucleotide polymorphisms (SNPs) that are significantly associated with intelligence to shed light on this issue. My aim is to examine whether these studies have generated replicable findings concerning associations between one or more SNPs and variation in intelligence as it has generally been defined. Data on 17 published intelligence GWASs (3 of which reported no associations whatsoever) were downloaded from the GWAS Catalog and analyzed in the current study. Results show a generally low rate of replication: over 87% of the 2,335 included SNPs were reported only once, and only 4 of the 17 studies included follow-up testing in a replication sample. Of these 4, none found any replicable genome-wide significant hits. The literature was also found to be severely lacking in diversity: all study samples in which ancestry was described were of European and/or British ancestry. Additionally, evidence of a "winner's curse" was found: among the minority of SNPs reported more than once, the associated p-value tended to be higher in subsequent studies than in the study in which it was originally reported, corresponding to weaker SNP-intelligence associations in later studies.IntroductionMany studies have previously reported evidence that the trait of human intelligence is under significant genetic influence. Specifically, it has long been estimated that intelligence has a high heritability (i.e. proportion of variation associated with genetic variation), with twin and family studies typically reporting narrow-sense heritability estimates of between 50% and 80%.\cite{Hill_2018} This conclusion, combined with the prevailing interpretation of heritability estimates as reflecting a genetic basis for a given trait, implies that between 50 and 80% of variation in intelligence is due to additive genetic factors specifically. This is because narrow-sense heritability estimates only include additive genetic effects.\cite{Hill_2018}\cite{Plomin_2014}However, genome-wide association studies (GWASs) have generally failed to reveal genes that account for all, or even most, of the reported heritability of intelligence. Instead, genetic variants reported to be associated with intelligence in GWASs invariably have very small effects, typically explaining no more than 1% of trait variance individually.\cite{Plomin_2014} Even when taken together, only about 20% of the heritability of intelligence can be accounted for by known DNA differences.\cite{Plomin_2018} There are two schools of thought to why the heritability of intelligence remains "missing", that is, unable to be accounted for by even the combination of all SNPs known to be associated with intelligence. The first is that the heritability of this trait, like that of most complex traits, is due to many genes of very small effect (e.g. \cite{Chabris_2015}, \cite{Plomin_2012}). The second is that the original estimates of heritability have been inaccurate, thereby misleading researchers into searching for genes to explain the heritability of IQ, when in fact such heritability estimates may be fatally confounded by environmental factors.\cite{Feldman2018} Distinguishing between these possibilities is difficult, but if the first explanation is true, then we should see a relatively high rate of replication of associations, especially with such higher-powered studies as are needed to detect the expected small effects.\cite{Chabris_2015}In this paper I aim to comprehensively assess whether GWASs of intelligence have generated replicable findings. In doing so, I hope to shed light on the extent to which genetic associations with intelligence reported in GWASs are false positives, as is known to be the case with many candidate gene studies,\cite{Chabris_2012} or whether the identified SNPs in GWASs actually contribute to differences in human intelligence. I will also examine other patterns in the GWAS literature on intelligence, such as whether these studies are done on populations of diverse ancestries, and whether there is a "winner's curse" in this literature with earlier studies reporting stronger associations than later ones.\cite{Xiao2009}MethodsOn April 7, 2019, I downloaded all data on the GWAS Catalog\cite{Buniello2019} for the trait of "intelligence" (URL: This was done simply by clicking the button on the page linked in the previous sentence labeled "Download Catalog Data". I then assessed whether these associations had been replicable across different studies, as well as whether there was significant ancestral diversity in the populations on which the studies were conducted.Functional annotationConveniently, the GWAS Catalog database, and data downloaded from it, includes information about the functional status of each SNP (in the "CONTEXT" column). Specifically, it categorizes each SNP (with exceptions detailed in the "inconsistent reporting" section below) into exactly one of the following categories:
Stephens-Davidowitz (2014) presents evidence that racial prejudice against blacks cost Obama a significant proportion of the vote in the 2008 and 2012 presidential elections. Using Google Trends search data on the n-word and several variants thereof, he finds a statistically significant, negative relationship between search rates for racial slurs and improvements in Obama's performance in 2008, relative to that of John Kerry in 2004. He examined this relationship at the media market level, in which Google Trends data is typically available, except when he had to use state-level data (i.e. when not all data he was analyzing was available at the media market level).More specifically, Stephens-Davidowitz reported a negative correlation of r2 = 0.24 (t = -7.36) between the racially charged search rate (measured from 2004 to 2007) and changes from Kerry's share of the 2-party vote in 2004 and Obama's in 2008. This was based on a media market-level analysis with no control variables.\cite{Stephens_Davidowitz_2014} Here I will elaborate on the way that SD measured changes in the Democratic candidate's performance from 2004 to 2008, and then on the way in which I have done so. I will then compare the results obtained by the two methods to see how similar they are.First, SD calculated the % of the vote Kerry got in a given region in 2004, then the % Obama got there in 2008, then subtracted the former from the latter. Thus the result of this subtraction would be positive in an area if Obama did better there than Kerry and vice versa. But when I say "% of the vote", I need to make clear that I mean "% of the two-party vote", i.e. the value for geographic region j for Obama in 2008 would be Oj/(Oj+Mj), where M is the % of the vote McCain got in the same area in 2008. Thus the equivalent value for 2004 would be Kj/(Kj+Bj), where j is the same geographic region, K = % of total vote received by Kerry in j in 2004, and B = % of total vote received by Bush in 2004. Thus the final value of the election result change - or swing - calculated in this way for area j would be: Oj/(Oj+Mj)  - Kj/(Kj+Bj)This is the measure of electoral swing used by SD. By contrast, the measure of swing I calculated is the difference in the victory margin in a given area between 2004 and 2008. E.g. if the Dem. candidate had won state A in 2004 by 6 points but won the same state in 2008 by 16 points, the swing I would report for that state would be 16 - 6 = 10 points. This metric therefore is only based on the difference between %R and %D in each geographic area in 2004 and 2008. It can be described as (Oj-Mj)-(Kj-Bj). Fundamentally, these two measures seem pretty similar, as they are both based on the same two elections and are both aimed at calculating the difference in Democratic performance between them. Furthermore, both measures are based on both Democratic and Republican performance. The similarity (or lack thereof) between these two metrics can be gleaned by examining their correlation. The two measures turn out to be virtually perfectly correlated (r2 = 0.9997), thereby demonstrating that they are interchangeable for all practical purposes.What if you use my metric of 2004 to 2008 swing instead of the two-party-based metric SD used, and what if you analyzed the relationship between this metric and racist search rates at the state level rather than media-market level? I actually used the exact racist search rate values SD reported in his Appendix A, combined with the shifts for each state calculated using my approach as noted above, which I obtained from Dave Leip's election atlas (so to be fair I didn't calculate them myself). A negative relationship was seen between these two variables. I obtain a similar r2 to that reported by SD: 0.23. It is unsurprising that my result is very similar to SD's given that the data sources are very similar (or, in the case of the racist search measure, identical).Furthermore, I obtained data on the % of the vote Obama got in each state during the 2008 primaries from Leip's Atlas. The idea here is that Obama was a black candidate but all of his Democratic opponents were white, so Obama's percent of the vote should indicate how much he is preferred over white candidates of the same party. I then regressed this value on the racist search measure for each state. The results showed that there was a moderate negative relationship between racist searches and Obama's performance in the primaries (r = -0.43, r2 = 0.19, t = -3.31). This provides evidence that higher levels of racial prejudice, as measured by the Google proxy, may have led to Obama performing relatively poorly in that area. This hypothesis is further supported by the presence of a moderate positive relationship (r = 0.46, r2 = 0.21, t = 3.55) between Obama's % of the vote in the 2008 primary for a given state and the general election swing in that state  relative to 2004.
Figure 1
This study examines the association between gun ownership rates and homicide rates in the United States from 1973 to 2016. The results show a very strong positive correlation (Spearman's rho=0.79) between gun ownership and homicide rates, which supports the hypothesis that higher rates of gun ownership increase the likelihood of homicide. On average, a 1% increase in gun ownership is found to be associated with 2.6 additional homicides per million people.MethodsGun ownership data  was obtained from the 26 waves of the General Social Survey in which respondents were asked if they had a gun in their home.\cite{chicago} The rate of gun ownership for a given year was calculated by dividing the number of respondents in that year who answered "Yes" to the  aforementioned gun question by the number of  respondents  who answered "Yes" or "No"  in the same year; this value was then converted to a percentage. Homicide data was obtained almost entirely from the Uniform Crime Reports online data tool.\cite{statistics} The exception was the homicide rate for 2016, which was not available through this tool;  I obtained this rate from a separate FBI report.\cite{11} Hereafter, I refer to the yes/no-based gun ownership estimates simply as "gun ownership"  and the UCR-based homicide estimates simply as "homicide" for simplicity.I used the chi-square goodness-of-fit test in Microsoft Excel to determine whether  the homicide and gun ownership data were normally distributed, following a procedure previously described elsewhere.\cite{goodness-of-fit} This produced p-values for both datasets (p= 0.999869577 for gun ownership, 0.985909364 for homicide) that were far above the level of significance (p=0.05), indicating that there is insufficient evidence to reject the null hypothesis that both datasets are normally distributed.In order to examine the  two remaining assumptions necessary for a Pearson correlation to be valid, namely linearity and homoscedasticity, I did the following. First, as noted above, I calculated the Pearson correlation between gun ownership and homicide, which was 0.90; this indicates the presence of a very strong linear correlation, meaning that the assumption of linearity is justified in these data. Second, I examined the scatterplot (figure 1 below) to see if the data met the assumption of homoscedasticity. If this assumption is valid, the data should cluster mostly around the middle of the regression line, and then taper off as one approaches either end of the line.\cite{herschel2017} However, this was not the pattern observed in my data: instead, as figure 1 shows, the data was clearly clustered much more around both the high and low ends than in the middle, with a particular density of data points at the low end of the regression line. Therefore, the assumption of homoscedasticity is not met here, so a Pearson correlation is not appropriate. Therefore, I calculated a Spearman's rank correlation on the same data and obtained a Spearman's rho of 0.79. 
This paper describes and develops a hierarchy of framing as a framework that can be applied to scientific discourse, which it then applies to a set of peer-reviewed articles. In this framework, there are multiple levels of certainty according to which various statements can be classified. These levels range from complete certainty to complete uncertainty, and each level represents a different way to frame a given claim: the same claim may be presented as scientific fact, or as possessing some degree of ambiguity, or as being entirely speculative. All of these levels relate to whether, and the extent to which, the writer wants to portray the claim as being scientific fact on the one hand, or as a falsehood contradicted by conclusive scientific evidence. I describe the ways in which scientists, intentionally or not, can frame certain conclusions that have been reached by other scientists (or indeed by anyone) as being more or less in doubt. It is further suggested that the discretion scientists have regarding such framing represents another example of what have previously been called "researcher degrees of freedom".HierarchyThe levels in the hierarchy are described below in descending order, from most to least certain. Before explaining each level, I should note that the words in bold are identified as words that, if they are present in a phrase, will always make that phrase an example of the corresponding level of framing. Additionally, if the bolded word is a verb, it does not matter what tense the word is; it will still be in the same level (e.g. "shows" and "showed" both qualify as level 2).Level 6: Absolute certaintyAt this level, the claim may simply be stated as fact, e.g. "The Earth is round", or "It is a well-established fact that the Earth is round" (omission of "well-established" would still qualify the latter example as level 6). To the writer, no qualifications are necessary to accurately convey such information, because it is supported by overwhelming and conclusive empirical evidence. Thus, you can either explicitly call it a fact or state it as such.Level 5: Probably trueHere, the author(s) create(s) the impression that there is a strong body of evidence supporting the claim, but it has not yet reached the level of extreme, total certainty necessary for promotion to level 6. Thus, while the claim is being presented in a very sympathetic manner,  so as to portray it as being almost certainly true, at least a few additional qualifiers are still needed. An example of this framing would be, "A recent study showed that the Earth is round." or "Evidence strongly suggests that the Earth is round." (The word "strongly" is crucial to distinguishing the latter example from level 3 framing.) Any phrase using the words "strong" and "evidence"  (unless it explicitly states that the evidence for a claim is not strong) should count as level 5.Level 4: "A recent study found that the Earth is round."Level 3: "A recent study indicated/indicates/suggested/suggests/implies that the Earth is round." Another example of such framing would be "A recent study supports (or finds support for) the hypothesis that the Earth is round." Level 2: "A recent study concluded/reported that the earth is round."Level 1: "A recent study claimed/alleged that the earth is round." Other examples: "A recent study claims to show/find that the earth is round." Also falling into this category are statements using words like "weak", e.g. "There is weak evidence supporting  the hypothesis that the earth is round", or phrases like "There is little evidence that the earth is round". Fundamentally, a sentence only counts as level 1 phrasing if it inherently frames the scientific claim at its core as being questionable or in doubt.1Level 0:  "It has been frequently suggested that the earth is round, but there is currently no evidence as to whether this claim is true or not." Level 0 claims, as the number implies, are those which frame the claim as being  neither supported nor contradicted by scientific evidence, ostensibly because no evidence at all exists bearing on the question of whether the claim is true or not.Lastly, I wish to note that any statement that seems to fit in a category above, but includes a negative word like "not" that reverses its meaning, may be classified as being in the level to which it was assigned as described above multiplied by -1. Thus a level 6 statement with a negative conclusion (e.g. "The earth is not flat.") would be considered not level 6 but rather level -6. Further potential modifications of this scale are undoubtedly possible, and many of them could well be useful in future research.ValidationHere I attempt to validate the hierarchy I described above by analyzing the content of several recently published journal articles. Note: all bolded text in the quotes below was bolded solely by me, not by the author(s) of the papers quoted.In order to attempt to validate this hierarchy, I arbitrarily chose to visit the homepage of the respected peer-reviewed journal Personality and Individual Differences, whereupon I chose the first published article I could find; this resulted in me clicking on the first paper listed at the top of the "latest articles" section of the journal's website.\cite{sciencedirectcom} The paper in question, at the time of writing (9/2/2018), was a recently published paper entitled "Personality and the social experience of body weight".\cite{Sutin_2019} When I applied the hierarchy of framing to the text of the paper, I found two examples of level 3 framing: the use of the word "indicates" in the abstract and in the final sentence of the paper. The first use of the word is in the final sentence of the abstract: "The present research indicates that in addition to measured weight and body image, personality traits are associated with the social experience of body weight." The second use is in the final sentence of the entire paper, in which the conclusions are restated: "...the present research indicates that personality traits are associated with social experiences with body weight." A more detailed reading of the paper's introduction section reveals many examples of level 6 framing, i.e. simply stating research findings without qualification, as if they were scientific facts. This sentence is one example of such framing: "Individuals higher in Conscientiousness, for example, tend to weigh less (Sutin, Ferrucci, Zonderman, & Terracciano, 2011) and are at lower risk of obesity over time (Jokela et al., 2013)." Later in the same section, one finds another example of level 6 framing: "Individuals who believe that obesity is under one's control hold stronger negative attitudes toward it (Hansson & Rasmussen, 2014)." I then attempted to apply the scale to the paper listed just below this one on the journal's homepage on 9/2/18, which was entitled, "A preliminary investigation into the relationship between empathy, autistic like traits and emotion recognition"\cite{Martin_2019}. Using the scale to assess the text of this paper revealed several examples of level 3 framing (e.g. "There is research indicating that processing style can influence the ER of TD individuals, adults with an intellectual disability and children with ASD..."; "Overall, the study indicated that higher ALT and the absence of situational cues were significantly related to emotion recognition", "While research with people with ASD is limited, it suggests they utilise situational cues less when matching emotions to their correct context"). There is also at least one example of level 4 framing: "While recent research found no significant relationship between ER and processing style in individuals with and without ASD (McKenzie et al., 2018), it may be worth including in future studies of ER." DiscussionWhen writing peer-reviewed papers, and when conducting any experiments or analyses that result in the studies described in such papers, scientists necessarily have a lot of discretion regarding exactly what to do. This has previously been dubbed the concept of "researcher degrees of freedom". In the 2011 paper in which they coined this phrase, Simmons et al. described it as follows: "In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both?" Further, they noted that researchers typically do not decide the answers to such questions before their research is conducted, meaning that instead, researchers typically choose the analytic method(s) that give them a positive result, ignoring other (potentially methodologically superior) methods that might not produce such a result. They noted that these actions did not result from malicious or fraudulent motives: "This exploratory behavior is not the by-product of malicious intent, but rather the result of two factors: (a) ambiguity in how best to make these decisions and (b) the researcher’s desire to find a statistically significant result."\cite{Simmons_2011}Consider the following example: the now-retracted paper by Seralini et al. (2012)\cite{S_ralini_2012} claiming to find a link between GM maize, Roundup, and tumors in rats was heavily criticized shortly after it was published, and within a month, the European Food Safety Authority had issued a statement concluding that the study's "design, reporting, and analysis" were "inadequate".\cite{Butler_2012} Yet in a 2013 paper, published before Seralini et al. (2012) had been retracted, Samsel & Seneff (2013) cited the Seralini paper to support their argument about the alleged toxicity of Roundup.  Samsel & Seneff wrote, "while short-term studies in rodents have shown no apparent toxicity [8], studies involving life-long exposure in rodents have demonstrated liver and kidney dysfunction and a greatly increased risk of cancer, with shortened lifespan [9]" (Emphasis mine). Later in the same paper, one finds this statement: "...the fact that female rats are highly susceptible to mammary tumors following chronic exposure to glyphosate [9] suggests that there may be something else going on" (Emphasis also mine).\cite{Samsel_2013} Reference [9] is Seralini et al. (2012), and it is clearly being framed in a level 5 manner.ConclusionsI have described the development of a six-level scale to classify scientific statements according to the level of certainty with which they are framed. The hierarchical scale ranges from 1 (most certain) to 6 (least certain). Furthermore, in order to illustrate its real-world applicability, I have briefly highlighted real examples of sentences in real academic journal articles corresponding to different tiers on the scale. I conclude by suggesting that this 6-point scale be further refined and studied by those interested in the sociology of science and the ambiguity inherent in the process of writing scientific papers. It could, for instance, be used to build on the points made by \cite{Katz_2013}, who recently noted that "The alternative to storytelling [as a means of describing scientific results] is the usual language of evidence and arguments that are used—with varying degrees of certainty—to support models and theories." This statement was made in a commentary criticizing "storytelling" as a way of framing results in science.I also suggest that the hierarchy of framing that I have described here can also be considered an example of "researcher degrees of freedom", in line with the work of Simmons et al. Researchers are not bound to cite any paper at all in describing either their own research results or those of previous investigators. But if they do decide to cite a given paper, they are not bound to cite it favorably; instead, they could cite it in a negative way if they wanted to cast doubt on its findings. Conversely, even if a study was widely criticized by other researchers soon after its publication, it could still be cited favorably by those ideologically sympathetic to its conclusions. Notes1. Level 1 is distinct from all other levels in the hierarchy in that it attempts to implicitly cast doubt on the validity of the findings by the words used. In contrast, for example, level 0 does not attempt to portray the claim as suspect, but merely as entirely speculative, and in need of (further) research. By framing the conclusion of the research as merely a "claim" or "allegation", the writer can implant in the mind of the reader that the results of the study are merely speculative, or perhaps are not even supported by evidence at all. Words like "claim" and "allege" thus convey a similar sense of uncertainty and doubt as is associated with newspaper reporting of crimes for which no conviction has been obtained, or sexual assault accusations when there remains considerable uncertainty about whether the accusers are telling the truth (at least from the journalists' perspectives). 
I argue that, in line with Halford Fairchild, scientists promoting racist ideas as though they were legitimate science hide such ideas behind what he dubbed a "cloak of objectivity".\cite{Fairchild_1991} But I also argue that such scientists hide behind another cloak: namely, that of being persecuted and treated unfairly by a supposed conspiracy of academics in pursuit of "political correctness". First, I demonstrate that scientific racism has undergone a resurgence in recent literature, with numerous examples of articles in respected peer-reviewed journals espousing such beliefs in the last few years alone. I demonstrate that when scientific racism is published, it does indeed attract criticism for its potential racist implications and fueling negative stereotypes, but that this is not the only such criticism to appear in response to this type of research. I further show that scientific racists themselves use the existence of criticism of their research as racist to justify their continuing promotion and conducting of such research, without having to address the methodological flaws inherent therein. I thus conclude that scientific racists hide their ideas behind not just the "cloak of objectivity", but also the cloak of the persecution complex, which allows them to paint themselves as Galileos who are being attacked only for challenging the "politically correct" "orthodoxy".IntroductionScientific racism--the promotion of pseudoscientific claims that some races are superior to others, presented under the guise of legitimate science--has a long history in the United States (e.g. \cite{racism}) and in other countries, such as South Africa \cite{history}. The works produced by scientific racists have often been used to justify the enactment of racially biased policies.\cite{Dennis_1995}\cite{Bhopal1998} Fairchild (1991) demonstrated the assumptions underlying this work in a critique of the research of prominent race scientist J. Philippe Rushton, noting that Rushton's research on race failed to pass objective standards for what constitutes rigorous science. Thus, it is clear that this work fails to live up to its author's claim that it is objective. Similar points regarding the importance of underlying Euro-centric ideologies have been made elsewhere (e.g. \cite{writingrdactologie},\cite{Garrod2006}).The biological and genetic invalidity of the race concept is now well-established.\cite{Witzig_1996}\cite{Keita_2001}\cite{Maglo_2016} Claims that genetics are responsible for racial differences in health, for instance, are sometimes defended based on accusations of political correctness against those criticizing them, when the most logical explanation is that proponents of genetic explanations of these differences are motivated by their conservative ideology.\cite{Krieger_2005} In order to maintain their own legitimacy, then, "race realists" must resort to logical strategies other than directly addressing their critics with empirical evidence, since such evidence to support their arguments is lacking. I discuss some of these strategies below. What do these arguments by the "race realist" scientists I discussed above mean? I argue that they mean that simply pretending to be objective is not their sole modus operandi: they also pretend, and have long liked to pretend, that their critics only dispute their views because they are politically incorrect or might be perceived as offensive by some. Thus, they adopt two cloaks behind which they hide their ideas: that of objectivity, and that of persecution.The "cloak of objectivity"--whereby a scientist or group of scientists will claim that their research on racial-group differences in a trait or collection of traits is merely legitimate scientific inquiry, conducted without regard for ideological values such as political correctness--is fairly easily understood; moreover, it has already been discussed in detail by Fairchild (1991). But what about the second cloak I identified--the "cloak of persecution"? I define this "cloak" as a tactic whereby scientists who publish and/or describe controversial conclusions of their research on racial differences, and are then criticized for encouraging or legitimizing racism, use these criticisms to further their own arguments. This is done by taking the criticisms of one's ostensibly scientific research on ideological/political grounds, and then using it to paint oneself as being unfairly persecuted for challenging "political correctness" or for daring to bravely study "taboo" topics.Previous sociological scholarship on the responses of controversial race scientists to criticsI am not the first to note that many proponents of controversial claims in fields relating to race, psychology, and human genetics often try to cast themselves as struggling against a politically correct "dogma". Louis (2003), for example, argued that "...the notion of the racial basis of athletic ability strategically employs genetic science in order to support erroneous understandings of racial physicality and dismiss the irrational ‘politically correct’ dogmas of social constructionism" (\cite{ST_LOUIS_2003}, p. 2). Similarly, in his recent book Misbehaving Science, sociologist Aaron Panofsky notes that human behavior genetics researchers (whose work often, but not always, had to do with race) often avoided addressing the substance of critics' arguments by resorting to the ad hominem tactic of portraying their critics as politically motivated. Panofsky described this style as the "hitting them over the head" style, and identified it as a way of dodging specific criticisms by attacking your critics as believing that genes had absolutely nothing to do with human behavior, and that everything was determined completely by the environment. Ostensibly, this was because of their liberal ideology that compelled the critics to believe that all human behavioral problems and inequalities had to be environmental in origin, and thus solvable by environmental interventions. As Panofsky (2014, p. 144) put it, "‘Hitting them over the head’ was a strategy for building scientific capital that involved constructing one’s intellectual interlocutors as mortal enemies and attacking them in spectacular, polemical fashion." Even more recently, David Gillborn has built on work such as that of Howard Gardner by bringing back the concept of "scholarly brinkmanship". Gardner coined this phrase to describe someone who writes so as to very strongly imply an extreme form of an argument, but with little caveats sprinkled here and there that they can fall back on if their strongest (and least tenable) argument is challenged. Or, as Gardner himself put it in reviewing The Bell Curve, "Scholarly brinkmanship encourages the reader to draw the strongest conclusions, while allowing the authors to disavow this intention." Gillborn (2016) identified numerous examples of scholarly brinkmanship in the 2013 book G is for Genes. The multiple steps involved in the production of the "cloak of persecution"In this section, I will outline a hypothesis for how scientists studying race and intelligence from a hereditarian, biological determinist perspective benefit from making provocative and controversial claims regarding this subject. This hypothesis consists of the following steps:A (group of) researcher(s) claim(s) that human "races" differ in average intelligence, that these differences are partly or entirely due to genetic factors, and that there are important real-world consequences of such differences with respect to educational and economic attainment of members of these groups. The most famous example of a researcher who did this, of course, was undoubtedly Arthur Jensen in 1969, but other examples abound. Rushton, for instance, was thrust into the public eye when he presented a paper at a January 1989 conference of the American Association for the Advancement of Science. In it, he argued that humans could not only be divided into three "races" (Mongoloid, Caucasoid, and Negroid), but also that these "races" ranked consistently on a wide variety of behavioral and physical traits, and that they consistently did so in the same order.\cite{star}The claim described in step 1 provokes angry responses, and perhaps even threats of violence, from other people because it is "racist" or offensive in some other way. For instance, Jensen was widely criticized as a racist after his 1969 article was published, and the anthropologist Margaret Mead attempted unsuccessfully to block his nomination as an AAAS fellow in 1977.\cite{furor} Some college students also burned effigies of Jensen, and he received enough death threats that he had to be accompanied by bodyguards on campus.\cite{dies} In the case of Rushton, this occurred when in response to his aforementioned 1989 presentation, the then-premier of Ontario David Peterson called for him to be fired;\cite{star} his students also protested his classes to such an extent that his university (the University of Western Ontario) ordered his classes to be shown to students only via videotape.\cite{studies}The same scientists who made claims described in step 1, and their defenders within academia, cite the responses in step 2 to portray themselves as brave "Galileos" who are standing up for the scientific method, even at the expense of their own careers and/or safety. This often involves claiming to be challenging an "orthodoxy" or daring to bravely address "taboo" subjects; indeed, one particularly blatant example of this is Jon Entine's 2000 book on the supposed biological reasons for race differences in athletic ability, which was literally entitled Taboo. But Rushton may have been the most obvious proponent of step 3, such as when he first described his research on racial differences before writing, "This challenge to the social science orthodoxy brought political correctness out in force against me, some examples of which I document here."\cite{Rushton_1996} But defenders of prominent race scientists also like to use such language, with the obvious goal of painting a picture of an evil, oppressive establishment in academia that punishes severely anyone who questions a social "orthodoxy" or "taboo". Linda Gottfredson, for instance, in a 1998 pro-Jensen puff piece, writes that "Arthur Jensen is a masterful scientist whose work broke a social taboo."\cite{Gottfredson_1998} She argues elsewhere that there exists a peculiar dichotomy between what intelligence experts (such as herself, of course) believe about intelligence and what they are willing to say publicly for fear of punishment or censorship: after describing racial differences in mean intelligence as "real", "stubborn", and "of great practical importance", she argues," Although these facts may seem surprising, most experts on intelligence believe them to be true but few will acknowledge their truth publicly."\cite{Gottfredson_1994} This is in line with the argument by Charles Murray, a political scientist and co-author of The Bell Curve, who claimed in 2005 that "The Orwellian disinformation about innate group differences is not wholly the media's fault. Many academics who are familiar with the state of knowledge are afraid to go on the record. Talking publicly can dry up research funding for senior professors and can cost assistant professors their jobs."\cite{murray} Murray has continued to make such persecutory assertions as recently as 2017.\cite{iq}The narrative that the mainstream establishment in both academia and the popular media are unfairly biased against hereditarian researchers on race and intelligence, unwilling to even consider that their research might be valid, continues to be built by these researchers themselves. Generally insulated from the harmful effects that might befall it if it were exposed to mainstream criticism, this narrative thus becomes stronger and stronger as time goes on. Thus, the narrative which serves as received wisdom among the hereditarians becomes the following: that their views are politically incorrect and unpopular, but scientifically rigorous and well-supported. This narrative can be strengthened even more when its scientific merits are dismissed in a cursory manner by the mainstream "gatekeepers" of the popular media and the most influential academic journals. From time to time, when someone argues in a high-profile outlet like Nature that this subject should not be researched at all,\cite{Rose2009} or that there should be additional restrictions imposed on it because of the well-established harm that has resulted from such work in the past,\cite{Kourany_2016} it opens the door to allow hereditarians to twist these arguments around. As in step 3, hereditarians can use calls to impose restrictions on research on this subject to argue that "censorship" of their ideas is at work in academia, further bolstering the bolded "narrative" defined in step 4. It is periodically reinforced whenever the topic of race and IQ is once again thrust into the public sphere and the mainstream media has no choice but to examine it and try to cover it accurately. The most prominent example of this in the current century so far is probably that of Nobel laureate James Watson, who said in 2007 that "All our social policies are based on the fact that their [blacks'] intelligence is the same as ours—whereas all the testing says not really", and that though it is generally assumed that different groups are equal, "people who have to deal with black employees find this not true.”\cite{comments} Also in 2007, William Saletan published a credulous article in Slate in which he relied largely on the work of Rushton and Jensen to conclude "Tests do show an IQ deficit, not just for Africans relative to Europeans, but for Europeans relative to Asians. Economic and cultural theories have failed to explain most of the pattern, and there's strong preliminary evidence that part of it is genetic".ConclusionI conclude that complaints of persecution by the "politically correct" establishment in academia and the media are exaggerated, and that claims that race scientists are bravely challenging a vigorously held and defended, but fundamentally false, dogma are unfounded. My conclusion that political correctness is used as an excuse to shield scientists from criticism within academia builds on the work by Winston, who documented harassment of Rushton that, nevertheless, did not rise to the level of an effort by the academic establishment to suppress or censor his work \cite{Winston_1996}.  Invariably, the fact that Rushton and Jensen both were allowed to remain on their respective faculties until they retired is ignored, as is the fact that the then-president of the AAAS, Walter Massey, defended Rushton's right to speak at the 1989 AAAS conference mentioned above (he told the Scientist that "I'm skeptical of the conclusions he's drawn, but the best way to treat this is to have it exposed".)\cite{aaas} On the basis of such evidence as this, Winston (1996) notes that contrary to Rushton's claim about being persecuted because of his unorthodox ideas, "...while Rushton has been publicly harassed, he has had continuous opportunities to present his findings in diverse, widely available, respectable journals, and no general suppression within academic psychology is evident."\cite{Winston_1996} I also conclude that proponents of ideas now generally agreed to constitute scientific racism are influenced in large part by their ideological beliefs, a conclusion that has previously been reached by other scholars looking at such scientists. For instance, Brattain (2007) noted that, when Nazi Germany was at its peak in the 1930s, many scientists were remarkably hesitant to denounce the Nazis' ideology. More to the point, she also demonstrated that racism infiltrated American education to the extent that in 1939, a study found that 20% of textbooks taught what were tantamount to Nazi beliefs about racial superiority and inferiority. In addition to the widespread belief documented by Brattain that blacks were inferior to whites,\cite{Brattain_2007} another major contributor to past work promoting scientific racism (and that opposing it) was the desire to take a side in the segregation debate.\cite{SCHAFFER_2007}