Discovering Stars with Altmetrics 0. abstract 1. introduction 1.1 importance of metrics for personnel evaluation Arguably the most important use of bibliometric techniques is in aiding in the evaluation of individual scholars. If the capability of individuals to contribute to an organization or discipline follows a Zipf or Lottke like power law distribution, it is obviously crucial for organizations to choose the most capable or talented for hiring, advancement and rewards. Simply appling the 80-20 rule implies that a well chosen faculty can be a factor of ten times as productive as a randomly chosen one. At early career stages personnel decisions are essentially predictions about an individual’s future achievements. Bibliometric methods, such as article and/or citation counts (Garfield, Sher and Torpie, 1964) have long been used to help evaluate individuals, and predict their future success; famously Garfield and Malin (1968) used citation counts to predict Gell-Mann would win the Nobel Prize in Physics, which he did the next year. Recently improvements to raw citation counts, and non-citation (alternative or alt, e.g. Priem 2013) based measures have been suggested by many authors, article downloads (Kurtz and Bollen 2010) being the most common. How well do these measures predict the future performance of individuals? In this paper we cross compare three substantially different indicators/predictors for an observational cohort of all 905 persons who received a PhD in Astronomy from a U.S. university in the five year span 1972-1976. The three indicators are the citation based TORI statistic (Pepe and Kurtz, 2012) which normalizes citation counts by both the number of authors on a cited paper and the number of references in the citing paper; the downloads based Read10 statistic (Kurtz, et al 2005b) which counts downloads of papers published within ten years of the download date, normalized by number of authors; and the combined results of elections and prize committee deliberations af the American Astronomical Society and the National Academy of Science. This paper is not intended to compare different citation measures. The (near) full career retrospective data used here, of individuals from a single age cohort in a single discipline, are much less susceptible to the large variance in the per article number of authors and references than a modern, cross-disciplinary data set would be. The TORI was designed to address the more modern issue; for the data in this paper we expect the results using TORI to be very similar to any other citation based indicator, such as total citations (Garfield, Sher and Torpie 1964, co-author number normalized citations (Kurtz, et al 2005b), h (Hirsch 2006), or the CWTS crown indicator (Moed, De Bruis and Van Leeuwen 1995). 1.2 Garfield 1964, Nobel web site 1.3 alt metrics, reads, K05 read-cite diagram 2. data 2.1 1972-1976 USA PhD Astro 2.1.1 hand work 2.1.2 papers, AAS, gender We searched the Astronomy database of the Smithsonian/NASA Astrophysics Data System (Kurtz, et al 2005a: ADS) for PhD Theses from U.S. Universities granted in the five years between 1972 and 1976. We found 905, corresponding to 905 individuals. This work was performed in 2002 by B. Ellwell. Individuals in this cohort are now (2013) typically between 64 and 74 years old. We used the ADS astronomy database to obtain a listing of all of each individuals astronomy papers. We eliminated papers by persons with similar names, and added papers missed due to name changes (marriage) by hand curation. This process required a substantial number of telephone calls. Additionally we noted whether each individual was a member of the American Astronomical Society (AAS) according to the 2002 member directory. These article lists do not include papers from the ADS Physics database; this very substantially reduced the scale of the homonym problem, at the cost of an underestimation of the impact of individuals whose career had a substantial physics component. 2.2 2000-2001 ADS logs To compute the Read10 statistic we counted downloads of papers published in 1991 or later using download information from the ADS logs for the years 2000 and 2001;. At that time the ADS was heavily used by professional astronomers, worldwide (K05a) but had not yet been discovered by others, thus the ADS logs from then represent a reasonably fair and pure sample of what was being read by professional astronomers, without additional filtering. In the year 2000 (1991) members of our PhD cohort were typically between 51(42) and 61(52) years of age. For this study a download was considered as any accessing of data on a paper, whether full text, abstract, citations, references, data link, or one of several other lesser used options. The Read10 statistic used here is exactly the same as that used in K05b, which allows the plots in this paper to be directly compared with the plots in that paper. This definition differs in several details with that currently in use by the ADS (ADS 2013); the results of this paper are not affected by these differences. 2.3 ADS citation index For the citation analysis we use the current ADS citation database; Abt(2006) has compared the completeness of the ADS citations with the Web of Science, both have a high level of completeness, but ADS is more complete in covering the astronomy conference literature, and WoS is (much) more complete in covering references originating outside physics and astronomy. Because the citation data contains the publication date of the citing article, it is a simple matter to derive retrospectively the citation measure for any article at any date; this was done to compute the TORI statistic five (TORI5) and twenty-five (TORI25) years past the PhD. 2.4 AAS directory, 2002 + current 2.5 NAS website Membership in the American Astronomical Society (AAS) as of 1 Jan 2002 was taken from the 2002 AAS Membership Directory, 339 members of the cohort were in the AAS at that time; lists of AAS prize recipients and officers were taken from the AAS’ current web site (AAS 2013). Membership in the U.S. National Academy of Science was taken from their website, along with the date elected, 14 members of the cohort have been elected into the NAS. In addition to simple membership we have created three additional subsets of individuals. Honored are persons who (as of 2012) have been elected to the NAS, or to any office in the AAS, or who have received any prize from the AAS, 29 from the cohort. Highly honored are individuals who have been elected to the NAS, or to the AAS presidency or vice presidency, or who have received a mid or late career prize from the AAS, 21 from the cohort. Young award winners are individuals who have received one of the three prizes given by the AAS between 1970 and 1990 to individuals within five years of receiving the PhD, this comprises 49 individuals, most not in the cohort.. 3. results All bibliometric investigations are limited by the nature of the samples taken; this is especially true for comparisons between individual researchers. Some research disciplines have citation rates which differ from others by a factor of five and citation measures for individuals increase approximately quadratically with the scientific age the person. Thus comparing individuals or groups of individuals at different ages and from different fields is problematic. By choosing a sample of individuals from a single discipline and with very similar scientific ages we ameliorate these problems. Additionally substantial difficulties can arise in determining the extent of the low productivity end of the distribution; this is sometimes known as survivor bias. Most sample selection strategies systematically undersample low achievers. By taking all (U.S.) astronomy PhDs from our five year cohort our sample approximates everyone at the relevant age who could be an author on an astronomy research paper. Figures XXX-XXX show the distributions of TORI5, the citation based measure taken five years past the PhD, TORI25, the citation based measure taken 25 years past the PhD, and READ10, the measure of downloads of recently published articles (as of 2002), for the entire sample, and two subsets. The distributions are shown as histograms binned in equal logarithmic steps (a multiplicative factor of 1.585). The outer, clear histogram shows the full sample, the cross-hatched histogram shows the 2002 AAS member sample, and the black histogram shows the honored sample. One member of the honored sample was not a member of the AAS in 2002 (this person is currently an AAS member), otherwise these are nested proper subsets. 3.1 tori25 3.2 read10 3.3 tori5 4. discussion 4.1 equivalence of t25 & r10 4.2 university bias all3 a. 16 top 50 tori25+Read10+ hiAAS; b. 15 top50 tori25+Read10 not hiAAS a 9 nas; 10/8 univ; 6/1 not univ b 13/1 nas univ; 2/0 not univ odds of a random <1.4 4.3 t5 vs aas junior tori5>5.5 11/55 in nas aas5 10/49 in nas 1971-1990 4.4 t5 v r10 mjg at 24ptile, lowest t5 of nas, tenure track before PhD+5 Citation: Mazloumian A (2012) Predicting Scholars’ Scientific Impact. PLoS ONE 7(11): e49246. doi:10.1371/journal.pone.0049246