Methods

Search Criteria Using the h5-Index from Google Scholar Metrics, we selected the 6 oncology journals with the highest index scores. We searched PubMed search using the following search string:((((((“Journal of clinical oncology : official journal of the American Society of Clinical Oncology”[Journal] OR “Nature reviews. Cancer”[Journal]) OR “Cancer research”[Journal]) OR “The Lancet. Oncology”[Journal]) OR “Clinical cancer research : an official journal of the American Association for Cancer Research”[Journal]) OR “Cancer cell”[Journal]) AND (“2007/01/01”[PDAT] : “2015/12/31”[PDAT]) AND “humans”[MeSH Terms]) AND (((meta-analysis[Title/Abstract] OR meta-analysis[Publication Type]) OR systematic review[Title/Abstract]) AND (“2007/01/01”[PDAT] : “2015/12/31”[PDAT]) AND “humans”[MeSH Terms]) AND ((“2007/01/01”[PDAT] : “2015/12/31”[PDAT]) AND “humans”[MeSH Terms]). This search strategy was adapted from a previously established method that is sensitive to identifying systematic reviews and meta-analyses (Montori 2005). Searches were conducted on 18 May and 26 May 2015.

Screening and data extraction We used Covidence (covidence.org) to initially screen articles based on title and abstract. To qualify as a systematic review, studies had to summarize evidence across multiple studies and provide information on the search strategy, such as search terms, databases, or inclusion/exclusion criteria. Meta-analyses were classified as quantitative syntheses of results across multiple studies (Onishi 2014). Two screeners independently reviewed the titles and abstracts of each citation and made a decision regarding its suitability for inclusion based on the definitions previously described. Next, the screeners held a meeting to revisit the citations in conflict and arrive at a final consensus. Following the screening process, full-text versions of included articles were obtained via EndNote.

To standardize the coding process, an abstraction manual was developed and pilot tested. After completing this process, a training session was conducted to familiarize coders with abstracting the data elements. A subset of studies were jointly coded as a group. After the training exercise, each coder was provided with 3 new articles to code independently. Inter-rater agreement of this data was calculated using Cohen’s kappa. Since inter-rater agreement was high (k=0.86; agreement=91 percent), each coder was assigned an equal subset of articles for data abstraction. We coded the following elements: a) statistical test used to evaluate heterogeneity; b) a priori threshold for statistical significance; c) type of model (random, fixed, mixed, or both); d) whether authors selected a random effects model based on significance of the heterogeneity test; e) whether authors used a random effects model without explanation; f) what type of plot was used to evaluate heterogeneity, if any; g) whether the plot was published as a figure in the manuscript; h) whether follow-up analysis was conducted, and if so, the type of analysis (subgroup, meta-regression, and/or sensitivity analysis); i) whether heterogeneity was mentioned in writing only; and j) whether authors concluded there was too much heterogeneity to perform a meta-analysis. After the initial coding process, validation checks were conducted such that each coded element was verified by the other coder. After these checks were performed, coders met to discuss disagreements and settle them by consensus. Analysis of the final data was conducted using STATA 13.1. Data from this study are publically available on Figshare (http://dx.doi.org/10.6084/m9.figshare.1496574).

Evidence-Based Mapping We modified an approach by Althuis, Weed, and Frankenfield to perform the evidence mapping process \cite{Althuis_2014}. This approach focused on observational studies and included a step to evaluate covariates adjusted for across the primary studies. In this study, we included an evaluation of particular risk of bias components pertinent to the selected articles. We performed the following steps during the evidence mapping exercise:

1. We selected a systematic review that compared interventions and measured a specific outcome.

2. We formulated a research question based on the PICOS (population, intervention, control, outcome, study design) method. 3. We reviewed the primary studies from the selected systematic review to find a natural division to begin mapping. We examined the methods sections of each primary study in detail on relevant design features and considered all aspects of the PICOS question as we compared studies. Our initial goal was to categorize studies into two groups, and we created a diagram to visually depict this sorting of primary studies into the relevant categories. After the initial diagram was constructed, we determined additional groupings that could further differentiate primary studies.

4. We developed a second table informed by the Cochrane Risk of Bias tool and the CONSORT Guidelines. Each primary study was examined to determine if it might be susceptible to bias.

5. Last, we compiled all other defining characteristics that could be sources of heterogeneity from the trials. These included patient population characteristics, intervention characteristics, outcome evaluation characteristics, and study design features. This additional information was placed in the final table to display a summary of the heterogeneity mapping exercise.