Automated Citation Searching in Systematic Review Production: A Simulation Study Protocol and Framework

Darren Rajit; Lan Du; Helena Teede; Emily Callander; Joanne Enticott

doi:10.22541/au.169028985.56828301/v1

loading page

Automated Citation Searching in Systematic Review Production: A Simulation Study Protocol and Framework

Darren Rajit,
Lan Du,
Helena Teede,
Emily Callander,
Joanne Enticott

Abstract

Citation mining, citation searching or snowball searches have been recommended as a supplementary search method in the conduct of systematic searches for evidence retrieval as part of systematic review production. However, manual methods are extremely costly and time-consuming, with limited empirical evidence for their utility, and limited guidance on how best to incorporate the method during systematic review production. Encouragingly, the advent of programmatic access to bibliographic databases has enabled exploration of automated citation mining for a potentially scalable and replicable approach. Thus, the study aims to simulate and evaluate the use of exclusively automated citation searching methods for evidence retrieval compared to reference standard boolean logic-based methods, and to explore the factors that influence performance. Methods: A total of 30 systematic reviews will be retrieved from the Cochrane Database of Systematic Reviews, Campbell Systematic Reviews and the Collaboration for Environmental Evidence (CEE). Baseline characteristics will be extracted, including the performance of the reference standard boolean search strategy in terms of recall, precision and F(1-3)-score for each sample review. Seed articles from the background and methods section of each sample review and their baseline characteristics will then be extracted, and automated citation searching will be conducted for different seed article and database combinations (Semantic Scholar, OpenAlex). Each seed article candidate will be ranked according to recall, and the top 10 seed articles will be combined in all possible combinations and evaluated. The end performance of automated citation searching will then be compared against the reference standard Boolean strategy for each sample review. The association of factors related to i) automated citation search parameters, ii) characteristics related to review question, and iii) characteristics related to the initial starting set of seed articles will be evaluated. Empirical guidance surrounding the use of automated citation searching will then be generated.