loading page

Standardised and reproducible phenotyping using distributed analytics and tools in the Data Analysis and Real World Interrogation Network (DARWIN EU®)
  • +11
  • Daniel Prieto Alhambra,
  • Francesco Dernie,
  • George Corby,
  • Abigail Robinson,
  • James Bezer,
  • Rowan Parry,
  • Annika M. Jödicke,
  • Talita Duarte-Salles,
  • Peter Rijnbeek,
  • Katia Verhamme,
  • Alexandra Pacurariu,
  • Daniel Morales,
  • Luís Pinheiro,
  • Albert Prats-Uribe
Daniel Prieto Alhambra
University of Oxford Nuffield Department of Orthopaedics Rheumatology and Musculoskeletal Sciences

Corresponding Author:[email protected]

Author Profile
Francesco Dernie
University of Oxford Medical Sciences Division
Author Profile
George Corby
University of Oxford Medical Sciences Division
Author Profile
Abigail Robinson
University of Oxford Medical Sciences Division
Author Profile
James Bezer
University of Oxford Medical Sciences Division
Author Profile
Rowan Parry
Erasmus University Medical Centre
Author Profile
Annika M. Jödicke
University of Oxford Nuffield Department of Orthopaedics Rheumatology and Musculoskeletal Sciences
Author Profile
Talita Duarte-Salles
Erasmus University Medical Centre
Author Profile
Peter Rijnbeek
Erasmus University Medical Centre
Author Profile
Katia Verhamme
Erasmus University Medical Centre
Author Profile
Alexandra Pacurariu
European Medicines Agency
Author Profile
Daniel Morales
European Medicines Agency
Author Profile
Luís Pinheiro
European Medicines Agency
Author Profile
Albert Prats-Uribe
University of Oxford Nuffield Department of Orthopaedics Rheumatology and Musculoskeletal Sciences
Author Profile

Abstract

Purpose The generation of representative disease phenotypes is important for ensuring the reliability of the findings of observational studies. The aim of this manuscript is to outline a reproducible framework for reliable and traceable phenotype generation based on real world data for use in the Data Analysis and Real-World Interrogation Network (DARWIN EU®). We illustrate the use of this framework by generating phenotypes for two complex diseases: pancreatic cancer and systemic lupus erythematosus (SLE). Methods The phenotyping process involves a 14-step process based on a standard operating procedure co-created by the DARWIN EU® Coordination Centre in collaboration with the European Medicines Agency. A number of bespoke R packages were utilised to generate and review codelists for two phenotypes based on real world data mapped to the OMOP Common Data Model. Results Phenotypes were generated for both pancreatic cancer and SLE, and cohorts were generated using the Clinical Practice Research Datalink (UK primary care records) and Pharmetrics (US health claims data). Diagnostic checks were performed, which showed these cohorts had broadly similar incidence and prevalence figures to previously published literature. Additionally, co-occurrent symptoms, conditions, and medication use were in keeping with pre-specified clinical descriptions based on previous knowledge. Conclusions Our detailed phenotyping process makes use of bespoke tools and allows for comprehensive codelist generation and review, as well as large-scale exploration of the characteristics of the generated cohorts. Wider use of structured phenotyping methods will be important in ensuring the reliability of observational studies for regulatory purposes.