loading page

A noninvasive prenatal test pipeline with a well-generalized machine-learning approach for accurate fetal trisomy detection using low-depth short sequence data
  • +3
  • Bin Tu,
  • Qiongrong Huang,
  • Jianjiang Zhu,
  • Jianbo Lu,
  • Hong Qi,
  • Qiaojun Fang
Bin Tu
National Center for Nanoscience and Technology

Corresponding Author:[email protected]

Author Profile
Qiongrong Huang
National Center for Nanoscience and Technology
Author Profile
Jianjiang Zhu
Haidian District Maternal and Child Health Care Hospital
Author Profile
Jianbo Lu
National Research Institute for Family Planning
Author Profile
Hong Qi
Haidian District Maternal and Child Health Care Hospital
Author Profile
Qiaojun Fang
National Center for Nanoscience and Technology
Author Profile

Abstract

Objective: To find out whether the prediction model using a machine learning approach can have comparable accuracy with the current state-of-the-art trisomy detection methods in extremely low-depth sequencing data. Verify the practical feasibility of being used for clinical auxiliary screening of fetal trisomy. Design: A public dataset with 144 samples is divided into training/validation/test (testA) set. A dataset with 270 sequencing samples was used for independent testing. Setting: Samples are from Hong Kong, China; London, England; Amsterdam, the Netherlands; and Beijing, China. Population: 414 maternal blood samples were analyzed for this study. Methods: The machine learning method for low-depth short sequencing data from maternal blood. Main Outcome Measures: Fetal karyotype was analyzed by interventional prenatal diagnosis or obtaining cord blood after birth. Results: We demonstrate the predictive ability of our method by testing on data from different sources. The final best model achieved an AUC of 99.85% in predicting T21 using chr21 features which are the DNA fragment concentrations. The AUC is 99.50%, and 97.70% in predicting T18 and T13 with all features from 24 chromosomes. PPV was 91.67%, 93.33%, and 83.33% in predicting T21, T18, and T13, respectively. The NPV to identify T21, T18, and T13 were 100%, 99.33%, and 98.70%, respectively. Our approach does not need to calculate fetal fraction (FF) and can handle samples from a wide range of gestational ages (GA), twin pregnancies and fetal mosaicism. We achieved high PPV with low-depth sequencing and robust performance in an independent dataset. Conclusion: Our approach can achieve comparable accuracy with the current best methods. Our pipeline can be an important aid for the detection of fetal trisomy in clinical NIPT.