Discovering potential key features of genome wide profiling data using Decision Variable Analysis

Jie Xie; Feng Xie; Cheng Li; Weike Lu; Zhen Yang

doi:10.22541/au.171023413.39007505/v1

loading page

Discovering potential key features of genome wide profiling data using Decision Variable Analysis

Jie Xie,
Feng Xie,
Cheng Li,
Weike Lu,
Zhen Yang,
Hanling Zhang

Abstract

The identification of key features related to the phenotype of interest (POI) from high-dimensional data has been one of the most important issues for omics-data studies, such as transcriptome or DNA methylome data. However, these data are commonly contaminated by sources of unwanted variation caused by platforms, batches or other types of biological factors. Thus, the data can be considered as a combination of variation derived from POI and other confounding factors. Not taking these factors into consideration could lead to spurious associations and missing important signals. Based on this idea, we propose a novel feature selection method called Decision Variable Analysis (DVA) to extract the important features related to POI from the data containing potential confounding factors. Using this method on the simulated data and real data, respectively, we found DVA performed better in identifying confounding factors compared to other methods, including linear regression and surrogate variable analysis. Especially, our method is more efficient for the data in which there are much more feature numbers than sample sizes. We show improvements of DVA across high-dimensional datasets with smaller sample sizes compared to feature numbers on different platforms. The results indicate that DVA is an effective method to dissect sources of variation for omics-data with potential confounding factors. DVA is freely available for use at https://github.com/xvon1/DVA.