Metabolomics Data Processing
All raw files were first converted into cdf format by the Xcalibur
(Thermo Fisher Scientific, San Jose, CA, US.) and then imported into
MATLAB 2020a (Mathworks, Natick, MA, US.) for batch data preprocessing
using the self-programmed script. Each sample’s metabolomic profile was
presented by averaging the mass spectra over 10 continuous scans in the
corresponding time window. There were 1518 peaks initially extracted to
characterize the metabolomic profile. A data matrix was constructed with
each row representing one case and each column representing one peak
variable. To reduce the matrix data volume, the peaks that possessing
more than 50 % missing values among the first cohort of 254 samples
were discarded. No missing value imputation was conducted to avoid
artifact statistical results in univariate analysis.31Then, the matrix goes through the IS normalization, natural log
transform, zero-centering, and unit variance scaling before univariate
analysis, multivariate analysis, and machine learning modelling is
applied. The data processing was done at Fudan University and Stanford
University.