Metabolomics Data Processing
All raw files were first converted into cdf format by the Xcalibur (Thermo Fisher Scientific, San Jose, CA, US.) and then imported into MATLAB 2020a (Mathworks, Natick, MA, US.) for batch data preprocessing using the self-programmed script. Each sample’s metabolomic profile was presented by averaging the mass spectra over 10 continuous scans in the corresponding time window. There were 1518 peaks initially extracted to characterize the metabolomic profile. A data matrix was constructed with each row representing one case and each column representing one peak variable. To reduce the matrix data volume, the peaks that possessing more than 50 % missing values among the first cohort of 254 samples were discarded. No missing value imputation was conducted to avoid artifact statistical results in univariate analysis.31Then, the matrix goes through the IS normalization, natural log transform, zero-centering, and unit variance scaling before univariate analysis, multivariate analysis, and machine learning modelling is applied. The data processing was done at Fudan University and Stanford University.