Material and Methods

Subjects

All subjects in whole-exome sequencing, subsequent validation, and whole blood transcriptome sequencing were obtained from two large school-based cohorts, the Beijing Child and Adolescent Metabolic Syndrome Study (BCAMS), and the China Child and Adolescent Cardiovascular Health Study (CCACH)(Liu et al., 2017; Shan et al., 2010). Adipose tissues were obtained from Plastic Surgery Hospital, Chinese Academy of medical Sciences. Individuals with hormone treatment, secondary obesity due to endocrinopathy and serious intercurrent illness were excluded. All cases and controls were unrelated Han Chinese in Beijing, China. Replication and validation for promising variants were performed on 6,334 (2,480 obese and 3,854 normal weight) children. Written informed consent were obtained from the parents or guardians of all subjects. The study was approved by the ethics committee and institutional review board of the Capital Institute of Paediatrics (IEC-C-008-A08-V.05.1).

Phenotype

Normal weight and common obesity were defined by age- and sex-specific BMI cutoff points recommended by the International Obesity Task Force (IOTF)(Cole, Flegal, Nicholls, & Jackson, 2007). Adiposity was defined as body fat mass percentage greater than or equal to 20% for boys and 25% for girls aged ≤ 14 years and 30% for girls aged > 14 years(Chen, Lu, & Department of Disease Control Ministry of Health, 2004). Weight-to-Height ratio (WHtR) was calculated to define abdominal obesity using a boundary value of 0.5(Browning, Hsieh, & Ashwell, 2010). Dyslipidemia was defined by serum total cholesterol (TC) ≥ 5.20 mmol/L, or triacylglycerol (TG) ≥ 1.70 mmol/L, or high-density lipoprotein cholesterol (HDL-C) ≤ 1.04 mmol/L, or low-density lipoprotein cholesterol (LDL-C) ≥ 3.37 mmol/L.

Whole-exome sequencing

Genomic DNA was isolated from peripheral blood leukocytes using the QIAamp DNA blood kit (Qiagen, Germany). Each DNA library was prepared from 5μg of qualified genomic DNA which was sheared to 180-200 base pairs. The SureSelect Human All Exon V4+UTR Kit (Agilent Technologies, USA) was used to capture 71 Mbps of exons and untranslated regions (UTRs) according to the manufacturer’s protocol. Paired-end sequencing (2×100 bp) was carried out with the HiSeq 2500 Sequencing System (Illumina Inc., USA) at Berry Genomics, Co., Ltd.

Quality control and variant calling

Trimmed reads were aligned to the human reference genome (hg19) using the Burrows-Wheeler Alignment tool (v0.7.17)(H. Li & Durbin, 2009), with mapping rates above 99.4%, and > 90% of the target regions were completely covered with at least 10 × depth. Samples were required to reach at least 20 × coverage over 70% of the exome target. Duplicates were removed by Picard (v2.0.1).
Sequence variation, including SNVs and insertion/deletions, were detected using Genome Analysis Toolkit (v4.1.3.0)(McKenna et al., 2010). We excluded variants with a call rate of < 95% or Hardy-Weinberg equilibrium (HWE) P < 1.0E-6. Samples with a call rate lower than 95% or a heterozygosity rate more than three standard deviations away from the mean were removed. Samples were clear of gender-mismatches or relativeness.
Variants for validation were achieved through several filtering and comparison steps (Figure 1 ). InterVar (v2.0.1) was used for annotation(Q. Li & Wang, 2017). The highest impact effect was taken for variants that have different annotations due to multiple transcripts. SIFT, PolyPhen-2, MutationTaster, and FATHMM were used to predict the possible impact of coding variants on protein function(Yang & Wang, 2015).

Association analysis

Association test was performed with PLINK (v1.90)(Chang et al., 2015). Subjects were characterized according to their BMI and related anthropometrics. Association analysis was performed using linear or logistic regression assuming an additive model with sex and age as covariates. To weaken the effect of population stratification, we also included geographic information as a covariate because the maximal known stratification for Chinese is northern and southern ancestry. To prove the associated SNVs were independent, we performed a conditional analysis on each one by adjusting other significant associated SNVs or previous reported SNVs. Correction for multiple tests was performed using the PLINK adjust function, with genomic control-corrected Pvalues being calculated based on the genotypes of all variants in the final analysis. The threshold for genome-wide significance was set at aP value < 5.0 × 10-8.

Genotyping and validation

Candidate SNVs showing nominal association (P < 0.05) with obesity in the first stage were genotyped in an additional large case-control cohort consisting of 2,480 obese children and 3,854 controls. Genotyping was performed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (SEQUENOM) at Beijing Compass Biotechnology Co., Ltd. Multiplexed assays were designed using MassARRAY Assay Design v3.1. Allele-specific base extension was performed according to recommendations from Agena Bioscience. Sanger sequencing was performed to validate the candidate variants identified by WES and replication analysis.

Transcriptomic sequencing

Whole blood were obtained from 58 cases and 43 controls. Whole blood RNA were extracted using PAXgene® Blood RNA Kit according to its handbook protocols. Frozen subcutaneous adipose tissue were obtained from 11 children during plastic surgeries. RNA from adipose tissues were extracted using mirVana™ miRNA Isolation Kit. Total cDNA library was constructed using NEB Next® Ultra™ RNA Library Prep Kit and paired-end sequencing was performed on Illumina novaseq6000 platform. Reads alignment, transcripts including novel splice variants assembly, abundance computation of these transcripts, and differentially expressed genes and transcripts compare were analyzed with a comprehensive software package of HISAT(Kim, Langmead, & Salzberg, 2015), StringTie(Pertea et al., 2015), and DESeq2(Love, Huber, & Anders, 2014). We focus on the expression pattern of genes, such asSULT1A2 (NC_000016.10), MAP3K21 (NC_000001.11), etc. which potentially affected by rs1059491 and rs189326455.

Functional analysis

We used MEME suite to predict the effect of SNPs on the binding affinity of transcription related factors. To evaluate the structural and conformational change of rs1059491 (SULT1A2: p.N235T), we used PSIPRED for secondary structure prediction. The differential expression data of eQTL rs1059491 in adipose tissues and blood were obtained from both transcriptome sequencing and GTEx website (https://gtexportal.org/). Visualization was achieved by R (www.r-project.org/).