Deep coverage whole genome sequencing samples and
collection of B-allele frequencies from X chromosome:
All high coverage whole genome sequencing samples (10 males and 15
females) from 1000G project phase3 were used to collect B-Allele
frequencies (BAF) (Auton et al., 2015). GATK UnifiedGenotyper and
HaplotypeCaller were used to collect BAF data from high quality SNPs
from the X chromosome non-pseudoautosomal region (Van der Auwera et al.,
2013). Initial collection of SNPs were filtered based on depth, strand
balance and position bias using GATK FilterVariants. Final sets of
variants were evaluated based on their BAF values and grouped into
homozygous reference (BAF values 0.0-0.2), heterozygous (BAF values
0.2-0.8) and homozygous alternative (BAF values 0.8-1.0) categories.