Deep coverage whole genome sequencing samples and collection of B-allele frequencies from X chromosome:
All high coverage whole genome sequencing samples (10 males and 15 females) from 1000G project phase3 were used to collect B-Allele frequencies (BAF) (Auton et al., 2015). GATK UnifiedGenotyper and HaplotypeCaller were used to collect BAF data from high quality SNPs from the X chromosome non-pseudoautosomal region (Van der Auwera et al., 2013). Initial collection of SNPs were filtered based on depth, strand balance and position bias using GATK FilterVariants. Final sets of variants were evaluated based on their BAF values and grouped into homozygous reference (BAF values 0.0-0.2), heterozygous (BAF values 0.2-0.8) and homozygous alternative (BAF values 0.8-1.0) categories.