Authorea

Graham McVicker edited Testing against N masked.tex over 9 years ago

Commit id: 56ec9af28369cd18b94e8e789d24ffffc1dafe45

deletions | additions

\subsection{Comparing WASP mapping to N-masked and personal genome mapping} To evaluate the accuracy of allelic mapping using WASP, we simulated 100 bp reads from a lymphoblastoid cell line (GM12878) that has been genotyped by the 1000 Genomes and HapMap projects as well as projects. We additionally imputed and phased using genotypes for this cell line with IMPUTE2 \cite{Howie_Donnelly_Marchini_2009}. We \cite{Howie_Donnelly_Marchini_2009} using the 1000 Genomes Phase1 integrated version 3 reference panel. To simulate reads, we identified each base where a read that began starting at that base would overlap a heterozygous site. We then simulated generated reads from each haplotype while introducing identical sequencing errors for each haplotype at a predefined rate. Any base where We considered the mapping of a read to be biased if the read from one haplotype mapped correctly and to the correct location but the other did not is considered biased. not. We evaluated the performance of WASP compared to mapping to a personal or N-masked genome. To create an N-masked genome, we created a copy of the hg19 genome with Ns in place of known variants from the GM12878 cell line. We similarly created maternal and paternal copies of GM12878 using the phased genotypes. We mapped the simulated reads to the original, N-masked N-masked, and personal versions of the hg19 genome with BWA \cite{Li_2009} allowing up to 2 mismatches per read ($\verb|-n 2|$), and excluding gapped alignments ($\verb|-o 0|$). We then ran the reads mapped to the original genome through the WASP pipeline. Finally we calculated the rate of incorrectly mapped reads biased mapping using the WASP, N-masked, and personal genome mapping approaches using several different sequencing error rates.