this is for holding javascript data
Graham McVicker edited Testing against N masked.tex
over 9 years ago
Commit id: 56ec9af28369cd18b94e8e789d24ffffc1dafe45
deletions | additions
diff --git a/Testing against N masked.tex b/Testing against N masked.tex
index 49c78b4..bdff921 100644
--- a/Testing against N masked.tex
+++ b/Testing against N masked.tex
...
\subsection{Comparing WASP mapping to N-masked and personal genome mapping}
To evaluate the accuracy of allelic mapping using WASP, we simulated 100 bp reads from a lymphoblastoid cell line (GM12878) that has been genotyped by the 1000 Genomes and HapMap
projects as well as projects. We additionally imputed and phased
using genotypes for this cell line with IMPUTE2
\cite{Howie_Donnelly_Marchini_2009}. We \cite{Howie_Donnelly_Marchini_2009} using the 1000 Genomes Phase1 integrated version 3 reference panel.
To simulate reads, we identified each base where a read
that began starting at that base would overlap a heterozygous site. We
then simulated generated reads from each haplotype while introducing identical sequencing errors for each haplotype at a predefined rate.
Any base where We considered the mapping of a read to be biased if the read from one haplotype mapped
correctly and to the correct location but the other did
not is considered biased. not.
We evaluated the performance of WASP compared to mapping to a personal or N-masked genome. To create an N-masked genome, we created a copy of the hg19 genome with Ns in place of known variants from the GM12878 cell line. We similarly created maternal and paternal copies of GM12878 using the phased genotypes.
We mapped the simulated reads to the original,
N-masked N-masked, and personal versions of the hg19 genome with BWA \cite{Li_2009} allowing up to 2 mismatches per read ($\verb|-n 2|$), and excluding gapped alignments ($\verb|-o 0|$). We then ran the reads mapped to the original genome through the WASP pipeline. Finally we calculated the rate of
incorrectly mapped reads biased mapping using the WASP, N-masked, and personal genome mapping approaches using several different sequencing error rates.