this is for holding javascript data
David Andrew Eccles added subsection_Linkage_Refinement_label_sec__.tex
almost 9 years ago
Commit id: 8d4e95079e56e8769affe781e4c2f0bf4f244d15
deletions | additions
diff --git a/subsection_Linkage_Refinement_label_sec__.tex b/subsection_Linkage_Refinement_label_sec__.tex
new file mode 100644
index 0000000..175e171
--- /dev/null
+++ b/subsection_Linkage_Refinement_label_sec__.tex
...
\subsection{Linkage Refinement}
\label{sec:meth-summ-refinement}
Linked SNPs were removed from the bootstrap-consistent SNP set in
order to reduce the redundancy of associative signal produced by the
generated SNP set. Markers were ordered based on mean rank order and
any SNPs that were linked ($r^2 > 0.1$) with a higher-ranked SNP were
removed from the set, leaving an \emph{unlinked set} of 34 SNPs.
Markers within a signature marker set should be unlinked, so it is a
good idea to calculate a linkage-associated statistic such as
$D^\prime$ or $r^2$ during the discovery phase of the analysis, and
remove the least informative marker among linked high-association
pairs. This step is carried out after the bootstrap sub-sampling
process in order to reduce the number of pairwise calculations
required for linkage analysis -- pairwise calculations for 500 markers
would require 124,750 linkage comparisons,\footnote{$124,750 = (500^2 -
500) / 2$} while pairwise calculations
on 500,000 markers would require around $1.25\times10^{11}$
comparisons.
% $(x^2 - x) / 2$
\subsection{Set Size Refinement}
\label{sec:sig-thy-eval-effect-mark}
\begin{figure}
\centering
\includegraphics[width=0.95\textwidth]{figures/AUC_T1D_1-34_r2filtered.pdf}
\caption[Marker Refinement Plot]{A marker refinement plot, showing
the effectiveness score (AUC) for increasing numbers of SNPs in
the discovery group. The highest AUC value (0.835 for 5 SNPs) is
circled in red.}
\label{fig:sig-thy-marker-refinement}
\end{figure}
The optimal marker set size was identified using an Area Under the
Curve (AUC) test on the Q-values generated by \textsl{structure} (10,000
bootstraps, and 100,000 total runs), finding marker sets with large
differences in mean Q value between the two groups (see
Figure~\ref{fig:sig-thy-marker-refinement}). Increasing numbers of
markers were selected from the unlinked SNP set based on mean rank
order identified during the previous (bootstrap sub-sampling) stage.
The effectiveness of a given set of markers was evaluated using the
\textsl{structure} program, followed by an AUC calculation for
each set of markers based on Q values reported by the program.
The \textsl{structure} program outputs values that represent to how
genetically similar an individual is to a particular group (Q values),
attempting to cluster pooled individuals into two
``populations''.\footnote{The \textsl{structure} program is designed
for \emph{population} analysis, but is used here for \emph{group}
analysis.} The Q values produced by \textsl{structure} are
continuous in the range between 0 and 1 inclusive, and are treated as
an estimate of the probability that an individual has a particular
trait.
Analysis of Q values was used to determine false positive and true
positive rates for given Q-value cutoffs (see
Figure~\ref{fig:t1d-validation-top5-ROC-analysis}). The true positive
rate was calculated as the proportion of T1D cases with Q below the
cutoff value, and false positive rate was calculated in the same way
for NBS controls. The area under the curve of this graph can be used
as an indication of the effectiveness of a quantitative test. An AUC
of 1 indicates a perfect test (no misclassification), while an AUC of
0.5 indicates a test that cannot distinguish between groups.
The greatest difference between cases and controls was observed when
the top 5 SNPs were selected, producing an AUC of 0.8449. This
\emph{signature set} of 5 SNPs was considered to be the most
appropriate T1D-informative set.
\subsection{Validation of Final 5 SNP Set}
\label{sec:meth-summ-validation}
The signature set of 5 SNPs (see Table~\ref{tab:top5-snps-t1d}) was
finally tested on the validation group (982 T1D cases, 729 NBS
controls) using \textsl{structure}, followed by an AUC analysis of the Q
values. There is a small overlap between some T1D cases and some NBS
controls (Figure~\ref{fig:t1d-validation-structure-top5}), but most
T1D cases cluster together, and are separate from the cluster of NBS
controls.
\begin{table}
\centering
\begin{tabular}{ccccc}
\textbf{Marker} & \textbf{Chromosome} &
\textbf{Location (Mb)} & \textbf{$\chi^2$} &\textbf{Mean Rank}\\\hline
rs9273363 & 6 & 32734250 & 485 & 1\\
rs3957146 & 6 & 32789508 & 317 & 2.2\\
rs3135377 & 6 & 32493377 & 264 & 4.3\\
rs7431934 & 3 & 40268801 & 199 & 13.7\\
rs1046089 & 6 & 31710946 & 108 & 37.9\\
\hline
\end{tabular}
\caption[T1D SNP Location table]{Location
information for the top 5 SNPs discovered in a bootstrap
sub-sampled GWAS for T1D associations, after removing linked SNPs,
and choosing the set with the highest AUC value. Mean rank
reported in this table is based on the marker rank for 100
bootstrap sub-samples. Out of the five
markers, four are within a 2Mb region of chromosome 6.}
\label{tab:top5-snps-t1d}
\end{table}