To meet the challenges of increasing food demands, environmental sustainability, and diverse breeding targets in plants and animals, molecular markers have been widely used to reveal polymorphism at the DNA level. These DNA markers have been applied to numerous uses including identification of marker-trait associations, marker-assisted selection, genomic selection, and germplasm characterization. Over the past four decades, markers have evolved from interrogating hundreds of loci on hundreds of lines (e.g. Restriction Fragment Length Polymorphisms (RFLPs)\cite{Botstein1980}, or simple sequence repeats (SSR)\cite{Tautz1989} ) to tens of thousands of loci on an almost unlimited number of lines (e.g. fluorescence hybridization-based microarray or next-generation sequencing based genotyping). However, the most commonly used next-generation sequencing restriction-site associated DNA\cite{Miller2007}, genotyping-by-sequencing (GBS)\cite{Elshire2011} and Specific locus amplified fragment sequencing (SLAF-seq)\cite{Sun2013} suffer from high missing rate and under-calling of heterozygous sites in highly diverse and heterozygous species. In previous case study using grapevine, amplicon sequencing (AmpSeq) solved the problem of high-missing rate and under-calling of heterozygote sites, but those AmpSeq markers interrogated hundreds not thousands of loci. The remaining problem for the highly diverse and heterozygous species is marker transferability. For example, the breeding practice for Eucalyptus hardwood includes species that diverged 2 to 5 million years ago (Mya)\cite{Grattapaglia2011}, and grape breeding often includes species that diverged around 20 Mya\cite{Wan_2013}. Therefore, universal transāspecies molecular marker panel are needed spanning the breeding diversity of each of these taxa. In addition, such a marker panel could improve efficiency in genotyping non-model organisms that have closely related species that are well studied.
The transferability problem is mainly because most genetic variants are rare. Therefore, genetic variants are usually specific to individuals, populations or species, and should only work well in the rare cases that the trait of interest is tightly linked and only present in those same individuals. Common variants have been used in marker development, but the transferability is still not satisfactory. The transferability problem can be further broken down into three categories: 1) Genotyping failure: The marker fails to return data when applied to different germplasm, for example, when PCR primers fail to bind to the target sites or restriction digestion fails due to sequence variation at sites other than the associated polymorphism. In a study using three SNP chips (BovineSNP50, OvineSNP50, and EquineSNP50) to genotype those species diverged from less than 1 to 50 Mya, the authors found an average 1.5% increase in genotyping failure rate per million years of divergence time
\cite{Miller2012}. 2) No polymorphism in different germplasm: For example, for the most widely used SNP genotyping array in maize, the Illumina maize SNP50 BeadChip, only 17% to 33% of the marker are polymorphic in populations from European maize inbred lines among the 49,585 high-quality markers designed mostly for the temperate germplasm
\cite{Bauer2013}. And this problem is more evident in the highly diverse and heterozygous species. The transferability drops to 2.3% when transferring the marker between species in the
Vitis genus
(Vezzulli 2008). The same issue has been reported in cattle
\cite{Michelizzi2010,wu2013genome}, as only 2% of markers are polymorphic when applying the panel designed for cattle to water buffalo (diverged ~12Mya). Miller et al discovered a pattern of the retention of polymorphisms using a series of species that share a common ancestor, and they found that the retention of polymorphisms follows an exponential decay as the divergence time increase. The polymorphisms decreased to only 5% when examined in a species that diverged 5 Mya
\cite{Miller2012}. 3) Variable genetic or physical position: In species or loci where linkage disequilibrium decays rapidly and large structural variations occur, different markers may be linked to the same causal polymorphism in different germplasm. For example, grapevine flower sex consistently maps to the same locus, but the most significant markers in QTL analyses vary, and SSR, AmpSeq, and other markers used to track the locus are not transferable across
Vitis species
(Yang 2016).
The aims of this study were: (1) to develop a pipeline to design transferable markers, using as a case study the core genome of the Vitis genus; and (2) to test the transferability of these markers across a broad diversity of Vitis breeding populations, incorporating a high-fidelity RNase H2 enzyme dependent amplicon sequencing (rhAmpSeq) platform for deeper multiplexing\cite{Dobosy2011}. rhAmpSeq is a new technology with the throughput of the original AmpSeq and improved specificity. To accomplish these objectives, we de novo assembled six genomes in the Vitis genus, collected three genomes from the public database, and constructed a core genome based on syntenic genome alignment. Two thousand markers that span all 19 chromosomes with average distance 200 kilobases (kb) were manufactured and tested in four population that represent the greatest genetic diversity in the US breeding practice. The results in the four populations indicate that the core genome plus rhAmpSeq platform could generate high polymorphic data with a very low missing rate. This pipeline provides a trans-species genotyping method that works for highly diverse and heterozygous species.