A total of 47 accessions with 2-93× pair-ended Illumina sequencing data were downloaded from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) . If one accession ha reads more than 20 ×, we randomly sample 20 × reads for variants calling. We also sequenced 8 accessions with 5 -10 × pair-ended sequencing at the Illumina HiSeq 2500 platform. The insert size is 400bp(?) and the read length is 2 × 150bp. The raw sequencing data have been deposited in the SRA at NCBI under BioProject ID: PRJNAXXXXXX. All the information of SRA, including project Number, total base pairs, and name of accession are list in supplemental table X. Variants were called using the Sentieon DNA Software Package (version, Golden helix )\cite{Kendig396325} with default settings. This Sentieon package is a speed-up software that rebuilt the Genome Analysis Toolkit HaplotypeCaller and returns the same result as GATK 3.3. Principal component analyses (PCA) was conducted using R/Bioconductor Package SNPRelate. To avoid the strong influence of the SNP clusters in the PCA analysis, SNPs are filtered using LD-based pruning algorithm implemented in SNPRelate with LD threshold 0.2. We randomly removed samples with very close relationship based on the  eigenvalues  from the PCA and kept 20 V. viniferea samples and 20 non-vinifera samples.

Marker design pipeline

The vcf file generated from the genus-wide variants calling is loaded in to R.  For each non-gap alignable region in the core-genome, we checked their length, diversity and missing rate. The regions that are shorter than 200bp, with diversity larger than 7% or smaller than 2%, or  has average missing rate large than 50% were dropped out. These steps were conducted in R using the bioconductor (version 3.8 ) package VariantAnnotation \cite{Obenchain2014}. The candidate regions are then picked  to ensure one marker per 200Kb. If no qualified candidate region can be found in a 1 Mbp window, we included the regions that has highest coverage in the core genome construction. For each 1 Mbp sliding window, we randomly include more candidate region for the high gene density region.  A total of 2500 candidate region were sent to IDT for primer design and pooling compatibility test.  Primers can be designed for 99.6% of the regions and 98.1% of them are pooling compatible in one-tube-PCR. A  total of 2000 rhPCR markers were synthesized by IDT.