Lynch syndrome (LS) is an autosomal dominant inherited disease and its prevalence is 1–3% in unselected colorectal or endometrial cancer patients (de la Chapelle, 2005). It is characterized by increased risks for early-onset tumor development, especially for colorectal cancer (CRC), endometrial cancer, ovarian cancer, and other extracolonic tumors such as hepatobiliary, urothelial, brain or central nervous system tumors, as well as sebaceous tumors (Cohen & Leininger, 2014). Lynch syndrome is caused by germline mutations in one of the mismatch repair (MMR) andEPCAM genes (Da Silva, Wernhoff, Dominguez-Barrera, & Dominguez-Valentin, 2016). Tumors from LS patients normally exhibit high microsatellite instability (MSI-H) and loss of expression of one or more MMR proteins (Boland, Koi, Chang, & Carethers, 2008). Substitutions, small insertion/deletions, large deletions/duplications, inversions (Liu et al., 2016; Mork et al., 2017; Rhees, Arnold, & Boland, 2014), as well as insertions of retrotransposon have been reported in the MMR genes as causes of LS (Peltomaki & Vasen, 2004; van der Klift, Tops, Hes, Devilee, & Wijnen, 2012).
Retrotransposons are DNA sequences that proliferate in the genome using an RNA intermediate and a ‘copy- and-paste’ retrotransposition mechanism. Retrotransposons can be subdivided into two groups distinguished by the presence or absence of long terminal repeats (LTRs). Retrotransposons without LTRs include Long Interspersed Elements 1 (LINE-1, L1), Alu elements (Short Interspersed Elements, SINE) and SVA (SINE-VNTR-Alu) elements (Cordaux & Batzer, 2009; Rebollo, Romanish, & Mager, 2012). Approximately 124 retrotransposon insertions associated with human disease have been previously reported (Hancks & Kazazian, 2016)
To date, ten (10) gross insertions larger than 20 base pairs (bp) have been recorded in MMR genes in the Human Genome Mutation database (HGMD Professional 2019.4). Five of these large insertions involved retrotransposons, four of which were Alu insertions with two in each of the MLH1 (Leclerc et al., 2018; Solassol et al., 2019) and MSH2genes (Kloor et al., 2004; Marshall, Isidro, & Boavida, 1996) and one was an SVA insertion in PMS2(van der Klift et al., 2012). Up to now, no SVA insertion has been reported in MSH2. In this study, we report an insertion of an SVA element at c.1972 in exon 12 ofMSH2 as a novel cause of Lynch syndrome.
Our proband is a 49-year-old man who was diagnosed with colon cancer at age 43. A four-generation pedigree (Fig. 1) indicated that other family members were affected with early-onset colorectal cancer (CRC) under age 50. The proband’s mother was diagnosed with metachronous endometrial and CRC and one maternal aunt was diagnosed with CRC at 50. One of the proband’s brothers had colon polyps, and was subsequently diagnosed with a proximal colon cancer at age 54 which was MSH2 and MSH6 deficient on immunohistochemical (IHC) staining. Another brother was diagnosed with a screen detected colon cancer at age 38 which also demonstrated loss of expression of the MSH2 and MSH6 proteins by IHC. However, no mutation was identified through next generation sequencing (NGS) and the 10 Mb inversion in MSH2 was not detected (Rhees et al., 2014).
Another maternal aunt of the index case was diagnosed with CRC at 35. Her daughter was diagnosed with endometrial cancer at age 38 which demonstrated MSI-H and loss of expression of MSH2 and MSH6 proteins by IHC. This family member was initially tested in 2007 for MLH1 andMSH2 sequencing and large rearrangement in a reference laboratory and was identified to have an MSH2 intron 12 rearrangement which was classified as variant of uncertain significance (VUS). Multiple family members affected with colon or endometrial cancer were tested and no mutation was identified, although tumor tissues of several individuals were tested and showed loss of MSH2 and MSH6 proteins with immunohistochemistry (IHC). A weak aberrant larger transcript was identified but not further characterized in lymphocyte RNA isolated from one of these family members who was affected with colorectal cancer (age 38) that showed loss of MSH2 and MSH6 expression (Fig. 2a). Additional Southern blot analysis on genomic DNA of the same patient indicated the presence of a 3 kb insertion, possibly a large LINE-1 or SVA insertion (Fig. 2b); restriction fragment analysis could narrow down the place of the insertion to a 1.45 kb region around MSH2 exon 12 (Fig. 2c). The same rearrangement was shown with Southern blot analysis in the genomic DNA from another more distantly related family member; this individual presented with endometrial cancer at age 34, and showed loss of MSH2 and MSH6 in tumor tissue. However, the type of retrotransposon and the exact genomic location of the insertion were not determined.
The proband was seen at the Clinical Genetics Service (CGS) at Memorial Sloan Kettering Cancer Center (MSKCC). Immunohistochemistry (IHC) analysis indicated loss of MSH2 and MSH6 proteins in the tumor. NoMSH2 inversion was detected. Given the strong family history of colon cancer, a colorectal multi-gene panel test (sequencing and large rearrangement analysis of APC , EPCAM (large rearrangement only), MLH1, MSH2, MSH6, MUTYH, PMS2, POLD1, POLE , with add-on genes: PTEN, BRCA1, BRCA2 ) was performed at the Diagnostic Molecular Genetics laboratory at MSK. Testing identified aberrant sequences before c.1954 and after c.1972 positions in exon 12 ofMSH2 (Fig. 3a). No other mutations or VUSs were identified in the remaining eleven genes analyzed. The copy number of MSH2 exon 12 was normal based on our next-generation sequencing (NGS) analysis (Fig. 3b) which was confirmed by multiplex ligation-dependent probe amplification (MLPA) (Fig. 3c), indicating that the aberrant sequence was probably not due to a genomic deletion or duplication of the coding region of MSH2 .
To investigate the nature and origin of the abnormal sequence, long-range (LR) PCR, was performed on genomic DNA from the patient using the TaKaRa LA PCR Kit according to the manufacturer’s protocol, an M13-tagged forward primer located in intron 11 (5’- GTA AAA CGA CGG CCA GT GGGTTTTGAATTCCCAAATG - 3’) and an M13-tagged reverse primer in intron 12 (5’- CAG GAA ACA GCT ATG AC AAAACGTTACCCCCACAAAG-3’). One band about 400 bp in length was present in negative controls (Fig. 4a). Another band of a larger size (~3 - 4 kb) was observed in the patient but absent in the negative controls (Fig. 4a). The larger aberrant fragment was extracted and sequenced with M13 forward and reverse primers. Sequence analysis of the extracted aberrant fragment revealed a targeted duplication of 19 bps, with the location of the insertion at c.1972, and part of the inserted sequence (660 bp) in an antisense orientation with respect to the MSH2 transcription direction (Fig. 4b, 4c). The Repeat Masker (http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker ) indicated the inserted sequence belongs to an SVA element. To map the inserted sequence, we performed a BLAST search (https://blast.ncbi.nlm.nih.gov/Blast.cgi, human genome GRCh38.p12 reference, Annotation Release 109). A total of 27 hits covered almost all chromosomes, except chromosome 16, 18 and Y. The first alignment showed close to 100% homology between the 559 inserted sequence bp (except one nucleotide) and a region on chromosome 3 (Chr3: 48,210,600 - 48,211,258). The Repeat Masker indicated that an SVA repeat was present at this location.
The sequence from 48,210,600 to 48,220,000 in chromosome 3 was retrieved from the NCBI database and sequencing primers were designed based on the retrieved sequence. Apart from a short gap in the VNTR region that could not be sequenced due to the repetitive structure, we were able to decipher a total of 2,937 bp inserted sequence without including the polyA tail in the antisense strand (Fig. 5a). The inserted sequence identified in our proband starts with a guanine nucleotide followed by a 295 bp exon 1 of the MAST2 gene, an Alu-like element (37 bp), an approximately 2.2 kb VNTR region with tandem repeats ranging from 37 to 54 bp long, a SINE-R region (492 bp), the putative polyadenylation signal AATAAA, and a long polyA tail. The inserted sequence is followed by a target site duplication (TSD) of 19 bps (MSH2 c.1954_1972) at the 3’end of the insertion (Fig. 4c, 5a). Interestingly, the inserted sequence is very close to the sequence from 48,210,600 to 48,213,111 in chromosome 3, except that the insertion has a longer VNTR (Fig. 5a, 5b). Further analysis characterized it as a human specific SVA subfamily of retrotransposons termed SVA_F1 that contains a MAST2 5’ transduction group and is a fusion of MAST2 exon 1 containing CpG island and a 5’-truncated SVA (Bantysh & Buzdin, 2009; Damert et al., 2009; Hancks, Ewing, Chen, Tokunaga, & Kazazian, 2009) (Fig. 5c, 5d). Seventy-six members have been identified in the SVA_F1 subfamily in the human genome. In 96% of SVA_F1 members, the SVA element insert starts with a guanine residue (Bantysh & Buzdin, 2009) and the SVA_F1 insertion in this case also starts with a ‘G‘ (Fig. 4c, 5a).
The SVA insertion in MSH2 exon 12 likely occurred through LINE-1-mediated retrotransposition as it exhibits several classical features of this process (Hancks et al., 2009; Raiz et al., 2012) as shown in Fig. 4c and 5a: (1) insertion at consensus LINE-1 endonuclease cleavage site 5’-TTTT/AA-3’ (where “/”denotes the cleavage site); (2) the presence of a direct repeat TSD of 19 bp in length, within the size range of 4–20 bp that is typical for LINE-1 mediated retrotranspositions; (3) a long polyA tail preceded by the putative polyadenylation signal AATAAA; and (4) presence of 5’ transducing and truncation, a structural variation encompassing more than 8% of all SVA elements in the human genome (Damert et al., 2009; Raiz et al., 2012; Wang et al., 2005).
In summary, we describe here for the first time an SVA insertion into the coding sequence of MSH2 mediated by LINE-1 protein machinery. Precise location of SVA insertion and determination of the specific SVA sequence in the MSH2 gene are important for cancer management to guide genetic testing of family members and potentially preimplantation genetic testing. Furthermore, cancer affected family members identified to have Lynch Syndrome may further benefit from immune checkpoint inhibitors which are FDA-approved for MMR deficient and MSI-H tumors, the hallmark of Lynch Syndrome associated tumors. Therefore, identification and characterization of the SVA elements and their roles in cancer predisposition genes paved the path for genomic precision medicine and cancer prevention and therapy.