WGD-2 causes duplication of SrCYP76AK6–8 genes and site-specific mutations in molecular docking sites
Carnosic acid and carnosol are the primary diterpenes in S. rosmarinus leaves, the biosynthesis of them have been elaborated. These compounds are derived from precursors (IPP and DMAPP) through MEP pathway in the plastids, and are catalyzed by downstream genes including diterpene synthases and cytochrome P450. In S. rosmarinus genome, we identified three genes encoding SrCYP76AK6 , two encodingSrCYP76AK7 , and two encoding SrCYP76AK5 on pseudochromosome 11. All of these genes were clustered within a 0.33 Mb region (Figure 6d), and one, four, and one homologous gene were identified in the syntenic positions in S. miltiorrhiza , S. splendens , and S. baicalensis , respectively (Figure 5e). These findings suggest that substantial duplication of SrCYP76AK5, SrCYP76AK6 and SrCYP76AK7 occurred on pseudochromosome 11 after speciation of S. rosmarinus. Moreover, SrCYP76AK5-2 ,SrCYP76AK6-1 , and SrCYP76AK6-2 are highly expressed in rosemary leaves, with expression levels 6.59-fold, 5.64-fold, and 6.25-fold higher than in roots, respectively (Figure 7d). Therefore, the clustering, expansion, and high expression of the genes encodingSrCYP76AK5 , SrCYP76AK6 and SrCYP76AK7 might have contributed to the accumulation of carnosol in S. rosemarinus .
In addition to the SrCYP76AK genes on pseudochromosome 11, we have also identified one SrCYP76AK5 gene and oneSrCYP76AK8 gene on pseudochromosome 3. Our analysis of the evolutionary trajectory for the chromosomes of S. rosemarinussuggested that the duplication of CYP76AK8 occurred as result of the WGD-2 and subsequently underwent chromosomal rearrangements and fusions on pseudochromosomes 3 and 11, respectively (Figure S29 b). The Ks values between homologous gene pairs (SrCYP76AK5-1 vsSrCYP76AK5-2 and SrCYP76AK5-1 vs SrCYP76AK6-2) were all close to Ks value of WGD-2, indicating their duplication occurred during this event, and then SrCYP76AK6-2 replicated toSrCYP76AK7-2 . The duplication of SrCYP76AK7-1 toSrCYP76AK7-2 and SrCYP76AK6-1 to SrCYP76AK6-2occurred close to the present (Table S31). We hypothesize thatSrCYP76AK5-1 and SrCYP76AK8-1 on Chr3 were copied to Chr11 during the event of WGD-2, following a tandem duplication occurred recently on Chr11. It was followed by replications ofSrCYP76AK6-2 and SrCYP76AK6-3 to form the cluster of sixSrCYP76AK copies. Moreover, further duplications of chromosome fragments led to the clustering of six SrCYP76AK6–8 genes on pseudochromosome 11 within a 0.33 Mb region (Figure 7e). This is supported by our phylogenetic analysis of the proteins encoded by homologous gene pairs (Figure S30), and Ks calculations (Table S29).
To gain a comprehensive understanding of the evolution of CYP76AKsubfamily, we examined the proteins encoded by CYP76AK1 andCYP76AK6-8 in 24 different species and extracted a total of 18 protein sequences, mainly from Salvia species. Using Ocimum basilicum CYP76 gene as an outgroup, a maximum likelihood (ML) tree of CYP76 genes were reconstructed (Figure S29 a). The phylogenetic relationships revealed that the proteins encoded by CYP76AK1s ,CYP76AK2s , CYP76AK3s , CYP76AK5s , CYP76AK6s ,CYP76AK7s and CYP76AK8s align into four distinct groups, respectively. The evolutionary tree of the CYP76AK subfamily showed two clades, the clade of the gene encoding the CYP76AK3and CYP76AK7 was sister to the clade of CYP76AK5 ,CYP76AK6 , CYP76AK8 , CYP76AK1 and CYP76AK2. CYP76AK6, CYP76AK1 and CYP76AK2 did not form the independent clade, which indicated that CYP76AK6, CYP76AK1 andCYP76AK2 were evolved from CYP76AK8, CYP76AK1 andCYP76AK2 were evolved from CYP76AK6
We observed that SrCYP76AK6–8 catalyzed the conversion of 11-hydroxy ferruginol into capraldehyde in S. rosmarinus , whileSmCYP76AK1 catalyzed the production of 11,20-dihydroxy ferruginol in S. miltiorrhiza (Figure 7a). To further investigate the catalytic mechanism of CYP76AK subfamily, we performed homology modeling and molecular docking to infer the key amino acid sites onSmCYP76AK1 and SrCYP76AKs . The latter were highly expressed in leaves of rosemary. Using SmCYP76AH1 (PDBid: 5ym3) structure as a PDB template, we generated 3D models of SmCYP76AK1 and SrCYP76AKs, and docked them to the substrate 11-hydroxy-ferroginol. Our results showed that position C-20 in 11-hydroxy-ferruginol, which docked with SrCYP76AK5-2 , SrCYP76AK6-1 , and SrCYP76AK6-2 , was closer to heme iron than that with SmCYP76AK1 (Figure 6b). This closer proximity may have led to a sequential oxidation reaction at C-20 that resulted in the accumulation of carnosol precursors. We hypothesized that mutations in essential amino acids could result in functional differentiation of CYP76AKs , leading to the accumulation of carnosol in the leaves of S. rosmarinus and tanshinone in the roots of S. miltiorrhiza , respectively.
Furthermore, we investigated amino acid mutations within 8 Å of the active pocket in order to understand their potential influence on the proximity of the ligands to the heme iron (Figure 7c). To identify key residues involved in docking sites, we compared differential amino acid residues within this range between SrCYP76AKs andSmCYP76AK1 , and found. nine candidate residues. We then conducted remodeling and docking experiments by replacing the corresponding residues of SmCYP76AKs with those of SmCYP76AK1 , and vice versa. Specifically, we mutated S445 and I449 of SrCYP76AK5-2 ,SrCYP76AK6-1 , and SrCYP76AK6-2 to I445 and M449, respectively, to mimic SmCYP76AK1 . Conversely, we mutated I445 and M449 of SmCYP76AK1 to S445 and I449, respectively, to mimic SrCYP76AK5-2, SrCYP76AK6-1, and SrCYP76AK6-2. We then docked these remodeled proteins with 11-hydroxy-ferruginol. whereas I445S, M449I withSmCYP76AK1 . The results showed that the co-mutation of S445I and I449M in SrCYP76AK5-2, SrCYP76AK6-1, SrCYP76AK6-2 led to ligands docking away from heme iron at the docking sites, while co-mutation of I445S and M449 in SmCYP76AK1 resulted in docking close to heme iron (Figure S31). Therefore, we hypothesized that S445I and I449M played a significant role in determining the distance of ligand from heme iron, and may have contributed to the functional divergence of SmCYP76AK1 fromSrCYP76AK6–8 . Our findings suggest that these residues are critical for ligand binding and may have important implications for understanding the functional differences between these two enzymes.