Ethics
The human brain samples sourced from the human brain tissue repository were collected post-mortem following written, informed consent from next of kin of the deceased, and their use for research was approved by the NIMHANS Institutional Ethics Committee. All canine brain tissues used in this study were collected at necropsy from dogs that died of suspected rabies, through the routine rabies surveillance programme conducted by Mission Rabies, a non-governmental organization and the Department of Animal Husbandry and Veterinary Services, Government of Goa and sent to our rabies referral laboratory for diagnostic confirmation. Therefore, approval from Institutional Animal Ethics Committee was not required. The study protocol was approved by the NIMHANS Institutional Ethics Committee (NIMH/DO/ETHICS SUB-COMMITTEE MEETING/2017, dated June 19, 2017).
RNA extraction
Approximately, 300 mg of each brain tissue was cut into small pieces with a sterile blade, resuspended in 1 ml of nuclease free water and homogenized. The homogenate was used for nucleic acid purification using QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany #52904). Briefly, on column DNase digestion (NEB #M0303L) was performed for 30 min at room temperature following lysis and flow through on Qiagen column. This was followed by the wash steps as recommended by the manufacturer. RNA was eluted in 50 µl of elution buffer and stored at -80◦C until use. 8µl of RNA from each sample was taken for cDNA preparation using LunaScript RT SuperMix Kit (NEB#E3010L).
Amplicon primer design and sequencing
All the samples used in the study were positive for RABV by real time reverse transcriptase PCR (Ct values 14-25). We used an amplicon-based sequencing method using tiling primer-based approach for whole genome sequencing using Primal Scheme 29 to recover complete RABV genomes from the samples. A set of 5 reference sequences belonging to the arctic lineage from datasets curated and filtered in RABV-GLUE were chosen to represent the known circulating RABV diversity in India. The primer sets included 38 pairs with amplicons of about 400 base pairs (bp) spanning the whole genome except the 5’ and the 3’UTR (Supplementary Table S1). The primer sets used in this study can detect the Indian subcontinent as well as all other Arctic lineages. The primer coverage for other RABV lineages reported in India is provided in Supplementary Figure-1. PCR was performed by pooling overlapping primers into different pools to prevent preferential amplification of short fragments between adjacent primer pairs. The final concentration of 10 mM of primers was used for PCR. The resulting PCR amplicons were used for preparing libraries for nanopore sequencing using the native barcoding (NBD 104/114) approach combined with the ligation sequencing kit (SQK-LSK109). About 12-24 samples were barcoded and included in a single run. The resulting DNA was cleaned up, quantified on a Qubit 1.0 fluorimeter (Thermo Fisher Scientific), followed by loading onto MinION flow cell (FLO-MIN106) for sequencing (ONT, UK). Acquisition was done using the MinKNOW Software v.1.4.2 and the sequencing run was stopped after 24 hours. The detailed step by step protocol is available atdx.doi.org/10.17504/protocols.io.3byl4jzb8lo5/v1 .
Data analysis
Sequences were basecalled and demultiplexed using guppy (v3.6), and read lengths between 100–600bp were considered for further analysis. Sequencing reads were trimmed by 25 bp on both ends and the specific removal of primer sequences was performed using BBDuk (v38.37). Resulting reads were mapped to LT909541 as this genome was found to have higher similarity to the consensus genomes based on percentage identity, using Minimap2 (v2.17) within Geneious Prime (Geneious Prime 2020.0.3).
For each sample, a consensus genome was generated with a minimum coverage of 10× for each base and by calling the most frequently occurring base at each position. Consensus sequences were manually examined, edited, and aligned to the reference genome to ensure the correct reading frame, followed by transfer of annotations from the reference sequence.
Phylogenetic analysis
Phylogenetic trees were constructed using Maximum likelihood method with the complete coding sequences. The available RABV whole genome rabies sequences from all around the globe were retrieved from the NIAID Virus Pathogen Database and Analysis Resource (ViPR) through the web site at http://www.viprbrc.org/and used for the analysis. Sequences were uploaded to RABV-GLUE for the lineage determination.
All the sequences were aligned using MUSCLE algorithm (Edgar 2004) in Aliview program (V.2018). Only the sequences adequately covering the regions of interest (n=55) were taken for further analysis. A maximum likelihood tree was constructed with 1000 bootstraps, with GTR+I+G4 method based on Bayesian information criteria for whole genome sequences using the iqtree (v1.3.11.1) software. The tree was visualized using Figtree (v1.4.2) 30.
Phylogeny and divergence estimation
To infer the phylogenetic relationship and divergence time between Indian genomes generated in this study and other known rabies genomes, we created a dataset with 40 whole genomes from neighboring and other south Asian countries; Nepal, Pakistan, Bangladesh, Afghanistan, China and Sri Lanka. Sequences were aligned using MUSCLE 31in MEGA 7.0.26 32 and missing positions from both end of the alignment were trimmed to 11700 bp. This alignment was used to infer phylogenetic relationships and divergence estimation using BEAST 1.8.4 33. Sequence isolation dates were used to perform tip dating and calibrate the phylogenies. The General Time-Reversible (GTR) substitution with gamma (G) heterogeneity model with a strict molecular clock were used for the analysis. Each phylogeny was run for 3x108 Markov Chain Monte Carlo (MCMC) cycles. Effective sample sizes (ESS) and parameter convergence of MCMC cycles were assessed using Tracer 1.6 34. A maximum clade credibility consensus tree was generated after 25% burnin. The resultant phylogeny was visualized and annotated using FigTree 1.4.330. The same analysis was separately performed for the south Indian (Karnataka, Tamil Nadu, Andhra Pradesh, and Goa) genomes generated in this study to understand their phylogenetic relationship and divergence. Internal nodes of the phylogeny were marked and colored based on the Posterior Probability support (PP).
Results
Complete RABV genomes were recovered from all the 20 archived human and canine brain tissue samples (100%) sequenced using an amplicon-based approach in this study. All sequences have been deposited in GenBank; Accession numbers are given in Table 3. The depth and breadth of the sequences and the pairwise identity with the reference genome is given in the Table 2 and Supplementary Figure S2. A median of 0.08 million reads per sample was obtained. Phylogenetic analysis of the complete concatenated coding sequences of 75 RABV from across the world (including 20 sequences from this study) showed that all the RABV sequences obtained from human and canine brain tissues in the study belonged to the Artic-like lineage, sub-lineage AL1a (Figure 1).
Our analysis revealed the phylogenetic position of south Indian sequences to other Arctic-like lineages (AL1a) from India, Nepal, Bangladesh and Pakistan (Figure 2A). All the 20 sequences generated in this study were part of this AL1a lineage which is sister to other Arctic-like lineage-AL1b from Afghanistan (PP =1). These two lineages were estimated to have diverged from an ancestral sequence around 1825 (95% HPD 1696-1895). Within the south Indian states (Figure 2B), rabies sequences showed two strong monophyletic lineages (PP=1, Figure 2C), separated around 1956 (95% HPD 1930-1978). Clustering was not associated with state. The first monophyletic lineage (Figure 2C top) primarily included human sequences from Karnataka (n=5) and Tamil Nadu (n=2) along with sequence EF437215 reported earlier from Karnataka. Canine sequences, CR1 and CR2, collected from Goa were also part of the lineage. All the other canine samples were part of the second monophyletic lineage (Figure 2C bottom) which clustered with three human samples (one from Andhra Pradesh and two from Karnataka). However, we observed CR3, a canine sequence from this lineage, clustering with the first lineage in the larger phylogeny (Figure 2A). Since the posterior probability support of the CR3 divergence in the second phylogeny is low (PP=0.65), this minor topological variation will not affect our result interpretation of divergence estimation.