3 Material and methods
Insect strain
The R. pedestris bean bugs sequenced in this study were
originally collected in a
soybeanZhengqing field in Suzhou, China in 2019. The insects were reared
on soybean plants (strain: Wandou 27) at 26 ± 0.5 °C with humidity of 50
± 5% under a 16/8 h (light/dark) photoperiod. Prior to DNA/RNA
extraction, the colony was inbred following a sib-sib mating protocol
for three generations as previously described (Xue et al. 2015).
Genome sequencing and assembly
Genomic DNA was isolated from three virgin female adults using the
DNeasy Blood & Tissue Extraction Kit (Qiagen Inc., Valencia, CA, USA)
based on the manufacturer’s protocol. After the determination of the DNA
quality and quantity, a paired-end sequencing library (150 bp in length)
was constructed and sequenced using the Illumina NovaSeq6000. In
addition, a 20 kb library was constructed and sequenced on 17 cells
using the SequelTM Sequencing Plate 1.2 on the Pacific
Biosciences Sequel platform at Berry Genomic Corporation in Beijing,
China.
De novo genome assembly was performed using Canu-SMARTdenovo
methods (Schmidt et al. 2017). Correction was performed by Canu with the
default parameters (Koren et al. 2017). The resulting reads were
assembled using SMARTdenovo (-k 21 -J 3000 -t 20) (Hu et al. 2020).
Nextpolish was used for genome polishing (Hu et al. 2020).
Hi-C sequencing and chromosome-length genome assembly
Hi-C libraries were constructed from a pool of fifty
2nd instar nymphs according to the method previously
described (Lieberman-Aiden et al. 2009). Briefly, the samples were fixed
with 1% formaldehyde for 10 min at room temperature. Then, a 2.5
M-glycine solution was added to obtain a final concentration of 0.2 M.
The sample was incubated for 5 min at room temperature and then on ice
for 15 min to stop cross-linking completely. After centrifugation, the
cross-linked sample was sent to Annoroad Gene Technology Co. Ltd,
Beijing, China for sequencing.
The genomic contigs were scaffolded using the 3D de novo assembly
(3D-DNA) pipeline (Dudchenko et al. 2017). Briefly, the Hi-C reads were
aligned to the draft genome assembly using Juicer; a 3D-DNA analysis was
run to generate a candidate assembly; the candidate assembly was
reviewed using Juicebox Assembly Tools (JBAT); and a high-quality
chromosome-length genome assembly was generated after JBAT review.
Chromosomic staining
The salivary glands were dissected from male and female R.
pedestris in a phosphate-buffered saline (PBS) solution (137 mM NaCl,
2.68 mM KCl, 8.1 mM Na2HPO4 and 1.47 mM
KH2PO4 at pH 7.4) under a
stereomicroscope (COIC, Chongqing, China). The samples were fixed in 4%
paraformaldehyde (Sangon Biotech, Shanghai, China) for 1 h. After
washing in PBS 3 times, the samples were triturated mechanically, and 20
μl DAPI solution (Abcam, Cambridge, USA) was added to stain the
chromosomes. Fluorescence images were examined using a Leica confocal
laser-scanning microscope (Leica, Heidelberg, Germany).
Transcriptomic sequencing
Third generation transcriptomic sequencing was performed by Oxford
Nanopore technology (ONT). Four samples, including eggs, nymphs, male,
and female R. pedestris were collected. Oxford PromethION 2D
amplicon libraries were generated according to the Nanopore community
protocol using library preparation kit SQK-LSK109 (Oxford Nanopore,
Oxford, UK) and sequenced on R9 flow cells to generate fast5 files. All
generated fast5 reads were then basecalled in Guppy v3.2.10 with the
default options to yield fastq files (Bolognini et al. 2019). The fastq
reads for each sample were filtered using Nanofilt v2.5.0 with options
-l 100 -q 7 (De Coster et al. 2018). The full length reads were detected
using Pychopper with arguments -b primer.fa -i raw.fq -t 200 -s 98
(https://github.com/nanoporetech/pychopper). Pinfish was run with
default parameters, and finally yielded polished consensus reads. Second
generation Illumina sequencing was performed as we previously described
(Huang et al. 2020). Insect samples from 6 different tissues and 37
different development stages were analyzed.
Genome annotation
Genome annotation was performed using the LoReAn annotation pipeline
(Cook et al., 2019). Briefly, the transcriptomic data generated above
were aligned to the genome using the Program to Assemble Spliced
Alignments (PASA) (Haas et al. 2008) and the Genomic Mapping and
Alignment Program (GMAP) (Wu and Watanabe 2005). For the protein
sequence map, GeneWise (Birney et al. 2004) was employed. For ab initio
gene prediction, SNAP (Korf 2004), Augustus (Stanke et al. 2006), MAKER2
(Holt and Yandell 2011), GeneMark-ET (Lomsadze et al. 2014), and BRAKER1
(Hoff et al. 2016) were employed separately. The Evidence Modeler (EVM)
(Haas et al. 2008) was used to combine the outputs of previous tools to
generate the combined annotation model. PASA was used to update the gene
models by identifying untranslated regions (UTRs) to generate a final
annotation.
Sex chromosome identification
To identify the potential sex chromosome, paired-end sequencing
libraries were constructed for male and female adult R.
pedestris . The resulting DNA reads were separately mapped to the
genomic scaffolds using Bowtie2
(Longmead and Salzberg 2012). Then, Sequence Alignment/Map tolls
(SAMtolls) was used to remove the low-quality mapped reads, and the
mapped reads per million (MRPM) of each chromosome in males and females
were calculated (Li et al. 2009). The sex chromosomes were determined
according to the female/male ratio of MRPM, as previously described (Ma
et al. 2020).
Weighted gene co-expression network analysis (WGCNA)
The WGCNA was performed based on the WGCNA package in R (3.2.2.)
(Langfelder and Horvath 2008). The parameters were set as follows: min
module size = 50 and ME miss thread = 0.15. Only the enrichment index
>0.85 was considered to be significantly clustered.
Phylogenetic analysis and divergence time estimation
Protein data from 12 hemipteran insects (Bemisia tabaci ,Acyrthosiphon pisum , Diaphorina citri , Homalodisca
vitripennis , Laodelphax striatellus , Nilaparvata lugens ,Gerris buenoi , Halyomorpha halys , Oncopeltus
fasciatus , Cimex lectularius , Rhodnius prolixus , andTriatoma rubrofasciata ) and Drosophila melanogaster were
retrieved from the NCBI database, GigaDB database, VectorBase (Table
S1). After filtering redundant alternative splicing events, the protein
dataset containing non-redundant transcripts was used to find the
homologous pairs of sequences by the all-vs-all BLASTp algorithm with an
E-value of <1e-5. The BLASTp result was then converted into a
normalized similarity matrix and processed using OrthoMCL v2.0.9 (Li et
al., 2003). Protein families were identified by Markov chain clustering
(MCL-14-137).
A Phylogenetic tree was constructed using single copy orthologs in each
species (1:1:1 genes identified by OrthoMCL analysis), and D.
melanogaster was used to root the tree. Sequence alignment was
performed by MAFFT (v7.310) (Katoh and Standley 2013), and conserved
amino-acid sites were identified by TrimAl (v1.2rev59)
(Capella-Gutiérrez et al. 2009). We used Model founder in IQ-TREE
(v1.6.10) to determine the best model, and the phylogenetic tree was
constructed using the maximum likelihood method with 1000 bootstraps
(Nguyen et al. 2015). The divergence time was estimated based on fossil
collection records from the Paleobiology Database (www.paleobiology.org)
using the MCMCtree program in PAML v4.9e (Yang 1997).