Insight into hard clam immune function as revealed by RNA-Seq


Sample Collection
Hard clam (Mercencaria mercenaria) seed from two different broodstock sources were planted in Scudder’s Lane, Massachusetts in Fall 2008. One broodstock was obtained from Barnstable Harbor in Massachusettes and had previously been exposed to a severe outbreak of QPX in . The second broodstock was obtained from Mashpee, Massachusettes where no outbreaks of QPX had been reported. The second broodstock cohort was obtained from Mashpee, Massachusetts where there were no reported incidences of QPX. Seed clams from both broodstock cohorts were planted together in 4 separate plots. The cohorts will be and will be referred to as BARN and MASH, respectively. Shell size and mortality was assessed by sampling clams (n=X) on 5 sampling dates over a 16 month period. In June and August of 2010, 40 clams were harvested from each cohort for histological analysis. In August 2010, gill tissue was removed from a subset of clams (n=16) using sterile procedures and stored in RNAlater. RNA was extracted from the gill tissue using TriReagent (Molecular Research Center) following manufacturers recommended protocol and stored at -80 for RNA-seq analysis.

Histological Analysis
In order to evaluate disease status, clams were histologically examined. [Bushek et al.]

RNA-Seq Sample Preparation and Analysis
Total RNA samples from eight individuals from each cohort (BARN and MASH) were pooled in equal quantity. Samples were enriched for mRNA using the MicroPoly(A) Purist Kit (Ambion). Library preparation and sequencing was conducted by the University of Washington High Throughput Genomics Unit (HTGU) on the SOLiD 4 System (Applied Biosystems).

All sequence analysis was performed with CLC Genomics Workbench version 4.0 (CLC Bio). Initially, sequences were trimmed based on a quality scores of 0.05 (Phred; Ewing, Green, 1998; Ewing et al., 1998) and the number of ambiguous nucleotides (>2 on ends). Sequences smaller than 20 bp were also removed. Quality trimmed reads from both libraries (BARN and MASH) were de novo assembled using following parameters: limit=8, mismatch cost=2, and minimum contig size of 200 bp. Consensus sequences were compared to the UniProtKB/Swiss-Prot database ( in order to determine putative description. Comparisons were made using the BLASTx algorithm (Altschul et al 1997). Associated Gene Ontology (GO) terms were used to classify sequences based on biological process as well as categorize genes into parent categories (GO slim).

RNA-seq analysis was performed to determine differential gene expression patterns between the BARN and MASH libraries using the de novo assembly as a reference (CLC Genomic Workbench v4.0, CLC Bio). Expression values were measured in RPKM (reads per kilobase of exon model per million mapped reads, see [Mortazavi et al., 2008]). Parameters for RNA-seq analysis included; unspecific match limit = 10, maximum number of mismatches = 2, minimum number of reads = 10. Differentially expressed genes were identified as having > 2 fold change in RPKM expression values and a significance of p<0.05. Statistical comparison of RPKM values between the BARN and MASH libraries was carried out using Baggerley’s test (Baggerly et al., 2003).

Significantly enriched GO terms were identified using the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 ( (Huang et al., 2009a; Huang et al., 2009b). The UniProt accession numbers for differentially expressed genes were uploaded as a gene list while all UniProt accession numbers for annotated contigs were used as a background. Only contigs with a corresponding e-value <1.00E-02 were used in this analysis. Significantly enriched GO terms were identified as those with a p-value of p<0.05. UniProt accession numbers for significantly enriched GO terms were extracted for further analysis. Additional visualization of enriched GO terms was carried out using Revigo (Supek et al. 2011).

Restriction site associated DNA (RAD) marker library preparation
Restriction site associated DNA (RAD) marker libraries were constructed to identify diagnostic markers among cohorts. Genomic DNA was isolated separately from the gill tissue BARN (n=4) and MASH (n=4) clams using DNAzol (Molecular Research Center) as per manufacturers recommendations. Libraries were prepared as described by Miller et al 2007. Briefly samples (n=8) were digested Sbf-1 (New England Biolabs), then each hybridized with a unique barcode, and RAD adapters (PI and P2) were ligated on DNA fragments. Size selection of DNA fragments was achieved by running PCR on a 1% EZ gel (Invitrogen) with E-gel 1 kb Plus DNA ladder followed by purification using the MiniElute gel purification protocol. Subsequent library construction and sequencing was carried out by the University of Washington High Throughput Genomics Unit (HTGU) using the Illumina HiSeq2000 system.

Restriction site associated DNA (RAD) marker library analysis
Initial sequence read processing of RAD tags was carried out as previously described by Miller et (2012). Quality scores were used to remove raw sequencing reads with a probability of sequencing error greater than 10%. Using custom perl scripts (Miller et al. 2012) we then grouped raw sequences reads by individual and removed barcodes and restriction sites for a total sequence read length of 24 base pairs.

Two types SNP analyses were performed including population specific SNP variation characterization and the identification of SNPs that could potentially distinguish populations. In order to examine population specific SNP variation quality trimmed reads from each cohort (BARN and MASH) were assembled independently using the following parameters; limit = 8, and mismatch cost = 2 (Genomics Workbench 4.0; CLC Bio). SNP detection was carried out using the following parameters: maximum gap and mismatch count = 2, minimum average quality = 15, minimum central quality = 20, minimum coverage = 10, minimum variant frequency = 35% (Genomics Workbench 4.0; CLC Bio).

For the second form of SNP analysis Novoindex and Novoalign (Novocraft Technologies) were used to aseemble RAD-tags to identify RAD-tags within a cohort that were identical (lacked any polymorphisms. These “isotigs” from each cohort were then compared by assembling reads and carrying out SNP detection as described above. Any SNP that was identified indicated that the locus is fixed for the individuals in each cohort examined.

Clam Mortality and Disease
Field sampling indicated that over the course of the trials, hard clam size did not significantly differ (Figure 1). QPX was first detected in MASH clams in June 2010 (Table 1) and the prevalence increased from 12.5% in June to 45% in August (Table 2). Likewise, increased mortality was observed starting in June 2010 (Figure 2). No QPX was observed in BARN clams.