Materials and Methods

BSC sampling and incubation conditions

BSC samples were taken from the Green Butte site near Moab, Utah as previously described (site “CP3”; latitude N 38\(^{\circ}\)44’55.1“, longitude W 109\(^{\circ}\)44’37.1”; \citealt{BERALDI_CAMPESI_2009}). All samples were from early successional ’light’ crusts as described by \citep{15643930}. Early successional BSC samples (37.5 cm\(^{2}\), average mass 35 g) were incubated in sealed chambers under controlled atmosphere and in 16 h light / 8 h dark for 4 days. Crusts were sampled and transported while dry and wetted at initiation of the experiment. Water was added to each sample to fully saturate the soil, but avoid visible ponding. The samples were then placed in air-tight sealed incubation containers for the rest of the experiment, so that soil and atmosphere remained saturated through the incubation period. The water was amended with calcium bicarbonate to yield a final concentration of 3 mM, so that autotrophy could proceed unimpeded. The control treatment received a headspace of air and the experimental treatment received a headspace containing \(^{15}\)N\(_{2}\) (\(>\)98% atom \(^{15}\)N\(_{2}\)). \(^{15}\)N\(_{2}\) (100%) gas was purchased from Sigma-Aldrich (St. Louis, MO). We used a composition of 75% \(^{15}\)N\(_{2}\) in helium for the initial incubation headspace. Four crust samples were treated and incubated (two control and two experimental). One control/experimental crust pair was collected at day 2 and the other at day 4. Acetylene reduction rates were measured daily. Acetylene reduction rates increased over the course of the experiment (0.8, 4.8, 8.8, and 14.5 \(\mu\)moles m\(^{-2}\) hr\(^{-1}\) ethylene for days 1 through 4, respectively).

DNA extraction

DNA was extracted for DNA-SIP at 2 and 4 days. DNA was extracted from 1 g of BSC. DNA from each sample was extracted using a MoBio (Carlsbad, CA) UltraClean Mega Soil DNA Isolation Kit (following manufacturer’s protocol, but lysis was done as previously described \citep{Strauss2011}), and then gel purified to select high molecular weight DNA (\(>\)4 kb) using a 1% low melt agarose gel and \(\beta\)-agarase I for digestion (manufacturer’s protocol, New England Biolabs, M0392S). Extracts were quantified using PicoGreen nucleic acid quantification dyes (Molecular Probes).

Formation of CsCl equilibrium density gradients

CsCl gradient fractionation was used to separate the DNA into 36 gradient fractions on the basis of buoyant density. CsCl density gradients were formed in 4.7 mL polyallomer centrifuge tubes filled with gradient buffer (15 mM Tris-HCl, pH 8; 15 mM EDTA; 15 mM KCl) which contained 1.725 g mL\(^{-1}\) CsCl. CsCl density was checked with a digital refractometer as described below. A total of 2.5-5.0 \(\mu\)g of DNA was added to each tube, and the tubes mixed, prior to centrifugation. Centrifugation was performed in a TLA-110 fixed angle rotor (Beckman Coulter) at 20\(^{\circ}\)C for 67 hours at 55,000 rpm. \citep{17369332}. Centrifuged gradients were fractionated from bottom to top in 36 equal fractions of 100 \(\mu\)L, using a syringe pump as described previously \citep{17369332}. The density of each fraction was determined using an AR200 refractometer modified to accommodate 5 \(\mu\)L samples as described previously \citep{17369332}. DNA in each fraction was desalted on a filter plate (PALL, AcroPrep Advance 96 Filter Plate, Product Number 8035), using four washes with 300 \(\mu\)L TE per fraction. After each wash, the filter plate was centrifuged at 500 x g for 10 minutes, with a final spin of 20 minutes. Purified DNA from each fraction was resuspended in 50 \(\mu\)L of TE buffer.

PCR, library normalization and DNA sequencing

To characterize the distribution of SSU rRNA genes across density gradients, SSU rRNA gene amplicons were generated from 20 gradient fractions per gradient for both unlabeled controls and \(^{15}\)N\(_{2}\) labeled samples. The 20 fractions analyzed are those expected to contain DNA (both labeled and unlabeled) having buoyant density in the range of 1.66 g mL\(^{-1}\) to 1.77 g mL\(^{-1}\). Barcoded PCR of bacterial and archaeal SSU rRNA genes was carried out using primer set 515F/806R \citep{21349862} (primers purchased from Integrated DNA Technologies). The primer 806R contained an 8 bp barcode sequence, a “TC” linker, and a Roche 454 B sequencing adapter, while the primer 515F contained the Roche 454 A sequencing adapter. Each 25 \(\mu\)L reaction contained 1x PCR Gold Buffer (Roche), 2.5 mM MgCl\(_{2}\), 200 \(\mu\)M of each of the four dNTPs (Promega), 0.5 mg mL\(^{-1}\) BSA (New England Biolabs), 0.3 \(\mu\)M of each primers, 1.25 U of Amplitaq Gold (Roche), and 8 \(\mu\)L of template. Each sample was amplified in triplicate. Thermal cycling occurred with an initial denaturation step of 5 minutes at 95\(^{\circ}\)C, followed by 40 cycles of amplification (20 s at 95\(^{\circ}\), 20 s at 53\(^{\circ}\), 30 s at 72\(^{\circ}\)), and a final extension step of 5 min at 72°C. Triplicate amplicons were pooled and purified using Agencourt AMPure PCR purification beads, following manufacturer’s protocol. Once purified, amplicons were quantified using PicoGreen nucleic acid quantification dyes (Molecular Probes) and pooled together in equimolar amounts. Samples were sent to the Environmental Genomics Core Facility at the University of South Carolina (now Selah Genomics) where they were run on a Roche FLX 454 pyrosequencing machine (FLX-Titanium platform).

Data analysis

Sequence quality control

Sequences were initially screened by maximum expected errors at a specific read length threshold \citep{23955772} and this has been shown to be as effective as denoising with respect to removing pyrosequencing errors. Specifically, reads were first truncated to 230 nucleotides (nt) (all reads shorter than 230 nt were discarded) and any read that exceeded a maximum expected error threshold of 1.0 was removed. After truncation and max expected error trimming, 91% of original reads remained. Forward primer and barcode were then removed from the high quality, truncated reads. Remaining reads were taxonomically annotated using the “UClust” taxonomic annotation framework in the QIIME software package \citep{20383131, 20709691} with cluster seeds from Silva SSU rRNA database \citep{17947321} 97% sequence identity OTUs as reference (release SSU Ref 111). Reads annotated as “Chloroplast”, “Eukaryota”, “Archaea”, “Unassigned” or “mitochondria” were removed from the dataset. Finally, reads were aligned to the Silva reference alignment provided by the Mothur software package \citep{19801464} using the Mothur NAST aligner \citep{16845035}. All reads that did not align to the expected amplicon region of the SSU rRNA gene were discarded. Quality control parameters removed 34,716 of 258,763 raw reads. Raw sequences have been uploaded to MG-RAST (MG-RAST ID 4603397.3).

Sequence clustering

Sequences were distributed into OTUs using the UPARSE methodology \citep{23955772}. Specifically, OTU centroids (i.e. seeds) were identified using USEARCH on non-redundant reads sorted by count. The sequence identity threshold for establishing a new OTU centroid was 97%. After initial OTU centroid selection, select SSU rRNA gene sequences from \citet{Yeager} were added to the centroid collection. Specifically, \citet{Yeager} Colorado Plateau or Moab, Utah sequences were added which included the SSU rRNA gene sequences for Calothrix MCC-3A (accession DQ531700.1), Nostoc commune MCT-1 (accession DQ531903), Nostoc commune MFG-1 (accession DQ531699.1), Scytonema hyalinum DC-A (accession DQ531701.1), Scytonema hyalinum FGP-7A (accession DQ531697.1), Spirirestis rafaelensis LQ-10 (accession DQ531696.1). Original centroid sequences that matched selected \citet{Yeager} (above) sequences with greater than to 97% sequence identity were subsequently removed from the centroid collection. With USEARCH/UPARSE, potential chimeras are identified during OTU centroid selection and are not allowed to become cluster centroids effectively removing chimeras from the read pool. All quality controlled reads were then mapped to cluster centroids at an identity threshold of 97% again using USEARCH. A total of 95.6% of quality controlled reads could be mapped to centroids. Unmapped reads do not count towards sample counts and were removed from downstream analyses. The USEARCH software version for cluster generation was 7.0.1090. \citet{Garcia_Pichel_2013} and \citet{Steven_2013} sequences were quality screened by alignment coordinates (described above) and included as input to USEARCH for OTU centroid selection and subsequent mapping to OTU centroids.

Phylogenetic analysis

Alignment of SSU rRNA genes was done with SSU-Align which is based on Infernal \citep{Nawrocki_2013, 19307242}. Columns in the alignment that were not included in the SSU-Align covariance models or were aligned with poor confidence (less than 95% of characters in a position had posterior probability alignment scores of at least 95%) were masked for phylogenetic reconstruction. Additionally, the alignment was trimmed to coordinates such that all sequences in the alignment began and ended at the same positions. FastTree \citep{20224823} was used to build the tree.

Identifying OTUs that incorporated \(^{15}\)N into their DNA

DNA-SIP is a culture-independent approach towards defining identity-function connections in microbial communities . Microbes are identified on the basis of isotope assimilation into DNA. As the buoyant density of a macromolecule is dependent on many factors in addition to stable isotope incorporation (e.g. G+C-content in nucleic acids \citep{25139123}), labeled nucleic acids from one microbial population may have the same buoyant density as unlabeled nucleic acids from another. Therefore, it is imperative to compare results of isotopic labelling to results obtained with unlabeled controls where everything mimics the experimental conditions except that unlabeled substrates are used. By contrasting heavy gradient fractions from isotopically labeled samples relative to corresponding fractions from controls, the identities of microbes with labeled nucleic acids can be determined

We used an RNA-Seq differential expression statistical framework \citep{Love_2014} to find OTUs enriched in heavy fractions of labeled gradients relative to corresponding density fractions in control gradients (for review of RNA-Seq differential expression statistics applied to microbiome OTU count data see \citealt{24699258}). We use the term “differential abundance” (coined by \citealt{24699258}) to denote OTUs that have different proportion means across sample classes (in this case the only sample class is labeled:control). CsCl gradient fractions were categorized as “heavy” or “light”. The heavy category denotes fractions with density values above 1.725 g mL\(^{-1}\). Since we are only interested in enriched OTUs (labeled versus control), we used a one-sided Wald-test to test the statistical significance of regression coefficients (the null hypothesis is that the labeled:control fold enrichment for an OTU is less than a selected threshold). We independently filtered out sparse OTUs prior to P-value correction for multiple comparisons. The sparsity threshold was set to the value which maximized the number of p-values under a false discovery rate (FDR) the specific sparsity threshold was 0.3 meaning that an OTU not found in at least 30% of heavy fractions (control and labeled gradients) in a given day were not considered further and not included in P-value adjustment for multiple comparisons. P-values were corrected with the Benjamini-Hochberg method \citep{citeulike:1042553} and a FDR of 0.10 was applied (this rate is the typical FDR threshold adopted during RNASeq analysis). We selected a log\(_{2}\) fold change null threshold of 0.25 (or a labeled:control fold enrichment of 1.19). DESeq2 was used to calculate the moderated log\(_{2}\) fold change of labeled:control proportion means and corresponding standard errors for the Wald-test (above). Fold change moderation allows for reliable ranking such that high variance and likely statistically insignificant fold changes are appropriately shrunk and subsequently ranked lower than they would be as unmoderated values. Those OTUs that exhibit a statistically significant increase in proportion in heavy fractions from \(^{15}\)N\(_{2}\)-labeled samples relative to corresponding controls have increased significantly in buoyant density in response to \(^{15}\)N\(_{2}\) treatment; a response that is expected for N\(_{2}\)-fixing organisms.

We also assessed the consistency of enrichment between time points by including the interaction of day and label:control in a DESeq2 generalized linear model. The interpretation of the interaction coefficient is the change in OTU enrichment per unit time. P-values for the interaction coefficient were adjusted for all OTUs that passed the sparsity threshold in the label versus control comparison (above) and we used the default null model that the coefficient equaled zero. Additionally, we assessed fold change between labeled and control gradient heavy fractions after pooling day 2 and day 4 data when treating the different time points as replicates. The same null model as the label versus control comparison (above) was used in this replicate analysis (log\(_{2}\) fold change in abundance between label and control is less than or equal to 0.25). We included all OTUs that passed sparsity based independent filtering at either day (above) for p-value adjustment in the replicate analysis.

Community and Sequence Analysis

BLAST searches were done with the “blastn” program from BLAST+ toolkit \citep{20003500} version 2.2.29+. Default parameters were always employed and the BioPython \citep{19304878} BLAST+ wrapper was used to invoke the blastn program. Pandas \citep{citeulike:11241428} and dplyr \citep{dplyr} were used to parse and manipulate BLAST output tables.

Principal coordinate ordinations depict the relationship between samples at each time point (day 2 and 4). Bray-Curtis distances were used as the sample distance metric for ordination. The Phyloseq \citep{24699258} wrapper for Vegan \citep{vegan} (both R packages) was used to compute sample values along principal coordinate axes. GGplot2 \citep{ggplot2} was used to display sample points along the first and second principal axes. Adonis tests \citep{Anderson_2001} were done with default number of permutations (1000).

Rarefaction curves were created using bioinformatics modules in the PyCogent Python package \citep{Knight_2007}. Parametric richness estimates were made with CatchAll using only the best model for total OTU estimates \citep{BUNGE_2010}.

All code to take raw sequencing data through the presented figures (including download and processing of literature environmental datasets) can be found at:

https://github.com/chuckpr/NSIP_data_analysis