Materials and Methods
Plant strains used
Plant growth conditions
RNA-seq: High resolution time series
Quant-seq: Low resolution time series
Bioinformatics analysis of RNA-seq data
The Illumina RNA-seq data was analysed using the same bioinformatics pipeline used in \cite{Ezer2017}, using Trimmomatic-0.32 to trim reads \cite{Bolger2014}, Tophat for mapping to the TAIR10 annotated genome \cite{Trapnell2009}, HTseq-count to find the raw counts after duplicates were removed \cite{Anders2015}, and Cufflinks to calculate Fragments Per Kilobase Million (FPKM), which was then converted into Transcripts Per Million (TPM) \cite{Trapnell2013}. The time points for the Lexogen Quant-seq experiment were selected using NITPicker \cite{Ezer2019}.
The Lexogen Quant-seq data was analysed using the Integrated Data Analysis Pipeline on Bluebee® platform, which maps the reads to the genome using the STAR Aligner \cite{Dobin2013} and counts reads using HTseq-count \cite{Anders2015}. Quant-seq expression values are expressed as Reads per Million (RPM), because Quant-seq does not require normalisation by gene length.
Clustering
Clustering of the 24-hour time series (Figure 1) were drawn using hierarchical clustering (default parameters of hclust in R).
Clustering of the high resolution time gene expression time series data (in FIgure 2) was performed using the CLUST algorithm \cite{Abu-Jamous2018}, using recommended settings for RNA-seq TPM data as per the reference manual (i.e. log2, Z- and quantile normalisation of TPM values).
The clustering in Figure 2A was performed only on the 22oC high resolution time series data. The other gene expression time series in Figure 2A and 2B were drawn using the same gene order. The z-scores were calculated across all samples within each row of Figure 2A and Figure 2B.
Gene list curation for network inference
A number of criteria were used to generate a gene list for performing network inference.
Firstly, all lowly or non-expressing genes were not included in the network inference analysis. The criteria for removal each sample were: [1] rowSum(RPM_jt) < 7 x nt_j x 1.05 (RPM_jt is RPM value and n^T_j the total number of time-points for gene j at timepoint t) and [2] genes that had <5 time-points where the TPM_j < 7. Genes with at least one time-point containing TPM = 0 were removed (requirement of dynGENIE3 package).
The final gene list used for network inference was obtained from the following sources: (1) GO categories, (2) consensus cluster and (3) DE analysis. A gene only needed to meet one of these three criteria in order to be included in the analysis.
First, all genes that had GO categories that were of biological interest to us were initially included. Specifically, this referred to genes that had Biological Process GO terms that included the words 'stress', 'light', 'auxin', 'abscisic', 'ethylene', 'circadian', as well as all genes that had the Molecular Function 'DNA binding').
Secondly, we selected for genes that had similar expression patterns as other genes in the data set. The reason we chose this criteria was that we did not want to include lots of genes that had extremely noisy patterns of expression. Clustering of the time-course gene expression was performed for each sample Col-0, Ler, prr579, phyAB, HsfQk at 22oC using the CLUST algorithm using recommended settings. Recall that the CLUST algorithm filters genes that do not cluster well with any of the clusters. Any gene that appeared in any of the clusters detected by the CLUST algorithm were included in the analysis.
Thirdly, differential expression analysis of of gene was performed based on the time-course using the odp method from the package edge (R Bioconductor) \cite{Storey2005}. Significant genes were chosen based on having an adjusted p-value (q-value) < 0.05. The following WT and mutant pairs at 22oC were compared: Col-0 vs prr579, Col-0 vs hsfQk and Ler vs phyAphyB.
This analysis produced a gene list of length 6795, which was too expansive for dynGenie3. The filtered gene list was ranked based on decreasing CV (sd/mean) with the top 1500 chosen for network inference.
Network inference and analysis