Protein Quantification
Mass spectra were processed using “MassPike”, which is a SEQUEST-based
software for quantitative proteomics, developed by Professor Steven Gygi
and colleagues at Harvard Medical School. In MassPike, MS spectra were
converted to mzXML format using an extractor built upon Thermo Fisher’s
RAW File Reader library (version 4.0.26). The standard mzXML format has
been augmented during extraction and conversion, with additional
customisations that are specific to ion trap and Orbitrap mass
spectrometry and essential for TMT quantitation. These customisations
consider ion injection times for each scan, Fourier Transform-derived
baseline and noise values calculated for every Orbitrap scan, isolation
widths for each scan type, scan event numbers, and elapsed scan times.
Mass spectra acquired were searched against a combined protein sequence
database that includes human proteins, HCMV proteins, and possible
protein contaminants that might be introduced to samples. The human
protein Uniprot database was downloaded on 26thJanuary, 2017. An HCMV protein database was assembled from the HCMV
strain Merlin Uniprot database, non-canonical human cytomegalovirus ORFs
described by Stern-Ginossar et al (23180859), and a six-frame
translation of HCMV strain Merlin filtered to include all potential ORFs
of ≥8 residues (delimited by stop-stop rather than requiring ATG-stop).
The database also included common contaminants (bovine serum albumin and
porcine trypsin, and annotated human protein contaminants such as
keratins). Searches were performed using a 20 ppm precursor ion
tolerance. Fragment ion tolerance was set to 1.0 Th.
TMT tags on lysine residues and peptide N termini (229.162932 Da) and
carbamidomethylation of cysteine residues (57.02146 Da) were set as
static modifications, while oxidation of methionine residues (15.99492
Da) was set as a variable modification.
Peptide identification was executed in the order of the ranks using
cross-correlation score (XCorr), as the correctness of peptide spectral
matches (PSMs) decreased along the ranks. A target-decoy strategy was
employed to ensure the quality of peptide identification (Elias and
Gygi, 2007). A decoy database was generated by reversing the sequence of
the composite protein database detailed above. Assignment of peptides
from this decoy database were considered as a “false discovery”, and
peptide identification terminated before the false discovery rate
reached 1%. Correct and incorrect spectral matches were distinguished
from one another using linear discriminant analysis based on several
different parameters including XCorr, the XCorr difference between top
and second possible peptide (ΔCn), precursor mass error, and charge
state.
Protein assembly was performed by principles of parsimony to produce the
smallest set of proteins necessary to account for all observed peptides,
meaning in cases of redundancy, shared peptides were assigned to the
protein sequence with the greatest number of matching unique peptides.
Following fragmentation, each TMT tag produces reporter ions with
specific mass, which were surveyed in low m/z area of the MS3 spectrum.
The maximum intensity nearest to the theoretical m/z of each reporter
ion was used. Proteins were quantified by summing TMT reporter ion
counts across all matching peptide-spectral matches. If a TMT experiment
uses n (number) types of TMT tags, more than n-1 TMT channels missing
and/or a combined signal-to noise ratio of less than 25n across all TMT
reporter ions were considered poor quality of MS3 spectra. PSMs with
poor or no MS3 spectra were excluded from quantitation. Protein
quantitation values were exported for further analysis in Excel. The
method of significance A was used to estimate the p-value that each
ratio was significantly different to 1 using Perseus version 1.5.1.6.
Values were adjusted for multiple hypothesis testing using the method of
Benjamini-Hochberg (Cox and Mann, 2008).