Materials and Methods
Data
A publicly available LC-MS/MS experiment using the UPS2-kit was used.
The Universal Proteomics Standard 2 contains 48 human proteins with a
molecular weight ranging from 6.000 to 83.000 Daltons. The proteins have
a dynamic range of concentrations between 0,5 to 50.000 femtomole. The
data is publicly available on the PRIDE repository with identifier
PXD000331 . The dataset contains raw data exclusively from the UPS2-kit,
but also the UPS2-kit in combination with micro-organisms such asMycoplasma pneumoniae , Drosophila melanogaster andLeptospira interrogans . For the purpose of the manuscript, only
the raw data on the UPS2-kit was selected. In the experiment, the
proteins in the UPS2-kit were enzymatically cleaved into peptides using
trypsin. The peptide-mixture was separated using LC for 120 minutes
prior to performing MS/MS with the LTQ Orbitrap Velos. The UPS2-kit was
measured in duplicate, A11-12042.raw and A11-12043.raw. For more
specific information about the experiment, we refer to the original
article of Ahrné et al. .
Database search
Both duplicates were analyzed with the FragPipe graphical user interface
(version 19.1), using the Thermo Fisher .RAW files as input. FragPipe
incorporates the MSFragger database search engine (version 3.7) . The
default workflow was used to process the data except for the following
adjustments. A precursor mass tolerance of ±10 parts per million (ppm)
and a fragment mass tolerance of ±5 ppm was specified.
Carbamidomethylation of Cysteine was set as a fixed modification and
oxidation of Methionine as a variable modification. Trypsin was
specified as the digestion enzyme with up to 2 missed cleavages.
MSBooster and Percolator were used for rescoring the PSM with an FDR of
0.01 using a reverse target-decoy approach. An FDR of 0.01 was selected
to ensure a high-quality benchmark dataset created from the MSFragger
identifications. The results were further investigated using R (Version
4.3.0) and RStudio (Version 2023.3.0.386) .
Benchmark dataset
construction
The general workflow is shown in Figure 1. The Thermo Fisher .RAW files
were converted into mzML-format using MSConvert (version 3.0.23051) .
MSConvert had vendor specific peak picking enabled to centroid the
spectra. The data was processed further using a custom written Python
script (version 3.9) . The Python bindings of OpenMS (version 2.7.0)
were used to process the mzML files , such as selecting the MS1 spectra,
acquiring the peak information, retention times, etc. All PSMs from
MSFragger were used to construct the dataset. The amount of possible
isotopic peaks was set to the monoisotopic peak followed by up to 5
isotopic peaks. It should be noted that this was an arbitrary choice. To
construct the extracted ion chromatogram (XIC), the error margin on the
observed m/z for the PSM was set to 5ppm, and we opted for a 5 second
window before the retention time of the PSM and 30 seconds after. The 5
second window before the retention time of the PSM was selected as 5
seconds was twice the maximum time between two MS1spectra. A window of 30 seconds after the retention time of the PSM was
selected as Ahrné et al. enabled a dynamic exclusion of 30 seconds after
sampling a precursor ion. Hence, it was possible that the peptide was
still present in the following MS1 spectra for 30
seconds without being sampled again. The extracted isotope distributions
with at least 2 peaks were compared with the theoretical isotope
distributions acquired using BRAIN (version 1.44.0) by computing the
spectral angle . The MS1 isotope distribution dataset
with additional metadata was stored as an Excel-file. The algorithms and
code are available on
https://github.com/VilenneFrederique/MS1IsotopeDistributionsDatasetWorkflow.