Abstract
The
continuous advancements in LC-MS/MS proteomics over the past decades
have paved the way for transformative changes in the field of medicine,
particularly in the realms of preventive and personalized healthcare.
Many new algorithms are evaluated on unknown proteomes and using
databases with annotated MS2-spectra. When the
research is focused on MS1-spectra, such databases are
not available yet. Specifically, we propose a comprehensive workflow to
extract MS1 isotope distributions from spectra, which
we validated using a proteomics standard kit comprising known proteins
at varying concentrations in duplicate. Our workflow incorporated a
database search utilizing a state-of-the-art algorithm at 1% FDR.
Through this approach, we investigated the impact of protein
concentration on the probability of protein identification. Confidently
identified PSMs were used to extract the MS1 isotope
distributions through the proposed workflow. A total of 138.111
MS1 isotope distributions were extracted. Isotope
distributions with 2 or more peaks were compared with their theoretical
isotope distributions using the spectral angle. A median spectral angle
of 0,101 and 0,0992 was observed in both samples indicating a high
similarity. The findings from this study were compiled into a dataset
which can potentially facilitate the development of novel tools with a
focus on MS1 data.