This pilot experiment examines if a loss in muscle proteostasis occurs in people with obesity and whether endurance exercise positively influences either the abundance profile or turnover rate of proteins in this population. Men with (n = 3) or without (n = 4) obesity were recruited and underwent a 14-d measurement protocol of daily deuterium oxide (D2O) consumption and serial biopsies of vastus lateralis muscle. Men with obesity then completed 10-weeks of high-intensity interval training (HIIT), encompassing 3 sessions per week of cycle ergometer exercise with 1 min intervals at 100 % maximum aerobic power interspersed by 1 min recovery periods. The number of intervals per session progressed from 4 to 8, and during weeks 8-10 the 14-d measurement protocol was repeated. Proteomic analysis detected 352 differences (p < 0.05, false discovery rate < 5%) in protein abundance and 19 (p < 0.05) differences in protein turnover, including components of the ubiquitin-proteasome system. HIIT altered the abundance of 53 proteins and increased the turnover rate of 22 proteins (p < 0.05) and tended to benefit proteostasis by increasing muscle protein turnover rates. Obesity and insulin resistance are associated with compromised muscle proteostasis, which may be partially restored by endurance exercise.
A historic challenge for shotgun proteomics has been the requirement for high quality, simple and nonredundant curated protein sequences in small .fasta text files. Due to the intrinsic informatic challenges and time required to assemble these files, proteomics has struggled to expand beyond the confines of a few model organisms. When considering post-translational modifications that may or may not be present on a specific peptide sequence, these factors inevitably compound. A study on how mangos continue to ripen on the shelf may not be the first thing you'd think of as proof of a scientific discipline shedding historic limitations. However, Bautiste-Valle et al., may be just that. These authors present a quantitative comparison of both peptide and glycopeptide alterations through the complexity of the fruit ripening process and in this we see the present state of a field that no longer needs to wait on genomics to obtain deep mechanistic insights.
In proteomics, fast, efficient and highly reproducible sample preparation is of utmost importance, particularly in view of fast scanning mass spectrometers enabling analyses of large sample series. To address this need, we have developed the web application MassSpecPreppy that operates on the open science OT-2 liquid handling robot from Opentrons. This platform can prepare up to 96 samples at once, performing tasks like BCA protein concentration determination, sample digestion with normalization, reduction/alkylation and peptide elution into vials or loading specified peptide amounts onto Evotips in an automated and flexible manner. The performance of the developed workflows using MassSpecPreppy was compared with standard manual sample preparation workflows. The BCA assay experiments revealed an average recovery of 101.3% (SD: ±7.82%) for the MassSpecPreppy workflow, while the manual workflow had a recovery of 96.3% (SD: ±9.73%). The species mix used in the evaluation experiments showed that 94.5% of protein groups for OT-2 digestion and 95% for manual digestion passed the significance thresholds with comparable peptide level coefficient of variations. These results demonstrate that MassSpecPreppy is a versatile and scalable platform for automated sample preparation, producing injection-ready samples for proteomics research.
Contaminants derived from consumables, reagents, and sample handling often negatively affect LC-MS data acquisition. In proteomics experiments, they can markedly reduce identification performance, reproducibility, and quantitative robustness. Here, we introduce a data analysis workflow combining MS1 feature extraction in Skyline with HowDirty, an R-markdown-based tool, that automatically generates an interactive report on the molecular contaminant level in LC-MS data sets. To facilitate the interpretation of the results, the HTML report is self-contained and self-explanatory, including plots that can be easily interpreted. The R package HowDirty is available from https://github.com/DavidGZ1/HowDirty. To demonstrate a showcase scenario for the application of HowDirty, we assessed the impact of ultrafiltration units from different providers on sample purity after filter-assisted sample preparation (FASP) digestion. This allowed us to select the filter units with the lowest contamination risk. Notably, the filter units with the lowest contaminant levels showed higher reproducibility regarding the number of peptides and proteins identified. Overall, HowDirty enables the efficient evaluation of sample quality covering a wide range of common contaminant groups that typically impair LC-MS analyses, facilitating taking corrective or preventive actions to minimize instrument downtime.
For decades, molecular biologists have been uncovering the mechanics of biological systems. Efforts to bring their findings together have led to the development of multiple databases and information systems that capture and present pathway information in a computable network format. Concurrently, the advent of modern omics technologies has empowered researchers to systematically profile cellular processes across different modalities. Numerous algorithms, methodologies, and tools have been developed to use prior knowledge networks in the analysis of omics datasets. Interestingly, it has been repeatedly demonstrated that the source of prior knowledge can greatly impact the results of a given analysis. For these methods to be successful it is paramount that their selection of prior knowledge networks is amenable to the data type and the computational task they aim to accomplish. Here we present a five-level framework that broadly describes network models in terms of their scope, level of detail, and ability to inform causal predictions. To contextualize this framework, we review a handful of network-based omics analysis methods at each level, while also describing the computational tasks they aim to accomplish.
Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional tools are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.
Structural characterization of protein interactions is essential for our ability to understand and modulate physiological processes. Computational approaches to modeling of protein complexes provide structural information that far exceeds capabilities of the existing experimental techniques. Protein structure prediction in general, and prediction of protein interactions in particular, has been revolutionized by the rapid progress in Deep Learning techniques. The work of Schweke et al. presents a community-wide study of an important problem of distinguishing physiological protein-protein complexes/interfaces (experimentally determined or modeled) from non-physiological ones. The authors designed and generated a large benchmark set of physiological and non-physiological homodimeric complexes, and evaluated a large set of scoring functions, as well as AlphaFold predictions, on their ability to discriminate the non-physiological interfaces. The problem of separating physiological interfaces from non-physiological ones is very difficult, largely due to the lack of a clear distinction between the two categories in a crowded environment inside a living cell. Still, the ability to identify key physiologically significant interfaces in the variety of possible configurations of a protein-protein complex is important. The study presents a major data resource and methodological development in this important direction for molecular and cellular biology.
Cancer-associated cachexia is a wasting syndrome that results in dramatic loss of whole-body weight, predominantly due to loss of skeletal muscle mass. It has been established that cachexia inducing cancer cells secrete proteins and extracellular vesicles (EVs) that can induce muscle atrophy. Though several studies examined these cancer-cell derived factors, targeting some of these components have shown little or no clinical benefit. To develop new therapies, understanding of the dysregulated proteins and signalling pathways that regulate catabolic gene expression during muscle wasting is essential. Here, we sought to examine the effect of conditioned media (CM) that contain secreted factors and EVs from cachexia inducing C26 colon cancer cells on C2C12 myotubes using mass spectrometry-based label-free quantitative proteomics. We identified significant changes in the protein profile of C2C12 cells upon exposure to C26-derived CM. Functional enrichment analysis revealed enrichment of proteins associated with inflammation, mitochondrial dysfunction, muscle catabolism, ROS production, and ER stress in CM treated myotubes. Furthermore, strong downregulation in muscle structural integrity and development and/or regenerative pathways were observed. Together, these enriched proteins in atrophied muscle could be utilized as potential muscle wasting markers and the dysregulated biological processes could be employed for therapeutic benefit in cancer-induced muscle wasting.
Due to their oftentimes ambiguous nature, phosphopeptide positional isomers can present challenges in bottom-up mass spectrometry-based workflows as search engine scores alone are often not enough to confidently distinguish them. Additional scoring algorithms can remedy this by providing confidence metrics in addition to these search results, reducing ambiguity. Here we describe challenges to interpreting phosphoproteomics data and review several different approaches to determine sites of phosphorylation for both data-dependent and data-independent acquisition-based workflows. Finally, we discuss open questions regarding neutral losses, gas-phase rearrangement, and false localization rate estimation experienced by both types of acquisition workflows and best practices for managing ambiguity in phosphosite determination.
CCOC s a relatively rare subtype of ovarian cancer with high degree of resistance to standard chemotherapy. Little is known about the underlying molecular mechanisms, and it remains a challenge to predict its prognosis after chemotherapy. We analyzed the proteome of CCOC tissue samples from two independent cohorts using DIA-MS. A total of 8697 proteins were characterized in the first cohort (H1 cohort, 32 patients, 35 FFPE samples) and 9409 proteins in the second cohort (H2 cohort, 24 patients, 28 FF samples). After bioinformatics analysis, we narrowed our focus to 15 proteins significantly correlated with RFS in both cohorts. These proteins are mainly involved in DNA damage response, extracellular matrix, and mitochondrial metabolism. We further developed a 13-protein model to predict the prognosis of patients with CCOC in H2 cohort, and validated the model in the H1 cohort in both DIA and PRM data. Finally, we verified the modulated pathways from our CCOC proteomic dataset in several published CCOC transcriptome and proteome datasets. Taken together, this study presents a CCOC proteomic data resource and a promising 13-protein panel which could potentially predict the recurrence and survival of CCOC.
Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise its adoption. In this context, there are numerous improvements that are possible in the area of open science including the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include e.g. increased data sharing practices and availability of tailored open data standards. Additionally, the field would benefit from the development of open analysis workflows that can enable e.g. data reuse of public datasets, something that is increasingly common in other proteomics fields. We present an open and modular platform for the analysis and visualisation of TD proteomics data called TopDownApp. It can be used as a flexible analysis platform, through the use of a common workflow engine, common data formats for input/output, and software containerisation. It can also serve as a tool for visual inspection through its simple setup. As a key point, it can also be used as a development platform for new tools through the use of Python, a modular design, software containerisation and common data formats. TopDownApp is open source and freely available at: https://github.com/mwalzer/TopDownApp.
We propose an updated approach for approximating the isotope distribution of average peptides given their monoisotopic mass. Our methodology involves in-silico cleavage of the entire UNIPROT database of Human reviewed proteins using Trypsin, generating a theoretical peptide dataset. The isotope distribution is computed using BRAIN. We apply a compositional data modelling strategy that utilizes an additive log-ratio transformation for the isotope probabilities followed by a penalized spline regression. Furthermore, due to the impact of the number of Sulphur atoms on the course of the isotope distribution, we develop separate models for peptides containing zero up to five Sulphur atoms. Additionally, we propose three methods to estimate the number of Sulphur atoms based on an observed isotope distribution. The performance of the spline models and the Sulphur prediction approaches is evaluated using a mean squared error and a modified Pearson’s χ² goodness-of-fit measure on an experimental UPS2 data set. Our analysis reveals that the variability in spectral accuracy contributes more to the errors than the approximation of the theoretical isotope distribution by our proposed average peptide model. Moreover, we find that the accuracy of predicting the number of Sulphur atoms based on the observed isotope distribution is limited by measurement accuracy.
Previous studies have established the association of sex with gene and protein expression. This study investigated the association of sex with the abundance of endogenous urinary peptides, using capillary electrophoresis-coupled to mass spectrometry datasets from 2008 healthy individuals and patients with type II diabetes, divided in one discovery and two validation cohorts. Statistical analysis using the Mann-Whitney test, adjusted for multiple testing, revealed 143 sex-associated peptides in the discovery cohort. Of these, 90 peptides were associated with sex in at least one of the validation cohorts and showed agreement in their regulation trends across all cohorts. The 90 sex-associated peptides were fragments of 29 parental proteins. Comparison with previously published transcriptomics data demonstrated that the genes encoding 16 of these parental proteins had sex-biased expression. The 143 sex-associated peptides were combined into a support vector machine-based classifier that could discriminate males from females in two independent sets of healthy individuals and patients with type II diabetes, with an AUC of 89% and 81%, respectively. Collectively, the urinary peptidome contains multiple sex-associated differences, which may enable a better understanding of sex-biased molecular mechanisms and the development of more accurate diagnostic, prognostic or predictive classifiers for each individual sex.
Trans-activation response DNA binding protein of 43kDa (TDP-43) regulates a great variety of cellular processes in the nucleus and cytosol. In addition, a defined subset of neurodegenerative diseases is characterized by nuclear depletion of TDP-43 as well as cytosolic mislocalization and aggregation. To perform its diverse functions TDP-43 can associate with different ribonucleoprotein complexes. Combined with transcriptomics, MS interactome studies have unveiled associations between TDP-43 and the spliceosome machinery, polysomes and RNA granules. Moreover, the highly dynamic, low-valency interactions regulated by its low-complexity domain calls for innovative proximity labeling methodologies. In addition to protein partners, the analysis of posttranslational modifications showed that they may play a role in the nucleocytoplasmic shuttling, RNA binding, liquid-liquid phase separation and protein aggregation of TDP-43. Here we review the various TDP-43 ribonucleoprotein complexes characterized so far, how they contribute to the diverse functions of TDP-43, and roles of post-translational modifications. Further understanding of the fluid dynamic properties of TDP-43 in ribonucleoprotein complexes, RNA granules, and self-assemblies will advance the understanding of RNA processing in cells and perhaps help to develop novel therapeutic approaches for TDPopathies.
Most proteins function by forming complexes within a dynamic interconnected network that underlies various biological mechanisms. To systematically investigate such interactomes, high-throughput techniques including CF-MS have been developed to capture, identify, and quantify protein-protein interactions (PPIs) in large-scale. Compared to other techniques, CF-MS allows the global identification and quantification of native protein complexes in one setting, without genetic manipulation and overexpression. Furthermore, quantitative CF-MS can potentially elucidate the distribution of a protein in multiple co-elution features, informing the stoichiometries and dynamics of a target protein complex. In this issue, Youssef et al. (Proteomics 2023, XX, XXXX-XXXX) combined multiplex CF-MS and an in-house algorithm to study the dynamics of the PPI network for Escherichia coli grown under ten different conditions. While the results demonstrated that while most proteins remained stable, the authors were able to detect disrupted interactions that were growth condition-specific. Further bioinformatics analyses also revealed biophysical properties and structural patterns that govern such a response.
Over the past two decades, there has been increasing research into the molecular composition and function of small extracellular vesicles in the central nervous system. This is due in part to the recognition that small extracellular vesicles likely contribute to the pathogenesis of neurological diseases such as Alzheimer's disease, but also an understanding that small extracellular vesicles are a source of potential biomarkers. Small extracellular vesicles carry specific cargo that reflects their biogenesis and cellular origins, including protein, RNA and lipid. While the protein and RNA content of small extracellular vesicles in the central nervous system diseases and have been studied extensively, our understanding of the lipidome of small extracellular vesicles in the central nervous system is still in its infancy. Lipids play a significant role in maintaining central nervous system structure and function, and the dysregulation of lipid metabolism is known to occur in many neurological disorders, including Alzheimer's disease. Here we review what is currently known about lipid dyshomeostasis in Alzheimer's disease. We propose that small extracellular vesicle lipids may provide insight into the pathophysiology and progression of Alzheimer's disease and other neurological disorders, and, in the future perhaps, aid in disease monitoring and detection.
Cell-derived extracellular vesicles (EVs) are evolutionary-conserved secretory organelles that, based on their molecular composition, are important intercellular signaling regulators. At least three classes of circulating EVs are known based on mechanism of biogenesis: exosomes (sEVs/Exos), microparticles (lEVs/MPs) and shed midbody remnants (sMB-Rs). sEVs/Exos are of endosomal pathway origin, microparticles (lEVs/MPs) from plasma membrane blebbing, and shed midbody remnants (sMB-Rs) arise from symmetric cytokinetic abscission. Here, we isolate sEVs/Exos, lEVs/MPs and sMB-Rs secreted from human isogenic primary (SW480) and metastatic (SW620) colorectal cancer (CRC) cell lines in milligram quantities for label-free MS/MS-based proteomic profiling. Purified EVs revealed selective composition packaging of exosomal protein markers in SW480/SW620-sEVs/Exos, metabolic enzymes in SW480/SW620-lEVs/MPs, while centralspindlin complex proteins, nucleoproteins, splicing factors, RNA granule proteins, translation-initiation factors, and mitochondrial proteins selectively traffic to SW480/SW620-sMB-Rs. Collectively, we identify 39 human cancer-associated genes in EVs; 17 associated with SW480-EVs, 22 with SW620-EVs. We highlight oncogenic receptors/transporters selectively enriched in sEVs/Exos (EGFR/ FAS in SW480-Exos and MET, TGFBR2, ABCB1 in SW620-sEVs/Exos). Interestingly, MDK, STAT1, and TGM2 are selectively enriched in SW480-sMB-Rs, and ADAM15 to SW620-sMB-Rs. Our study reveals sEVs/Exos, lEVs/MPs and sMB-Rs have distinct protein signatures that open potential diagnostic avenues of distinct types of EVs for clinical utility.