A historic challenge for shotgun proteomics has been the requirement for high quality, simple and nonredundant curated protein sequences in small .fasta text files. Due to the intrinsic informatic challenges and time required to assemble these files, proteomics has struggled to expand beyond the confines of a few model organisms. When considering post-translational modifications that may or may not be present on a specific peptide sequence, these factors inevitably compound. A study on how mangos continue to ripen on the shelf may not be the first thing you'd think of as proof of a scientific discipline shedding historic limitations. However, Bautiste-Valle et al., may be just that. These authors present a quantitative comparison of both peptide and glycopeptide alterations through the complexity of the fruit ripening process and in this we see the present state of a field that no longer needs to wait on genomics to obtain deep mechanistic insights.
In proteomics, fast, efficient and highly reproducible sample preparation is of utmost importance, particularly in view of fast scanning mass spectrometers enabling analyses of large sample series. To address this need, we have developed the web application MassSpecPreppy that operates on the open science OT-2 liquid handling robot from Opentrons. This platform can prepare up to 96 samples at once, performing tasks like BCA protein concentration determination, sample digestion with normalization, reduction/alkylation and peptide elution into vials or loading specified peptide amounts onto Evotips in an automated and flexible manner. The performance of the developed workflows using MassSpecPreppy was compared with standard manual sample preparation workflows. The BCA assay experiments revealed an average recovery of 101.3% (SD: ±7.82%) for the MassSpecPreppy workflow, while the manual workflow had a recovery of 96.3% (SD: ±9.73%). The species mix used in the evaluation experiments showed that 94.5% of protein groups for OT-2 digestion and 95% for manual digestion passed the significance thresholds with comparable peptide level coefficient of variations. These results demonstrate that MassSpecPreppy is a versatile and scalable platform for automated sample preparation, producing injection-ready samples for proteomics research.
Contaminants derived from consumables, reagents, and sample handling often negatively affect LC-MS data acquisition. In proteomics experiments, they can markedly reduce identification performance, reproducibility, and quantitative robustness. Here, we introduce a data analysis workflow combining MS1 feature extraction in Skyline with HowDirty, an R-markdown-based tool, that automatically generates an interactive report on the molecular contaminant level in LC-MS data sets. To facilitate the interpretation of the results, the HTML report is self-contained and self-explanatory, including plots that can be easily interpreted. The R package HowDirty is available from https://github.com/DavidGZ1/HowDirty. To demonstrate a showcase scenario for the application of HowDirty, we assessed the impact of ultrafiltration units from different providers on sample purity after filter-assisted sample preparation (FASP) digestion. This allowed us to select the filter units with the lowest contamination risk. Notably, the filter units with the lowest contaminant levels showed higher reproducibility regarding the number of peptides and proteins identified. Overall, HowDirty enables the efficient evaluation of sample quality covering a wide range of common contaminant groups that typically impair LC-MS analyses, facilitating taking corrective or preventive actions to minimize instrument downtime.
For decades, molecular biologists have been uncovering the mechanics of biological systems. Efforts to bring their findings together have led to the development of multiple databases and information systems that capture and present pathway information in a computable network format. Concurrently, the advent of modern omics technologies has empowered researchers to systematically profile cellular processes across different modalities. Numerous algorithms, methodologies, and tools have been developed to use prior knowledge networks in the analysis of omics datasets. Interestingly, it has been repeatedly demonstrated that the source of prior knowledge can greatly impact the results of a given analysis. For these methods to be successful it is paramount that their selection of prior knowledge networks is amenable to the data type and the computational task they aim to accomplish. Here we present a five-level framework that broadly describes network models in terms of their scope, level of detail, and ability to inform causal predictions. To contextualize this framework, we review a handful of network-based omics analysis methods at each level, while also describing the computational tasks they aim to accomplish.
Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional tools are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.
Structural characterization of protein interactions is essential for our ability to understand and modulate physiological processes. Computational approaches to modeling of protein complexes provide structural information that far exceeds capabilities of the existing experimental techniques. Protein structure prediction in general, and prediction of protein interactions in particular, has been revolutionized by the rapid progress in Deep Learning techniques. The work of Schweke et al. presents a community-wide study of an important problem of distinguishing physiological protein-protein complexes/interfaces (experimentally determined or modeled) from non-physiological ones. The authors designed and generated a large benchmark set of physiological and non-physiological homodimeric complexes, and evaluated a large set of scoring functions, as well as AlphaFold predictions, on their ability to discriminate the non-physiological interfaces. The problem of separating physiological interfaces from non-physiological ones is very difficult, largely due to the lack of a clear distinction between the two categories in a crowded environment inside a living cell. Still, the ability to identify key physiologically significant interfaces in the variety of possible configurations of a protein-protein complex is important. The study presents a major data resource and methodological development in this important direction for molecular and cellular biology.
Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise its adoption. In this context, there are numerous improvements that are possible in the area of open science including the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include e.g. increased data sharing practices and availability of tailored open data standards. Additionally, the field would benefit from the development of open analysis workflows that can enable e.g. data reuse of public datasets, something that is increasingly common in other proteomics fields. We present an open and modular platform for the analysis and visualisation of TD proteomics data called TopDownApp. It can be used as a flexible analysis platform, through the use of a common workflow engine, common data formats for input/output, and software containerisation. It can also serve as a tool for visual inspection through its simple setup. As a key point, it can also be used as a development platform for new tools through the use of Python, a modular design, software containerisation and common data formats. TopDownApp is open source and freely available at: https://github.com/mwalzer/TopDownApp.
Previous studies have established the association of sex with gene and protein expression. This study investigated the association of sex with the abundance of endogenous urinary peptides, using capillary electrophoresis-coupled to mass spectrometry datasets from 2008 healthy individuals and patients with type II diabetes, divided in one discovery and two validation cohorts. Statistical analysis using the Mann-Whitney test, adjusted for multiple testing, revealed 143 sex-associated peptides in the discovery cohort. Of these, 90 peptides were associated with sex in at least one of the validation cohorts and showed agreement in their regulation trends across all cohorts. The 90 sex-associated peptides were fragments of 29 parental proteins. Comparison with previously published transcriptomics data demonstrated that the genes encoding 16 of these parental proteins had sex-biased expression. The 143 sex-associated peptides were combined into a support vector machine-based classifier that could discriminate males from females in two independent sets of healthy individuals and patients with type II diabetes, with an AUC of 89% and 81%, respectively. Collectively, the urinary peptidome contains multiple sex-associated differences, which may enable a better understanding of sex-biased molecular mechanisms and the development of more accurate diagnostic, prognostic or predictive classifiers for each individual sex.
Trans-activation response DNA binding protein of 43kDa (TDP-43) regulates a great variety of cellular processes in the nucleus and cytosol. In addition, a defined subset of neurodegenerative diseases is characterized by nuclear depletion of TDP-43 as well as cytosolic mislocalization and aggregation. To perform its diverse functions TDP-43 can associate with different ribonucleoprotein complexes. Combined with transcriptomics, MS interactome studies have unveiled associations between TDP-43 and the spliceosome machinery, polysomes and RNA granules. Moreover, the highly dynamic, low-valency interactions regulated by its low-complexity domain calls for innovative proximity labeling methodologies. In addition to protein partners, the analysis of posttranslational modifications showed that they may play a role in the nucleocytoplasmic shuttling, RNA binding, liquid-liquid phase separation and protein aggregation of TDP-43. Here we review the various TDP-43 ribonucleoprotein complexes characterized so far, how they contribute to the diverse functions of TDP-43, and roles of post-translational modifications. Further understanding of the fluid dynamic properties of TDP-43 in ribonucleoprotein complexes, RNA granules, and self-assemblies will advance the understanding of RNA processing in cells and perhaps help to develop novel therapeutic approaches for TDPopathies.
Most proteins function by forming complexes within a dynamic interconnected network that underlies various biological mechanisms. To systematically investigate such interactomes, high-throughput techniques including CF-MS have been developed to capture, identify, and quantify protein-protein interactions (PPIs) in large-scale. Compared to other techniques, CF-MS allows the global identification and quantification of native protein complexes in one setting, without genetic manipulation and overexpression. Furthermore, quantitative CF-MS can potentially elucidate the distribution of a protein in multiple co-elution features, informing the stoichiometries and dynamics of a target protein complex. In this issue, Youssef et al. (Proteomics 2023, XX, XXXX-XXXX) combined multiplex CF-MS and an in-house algorithm to study the dynamics of the PPI network for Escherichia coli grown under ten different conditions. While the results demonstrated that while most proteins remained stable, the authors were able to detect disrupted interactions that were growth condition-specific. Further bioinformatics analyses also revealed biophysical properties and structural patterns that govern such a response.
Over the past two decades, there has been increasing research into the molecular composition and function of small extracellular vesicles in the central nervous system. This is due in part to the recognition that small extracellular vesicles likely contribute to the pathogenesis of neurological diseases such as Alzheimer's disease, but also an understanding that small extracellular vesicles are a source of potential biomarkers. Small extracellular vesicles carry specific cargo that reflects their biogenesis and cellular origins, including protein, RNA and lipid. While the protein and RNA content of small extracellular vesicles in the central nervous system diseases and have been studied extensively, our understanding of the lipidome of small extracellular vesicles in the central nervous system is still in its infancy. Lipids play a significant role in maintaining central nervous system structure and function, and the dysregulation of lipid metabolism is known to occur in many neurological disorders, including Alzheimer's disease. Here we review what is currently known about lipid dyshomeostasis in Alzheimer's disease. We propose that small extracellular vesicle lipids may provide insight into the pathophysiology and progression of Alzheimer's disease and other neurological disorders, and, in the future perhaps, aid in disease monitoring and detection.
Cell-derived extracellular vesicles (EVs) are evolutionary-conserved secretory organelles that, based on their molecular composition, are important intercellular signaling regulators. At least three classes of circulating EVs are known based on mechanism of biogenesis: exosomes (sEVs/Exos), microparticles (lEVs/MPs) and shed midbody remnants (sMB-Rs). sEVs/Exos are of endosomal pathway origin, microparticles (lEVs/MPs) from plasma membrane blebbing, and shed midbody remnants (sMB-Rs) arise from symmetric cytokinetic abscission. Here, we isolate sEVs/Exos, lEVs/MPs and sMB-Rs secreted from human isogenic primary (SW480) and metastatic (SW620) colorectal cancer (CRC) cell lines in milligram quantities for label-free MS/MS-based proteomic profiling. Purified EVs revealed selective composition packaging of exosomal protein markers in SW480/SW620-sEVs/Exos, metabolic enzymes in SW480/SW620-lEVs/MPs, while centralspindlin complex proteins, nucleoproteins, splicing factors, RNA granule proteins, translation-initiation factors, and mitochondrial proteins selectively traffic to SW480/SW620-sMB-Rs. Collectively, we identify 39 human cancer-associated genes in EVs; 17 associated with SW480-EVs, 22 with SW620-EVs. We highlight oncogenic receptors/transporters selectively enriched in sEVs/Exos (EGFR/ FAS in SW480-Exos and MET, TGFBR2, ABCB1 in SW620-sEVs/Exos). Interestingly, MDK, STAT1, and TGM2 are selectively enriched in SW480-sMB-Rs, and ADAM15 to SW620-sMB-Rs. Our study reveals sEVs/Exos, lEVs/MPs and sMB-Rs have distinct protein signatures that open potential diagnostic avenues of distinct types of EVs for clinical utility.
Relative and absolute intensity-based protein quantification across cell lines, tissue atlases, and tumour datasets is increasingly available in public datasets. These atlases enable researchers to explore fundamental biological questions, such as protein existence, expression location, quantity, and correlation with RNA expression. Most studies provide MS1 feature-based label-free quantitative (LFQ) datasets; however, growing numbers of isobaric tandem mass tags (TMT) datasets remain unexplored. Here, we compare traditional intensity-based absolute quantification (iBAQ) proteome abundance ranking to an analogous method using reporter ion proteome abundance ranking with data from an experiment where LFQ and TMT were measured on the same samples. This new TMT method substitutes reporter ion intensities for MS1 feature intensities in the iBAQ framework. Additionally, we compared LFQ-iBAQ values to TMT-iBAQ values from two independent large-scale tissue atlas datasets (one LFQ and one TMT) using robust bottom-up proteomic identification, normalisation, and quantitation workflows.
Proteins play an essential role in the vital biological processes governing cellular functions. Most proteins function as members of macromolecular machines, with the network of interacting proteins revealing the molecular mechanisms driving the formation of these complexes. Profiling the physiology-driven remodeling of these interactions within different contexts constitutes a crucial component to achieving a comprehensive systems-level understanding of interactome dynamics. Here, we apply co-fractionation mass spectrometry and computational modeling to quantify and profile the interactions of ~2,000 proteins in the bacterium Escherichia coli cultured under ten distinct culture conditions. The resulting quantitative co-elution patterns revealed large-scale condition-dependent interaction remodeling among protein complexes involved in diverse biochemical pathways in response to the unique environmental challenges. Network-level analysis highlighted interactome-wide biophysical properties and structural patterns governing interaction remodeling. Our results provide evidence of the local and global plasticity of the E. coli interactome along with a rigorous generalizable framework to define protein interaction specificity. We provide an accompanying interactive web application to facilitate exploration of these rewired networks.
Native mass spectrometry is a rapidly emerging technique for fast and sensitive structural analysis of protein constructs, maintaining the protein higher order structure. The coupling with electromigrative separation techniques under native conditions enables the characterization of proteoforms and highly complex protein mixtures. In this review, we present an overview of current native CE-MS technology. First, the status of native separation conditions is described for capillary zone electrophoresis (CZE), affinity capillary electrophoresis (ACE), and capillary isoelectric focusing (CIEF), as well as their chip-based formats, including essential parameters such as electrolyte composition and capillary coatings. Further, conditions required for native ESI-MS of (large) protein constructs, including instrumental parameters of QTOF and Orbitrap systems, as well as requirements for native CE-MS interfacing are presented. On this basis, methods and applications of the different modes of native CE-MS are summarized and discussed in the context of biological, medical, and biopharmaceutical questions. Finally, key achievements are highlighted and concluded, while remaining challenges are pointed out.
Fractionation of proteoforms is currently the most challenging topic in the field of protein purification. The need for considering the existence of proteoforms into experimental approaches is not only important in Life Science research in general but especially in the manufacturing of therapeutic proteins (TPs) like recombinant therapeutic antibodies (mAbs). Some of the proteoforms of TPs have significantly decreased actions or even cause side effects. The identification and removal of proteoforms differing from the main species, having the desired action, is challenging because the difference in the composition of atoms often is very small and their concentration in comparison to the main proteoform can be small. In this study we demonstrate that sample displacement batch chromatography (SDBC) is an easy to handle, economic and efficient method for fractionating proteoforms. As a model sample a commercial ovalbumin fraction was used, containing many ovalbumin proteoforms. The most promising parameters for the SDBC were determined by a screening approach and applied for a 10-segment fractionation of the ovalbumin with cation exchange chromatography resin. Mass spectrometry of intact proteoforms was used for characterizing the SDBC fractionation process. By SDBC a significant separation of different proteoforms was obtained.
Acclimations of Oreochromis mossambicus to hypersalinity were conducted with multiple rates of salinity increase and durations of exposure to determine the rate-independent maximum salinity limit and the incipient lethal salinity. Quantitative proteomics of over 3000 gill proteins simultaneously was performed to analyze molecular phenotypes associated with hypersalinity. For this purpose, a species- and tissue-specific data-independent acquisition (DIA) assay library of MSMS spectra was created. From these DIA data, protein networks representing complex molecular phenotypes associated with salinity acclimation were generated. O. mossambicus was determined to have a wide “zone of resistance” from approximately 75g/kg salinity to 120g/kg, which is tolerated for a limited period with eventual loss of organismal function. Crossing the critical threshold salinity into the zone of resistance corresponds with blood osmolality increasing beyond 400 mOsm/kg, significantly reduced body condition factor, and cessation of feeding. Gill protein networks impacted by hypersalinity include increased energy metabolism, especially upregulation of electron transport chain proteins, and regulation of specific osmoregulatory proteins. Cytoskeletal, cell adhesion, and extracellular matrix proteins are enriched in networks that are sensitive to the critical salinity threshold. Network analysis of these patterns provides deep insight into specific mechanisms of energy homeostasis during salinity stress.