Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform-level information than conventional bottom-up proteomics that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentation, and data analysis, reliable and reproducible data analysis still remains one of the major bottlenecks in TDP. A key step for robust data analysis is the establishment of an objective estimation of proteoform-level false discovery rate (FDR) in proteoform identification. The most widely used FDR estimation scheme is based on the target-decoy approach (TDA), which has primarily been established for bottom-up proteomics. We present evidence that the TDA-based FDR estimation may not work at the proteoform-level due to an overlooked factor, namely the erroneous deconvolution of precursor masses, which leads to incorrect FDR estimation. We argue that the conventional TDA-based FDR in proteoform identification is in fact protein-level FDR rather than proteoform-level FDR unless precursor deconvolution error rate is taken into account. To address this issue, we propose a formula to correct for proteoform-level FDR bias by combining TDA-based FDR and precursor deconvolution error rate.
Relative and absolute intensity-based protein quantification across cell lines, tissue atlases, and tumour datasets is increasingly available in public datasets. These atlases enable researchers to explore fundamental biological questions, such as protein existence, expression location, quantity, and correlation with RNA expression. Most studies provide MS1 feature-based label-free quantitative (LFQ) datasets; however, growing numbers of isobaric tandem mass tags (TMT) datasets remain unexplored. Here, we compare traditional intensity-based absolute quantification (iBAQ) proteome abundance ranking to an analogous method using reporter ion proteome abundance ranking with data from an experiment where LFQ and TMT were measured on the same samples. This new TMT method substitutes reporter ion intensities for MS1 feature intensities in the iBAQ framework. Additionally, we compared LFQ-iBAQ values to TMT-iBAQ values from two independent large-scale tissue atlas datasets (one LFQ and one TMT) using robust bottom-up proteomic identification, normalisation, and quantitation workflows.
Proteins play an essential role in the vital biological processes governing cellular functions. Most proteins function as members of macromolecular machines, with the network of interacting proteins revealing the molecular mechanisms driving the formation of these complexes. Profiling the physiology-driven remodeling of these interactions within different contexts constitutes a crucial component to achieving a comprehensive systems-level understanding of interactome dynamics. Here, we apply co-fractionation mass spectrometry and computational modeling to quantify and profile the interactions of ~2,000 proteins in the bacterium Escherichia coli cultured under ten distinct culture conditions. The resulting quantitative co-elution patterns revealed large-scale condition-dependent interaction remodeling among protein complexes involved in diverse biochemical pathways in response to the unique environmental challenges. Network-level analysis highlighted interactome-wide biophysical properties and structural patterns governing interaction remodeling. Our results provide evidence of the local and global plasticity of the E. coli interactome along with a rigorous generalizable framework to define protein interaction specificity. We provide an accompanying interactive web application to facilitate exploration of these rewired networks.
Native mass spectrometry is a rapidly emerging technique for fast and sensitive structural analysis of protein constructs, maintaining the protein higher order structure. The coupling with electromigrative separation techniques under native conditions enables the characterization of proteoforms and highly complex protein mixtures. In this review, we present an overview of current native CE-MS technology. First, the status of native separation conditions is described for capillary zone electrophoresis (CZE), affinity capillary electrophoresis (ACE), and capillary isoelectric focusing (CIEF), as well as their chip-based formats, including essential parameters such as electrolyte composition and capillary coatings. Further, conditions required for native ESI-MS of (large) protein constructs, including instrumental parameters of QTOF and Orbitrap systems, as well as requirements for native CE-MS interfacing are presented. On this basis, methods and applications of the different modes of native CE-MS are summarized and discussed in the context of biological, medical, and biopharmaceutical questions. Finally, key achievements are highlighted and concluded, while remaining challenges are pointed out.
Fractionation of proteoforms is currently the most challenging topic in the field of protein purification. The need for considering the existence of proteoforms into experimental approaches is not only important in Life Science research in general but especially in the manufacturing of therapeutic proteins (TPs) like recombinant therapeutic antibodies (mAbs). Some of the proteoforms of TPs have significantly decreased actions or even cause side effects. The identification and removal of proteoforms differing from the main species, having the desired action, is challenging because the difference in the composition of atoms often is very small and their concentration in comparison to the main proteoform can be small. In this study we demonstrate that sample displacement batch chromatography (SDBC) is an easy to handle, economic and efficient method for fractionating proteoforms. As a model sample a commercial ovalbumin fraction was used, containing many ovalbumin proteoforms. The most promising parameters for the SDBC were determined by a screening approach and applied for a 10-segment fractionation of the ovalbumin with cation exchange chromatography resin. Mass spectrometry of intact proteoforms was used for characterizing the SDBC fractionation process. By SDBC a significant separation of different proteoforms was obtained.
Acclimations of Oreochromis mossambicus to hypersalinity were conducted with multiple rates of salinity increase and durations of exposure to determine the rate-independent maximum salinity limit and the incipient lethal salinity. Quantitative proteomics of over 3000 gill proteins simultaneously was performed to analyze molecular phenotypes associated with hypersalinity. For this purpose, a species- and tissue-specific data-independent acquisition (DIA) assay library of MSMS spectra was created. From these DIA data, protein networks representing complex molecular phenotypes associated with salinity acclimation were generated. O. mossambicus was determined to have a wide “zone of resistance” from approximately 75g/kg salinity to 120g/kg, which is tolerated for a limited period with eventual loss of organismal function. Crossing the critical threshold salinity into the zone of resistance corresponds with blood osmolality increasing beyond 400 mOsm/kg, significantly reduced body condition factor, and cessation of feeding. Gill protein networks impacted by hypersalinity include increased energy metabolism, especially upregulation of electron transport chain proteins, and regulation of specific osmoregulatory proteins. Cytoskeletal, cell adhesion, and extracellular matrix proteins are enriched in networks that are sensitive to the critical salinity threshold. Network analysis of these patterns provides deep insight into specific mechanisms of energy homeostasis during salinity stress.
Multiomics approaches to studying systems biology are very powerful tools that can elucidate changes in the genomic, transcriptomic, proteomic, and metabolomic levels within a particular cell type in response to an infection. These approaches are valuable for understanding the mechanisms behind disease pathogenesis, and specifically how the immune system responds to being challenged. With the emergence of the COVID-019 pandemic, now more than ever, the importance and utility of these tools has become evident in garnering a better understanding of the systems biology within the innate and adaptive immune response and for developing treatments and preventative measures for new and emerging pathogens that pose a threat to human health. In this review we focus on the various state of the art “omics” technologies used within the scope of innate immunity.
Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94 respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines were shown to recall the physiological dimers with significantly higher accuracy than the non-physiological set, lending support for the pertinence of our benchmark dataset. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
HNF4α is a master regulator gene belonging to the nuclear receptor superfamily involved in regulating a wide range of critical biological processes in different organs. Structurally, the HNF4A locus is organized with two independent promoters and is subjected to alternative splicing with the production of twelve distinct isoforms. Little is known about the mechanisms each isoform uses to regulate transcription and their biological impact, with some reports addressing these aspects. Proteomic analyses have led to identifying proteins that interact with specific HNF4α isoforms. The identification and validation of these interactions and their role in co-regulating targeted gene expression are essential to understand better the role of this transcription factor in different biological processes and pathologies. This review addresses the historical origin of HNF4α isoforms, some of the main functions of the P1 and P2 isoform subgroups and provide information on the most recent hot topic research on the nature and function of proteins associated with each of the isoforms in some biological contexts.