Endometrial cancer is the most prevalent gynaecological cancer globally. Its association with obesity and metabolic diseases is a key aetiology, increasingly among younger females. Early diagnosis and improved treatment decisions are crucial for these women whose outcomes could be improved by discovering new biomarkers. We took a new approach to extracellular vesicle (EV) biomarker discovery - profiling the proteome of enriched EVs isolated directly from frozen biobanked endometrial cancers. Nine tissue pools, each generating collagenase-digested tissue and matched small EVs, were analysed using label-free proteomics. Three clinical subgroups: Endometrioid low BMI (body mass index), Endometrioid high BMI, and Serous, irrespective of BMI, were compared to identify shared secreted proteins, proteins associated with histological subtype, and proteins related to BMI. EVs were enriched for common EV markers and large secreted proteins. Cell lysates were enriched in mitochondrial and blood proteins. EV protein profiles were most different between the high BMI subgroup and the others, highlighting a significant influence of comorbidities on the intra-tumoural EV secretome. Proteins differentially abundant between subgroups in tissues were strikingly not also differential in the matched EVs. This work has identified secreted proteins implicated in the complex pathophysiology of endometrial cancer and pinpointed candidate biomarkers for diagnosis.
The information on the microbiome’s human pathways and active members that can affect SARS-CoV-2 susceptibility and pathogenesis in the salivary proteome is very scarce. Here, we studied samples collected from April to June 2020 from unvaccinated patients. We compared 10 infected and hospitalized patients with severe (n=5) and moderate (n=5) Coronavirus Disease (COVID-19) with 10 uninfected individuals, including Non-COVID but susceptible individuals (n=5) and Non-COVID and non-susceptible healthcare workers with repeated high-risk exposures (n=5). By performing high-throughput proteomic profiling in saliva samples, we detected 226 unique differentially expressed (DE) human proteins between groups (q-value ≤0.05) out of 2721 unambiguously identified proteins (false discovery rate ≤1%). Major differences were observed between the Non-COVID vs the non-susceptible groups. Bioinformatics analysis of DE proteins revealed human proteomic signatures related to inflammatory responses, central cellular processes, and antiviral activity associated with saliva of SARS-CoV-2 infected patients (p-value ≤0.0004). Discriminatory biomarker signatures from human saliva include cystatins, protective molecules present in the oral cavity, calprotectins, involved in cell cycle progression, and histones, related to nucleosome functions. The expression level of two human proteins related to protein transport in the cytoplasm, named DYNC1 (p-value, 0.0021) and MAPRE1 (p-value, 0.047), correlated with angiotensin-converting enzyme 2 (ACE2) plasma activity. Finally, the proteomes of microorganisms present in the saliva samples showed 4 main microbial functional features related to ribosome functioning that are overrepresented in the infected group. Our study explores potential candidates involved in pathways implicated in SARS-CoV-2 susceptibility although further studies in larger cohorts will be necessary.
The ability of trophectodermal cells (outer layer of the embryo) to attach to the endometrial cells and subsequently invade the underlying matrix are critical stages of embryo implantation during successful pregnancy establishment. Extracellular vesicles (EVs) have been implicated in embryo-maternal crosstalk, capable of reprogramming endometrial cells towards a pro-implantation signature and phenotype. However, challenges associated with EV yield and direct loading of biomolecules limit their therapeutic potential. We have previously established generation of cell-derived nanovesicles (NVs) from human trophectodermal cells (hTSCs) and their capacity to reprogram endometrial cells to enhance adhesion and blastocyst outgrowth. Here, we employed a rapid NV loading strategy to encapsulate potent implantation molecules such as HB-EGF (NVHBEGF). We show these loaded NVs elicit EGFR-mediated effects in recipient endometrial cells, activating kinase phosphorylation sites that modulate their activity (AKT S124/129, MAPK1 T185/Y187), and downstream signalling pathways and processes (AKT signal transduction, GTPase activity). Importantly, they enhanced target cell attachment and invasion. The phosphoproteomics and proteomics approach highlight NVHBEGF-mediated short-term signalling patterns and long-term reprogramming capabilities on endometrial cells which functionally enhance trophectodermal-endometrial interactions. This proof-of-concept study demonstrate feasibility in enhancing the potency of NVs in the context of embryo attachment and establishment.
For the ex-situ conservation of giant pandas, both collecting and preserving semen are important methods. The seminal plasma is rich in nutrients and bioactive substances, such as proteins, carbohydrates, lipids, amino acids, and hormones, which play an important role in the reproduction and reproductive health of the species. This is the first study to analyze the seminal plasma proteins of giant pandas through proteomics and identified 1125 proteins. These proteins are related to protein turnover, translation, and metabolism. The seminal plasma proteins of giant pandas were then compared to those of humans, pigs and sheep, with many unique proteins found in giant panda samples. Among these proteins, the WD40 repeat-containing proteins have been identified and implicated in sperm function and fertility. Understanding the composition and function of proteins in the giant panda seminal plasma proteome can provide valuable insights into their reproductive biology and help develop strategies to improve their reproductive success in captivity, which is essential for giant panda conservation.
Extracellular vesicles (EVs) in urine are a promising source to develop non-invasive biomarkers. However, urine concentration and content are highly variable and dynamic, and actual urine collection and handling often is nonideal, which presents an enormous challenge for urine-based biomarker studies. Furthermore, patients such as those with prostate diseases have challenges in sample collection due to difficulties in holding urine at designated time points. Here, we simulated the actual situation of clinical sample collection to examine the stability of EVs in urine under different circumstances, including urine collection time and temporary storage temperature, as well as the stability of EVs in daily urine sampling under different diet conditions. EVs were isolated by functionalized EVTRAP magnetic beads, and analyzed by nanoparticle tracking analysis (NTA), Western blotting, electron microscopy, and mass spectrometry (MS). EVs in urine remained relatively stable during temporary storage for 6 hours at room temperature and for 12 hours at 4°C, while significant fluctuations were observed in EV amounts from urine samples collected at the different time points from the same individuals, especially under certain diets. Sample normalization with creatinine reduced the coefficient of variation (CV) value among EV samples from 17% to approximately 6%, and facilitated downstream MS analyses. Finally, we based on the results and applied them to screen for potential biomarker panels in prostate cancer by data-independent acquisition (DIA) MS, presenting the recommendation that can facilitate biomarker discovery with nonideal handling conditions.
Cancer remains one of the most complex and challenging diseases in mankind. To address the need for a personalized treatment approach for particularly complex tumor cases, molecular tumor boards (MTBs) have been initiated. MTBs are interdisciplinary teams that perform in-depth molecular diagnostics to cooperatively and interdisciplinarily advise on the best therapeutic strategy. Current routine molecular diagnostics are routinely performed on the transcriptomic and genomic levels, aiming for the identification of tumor-driving mutations. However, these approaches can only partially capture the actual phenotype as well as the molecular key players of tumor growth and progression. Thus, direct investigation of the expressed proteins and activated signaling pathways provide complementary information on the tumor-driving molecular characteristics of the tissue. Technological advancements in mass-spectrometry-based proteomics enable the robust, rapid, and sensitive detection of thousands of proteins in minimal sample amounts, paving the way for clinical proteomics and the probing of oncogenic signaling activity. Therefore, proteomics is currently being integrated into molecular diagnostics within MTBs and holds promising potential in aiding tumor classification and identifying personalized treatment strategies. This review gives an introduction to MTBs and describes current state-of-the-art clinical proteomics, its potential in precision oncology, and highlights the benefits of multi-omic data integration.
High-throughput proteomics is an effective methodology for identifying a variety of virulence factors of pathogens. Proteomic data are commonly evaluated against annotated sequences present in publicly available database repositories. A proteogenomic approach can be used if annotated sequences are not available or to identify novel proteins/peptides. However, a single genome is commonly utilized in proteomic and proteogenomic analyses. We pose the question of whether utilizing a number of different genome assemblies of a bacterial pathogen would be beneficial. Here, we used previously obtained shot-gun label-free nano-LC‒MS/MS data of the exoprotein fraction of four reference ERIC I–IV genotypes of Paenibacillus larvae and evaluated them against publicly available annotated sequences (from NCBI-protein, RefSeq, UniProt) together with an array of protein sequences generated using a six-frame direct translation of 15 genomic assemblies available in GenBank. The wide search through 18 database components reliably identified 453 protein hits. UpSet analysis categorized the hits into 50 groups based on the success protein identification by databases. The relatively high variability in successful identification among the genome assemblies facilitated the mining of markers based on uniqueness and contrasting results prior to considering proteome differences. Data evaluation provided novel and interesting markers that can be studied further.
High grade gliomas (HGGs), are the most malignant and difficult to treat brain tumors. Despite several studies on glioma pathobiology there is no comparative proteomics study on high-grade and low-grade gliomas which uncovers the mechanism behind the aggressive mesenchymal behaviour of HGGs. In this study, tissue samples of high-grade and low-grade gliomas were processed for label free quantification (LFQ) using HR-LC MS/MS. The analysis identified 140 differentially expressed proteins, GSEA and protein-protein interaction analysis showed over expression of pathways like; ECM remodelling, Focal Adhesion, EMT and Glycan Biosynthesis in HGG. The key proteins were validated using multiple reaction monitoring experiment. ECM glycoproteins including; Fibronectin, Fibrinogens, Collagens, Vitronectin along with mesenchymal markers such as Vimentin and TGF-β came over-expressed in HGGs. The over-expression of oligosaccharyltransferase in HGG indicates its role in enhanced expression of glycoproteins. In-silico molecular docking with catalytic subunits of OST identified two small molecule inhibitors; Irinotecan and Entrectinib as potential candidates to target OST. We propose OST plays a major role in tumor metastasis by promoting EMT and could be used as a potential target to suppress glioma metastasis. Finally, the proteins identified in this study need further clinical research to validate their prognostic values as protein markers.
Enzymatic catalysis is one of the fundamental processes that drives the dynamic landscape of post-translational modifications (PTMs), expanding the structural and functional diversity of proteins. Here, we assessed enzyme specificity using a top-down ion mobility spectrometry (IMS) and tandem mass spectrometry (MS/MS) workflow. We successfully applied trapped IMS (TIMS) to investigate site-specific N-ε-acetylation of lysine residues of full-length histone H4 catalyzed by histone lysine acetyltransferase KAT8. We demonstrate that KAT8 exhibits a preference for N-ε-actylation of residue K16, while also installing N-ε-acetyl groups on residues K5 and K8 as the first degree of acetylation. Achieving TIMS resolving power values of up to 300, we fully separated mono-acetylated regioisomers (H4K5ac, H4K8ac, and H4K16ac). Each of these regioisomers produce unique MS/MS fragment ions, enabling estimation of their individual mobility distributions and the exact localization of the N-ε-acetylation sites. This study highlights the potential of top-down TIMS-MS/MS for conducting enzymatic assays at the intact protein level and, more generally, for separation and identification of isomeric proteoforms and precise PTM localization.
Cancer-associated cachexia is a wasting syndrome that results in dramatic loss of whole-body weight, predominantly due to loss of skeletal muscle mass. It has been established that cachexia inducing cancer cells secrete proteins and extracellular vesicles (EVs) that can induce muscle atrophy. Though several studies examined these cancer-cell derived factors, targeting some of these components have shown little or no clinical benefit. To develop new therapies, understanding of the dysregulated proteins and signalling pathways that regulate catabolic gene expression during muscle wasting is essential. Here, we sought to examine the effect of conditioned media (CM) that contain secreted factors and EVs from cachexia inducing C26 colon cancer cells on C2C12 myotubes using mass spectrometry-based label-free quantitative proteomics. We identified significant changes in the protein profile of C2C12 cells upon exposure to C26-derived CM. Functional enrichment analysis revealed enrichment of proteins associated with inflammation, mitochondrial dysfunction, muscle catabolism, ROS production, and ER stress in CM treated myotubes. Furthermore, strong downregulation in muscle structural integrity and development and/or regenerative pathways were observed. Together, these enriched proteins in atrophied muscle could be utilized as potential muscle wasting markers and the dysregulated biological processes could be employed for therapeutic benefit in cancer-induced muscle wasting.
Due to their oftentimes ambiguous nature, phosphopeptide positional isomers can present challenges in bottom-up mass spectrometry-based workflows as search engine scores alone are often not enough to confidently distinguish them. Additional scoring algorithms can remedy this by providing confidence metrics in addition to these search results, reducing ambiguity. Here we describe challenges to interpreting phosphoproteomics data and review several different approaches to determine sites of phosphorylation for both data-dependent and data-independent acquisition-based workflows. Finally, we discuss open questions regarding neutral losses, gas-phase rearrangement, and false localization rate estimation experienced by both types of acquisition workflows and best practices for managing ambiguity in phosphosite determination.
MALDI mass spectrometry imaging (MALDI imaging) is uniquely suited to advance cancer research by measuring spatial distribution of endogenous and exogenous molecules directly from thin tissue sections. These molecular maps provide valuable insights into various aspects of basic and translational cancer research, including spatial tumor and tumor microenvironment biology, pharmacological interventions, and patient stratification. However, despite these advantages, the utilization of MALDI imaging in studying rare cancers, which comprise approximately 20% of all cancers, remains limited. Rare cancers pose unique challenges in medical research, resulting in understudied entities with suboptimal management and outcomes. In this review, we explore the value of MALDI imaging in sarcoma, as an example of a highly heterogeneous and challenging rare cancer. We summarize existing MALDI imaging studies in sarcoma and outline potential future applications. In addition, we address the specific challenges encountered when employing MALDI imaging to rare cancers, and propose solutions, including the utilization of formalin-fixed paraffin-embedded tissues, multi-site studies, implementation of multiplexed experiments, and considerations for data sharing practices. Through this review, we aim to inspire collaboration between MALDI imaging researchers and clinical colleagues, to deploy the unique capabilities of MALDI imaging in rare cancer research, particularly in the context of sarcoma.
CCOC s a relatively rare subtype of ovarian cancer with high degree of resistance to standard chemotherapy. Little is known about the underlying molecular mechanisms, and it remains a challenge to predict its prognosis after chemotherapy. We analyzed the proteome of CCOC tissue samples from two independent cohorts using DIA-MS. A total of 8697 proteins were characterized in the first cohort (H1 cohort, 32 patients, 35 FFPE samples) and 9409 proteins in the second cohort (H2 cohort, 24 patients, 28 FF samples). After bioinformatics analysis, we narrowed our focus to 15 proteins significantly correlated with RFS in both cohorts. These proteins are mainly involved in DNA damage response, extracellular matrix, and mitochondrial metabolism. We further developed a 13-protein model to predict the prognosis of patients with CCOC in H2 cohort, and validated the model in the H1 cohort in both DIA and PRM data. Finally, we verified the modulated pathways from our CCOC proteomic dataset in several published CCOC transcriptome and proteome datasets. Taken together, this study presents a CCOC proteomic data resource and a promising 13-protein panel which could potentially predict the recurrence and survival of CCOC.
Changes in the structure of biological macromolecules, such as RNA and protein, have an important impact on biological functions, and are even important determinants of disease pathogenesis and treatment. Some genetic variations, including copy number variation, single nucleotide variation, and so on, can lead to changes in biological function and increased susceptibility to certain diseases by changing the structure of biological macromolecules. Here, we reviewed the progress of research about the effects of genetic variation on the structure of macromolecules including RNAs and proteins, several typical methods and common tools, and the effect on several diseases. An online resource (http://www.onethird-lab.com/gems/) to support convenient retrieval of common tools is also built. Finally, the challenges and future development of effect prediction were discussed.
The continuous advancements in LC-MS/MS proteomics over the past decades have paved the way for transformative changes in the field of medicine, particularly in the realms of preventive and personalized healthcare. Many new algorithms are evaluated on unknown proteomes and using databases with annotated MS2-spectra. When the research is focused on MS1-spectra, such databases are not available yet. Specifically, we propose a comprehensive workflow to extract MS1 isotope distributions from spectra, which we validated using a proteomics standard kit comprising known proteins at varying concentrations in duplicate. Our workflow incorporated a database search utilizing a state-of-the-art algorithm at 1% FDR. Through this approach, we investigated the impact of protein concentration on the probability of protein identification. Confidently identified PSMs were used to extract the MS1 isotope distributions through the proposed workflow. A total of 138.111 MS1 isotope distributions were extracted. Isotope distributions with 2 or more peaks were compared with their theoretical isotope distributions using the spectral angle. A median spectral angle of 0,101 and 0,0992 was observed in both samples indicating a high similarity. The findings from this study were compiled into a dataset which can potentially facilitate the development of novel tools with a focus on MS1 data.
We propose an updated approach for approximating the isotope distribution of average peptides given their monoisotopic mass. Our methodology involves in-silico cleavage of the entire UNIPROT database of Human reviewed proteins using Trypsin, generating a theoretical peptide dataset. The isotope distribution is computed using BRAIN. We apply a compositional data modelling strategy that utilizes an additive log-ratio transformation for the isotope probabilities followed by a penalized spline regression. Furthermore, due to the impact of the number of Sulphur atoms on the course of the isotope distribution, we develop separate models for peptides containing zero up to five Sulphur atoms. Additionally, we propose three methods to estimate the number of Sulphur atoms based on an observed isotope distribution. The performance of the spline models and the Sulphur prediction approaches is evaluated using a mean squared error and a modified Pearson’s χ² goodness-of-fit measure on an experimental UPS2 data set. Our analysis reveals that the variability in spectral accuracy contributes more to the errors than the approximation of the theoretical isotope distribution by our proposed average peptide model. Moreover, we find that the accuracy of predicting the number of Sulphur atoms based on the observed isotope distribution is limited by measurement accuracy.
The group 2 σ factor for RNA polymerase SigE plays important role in regulating central carbon metabolism in cyanobacteria. However, the regulation of SigE for these pathways at a proteome level remains unknown. Using a sigE-deficient strain (ΔsigE) of Synechocystis sp. PCC 6803 and quantitative proteomics, we found that SigE depletion induces differential protein expression for sugar catabolic pathways including glycolysis, oxidative pentose phosphate (OPP) pathway, and glycogen catabolism. Two glycogen debranching enzyme homologues Slr1857 and Slr0237 are found differentially expressed in ΔsigE. Glycogen determination indicated that Δslr0237 accumulated glycogen under photomixotrophic conditions but was unable to utilize these reserves in the dark, whereas Δslr1857 accumulates and utilize glycogen in a similar way as the WT strain does in the same conditions. These results suggest that Slr0237 plays the major role as the glycogen debranching enzyme in Synechocystis. To our knowledge, this is the first study to report the functional difference of two glycogen debranching enzyme in Synechocystis and the research highlights the intricate regulation of glycogen breakdown.
Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform-level information than conventional bottom-up proteomics that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentation, and data analysis, reliable and reproducible data analysis still remains one of the major bottlenecks in TDP. A key step for robust data analysis is the establishment of an objective estimation of proteoform-level false discovery rate (FDR) in proteoform identification. The most widely used FDR estimation scheme is based on the target-decoy approach (TDA), which has primarily been established for bottom-up proteomics. We present evidence that the TDA-based FDR estimation may not work at the proteoform-level due to an overlooked factor, namely the erroneous deconvolution of precursor masses, which leads to incorrect FDR estimation. We argue that the conventional TDA-based FDR in proteoform identification is in fact protein-level FDR rather than proteoform-level FDR unless precursor deconvolution error rate is taken into account. To address this issue, we propose a formula to correct for proteoform-level FDR bias by combining TDA-based FDR and precursor deconvolution error rate.