PROTEOMICS - Authorea

https://analyticalsciencejournals.onlinelibrary.wiley.com/journal/16159861

Proteomics is advancing the understanding of stallion sperm biology.

FERNANDO J PEÑA

and 6 more

April 12, 2024

The mammalian ejaculate is very well suited for proteomics studies. As such, investigations on the sperm proteomics are offering a huge amount of new information on the biology of the spermatozoa. Among domestic animals, horses represent a special interest species, in which reproductive technologies and an important market of genetic material has growth exponentially in the last decade. Investigations using proteomic approaches have been conducted in recent years, showing that proteomics is a potent tool to dig into the biology of the stallion spermatozoa. The aim of this review is to present an overview of the research conducted, and how these studies have improved our knowledge of the stallion sperm biology. The main outcomes of the research conducted so far have been an improved knowledge of the metabolism, and its importance of sperm functions, the impact of different technologies in the sperm proteome, and the identification of potential biomarkers. Moreover, proteomics of the seminal plasma and phosphoproteomics are identified as areas of major interest.

Benefits and limits of decellularization on mass-spectrometry-based extracellular mat...

Teresa Frattini

and 8 more

March 25, 2024

Extracellular matrix (ECM) proteins, including collagens, ECM glycoproteins, and proteoglycans, are critical components of tissue structure and function. In addition to the core matrisome, there are matrisome-associated proteins that balance ECM production and degradation. The identification and quantification of ECM proteins using mass spectrometry is often hindered by their low abundance and their tendency to aggregate, forming insoluble macromolecules in aqueous solutions. In this study, we aimed to investigate the effectiveness of a decellularization strategy that combined freeze-thaw cycles and sodium dodecyl sulphate treatment, in identifying and quantifying ECM proteins in mouse kidney using mass spectrometry. This decellularization strategy preserved 95% of the Core matrisome proteins detected in non-decellularized kidney and revealed additional once. Decellularization also led to an increase in the abundance of 96% of the core matrisome ECM proteins by an average of 59 times due to the successful removal of cellular and matrisome-associated proteins. However, the enrichment varied greatly among ECM proteins, resulting in a misrepresentation of the native ECM protein composition of the kidney. This should be brought to the attention of the matrisome research community, as it highlights the need for caution when interpreting proteomic data obtained following a decellularization procedure.

Deep Learning Methods for Protein Function Prediction

Frimpong Boadu

and 2 more

March 12, 2024

Predicting protein function from protein sequence, structure, interaction, and other relevant information is important for generating hypotheses for biological experiments and studying biological systems, and therefore has been a major challenge in protein bioinformatics. Numerous computational methods had been developed to advance protein function prediction gradually in the last two decades. Particularly, in the recent years, leveraging the revolutionary advances in artificial intelligence (AI), more and more deep learning methods have been developed to improve protein function prediction at a faster pace. Here, we provide an in-depth review of the recent developments of deep learning methods for protein function prediction. We summarize the significant advances in the field, identify several remaining major challenges to be tackled, and suggest some potential directions to explore. The data sources and evaluation metrics widely used in protein function prediction are also discussed to assist the machine learning, AI, and bioinformatics communities to develop more cutting-edge methods to advance protein function prediction.

Open Source Large Language Models in Action: A Bioinformatics Chatbot for PRIDE datab...

Jingwen Bai

and 5 more

March 12, 2024

We here present a chatbot assistant infrastructure (https://www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database, the most popular proteomics data repository. Our system utilizes two advanced Large Language Models (LLM), llama2-13b and chatglm2-6b, and includes a web service API (Application Programming Interface), web interface, and sophisticated algorithms. We have developed a novel approach to construct vector-based representations for enabling the LLM responses, featuring a curated version and a comprehensive database of relevant links and paragraphs for each generated response. An important part of the framework is a benchmark component based on an Elo-ranking system, providing a scalable method for evaluating not only the performance of llama2-13b and chatglm2-6b but also, of any other available and future open-source LLMs. Throughout the benchmarking process, the PRIDE documentation for external users was refined to enhance the clarity and efficacy in addressing user queries. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure.

Combining FAIMS based Glycoproteomics and DIA Proteomics reveals widespread proteome...

Chris Hadjineophytou

and 3 more

November 09, 2023

Protein glycosylation is increasingly recognized as a common protein modification across bacterial species. Within the Neisseria genus O-linked protein glycosylation is conserved yet closely related Neisseria species express O-oligosaccharyltransferases (PglOs) with distinct targeting activities. Within this work, we explore the targeting capacity of different PglOs using Field Asymmetric Waveform Ion Mobility Spectrometry (FAIMS) fractionation and Data-Independent Acquisition (DIA) to allow the characterization of the impact of changes in glycosylation on the proteome of N. gonorrhoeae. We demonstrate FAIMS expands the known glycoproteome of wild type N. gonorrhoeae MS11 and enables differences in glycosylation to be assessed across strains expressing different pglO allelic chimeras with unique substrate targeting activities. Combining glycoproteomic insights with DIA proteomics, we demonstrate that alterations within pglO alleles have widespread impacts on the proteome of N. gonorrhoeae. Examination of peptides known to be targeted by glycosylation using DIA analysis supports alterations in glycosylation occupancy occurs independently of changes in protein levels and that the occupancy of glycosylation is generally low on most glycoproteins. This work thus expands our understanding of the N. gonorrhoeae glycoproteome and the roles that pglO allelic variation may play in governing genus-level protein glycosylation.

Fragterminomics: extracting information on proteolytic processing from shotgun proteo...

Miguel Cosenza-Contreras

and 14 more

November 04, 2023

State-of-the-art mass spectrometers combined with modern bioinformatics algorithms for peptide-to-spectrum matching (PSM) with robust statistical scoring allow for more variable features (i.e., post-translational modifications) being reliably identified from (tandem-) mass spectrometry data, often without the need for biochemical enrichment. Semi-specific proteome searches, that enforces a theoretical enzymatic digestion to solely the N- or C-terminal end, allow to identify native protein termini or those arising from endogenous proteolytic activity (also referred to ‘neo-N-termini’ analysis or ‘N-terminomics’. Nevertheless, deriving biological meaning from these search outputs can be challenging in terms of data mining and analysis. Thus, we introduce Fragterminomics, a data analysis approach for the (1) annotation of peptides according to their enzymatic cleavage specificity, (2) differential abundance and enrichment analysis of N-terminal sequence patterns, (3) visualization of neo-N-termini location, and (4) mapping neo-N-termini to known protein processing features. We illustrate the use of Fragterminomics by applying it to tandem mass tag (TMT)-based proteomics data of a mouse model of polycystic kidney disease and assess the semi-specific searches for biological interpretation of cleavage events and the variable contribution of proteolytic products to general protein abundance. The Fragterminomics approach and example data are available as an R package at https://github.com/MiguelCos/Fragterminomics.

Comparative proteomics of round and wrinkled pea (Pisum sativum L.) during seed devel...

Sintayehu Daba

and 5 more

September 21, 2023

Seeds are an important part of plants, ensuring the continuation of plants’ life and providing nutrient reserves for humans and animals. Seed development is controlled by the interplay of several physiological processes. We applied label-free proteomics to round and wrinkled peas using seeds sampled at five growth stages (4 days after anthesis (DAA), 7DAA, 12DAA, 15DAA, and maturity). Phenotypic results indicated that wrinkled peas had lower starch concentration compared to round peas (29.5% vs. 46.6-55.1%). A total of 4,126 high confident proteins were detected, with 22–26% shared across all sampling times within an entry. Early seed growth stages were characterized by more unique proteins compared to maturity. Two-way ANOVA revealed 1,685 proteins significantly different among samples, of which 722 proteins were characterized into 29 functional classes. The four major classes (comprising over 50 proteins) were protein biosynthesis, protein homeostasis, enzymes, and carbohydrate metabolism. Of the two types of comparisons (time-point and entry-wise), time-point comparisons yielded more differentially abundance proteins (596 proteins in total). Different protein classes exhibited different patterns of change during seed development. For example, cell division related proteins were abundant early in seed development, whereas storage proteins were abundant later in seed development (especially after 12DAA). Compared to the round pea entries, the wrinkled entry had significantly lower abundance of starch branching enzymes, a protein involved in the biosynthesis of amylopectin in starch. In conclusion, the results of this study provide valuable information to improve our understanding of seed development and form the basis for further studies.

The tumour-derived extracellular vesicle proteome varies by endometrial cancer histol...

Anastasiia Artuyants

and 11 more

September 15, 2023

Endometrial cancer is the most prevalent gynaecological cancer globally. Its association with obesity and metabolic diseases is a key aetiology, increasingly among younger females. Early diagnosis and improved treatment decisions are crucial for these women whose outcomes could be improved by discovering new biomarkers. We took a new approach to extracellular vesicle (EV) biomarker discovery - profiling the proteome of enriched EVs isolated directly from frozen biobanked endometrial cancers. Nine tissue pools, each generating collagenase-digested tissue and matched small EVs, were analysed using label-free proteomics. Three clinical subgroups: Endometrioid low BMI (body mass index), Endometrioid high BMI, and Serous, irrespective of BMI, were compared to identify shared secreted proteins, proteins associated with histological subtype, and proteins related to BMI. EVs were enriched for common EV markers and large secreted proteins. Cell lysates were enriched in mitochondrial and blood proteins. EV protein profiles were most different between the high BMI subgroup and the others, highlighting a significant influence of comorbidities on the intra-tumoural EV secretome. Proteins differentially abundant between subgroups in tissues were strikingly not also differential in the matched EVs. This work has identified secreted proteins implicated in the complex pathophysiology of endometrial cancer and pinpointed candidate biomarkers for diagnosis.

HB-EGF-loaded nanovesicles enhance trophectodermal spheroid attachment and invasion

Qi Hui Poh

and 3 more

August 24, 2023

The ability of trophectodermal cells (outer layer of the embryo) to attach to the endometrial cells and subsequently invade the underlying matrix are critical stages of embryo implantation during successful pregnancy establishment. Extracellular vesicles (EVs) have been implicated in embryo-maternal crosstalk, capable of reprogramming endometrial cells towards a pro-implantation signature and phenotype. However, challenges associated with EV yield and direct loading of biomolecules limit their therapeutic potential. We have previously established generation of cell-derived nanovesicles (NVs) from human trophectodermal cells (hTSCs) and their capacity to reprogram endometrial cells to enhance adhesion and blastocyst outgrowth. Here, we employed a rapid NV loading strategy to encapsulate potent implantation molecules such as HB-EGF (NVHBEGF). We show these loaded NVs elicit EGFR-mediated effects in recipient endometrial cells, activating kinase phosphorylation sites that modulate their activity (AKT S124/129, MAPK1 T185/Y187), and downstream signalling pathways and processes (AKT signal transduction, GTPase activity). Importantly, they enhanced target cell attachment and invasion. The phosphoproteomics and proteomics approach highlight NVHBEGF-mediated short-term signalling patterns and long-term reprogramming capabilities on endometrial cells which functionally enhance trophectodermal-endometrial interactions. This proof-of-concept study demonstrate feasibility in enhancing the potency of NVs in the context of embryo attachment and establishment.

The seminal plasma proteome of the giant panda

Kailai Cai

and 14 more

August 08, 2023

For the ex-situ conservation of giant pandas, both collecting and preserving semen are important methods. The seminal plasma is rich in nutrients and bioactive substances, such as proteins, carbohydrates, lipids, amino acids, and hormones, which play an important role in the reproduction and reproductive health of the species. This is the first study to analyze the seminal plasma proteins of giant pandas through proteomics and identified 1125 proteins. These proteins are related to protein turnover, translation, and metabolism. The seminal plasma proteins of giant pandas were then compared to those of humans, pigs and sheep, with many unique proteins found in giant panda samples. Among these proteins, the WD40 repeat-containing proteins have been identified and implicated in sperm function and fertility. Understanding the composition and function of proteins in the giant panda seminal plasma proteome can provide valuable insights into their reproductive biology and help develop strategies to improve their reproductive success in captivity, which is essential for giant panda conservation.

Integration of proteomics in the molecular tumor board

Johanna Thiery

and 1 more

July 25, 2023

Cancer remains one of the most complex and challenging diseases in mankind. To address the need for a personalized treatment approach for particularly complex tumor cases, molecular tumor boards (MTBs) have been initiated. MTBs are interdisciplinary teams that perform in-depth molecular diagnostics to cooperatively and interdisciplinarily advise on the best therapeutic strategy. Current routine molecular diagnostics are routinely performed on the transcriptomic and genomic levels, aiming for the identification of tumor-driving mutations. However, these approaches can only partially capture the actual phenotype as well as the molecular key players of tumor growth and progression. Thus, direct investigation of the expressed proteins and activated signaling pathways provide complementary information on the tumor-driving molecular characteristics of the tissue. Technological advancements in mass-spectrometry-based proteomics enable the robust, rapid, and sensitive detection of thousands of proteins in minimal sample amounts, paving the way for clinical proteomics and the probing of oncogenic signaling activity. Therefore, proteomics is currently being integrated into molecular diagnostics within MTBs and holds promising potential in aiding tumor classification and identifying personalized treatment strategies. This review gives an introduction to MTBs and describes current state-of-the-art clinical proteomics, its potential in precision oncology, and highlights the benefits of multi-omic data integration.

A proteogenomic analysis and use of an array of genome assemblies to scout out new vi...

Tomas Erban

and 1 more

July 14, 2023

High-throughput proteomics is an effective methodology for identifying a variety of virulence factors of pathogens. Proteomic data are commonly evaluated against annotated sequences present in publicly available database repositories. A proteogenomic approach can be used if annotated sequences are not available or to identify novel proteins/peptides. However, a single genome is commonly utilized in proteomic and proteogenomic analyses. We pose the question of whether utilizing a number of different genome assemblies of a bacterial pathogen would be beneficial. Here, we used previously obtained shot-gun label-free nano-LC‒MS/MS data of the exoprotein fraction of four reference ERIC I–IV genotypes of Paenibacillus larvae and evaluated them against publicly available annotated sequences (from NCBI-protein, RefSeq, UniProt) together with an array of protein sequences generated using a six-frame direct translation of 15 genomic assemblies available in GenBank. The wide search through 18 database components reliably identified 453 protein hits. UpSet analysis categorized the hits into 50 groups based on the success protein identification by databases. The relatively high variability in successful identification among the genome assemblies facilitated the mining of markers based on uniqueness and contrasting results prior to considering proteome differences. Data evaluation provided novel and interesting markers that can be studied further.

Comprehensive proteomic investigation of high-grade and low-grade gliomas reveals pat...

Ayushi Verma

and 4 more

July 13, 2023

A document by Ayushi Verma. Click on the document to view its contents.

Top-down ion mobility/mass spectrometry reveals enzyme specificity: Separation and se...

Francis Berthias

and 3 more

July 11, 2023

Enzymatic catalysis is one of the fundamental processes that drives the dynamic landscape of post-translational modifications (PTMs), expanding the structural and functional diversity of proteins. Here, we assessed enzyme specificity using a top-down ion mobility spectrometry (IMS) and tandem mass spectrometry (MS/MS) workflow. We successfully applied trapped IMS (TIMS) to investigate site-specific N-ε-acetylation of lysine residues of full-length histone H4 catalyzed by histone lysine acetyltransferase KAT8. We demonstrate that KAT8 exhibits a preference for N-ε-actylation of residue K16, while also installing N-ε-acetyl groups on residues K5 and K8 as the first degree of acetylation. Achieving TIMS resolving power values of up to 300, we fully separated mono-acetylated regioisomers (H4K5ac, H4K8ac, and H4K16ac). Each of these regioisomers produce unique MS/MS fragment ions, enabling estimation of their individual mobility distributions and the exact localization of the N-ε-acetylation sites. This study highlights the potential of top-down TIMS-MS/MS for conducting enzymatic assays at the intact protein level and, more generally, for separation and identification of isomeric proteoforms and precise PTM localization.

Advancing rare cancer research by MALDI mass spectrometry imaging: Applications, chal...

Maren Stillger

and 4 more

June 12, 2023

MALDI mass spectrometry imaging (MALDI imaging) is uniquely suited to advance cancer research by measuring spatial distribution of endogenous and exogenous molecules directly from thin tissue sections. These molecular maps provide valuable insights into various aspects of basic and translational cancer research, including spatial tumor and tumor microenvironment biology, pharmacological interventions, and patient stratification. However, despite these advantages, the utilization of MALDI imaging in studying rare cancers, which comprise approximately 20% of all cancers, remains limited. Rare cancers pose unique challenges in medical research, resulting in understudied entities with suboptimal management and outcomes. In this review, we explore the value of MALDI imaging in sarcoma, as an example of a highly heterogeneous and challenging rare cancer. We summarize existing MALDI imaging studies in sarcoma and outline potential future applications. In addition, we address the specific challenges encountered when employing MALDI imaging to rare cancers, and propose solutions, including the utilization of formalin-fixed paraffin-embedded tissues, multi-site studies, implementation of multiplexed experiments, and considerations for data sharing practices. Through this review, we aim to inspire collaboration between MALDI imaging researchers and clinical colleagues, to deploy the unique capabilities of MALDI imaging in rare cancer research, particularly in the context of sarcoma.

Effects of genetic variation on the structure of biological macromolecules

Jingxuan Kang

and 16 more

June 01, 2023

Changes in the structure of biological macromolecules, such as RNA and protein, have an important impact on biological functions, and are even important determinants of disease pathogenesis and treatment. Some genetic variations, including copy number variation, single nucleotide variation, and so on, can lead to changes in biological function and increased susceptibility to certain diseases by changing the structure of biological macromolecules. Here, we reviewed the progress of research about the effects of genetic variation on the structure of macromolecules including RNAs and proteins, several typical methods and common tools, and the effect on several diseases. An online resource (http://www.onethird-lab.com/gems/) to support convenient retrieval of common tools is also built. Finally, the challenges and future development of effect prediction were discussed.

Functional proteomics reveals that Slr0237 is a SigE-regulated glycogen debranching e...

Haitao Ge

and 2 more

May 17, 2023

The group 2 σ factor for RNA polymerase SigE plays important role in regulating central carbon metabolism in cyanobacteria. However, the regulation of SigE for these pathways at a proteome level remains unknown. Using a sigE-deficient strain (ΔsigE) of Synechocystis sp. PCC 6803 and quantitative proteomics, we found that SigE depletion induces differential protein expression for sugar catabolic pathways including glycolysis, oxidative pentose phosphate (OPP) pathway, and glycogen catabolism. Two glycogen debranching enzyme homologues Slr1857 and Slr0237 are found differentially expressed in ΔsigE. Glycogen determination indicated that Δslr0237 accumulated glycogen under photomixotrophic conditions but was unable to utilize these reserves in the dark, whereas Δslr1857 accumulates and utilize glycogen in a similar way as the WT strain does in the same conditions. These results suggest that Slr0237 plays the major role as the glycogen debranching enzyme in Synechocystis. To our knowledge, this is the first study to report the functional difference of two glycogen debranching enzyme in Synechocystis and the research highlights the intricate regulation of glycogen breakdown.