PROTEOMICS - Authorea

https://analyticalsciencejournals.onlinelibrary.wiley.com/journal/16159861

by author

by title

by keyword

Deep quantitative proteomics of North American Pacific coast star tunicate (Botryllus...

Dietmar Kültz

and 9 more

March 28, 2024

Botryllus schlosseri, is a model marine invertebrate for studying immunity, regeneration, and stress-induced evolution. Conditions for validating its predicted proteome were optimized using nanoElute® 2 deep-coverage LCMS, revealing up to 4,930 protein groups and 20,984 unique peptides per sample. Spectral libraries were generated and filtered to remove interferences, low-quality transitions, and only retain proteins with >3 unique peptides. The resulting DIA assay library enabled label-free quantitation of 3,426 protein groups represented by 22,593 unique peptides. Quantitative comparisons of a laboratory-raised with two field-collected populations revealed (1) a more unique proteome in the laboratory-raised population, and (2) proteins with high/low individual variabilities in each population. DNA repair/replication, ion transport, and intracellular signaling processes were unique in laboratory-cultured colonies. Spliceosome and Wnt signaling proteins were the least variable (highly functionally constrained) in all populations. In conclusion, we present the first colonial tunicate’s deep quantitative proteome analysis, identifying functional protein clusters associated with laboratory conditions, different habitats, and strong versus relaxed abundance constraints. These results empower research on B. schlosseri with proteomics resources and enable quantitative molecular phenotyping of changes associated with transfer from in situ to ex situ and from in vivo to in vitro culture conditions.

Open Source Large Language Models in Action: A Bioinformatics Chatbot for PRIDE datab...

Jingwen Bai

and 5 more

March 12, 2024

We here present a chatbot assistant infrastructure (https://www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database, the most popular proteomics data repository. Our system utilizes two advanced Large Language Models (LLM), llama2-13b and chatglm2-6b, and includes a web service API (Application Programming Interface), web interface, and sophisticated algorithms. We have developed a novel approach to construct vector-based representations for enabling the LLM responses, featuring a curated version and a comprehensive database of relevant links and paragraphs for each generated response. An important part of the framework is a benchmark component based on an Elo-ranking system, providing a scalable method for evaluating not only the performance of llama2-13b and chatglm2-6b but also, of any other available and future open-source LLMs. Throughout the benchmarking process, the PRIDE documentation for external users was refined to enhance the clarity and efficacy in addressing user queries. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure.

The temporal protein signature analyses of developing human deciduous molar germ

Xiaohang Chen

and 7 more

October 27, 2023

The tooth is one of the ideal models for developmental study, involving in epithelial-mesenchymal transition and cell differentiation. The essential factors and pathways identified in tooth development will help understand the natural development process and the malformations of mineralized tissues such as skeleton. The time-dependent proteomic changes were investigated by healthy human molars proteomics of embryonic stages from the cap-to-early bell stage. A total of 713 differentially expressed proteins (DEPs) with five temporal expression patterns were filtered. 24 potential driver proteins of tooth development were screened by weighted gene co-expression network analysis (WGCNA) including CHID1, RAP1GDS1, HAPLN3, AKAP12, WLS, GSS, DDAH1, CLSTN1, AFM, RBP1, AGO1, SET, HMGB2, HMGB1, ANP32A, SPON1, FREM1, C8B, PRPS2, FCHO2, PPP1R12A, GPALPP1, U2AF2 and RCC2. The hub proteins in different temporal expression patterns were extracted. And the potential cell resources and the temporal expression patterns at transcriptomic level were explored using single cell RNA-sequencing (scRNA-seq). This study provides invaluable resources for the mechanistic studies of human embryonic epithelial and mesenchymal cell differentiation and tooth development.

People with obesity exhibit losses in muscle proteostasis that are partly improved by...

Kanchana Srisawat

and 10 more

October 27, 2023

This pilot experiment examines if a loss in muscle proteostasis occurs in people with obesity and whether endurance exercise positively influences either the abundance profile or turnover rate of proteins in this population. Men with (n = 3) or without (n = 4) obesity were recruited and underwent a 14-d measurement protocol of daily deuterium oxide (D2O) consumption and serial biopsies of vastus lateralis muscle. Men with obesity then completed 10-weeks of high-intensity interval training (HIIT), encompassing 3 sessions per week of cycle ergometer exercise with 1 min intervals at 100 % maximum aerobic power interspersed by 1 min recovery periods. The number of intervals per session progressed from 4 to 8, and during weeks 8-10 the 14-d measurement protocol was repeated. Proteomic analysis detected 352 differences (p < 0.05, false discovery rate < 5%) in protein abundance and 19 (p < 0.05) differences in protein turnover, including components of the ubiquitin-proteasome system. HIIT altered the abundance of 53 proteins and increased the turnover rate of 22 proteins (p < 0.05) and tended to benefit proteostasis by increasing muscle protein turnover rates. Obesity and insulin resistance are associated with compromised muscle proteostasis, which may be partially restored by endurance exercise.

mzIdentML 1.3.0 - Essential progress on the support of crosslinking and other identif...

Colin William Combe

and 8 more

October 09, 2023

The mzIdentML file format, originally developed by the Proteomics Standards Initiative in 2011, is the open XML data standard for peptide and protein identification results coming from mass spectrometry. We present mzIdentML version 1.3.0, which introduces new functionality and support for additional use cases. First of all, a new mechanism for encoding identifications based on multiple spectra. Furthermore, the main mzIdentML specification document can now be supplemented by extension documents which provide further guidance for encoding specific use cases for different proteomics subfields. One extension document has been added, covering additional use cases for the encoding of crosslinked peptide identifications. The ability to add extension documents facilitates keeping the mzIdentML standard up to date with advances in the proteomics field, without having to change the main specification document. The crosslinking extension document provides further explanation of the crosslinking use cases already supported in mzIdentML version 1.2.0, and provides support for encoding additional scenarios that are critical to reflect developments in the crosslinking field and facilitate its integration in structural biology. These are: (i) support for cleavable crosslinkers, (ii) support for internally linked peptides, (iii) support for noncovalently associated peptides, and (iv) improved support for encoding scores and the corresponding thresholds.

Regulation of Non-Canonical Proteins Encoded by Small Open Reading Frames via the Non...

PARTHIBAN PERIASAMY

and 10 more

September 19, 2023

Immunotherapy harnesses neoantigens encoded within the human genome, but their therapeutic potential is hampered by low expression, which may be controlled by the Nonsense-Mediated Decay (NMD) pathway. This study investigates the impact of UPF1-knockdown on the expression of non-canonical/mutant proteins, employing proteogenomic to explore UPF1 role within the NMD pathway. Additionally, we conducted a comprehensive pan-cancer analysis of UPF1 expression and evaluated UPF1 expression in Triple-Negative Breast Cancer (TNBC) tissue in-vivo. Our findings reveal that UPF1-knockdown leads to increased transcription of non-canonical/mutant proteins, particularly those originating from retained-introns, pseudogenes, long non-coding RNAs, and unannotated transcript biotypes. Moreover, our analysis demonstrates elevated UPF1 expression in various cancer types, with notably heightened protein levels in patient-derived TNBC tumours compared to adjacent tissues. This study elucidates UPF1 role in mitigating transcriptional noise by degrading transcripts encoding non-canonical/mutant proteins. Intriguingly, we observe an upregulation of the NMD pathway in cancer, potentially acting as a “neoantigen-masking” mechanism that suppresses non-canonical/mutant protein expression. Targeting this mechanism may reveal a new spectrum of neoantigens accessible to the antigen presentation pathway. Our novel findings provide a strong foundation for the development of therapeutic strategies aimed at targeting UPF1 or modulating the NMD pathway.

((Commentary on pmic.202300239 - title of the commentary t.b.a.)) Beyond the model or...

Benjamin Orsburn

August 31, 2023

A historic challenge for shotgun proteomics has been the requirement for high quality, simple and nonredundant curated protein sequences in small .fasta text files. Due to the intrinsic informatic challenges and time required to assemble these files, proteomics has struggled to expand beyond the confines of a few model organisms. When considering post-translational modifications that may or may not be present on a specific peptide sequence, these factors inevitably compound. A study on how mangos continue to ripen on the shelf may not be the first thing you'd think of as proof of a scientific discipline shedding historic limitations. However, Bautiste-Valle et al., may be just that. These authors present a quantitative comparison of both peptide and glycopeptide alterations through the complexity of the fruit ripening process and in this we see the present state of a field that no longer needs to wait on genomics to obtain deep mechanistic insights.

MassSpecPreppy - an end-to-end solution for automated protein concentration determina...

Alexander Reder

and 13 more

August 08, 2023

In proteomics, fast, efficient and highly reproducible sample preparation is of utmost importance, particularly in view of fast scanning mass spectrometers enabling analyses of large sample series. To address this need, we have developed the web application MassSpecPreppy that operates on the open science OT-2 liquid handling robot from Opentrons. This platform can prepare up to 96 samples at once, performing tasks like BCA protein concentration determination, sample digestion with normalization, reduction/alkylation and peptide elution into vials or loading specified peptide amounts onto Evotips in an automated and flexible manner. The performance of the developed workflows using MassSpecPreppy was compared with standard manual sample preparation workflows. The BCA assay experiments revealed an average recovery of 101.3% (SD: ±7.82%) for the MassSpecPreppy workflow, while the manual workflow had a recovery of 96.3% (SD: ±9.73%). The species mix used in the evaluation experiments showed that 94.5% of protein groups for OT-2 digestion and 95% for manual digestion passed the significance thresholds with comparable peptide level coefficient of variations. These results demonstrate that MassSpecPreppy is a versatile and scalable platform for automated sample preparation, producing injection-ready samples for proteomics research.

Structural biology: the Transformational Era

Shoshana Wodak

July 26, 2023

Structural biology: the Transformational Eraby

HowDirty: An R package to evaluate molecular contaminants in LC-MS experiments

David Gomez-Zepeda

and 4 more

July 26, 2023

Contaminants derived from consumables, reagents, and sample handling often negatively affect LC-MS data acquisition. In proteomics experiments, they can markedly reduce identification performance, reproducibility, and quantitative robustness. Here, we introduce a data analysis workflow combining MS1 feature extraction in Skyline with HowDirty, an R-markdown-based tool, that automatically generates an interactive report on the molecular contaminant level in LC-MS data sets. To facilitate the interpretation of the results, the HTML report is self-contained and self-explanatory, including plots that can be easily interpreted. The R package HowDirty is available from https://github.com/DavidGZ1/HowDirty. To demonstrate a showcase scenario for the application of HowDirty, we assessed the impact of ultrafiltration units from different providers on sample purity after filter-assisted sample preparation (FASP) digestion. This allowed us to select the filter units with the lowest contamination risk. Notably, the filter units with the lowest contaminant levels showed higher reproducibility regarding the number of peptides and proteins identified. Overall, HowDirty enables the efficient evaluation of sample quality covering a wide range of common contaminant groups that typically impair LC-MS analyses, facilitating taking corrective or preventive actions to minimize instrument downtime.

A framework for considering prior information in network-based approaches to --omics...

Julia Somers

and 9 more

July 20, 2023

For decades, molecular biologists have been uncovering the mechanics of biological systems. Efforts to bring their findings together have led to the development of multiple databases and information systems that capture and present pathway information in a computable network format. Concurrently, the advent of modern omics technologies has empowered researchers to systematically profile cellular processes across different modalities. Numerous algorithms, methodologies, and tools have been developed to use prior knowledge networks in the analysis of omics datasets. Interestingly, it has been repeatedly demonstrated that the source of prior knowledge can greatly impact the results of a given analysis. For these methods to be successful it is paramount that their selection of prior knowledge networks is amenable to the data type and the computational task they aim to accomplish. Here we present a five-level framework that broadly describes network models in terms of their scope, level of detail, and ability to inform causal predictions. To contextualize this framework, we review a handful of network-based omics analysis methods at each level, while also describing the computational tasks they aim to accomplish.

Microproteins - discovery, structure and function

Jessica Mohsen

and 2 more

July 05, 2023

Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional tools are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.

Prediction of protein interactions is essential for studying biomolecular mechanisms

Ilya Vakser

June 30, 2023

Structural characterization of protein interactions is essential for our ability to understand and modulate physiological processes. Computational approaches to modeling of protein complexes provide structural information that far exceeds capabilities of the existing experimental techniques. Protein structure prediction in general, and prediction of protein interactions in particular, has been revolutionized by the rapid progress in Deep Learning techniques. The work of Schweke et al. presents a community-wide study of an important problem of distinguishing physiological protein-protein complexes/interfaces (experimentally determined or modeled) from non-physiological ones. The authors designed and generated a large benchmark set of physiological and non-physiological homodimeric complexes, and evaluated a large set of scoring functions, as well as AlphaFold predictions, on their ability to discriminate the non-physiological interfaces. The problem of separating physiological interfaces from non-physiological ones is very difficult, largely due to the lack of a clear distinction between the two categories in a crowded environment inside a living cell. Still, the ability to identify key physiologically significant interfaces in the variety of possible configurations of a protein-protein complex is important. The study presents a major data resource and methodological development in this important direction for molecular and cellular biology.

Proteomics analysis of C2C12 myotubes treated with atrophy inducing cancer cell-deriv...

Akbar Marzan

and 5 more

June 30, 2023

Cancer-associated cachexia is a wasting syndrome that results in dramatic loss of whole-body weight, predominantly due to loss of skeletal muscle mass. It has been established that cachexia inducing cancer cells secrete proteins and extracellular vesicles (EVs) that can induce muscle atrophy. Though several studies examined these cancer-cell derived factors, targeting some of these components have shown little or no clinical benefit. To develop new therapies, understanding of the dysregulated proteins and signalling pathways that regulate catabolic gene expression during muscle wasting is essential. Here, we sought to examine the effect of conditioned media (CM) that contain secreted factors and EVs from cachexia inducing C26 colon cancer cells on C2C12 myotubes using mass spectrometry-based label-free quantitative proteomics. We identified significant changes in the protein profile of C2C12 cells upon exposure to C26-derived CM. Functional enrichment analysis revealed enrichment of proteins associated with inflammation, mitochondrial dysfunction, muscle catabolism, ROS production, and ER stress in CM treated myotubes. Furthermore, strong downregulation in muscle structural integrity and development and/or regenerative pathways were observed. Together, these enriched proteins in atrophied muscle could be utilized as potential muscle wasting markers and the dysregulated biological processes could be employed for therapeutic benefit in cancer-induced muscle wasting.

Computational approaches to identify sites of phosphorylation

Alex Joyce

and 1 more

June 15, 2023

Due to their oftentimes ambiguous nature, phosphopeptide positional isomers can present challenges in bottom-up mass spectrometry-based workflows as search engine scores alone are often not enough to confidently distinguish them. Additional scoring algorithms can remedy this by providing confidence metrics in addition to these search results, reducing ambiguity. Here we describe challenges to interpreting phosphoproteomics data and review several different approaches to determine sites of phosphorylation for both data-dependent and data-independent acquisition-based workflows. Finally, we discuss open questions regarding neutral losses, gas-phase rearrangement, and false localization rate estimation experienced by both types of acquisition workflows and best practices for managing ambiguity in phosphosite determination.

Proteomic profiling of ovarian clear cell carcinomas identifies prognostic biomarkers...

Liang Yue

and 14 more

June 09, 2023

CCOC s a relatively rare subtype of ovarian cancer with high degree of resistance to standard chemotherapy. Little is known about the underlying molecular mechanisms, and it remains a challenge to predict its prognosis after chemotherapy. We analyzed the proteome of CCOC tissue samples from two independent cohorts using DIA-MS. A total of 8697 proteins were characterized in the first cohort (H1 cohort, 32 patients, 35 FFPE samples) and 9409 proteins in the second cohort (H2 cohort, 24 patients, 28 FF samples). After bioinformatics analysis, we narrowed our focus to 15 proteins significantly correlated with RFS in both cohorts. These proteins are mainly involved in DNA damage response, extracellular matrix, and mitochondrial metabolism. We further developed a 13-protein model to predict the prognosis of patients with CCOC in H2 cohort, and validated the model in the H1 cohort in both DIA and PRM data. Finally, we verified the modulated pathways from our CCOC proteomic dataset in several published CCOC transcriptome and proteome datasets. Taken together, this study presents a CCOC proteomic data resource and a promising 13-protein panel which could potentially predict the recurrence and survival of CCOC.

TopDownApp: An open and modular platform for analysis and visualisation of top-down p...

Mathias Walzer

and 3 more

June 06, 2023

Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise its adoption. In this context, there are numerous improvements that are possible in the area of open science including the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include e.g. increased data sharing practices and availability of tailored open data standards. Additionally, the field would benefit from the development of open analysis workflows that can enable e.g. data reuse of public datasets, something that is increasingly common in other proteomics fields. We present an open and modular platform for the analysis and visualisation of TD proteomics data called TopDownApp. It can be used as a flexible analysis platform, through the use of a common workflow engine, common data formats for input/output, and software containerisation. It can also serve as a tool for visual inspection through its simple setup. As a key point, it can also be used as a development platform for new tools through the use of Python, a modular design, software containerisation and common data formats. TopDownApp is open source and freely available at: https://github.com/mwalzer/TopDownApp.

A compositional data model to predict the isotope distribution for average peptides u...

Annelies Agten

and 2 more

May 30, 2023

We propose an updated approach for approximating the isotope distribution of average peptides given their monoisotopic mass. Our methodology involves in-silico cleavage of the entire UNIPROT database of Human reviewed proteins using Trypsin, generating a theoretical peptide dataset. The isotope distribution is computed using BRAIN. We apply a compositional data modelling strategy that utilizes an additive log-ratio transformation for the isotope probabilities followed by a penalized spline regression. Furthermore, due to the impact of the number of Sulphur atoms on the course of the isotope distribution, we develop separate models for peptides containing zero up to five Sulphur atoms. Additionally, we propose three methods to estimate the number of Sulphur atoms based on an observed isotope distribution. The performance of the spline models and the Sulphur prediction approaches is evaluated using a mean squared error and a modified Pearson’s χ² goodness-of-fit measure on an experimental UPS2 data set. Our analysis reveals that the variability in spectral accuracy contributes more to the errors than the approximation of the theoretical isotope distribution by our proposed average peptide model. Moreover, we find that the accuracy of predicting the number of Sulphur atoms based on the observed isotope distribution is limited by measurement accuracy.