An Atlas of Human Kinase Regulation

David Ochoa, Mindaugas Jonikas, Robert T Lawrence, Bachir El Debs, Joel Selkrig, Athanasios Typas, Judit Villén, Silvia DM Santos, Pedro Beltrao

Now published in Molecular Systems Biology 10.15252/msb.20167295

AbstractThe coordinated regulation of protein kinases is a rapid mechanism that integrates diverse cues and swiftly determines appropriate cellular responses. However, our understanding of cellular decision-making has been limited by the small number of simultaneously monitored phospho-regulatory events. Here, we have estimated changes in activity in 215 human kinases in 399 conditions derived from a large compilation of phosphopeptide quantifications. This atlas identifies commonly regulated kinases as those that are central in the signaling network and defines the logic relationships between kinase pairs. Co-regulation along the conditions predicts kinase-complex and kinase-substrate associations. Additionally, the kinase regulation profile acts as a molecular fingerprint to identify related and opposing signaling states. Using this atlas, we identified essential mediators of stem cell differentiation, modulators of Salmonella infection and new targets of AKT1. This provides a global view of human phosphorylation-based signaling and the necessary context to better understand kinase driven decision-making.

#Introduction

Cells need to constantly adapt to internal and external conditions in order to maintain homoeostasis. During cellular decision-making, signal-transduction networks dynamically change in response to cues, triggering cellular state-defining responses. Multiple mechanisms exist to transfer this information from sensors to the corresponding molecular responses, one of the fastest being the reversible post-translational modification of proteins (PTMs). Through these targeted modifications, such as phosphorylation, the cell can quickly alter enzymatic activities, protein interactions or sub-cellular localization in order to produce a coordinated response to a given stimulus (Pawson 2004). Protein phospho-regulation constitutes a highly conserved regulatory mechanism relevant for a broad set of biological functions and diseases (Beltrao 2012).

Over the past decades, our view of cellular signaling has advanced from an idea of isolated and linear cascades to highly complex and cooperative regulatory networks (Jordan 2000, Gibson 2009). Different perturbations in cellular conditions often activate different sets of interconnected kinases, ultimately triggering appropriate cellular responses. The complete understanding of such cell-fate decisions would require the systematic measurement of changes in kinase activities under multiple perturbations, but the small number of quantified regulatory events (i.e. tens) that were possible to date has limited our knowledge of cellular decision making and its molecular consequences (Kim 2011, Bendall 2011, Niepel 2013, Garmaroudi 2010).

Advances in mass-spectrometry and enrichment methods now allow measuring changes in thousands of phosphorylated peptides at a very high temporal resolution (Olsen 2013, Kanshin 2015, Humphrey 2015). Recent studies on human quantitative phosphorylation include responses at different cell-cycle stages (Dephoure 2008, Olsen 2010), after DNA damage (Beli 2012), EGF stimulation (Olsen 2006, Mertins 2012), prostaglandin stimulation (de Graaf 2014) and different kinase inhibitions (Weber 2012, Oppermann 2013, Kettenbach 2011, Hsu 2011) among many others. More recently, improvements in experimental and computational methods have fostered the study of differential regulation of phosphosites and kinases in different cancer types (Casado 2013), the modeling of drug resistance (Wilkes 2015) and inference of more precise pathway models (Terfve 2015). We suggest that the integrated analysis of phosphoproteomic responses after a wide panel of heterogeneous perturbations can expedite our understanding of cell decision-making processes.

In this study, we have compiled condition-dependent changes in human protein phosphorylation derived from 2,940,379 phosphopeptide quantifications in 435 heterogeneous perturbations. After quality control and normalization, we infer and benchmark the changes in 215 kinase activities in 399 conditions. We show that the similarity of kinase regulatory profiles can be used as a fingerprint to compare conditions in order to, for example, identify perturbations that modulate the kinase activity changes of a condition of interest. The large number of conditions analyzed allowed us to identify the kinases that are broad regulators (i.e. generalist kinases), found to be central kinases of the signaling network. Individual kinase profiles across conditions were also informative to recapitulate known kinase-kinase interactions and to identify novel co-regulated complexes and phosphosites acting as potential effectors.

#Results

Landscape of kinase activity responses under perturbation

To extensively study the heterogeneity and specificity of the human signaling response, we compiled and standardized 41 quantitative studies reporting the relative changes in phosphopeptide abundance after perturbation (see Materials and Methods). From the detected peptides, we collected identifications for 119,710 phosphosites in 12,505 proteins, 63% of which already reported in phosphosite databases (Fig EV1). For these sites, we normalized a total of 2,940,379 quantitative changes in phosphopeptide abundance in a panel of 435 biological conditions covering a broad spectrum of perturbations including targeted kinase inhibition, induced hESC differentiation or cell cycle progression, among many others (Appendix Fig S1, Table EV1). Only 1% of all phosphorylated sites were reported in more than 60% of the studies, whereas 52% of the sites were quantified in one single study (Fig EV1). The observed data sparsity, a common problem in shotgun proteomics, is frequently derived from the accumulation of technical and biological variants. To prevent the aggregation of false positives, only phosphosites observed in two or more independent studies were considered in downstream analysis.

In order to circumvent the problem of incomplete coverage due to the different sets of quantified sites in each study, we avoid analysing individual phosphosites. Instead, we predicted the changes in kinase activity by testing the enrichment on differentially regulated phosphosites among the known substrates of each kinase (Fig 1A). Using a modified version of the weighted Kinase Set Enrichment Analysis (KSEA) (Casado 2013, Subramanian 2005), we quantified the regulation of 215 kinases in a range between 10 to 399 perturbations (Fig 1A, Table EV2, Materials and Methods). To verify the ability of KSEA to quantitatively measure the changes in kinase regulation, we performed a series of benchmark tests based on prior knowledge.

The known mechanism of action of certain biological processes or compounds suggests different perturbations in which specific kinase regulation is expected. For instance, the ATM (ataxia telangiectasia, mutated) and ATR (ATM and Rad3-related) kinases display direct regulation corroborated by the KSEA activities under DNA-damaging conditions (Fig 1B). Similarly, the kinases in the MAPK/Erk pathway accurately display activation 10 minutes after EGF stimulation (Fig 1B). Conversely, the KSEA estimates also report decrease in kinase activities as in the case of the Epidermal Growth Factor Receptor (EGFR) inhibition mediated by erlotinib and gefitinib or MTOR inhibition by Torin1 (Fig 1B). Overall, the KSEA activity shows no predictive power to discriminate expected regulation in 132 kinase-condition pairs (Fig 1C, Table EV3, area under ROC curve=0.75).

To further validate the kinase regulation inference, we compared the KSEA activities across conditions with the corresponding changes in kinase regulatory phosphosites collected in the atlas. For example, phosphorylation of the activation loop residue Threonine 287 (T287), known to result in an increased catalytic activity of AURKA, presents a significant co-regulation with the AURKA KSEA activity (Spearman's \(\rho = 0.6\), \(p=0.02\)). Phosphorylation of T287 and KSEA activity derived from AURKA substrates are both decreased as AURKA inhibitor MLN8054 concentration increases (Fig 1D). Overall, we observe a significantly higher correlation between the KSEA activities and the changes in kinase auto-regulatory sites (Student's t-test, \(p=1.7\times10^{-4}\)) (Fig 1E). Finally, we compared the kinase regulation with those assayed in a previous study using phospho-specific antibodies under similar conditions (Hill 2016). As an example, the KSEA activities 10 mins after EGF stimulation significantly correlate (\(\rho=0.53\), \(p=0.008\)) with the antibody-based quantified phospho-regulation 15 minutes after EGF stimulation (Fig 1F). Despite the differences of both assays, the profile of inferred changes based on 26 phospho-dependent antibodies and the MS-based KSEA activities for the equivalent kinases present significantly higher correlations when cells are stimulated with similar EGFR activating conditions (Fig 1G, Student's one-sided t-test \(p=2.7\times10^{-5}\), Appendix Fig S2).

Together, these results not only validate the activity inference for individual kinases but strongly suggest the profile of kinase activity changes can serve as molecular barcodes of the cellular signaling state.

Figure EV1 - A map of human conditional phosphoregulation.
A Venn diagram including the total number of phosphorylated residues for which quantifications were collected and the number of sites contained in the curated phosphosite databases: PhosphositePlus, HPRD and PhosphoELM (August 2014).
B Total number of quantified serines (S), threonines (T) and tyrosines (Y).
C Distribution of sites reported by multiple parallel publications.
D Accumulation of sites as publications are randomly aggregated (100 permutations). Red line shows mean and the shadowed area mean +/- 1 standard deviation.

Figure 1 - Kinome-wide activity regulation derived from known substrates and 41 quantitative phosphoproteomic studies.
A Schematic of the data compilation effort and kinase activity prediction using Kinase Set Enrichment Analysis (KSEA).
B Expected kinase response after activation or inhibition. When available, error bars represent standard deviation over the mean KSEA Activity.
C Receiver Operating Characteristic (ROC) representing the discriminative power of the KSEA Activity to separate a set of 132 kinase-conditions expected to display regulation. As negatives, 1000 random sets were generated containing the same number of kinase-condition pairs from the same set of kinases and conditions. Curve displays average ROC curve and bars standard deviation. AUC, Area Under ROC Curve.
D Regression analysis between Aurora Kinase A (AURKA) regulatory site T287 and AURKA KSEA activity across all quantified conditions. Labelled conditions correspond to different concentrations of the AURKA inhibitor MLN8054 under Mitosis.
E Comparison between Spearman correlation coefficients obtained between KSEA-inferred kinase activities, quantifications of regulatory sites susceptible of auto-phosphorylation or other regulatory sites in kinases. * Two-sided Student's t-test \(p=1.7\times10^{-4}\).
F Linear regression between KSEA activities 10 minutes after EGF stimulation and activities measured with RPPA targeting regulatory phosphorylations 15 minutes after adding EGF.
G Spearman correlation coefficients between the profile of 24 kinase activities estimated with KSEA 10 min after EGF stimulation and the activities of the same kinases measured with RPPA after stimulating the cell with different ligands. EGFR ligands, EGF or NRG1; other growth factors (GF), HGF, IGF, Insulin or FGF; or control conditions, Serum or PBS. ** Two-sided Student's t-test \(p=0.005\). *** \(p=0.004\).

Inhibition of inferred regulatory kinases impairs state transition during PMA-induced hESC differentiation

To further validate the inferred KSEA activities, we experimentally measured the activity changes of 10 kinases using immunohistochemistry (Table EV4) during human Embryonic Stem Cell (hESC) differentiation induced by Phorbol 12-myristate 13-acetate (PMA), a perturbation compiled in the phosphoproteomic atlas (Rigbolt 2011) (Fig 2, A and B). Immunofluorescence and KSEA substrate-derived activities 30 and 60 min after PMA treatment agree in their regulatory effect - activatory or inhibitory - for 14 out of the 20 quantifications (Fig 2C, Fig EV2). Several of the concordant changes are expected to occur during differentiation such as for PKC (PRKCA) (Feng 2012), Erk2, RSK (RPS6KA1), GSK3A and GSK3B (Kinehara 2013). For CDK1, the predicted activities were corroborated using an antibody targeting Cyclin B1 pS126 (CycB1/CCNB1), a phosphorylation required for the activation of the CDK1-Cyclin B1 complex.

Not all regulated kinases may be functional relevant for the process under study. To discriminate driver regulatory kinases from secondary kinases activated as a consequence of the differentiation process, we monitored the PMA-induced transition in the presence of kinase inhibitors (Table EV5). Using immunofluorescence, we quantified the cytoplasmic abundance of Oct4 and Erg1 as respective early and late markers of PMA-driven differentiation (Niwa 2000, Kinehara 2014) (Fig EV3, Appendix Fig S3). Interestingly, Erk2 inhibition induced the strongest disruption of Erg1 expression. Inhibition of CDK1 also appears to delay the increase in Erg1 expression and, potentially, the differentiation process. On the other hand, the inhibition of RSK (RPS6KA1) shows the strongest induction of Erg1 expression after treatment, suggesting a possible role of its activity in the maintenance of the pluripotent state. The inhibition of GSK and S6 (RPS6KB1) kinases results in a small increase in PMA induced Erg1 expression only during the early time points. Overall, these results show how the KSEA based inference can predict regulated kinases and therefore predict those that are more likely to be functionally relevant in specific conditions. This illustrates how the kinase atlas can serve as a useful cell signaling resource.

Figure 2 - Inhibition of Inferred Regulatory Kinases impairs PMA induced differentiation of hESC.
A Representative images of differentiation marker MAPK (pT202/Y204) expression in hES cells stimulated with PMA. Scale bar: 30\(\mu\)m.
B Time course quantification of MAPK activation levels after PMA stimulation in the presence or absence of MAPK inhibitor (PD184352).
C Relative changes in kinase activities using Kinase Set Enrichment Analysis (KSEA) benchmarked against antibody-measured reporter phosphosites in the intervals 0-30 and 0-60 minutes.

Figure EV2 - Agreement between immunoflorescence and KSEA activities during hES cell differentiation.
For each of the 10 kinases under study, the panel of phopsho-specific antibodies report the activity at 0, 30 and 60 minutes after PMA stimulation - left panel. Error bars represent mean ± SD. A minimum of 1000 cells were analyzed in each experimental condition. The KSEA activities are inferred from the MS-quantified substrates in the time intervals 0-30 and 0-60 minutes after PMA stimulation and represented as a fraction of the theoretical limit based on the number of KSEA permutations. Note that the MS-based horizontal bars represent a time interval (i.e 0-30min) as compared to the IHC data measured in each individual timepoint (i.e. 30 min). For both time intervals, the relative position of each of the known substrates within the quantitative phosphoproteomic profile indicate enrichments on upregulated - blue - or down-regulated - red- phosphosites - right panels.

Figure EV3 - Time course quantification of Erg1 expression levels after PMA stimulation in presence of kinase inhibitors. Inset displays kinase activity measured with phospho-specific antibody after inhibitory treatment. Error bars represent mean \(\pm\) SD (\(n \gt 1000\)).

Kinase regulation profiles as molecular fingerprints of cellular signaling states

The diversity of the compiled perturbations as well as the extent of the kinases for which regulation is inferred constitutes a resource to study fundamental aspects of cell signaling. To demonstrate that the biological variation dominates over the technical variation, we tested if related kinases display co-regulation across conditions and, similarly, related conditions show similar patterns of kinase regulation. We observed that significant correlations between the KSEA activities were more frequent between kinases one or two steps away in the pathway than between those farther away (Fig EV4A). This observation remains true when excluding kinase-pairs sharing substrate sites (Fig EV4B). Similarly, we confirmed that pairs of related conditions measured in different studies tend to have similar profiles of KSEA activities (Fig EV4C). Furthermore, the correlation of kinase regulatory profiles is a very strong predictor of related conditions assayed in different studies (Fig EV4D, AUC=0.93), but not of pairs of conditions from studies conducted in the same lab (AUC=0.499), with the same cell line (AUC=0.546) or with the same labelling method (AUC=0.475). These results strongly suggest that the variation of kinase activities across conditions is primarily driven by biological effects rather than technical variation.

In order to explore the space of different signaling responses, we performed a principal component analysis (PCA) using the kinase regulation profiles derived from 58 well characterized kinases (Fig 3A, Materials and Methods, Appendix Fig S4). The first two components separate related EGF conditions based on their expected signaling similarities and opposite to the EGFR pathway inhibitors (Fig 3A, symbols). The separation of perturbations in the reduced space is again independent of the publication of origin, reflecting instead the similarities in the signaling response. The systematic exploration of conditions in the reduced space also allows us to investigate commonalities in the decision-making process. Kinase loadings driving the sample separation in the PCA space reflect systematic differences on the regulation of different kinases (Appendix Fig S4C). In this way, we can identify different types of kinase logic relationships that apply to nearly all conditions (Fig 3B). Some kinase pairs are co-regulated - such as BRAF and PRKG1 (Fig 3B, AND) - or anti-correlated - such as CDK2 and CHEK (Fig 3B, OR). Alternatively, we also identify pairs of kinases that display exclusive regulation, whereby one is never regulated at the same time as the other. For example, AKT1 is regulated when CDK1 is not and vice versa (Fig 3B, NOT).

The results above show how extreme similarities or dissimilarities between profiles of activity changes facilitate the interpretation and generate hypothesis about the signaling in specific conditions. For example, perturbations under DNA damaging conditions display similar KSEA activity profiles that can be summarized as a signature of marker kinases (Fig EV5). Among the most similar conditions to two DNA damage conditions (ionizing radiation and etoposide), are compounds that are known to cause DNA damage and a sample under G1-S transition obtained using a thymidine block that likely resulted in DNA damage. Conversely, cells treated with the inhibitor VE-821 targeting the DNA damage response kinase ATR show changes in activities anti-correlated with DNA damaging conditions (Fig EV5C). Therefore, kinase regulatory profiles can be used to identify perturbations that may mimic or modulate the kinase regulation occurring in a condition of interest. We further explored this notion in the study of two related Shigella and Salmonella infection states (Fig 3E). Among the anti-correlated conditions are 4 compounds that could potentially interfere with the infection process or the host response: SB202190 (p38 MAP kinase inhibitor), mesalazine (anti-inflamatory), trichostatin A (HDAC inhibitor) and verapamil (an efflux pump inhibitor). To validate the effect of these compounds, we have measured their impact on the invasion and proliferation of Salmonella enterica Typhimurium (STm) in human cells (Fig 3D, Materials and Methods). Mesalazine showed no effect on either invasion or proliferation (data not shown). Trichostatin A and higher doses of SB202190 tend to promote invasion. SB202190 showed a consistent decrease in long-term STm proliferation while trichostatin A showed a trend for increase in STm proliferation that was clearer for lower doses of the drug. Verapamil had a significant effect on proliferation that was not consistent across different concentrations. These results show how modulators of signaling states of interest can be identified by comparing kinase regulatory profiles found in the atlas.

Figure EV4 - Kinase activity profiles as signatures of the molecular response.
A Fraction of significant correlations (\(FDR \lt 0.05\)) when comparing the activities across conditions for kinases at different distances in the signaling network. The network contains all kinase-substrate interactions from PhosphositePlus, HPRD and Phospho.ELM.
B Same but excluding kinase pairs sharing substrate residues.
C Differences on kinase activity profile correlation grouped by biological or technical origin. The profiles of activities for the 215 kinases are correlated by sample. Pairwise correlations between conditions assayed in different studies are grouped based on biological or technical criteria. Related conditions published in independent studies present significantly different correlations (Student's t-test, \(p=0.002\)), than the same number of random pairs of conditions also published in different studies.
D ROC curves denoting the predictive power of the same kinase activity profile correlations to discriminate between related conditions in different studies from random pairs of conditions (Mean AUC 0.933), conditions assayed in the same cell line against different (AUC = 0.546), same labelling methods versus different technique (AUC = 0.475) or samples coming from different publications with the same leading author against different leading authors (AUC = 0.499).

Figure 3 - Kinase activity profiles as fingerprints of the cell signaling state.
A Perturbation scores on the first 2 PCA components based on KSEA activity profiles of 52 well-characterized kinases. Symbols flag related perturbations in different studies.
B Boolean logic relationships between kinase responses. Samples in 2 first components are colored by different KSEA activities. Vectors display kinase loadings.
C Network displays significantly correlated or anti-correlated conditions in the context of early responses after bacterial infection. The strength of the correlations - blue - and anti-correlations - red - is displayed as the edge width.
D-E Infection rate at 0h - left - and bacterial proliferation after 20h - right - when adding different concentrations of compounds displaying anti-correlated KSEA activity profiles with early responses after bacterial infection. Displayed significant ANOVA p-values evaluate differences between 3 drug concentrations and the DMSO control.

Figure EV5 - Kinase activity profiles identify DNA damage responses.
A All conditions displayed on the PCA first 2 components. Perturbations potentially producing DNA damage are highlighted in red. Vectors report the loadings of 3 relevant kinases.
B Accumulated KSEA activity of the DNA damage biomarkers ATM, PRKAA1, CHEK2 across 399 perturbations.
C Network displays signficantly correlated conditions. Blue edges report significant correlation between conditions involving damaging conditions. Red edges report significant anti-correlations with ATR inhibition, a kinase involved in the DNA damage response. Edge width represent the strength of the correlation.

Activity signatures reveal generalist and specialist kinases

The large panel of estimated changes of kinase activity across conditions allows us to classify kinases according to their degree of specificity. Some kinases, such as AKT or CDK1, are very often regulated across all conditions and can be defined as generalist kinases. Other kinases such as ATM and ATR, are more narrowly regulated and can be considered specialist kinases. To study these two classes of kinases we correlated the number of conditions in which kinases show changes in activity with the functional importance of the kinases measured in genetic experiments. Functional importance was scored as either the degree of essentiality from a CRISPR screen (Wang 2015) or by the number of phenotypes from a compilation of RNAi screens (from www.genomernai.org) (Fig 4, A and B). Kinases that have changes in activity in many conditions (e.g. generalist kinases) are not more likely to be functionally important than specialist kinases. For example, ATR or PLK1 are regulated in few conditions but tend to be essential. We observed however that generalist kinases, such as AKT and CDK1, are more central in the kinase-signaling network as measured by the number of shortest paths that traverse them in the directed kinase-kinase network (Fig 4C, \(\rho=0.506\), \(p =9.8\times10^{-3}\), excluding kinases with 0 betweenness). Kinases that are often regulated tend to occupy positions in the network where signaling is very likely to flow through based on the wiring of the network. Understanding the properties of generalist and specialist kinases may allow us to better understand the specificity of the signaling response, as well as to propose novel therapeutic targets and inform on the potential consequences of kinase inhibition.

Figure 4 - Relevance of generalist or specialist kinases.
A Genetic relevance of generalist and specialist kinases. Number of conditions where the kinase is regulated (Absolute estimated kinase activity > 1.75) for each kinase with more than 10 known substrates against the depletion score from CRISPR essentiality screen (Wang 2015). A lower depletion scores is indicative of kinases that when knocked-out cause severe fitness defects.
B Same number of conditions in which a kinase is regulated against the number of phenotypes shown by the knocked-down kinase (from a compilation of RNAi screens www.genomernai.org).
C Same number of conditions in which a kinase is regulated against kinase centrality (betweeness) in signalling network. In the inner panel, a toy example illustrates the relationship between betweenness and the signaling network connectivity. Generalist kinases with more than 10 known substrates tend to have also high betweenness scores (Spearman's \(\rho=0.506\), \(p =9.8\times10^{-3}\)). Kinases without shortest paths going through them were excluded.

Kinase co-regulation identifies novel molecular effectors

The conditional depth of the kinase regulation atlas facilitates the search for co-regulated kinases and potential molecular effectors. Protein complexes are common signaling effectors that often display coordinated phospho-regulation with regulatory kinases. To search for kinase-complex co-regulation, we quantified the enrichment of regulated phosphosites within stable human complexes. We then correlate this enrichment with the KSEA activities across the panel of biological perturbations (Materials and Methods). Kinase-complex associations were validated if at least one subunit in the complex was a known substrate of the kinase. Overall, we found a very strong enrichment for known kinase targets among the kinase-complex associations predicted from co-regulation (Fig 5A, Table EV6). Using CDK1 as an example, we found a significant number of co-regulated complexes validated as direct substrates of CDK1 based on previous evidence, even though the actual substrate sites in the complex were not used to predict their association (Fig 5B). We have also identified examples of complex subunits functionally related with CDK1, but with no evidence yet of direct regulation. The chromatin assembly complex (CAF-1 Complex), for instance, delivers newly synthesized H3/H4 dimers to the replication fork during the DNA synthesis (S) phase, shifting to secondary functions during other stages of the cell cycle (Volk 2015). Although no specific site in the complex has been validated as CDK1 substrate, the observed co-regulation of CAF-1 and CDK1 (\(r=0.27\), \(FDR=5\times10^{-4}\)) was partially validated in vitro as CDK inhibition prevents the replication-dependent nucleosome assembly in human cell extracts (Keller 2000).

As an additional application of this approach, we tested if co-regulation can also be predictive of novel AKT1 kinase target sites. In order to validate these predictions we measured the phosphorylation changes of 15,255 phosphosites in insulin-stimulated HeLa cells in the presence or absence of the AKT inhibitor VIII (Fig 5C, Table EV7). As expected, previously known AKT targets are, on average, down-regulated in the presence of the inhibitor (Fig 5D). Additionally, the substrates of downstream related kinases, such as MTOR or GSK, are also regulated. When predicting the same number of AKT targets either as sites strongly matching the AKT sequence preference or sites showing the most significant co-regulation with AKT across conditions, the latter showed much stronger down-regulation after AKT inhibition (Fig 5D).

In order to propose a list of new bona fide AKT targets, we shortlisted those that are strongly co-regulated with the AKT activity (\(p \lt 0.01\)), match the AKT sequence preference, are down-regulated after AKT inhibition (\(\log_{2}L/H\lt-0.9\)) and were also reported as in vitro AKT substrate sites (Imamura 2014). We identified 12 AKT target phosphosites matching these stringent criteria (Fig 5E, Table EV8). S588 of TBC1D4 was previously described to be a potential AKT target sites, despite it not being present in our training set (Berwick 2002, Kane 2002). The tumor suppressor PDCD4 has also been shown to be regulated by AKT but at position S67 controlling its localization (Palamarchuk 2005). We identified S76 as an AKT target, a residue important for PDCD4 degradation (Matsuhashi 2014, Galan 2014) that is phosphorylated under EGF stimulation (Matsuhashi 2014). This suggests that AKT activity may directly control the protein levels of PDCD4. Another interesting target site is S801 of the kinesin KIF4A, a motor protein involved in the control of microtubule stability. The nearby position T799 is a target of Aurora B and the double alanine mutant T799A/S801A has impaired function (Nunes 2013). In addition, both AKT and KIF4A regulate microtubule stability during cell migration (Morris 2014, Onishi 2007). Taken together, these results suggest that AKT directly targets KIF4A S801 and that this interaction plays a role in microtubule stabilization during migration.

Figure 5 - Kinase co-regulation reveals candidate molecular effectors.
A Systematic evaluation of the kinase - complex associations based on the known direct interactions between kinases and complexes. The Positive Predictive Value (PPV) is displayed against the False Discovery Rate (FDR). The baseline random expectation - in grey - represents the PPV of a random predictor trying to estimate associations between kinases and complexes.
B Protein complexes showing correlated phospho-regulation with the activity of CDK1. The complexes marked in green contain at least one substrate of CDK1. Only the top correlated complexes are shown for the sake of clarity. Missing activities are displayed in white.
C Experimental workflow to study phosphoproteome dynamics under AKT (Akt1) inhibition in insulin-stimulated HeLa cells.
D Quantification of known kinase substrates after AKT inhibition of insulin stimulated cells for all kinases with at least 14 known sites - top left - and their respective KSEA enrichment after 10,000 permutations - bottom left. Regulation under AKT inhibition of the top 24 unknown sites (number of quantified AKT known substrates) ranked based on their motif similarity, co-regulation with the known substrates or the combination of both - top right - and their corresponding enrichment on regulated sites after inhibition - bottom right.
E List of high-confidence AKT substrates where the next criteria are required: downregulation on AKT inhibition \(\log_{2}L/H\lt-0.9\), positive co-regulation \(p\lt0.01\), motif similarity \(log-weights \gt0.8\), \(mss \gt 0.6\) and all sites reported as in vitro substrates of AKT (Imamura 2014).

#Discussion

Changes in internal or external conditions are sensed by cells which rapidly make decisions on how to mount an appropriate response. Difficulties in measuring activities and downstream consequences for a large number of kinases, have limited the comprehensive understanding of the decision-making process. Here, we have compiled the changes in the human phosphoproteome across 399 perturbations, in order to create an atlas of regulation for 215 kinases, as well as the phospho-regulatory status of hundreds of effector protein complexes.

One of the main limitations preventing the integration of MS quantitative data has been the stochasticity of the peptide discovery. In this study, we have demonstrated how the analysis of sets of phosphosites (e.g. all kinase substrates) can help overcome the lack of coverage on phosphosite quantifications. The enrichment on biologically meaningful sets of sites also increases the robustness against technical variability, mostly due to differences in experimental protocols and analysis pipelines. Several results suggest the variation of kinase activities along conditions in our compendium is driven by biological factors: related conditions have similar profiles of kinase regulation (Fig EV4C-D); pairs of related kinases are co-regulated across the conditions (Fig EV4A-B); and co-regulation between kinase activity and effectors is predictive of kinase-target interactions (Fig 5). These results strongly indicate that this compendium constitutes a resource to identify novel biological associations. The measured changes in phosphorylation were not normalized for the changes in total protein abundance for the conditions compiled for this study. In particular, for longer time-scales this could be a confounding effect that should also be mitigated by the analysis of sets of sites. The described analysis can be periodically updated as new kinase targets are characterized and new human perturbation experiments refine the map of signaling responses. As such, we will maintain a growing online application (http://phosfate.com), where the community can explore the signaling regulation of all conditions in the atlas and analyze and compare their own phosphoproteomic datasets.

The larger number of perturbations, identifies the possible roles of kinases and how they relate to each other. We have shown that generalist kinases do not tend to be more essential than specialist kinases but, instead, tend to be more central in the signaling network. One interpretation would be that generalist kinases are often regulated simply because they are located in the network where signals flow through more often. Recent phosphoproteomic studies of high temporal resolution have indicated that changes in phosphorylation occur in a time scale of seconds (Kanshin 2015, Humphrey 2015). Some fraction of the later changes in phosphorylation may be non-functional and potentially be consequence of the signaling propagating through the kinase-signaling network (Kanshin 2015). An alternative hypothesis could be that generalist kinases, central to the signaling network, can be robust to perturbation and the signal modulated by other kinases.

By comparing kinase and complex regulation in different signaling contexts, we have shown tight co-regulation between regulators and effectors that served to prioritize candidate interactions. However, regulation of substrate phosphosites or protein complexes is just one of the several potential consequences of a signaling response. Given the ease in collecting cellular phenotypes or large scale biological measurements such as gene-expression and metabolites, we propose this approach can be generalized to globally study the diversity of cellular molecular states. Additionally, as data from specific genetic backgrounds (e.g. cancer cell lines or primary tumours) becomes available, this research could help to interpret the actionable signaling consequences derived from specific sets of mutations, drawing a much more complete diagnostic of the cellular state.

#Materials and Methods

##Compilation of human quantitative phosphoproteomic data
The atlas integrates a compilation of 41 selected publications reporting human MS-derived changes in phosphopeptide abundance under 435 perturbations. In order to consider a dataset for the atlas, we required peptide sequences, phosphorylation identifications and detailed description of the biological perturbation and control for a minimum of 1000 phosphopeptides (Table EV1). From the 41 studies included, only the raw spectra derived from 16 studies was made available in public repositories. 3 additional studies were deposited in the no longer available Tranche. For the remaining 16 studies, the different labelling methodologies difficult the spectra processing with a common pipeline, the most frequent being SILAC in 10 studies. Additionally, due to the poor metadata annotation of some of the conditions, re-processing the spectra would not be possible for some of the datasets. For these reasons we relied on the original MS identifications performed by the experimental groups. We collected all quantifications from the supplementary data available in each of the 41 studies, including both significant and not significant changes. To standardize the peptide sequences and site positions across studies, we mapped all peptides to the same reference proteome using Ensembl v73 (Cunningham 2015). For data storage purposes, we mapped the peptide quantifications to all matching protein isoforms and transformed all quantifications into \(\log_{2}\) ratios. Finally, the re-mapped 2,940,379 phosphopeptide quantifications and all their metadata were stored in a MySQL database that is publicly available in the Downloads section of http://phosfate.com.

##Data pre-processing and normalization
To standardize and further compare the datasets, we applied a series of quality control criteria. (a) we restricted the analysis to peptides mapped in the Ensembl canonical transcripts, resulting in 12,427 canonical phospho-modified proteins. According to Ensembl, the canonical transcripts correspond to the longest Consensus CDS (CCDS) model in each gene (Pruitt 2009). (b) the quantified changes of peptides with the same set of modifications but different sequences were merged into one single entry, therefore increasing the conditional coverage. On those cases where more than one quantified peptide contains the same set of modifications for the same condition their ratios were averaged. (c) the conditional ratios between technical replicates were averaged, while remaining separated for biological replicates. (d) only monophosphorylated peptides were considered for all subsequent analysis requiring quantifications at single positions. (e) to prevent the accumulation of false positive sites, modified positions identified only in one single study were excluded. (f) we quantile normalized the quantifications across conditions and excluded all conditions with less than 1000 peptide quantifications. After applying all these quality control criteria, the size of the final set of conditions was reduced from 435 to 399 perturbations (Appendix Fig S1). On this reduced set of conditions, we considered for further analysis a total of 1.238.987 quantifications for 43.028 more reliable monophosphorylated peptides.

##Kinase Set Enrichment Analysis (KSEA)
To estimate the kinase regulation from the differential regulation of their known substrates, we modified the original Kinase Set Enrichment Analysis (KSEA) to incorporate the principles of the weighted Gene Set Enrichment Analysis (Casado 2013, Subramanian 2005). This modified KSEA uses the Kolmogorov-Smirnov statistical test to assess whether a predefined set of kinase substrates is statistically enriched in phosphosites that are at the two of extremes of a ranked list defined by their differential regulation. This algorithm is particularly helpful to detect changes on phosphosite regulation in the context of all site quantifications, even though the changes in site quantifications could be small. Moreover, the algorithm do not require any arbitrary threshold to define which phosphosites are significantly regulated or not. As in the GSEA algorithm, all site quantifications are considered in order to search for enrichments within the extreme fold changes. The KSEA algorithm proceeds as follows: (a) The Enrichment Score (ES) is calculated by walking down the ranked site quantifications and re-scoring a running-sum statistic. The statistic is increased by the site quantification when it encounters a substrate of the kinase and decreases when the site is not a substrate. Both increases and decreases are normalized by the total number of known and not known substrates and proportional to the observed fold change. Finally, the ES corresponds to the maximum deviation from zero - either positive or negative - encountered in the walking down. The metric is equivalent to a weighted Kolmogorov–Smirnov-like statistic. (b) The null distribution of ES scores is calculated by randomizing the sites but preserving the same distribution of site quantifications. (c) The statistical significance of the observed ES is calculated using the empirical p-values calculated from the ES null distribution based on the same number of known substrates and the same distribution of quantifications. An R package implementing the novel KSEA described above is provided at https://github.com/evocellnet/ksea.

##Predicting kinase activities using KSEA
In order to collect a comprehensive list of regulatory relationships, we merged all the interactions reported in PhosphoSitePlus (Hornbeck 2015), HPRD (Keshava 2009) and Phospho.ELM (Dinkel 2011) in June 2014. After excluding kinase auto-phosphorylations, we compiled a total of 7,815 interactions between 306 kinases and 5,617 individual sites. Using the above described regulatory information and the collected quantitative phosphoproteomic data, we applied the KSEA algorithm with a null distribution of 1,000 permutations per kinase and condition. In order to use the enrichment significance as a proxy of kinase activity, the resulting p-values were \(log_{10}\)-transformed and signed based on the average sign of all substrates. If the predominant change of all substrates is an increase in phosphorylation, the kinase is predicted as activated. If the majority of the substrates present reduced phosphorylation, the kinase is predicted as inactivated.

Kinase activity validation using kinase regulatory sites

To corroborate the kinase activity inference, KSEA activities were compared with the phosphorylation changes observed in kinase regulatory sites across conditions. From the total list of 941 human regulatory sites reported in PhosphositePlus, 150 were quantified in at least 10 perturbations. To evaluate the concordance between the KSEA activities and the quantitative changes in the regulatory phosphorylations, we performed a linear regression analysis across all available perturbations. Additionally, the results with those derived from the Spearman correlations between the kinase activities and the regulatory sites susceptible of auto-phosphorylation and other sites not reported to regulate the enzymatic activity. We discarded all kinase activity profiles under no regulation (absolute log p-value > 1 in at least 1 condition).

Kinase activity validation using RPPA data

We compared the predictions based on the collected quantitative phosphoproteomic data with the antibody-based kinase activities from a previous study (Hill 2016). We used the BT20 cell line as reference, showing the most responsive quantitative profiles after EGF receptor stimulation. We scaled the antibody-based measurements to make them comparable across conditions and antibodies. We quantile-normalized per antibody to assure equal final distributions. Next, we standardized each individual combination of cell line, inhibitor, stimulus and time point by calculating the z-score of each of the measurements based on the mean and standard deviation of the unstimulated conditions. Replicates were averaged. The final dataset contains 26 quantifications reporting changes in regulatory phosphosites in kinases. In order to use only the most reliable activity predictions, kinases with a number of known substrates smaller than 5 were excluded. To circumvent the effect of protein abundances, we restricted the analysis to the first hour after EGF stimulation. The normalized quantifications clustered together based on the sample similarities, with no apparent batch effects (Appendix Fig S1). The DREAM conditions were classified depending if they activate EGFR - EGF and NRG1 - other growth factors that eventually could have a similar downstream effects - HGF, IGF1, FGF1 and Insulin - or non-stimulating conditions - Serum and PBS.

Maintenance and treatment of human embryonic stem cells (hESC)

Human embryonic cells, H1 and H9 (WA01, WA09 from WiCell) were maintained on Matrigel (BD Biosciences) coated dishes in mTeSR™1 medium (StemCell Technologies). Differentiation of hESCs was induced by supplementing mTeSR™1 with PMA 50 nM. Differentiation time course experiments were typically 0 min, 30 min, 1 hour, 6 hours and 24 hours. Kinase inhibition experiments requiring were performed by supplementing mTeSR™1 medium with pharmacological inhibitors 1 hour before PMA treatments (Table EV8). In order to avoid inhibitor’s degradation during 24h experiment, fresh mTeSR™1 medium with a 50 nM of PMA was changed after 10 hours.

##hESC Immunofluorescence and image analysis
For each time course experiment, hESC were fixed for 10 minutes with a 4% paraformaldehyde, permeabilized for 5min with 0.3% Triton-x and incubated with a blocking solution (10% fetal bovine serum (FBS) and 3% bovine serum albumin (BSA) in PBS) for 1h. Primary antibodies (Table EV7) were incubated ON at 4 °C in antibody dilution buffer (1% bovine serum albumin (BSA). 0.1% Triton-x in PBS) at the indicated concentrations. Primary antibodies were visualized by using a secondary antibody conjugated to Alexa-488. Samples were counterstained with DAPI to facilitate analysis. Images were acquired using a high-content, widefield inverted microscope, Olympus ScanR System equipped with a sCMOS Flash 4.0 camera (Hamamatsu), universal plan semi-aprochromat 20x objective (NA 0.7) and a SpectraX LED light source. Image analysis was performed using Matlab or CellProfiler (Carpenter 2006). Briefly, a low-pass Gaussian filter was first applied to each image. The local background value of each pixel was then determined by searching for a surrounding ring area, with the outer and inner radii of the ring being 10 and 5 times the approximate nuclear radius, respectively. The lowest 5th percentile value of the ring area was used as the background intensity of the center pixel. Cell nuclei were identified using fluorescent DAPI images as masks. When needed cytoplasmic mask consisted of a ring around the nucleus. The MATLAB function regionprops was then used to label each nucleus and to retrieve the xy coordinates of all pixels in specific nuclei. The level of immunofluorescence staining in each cell was calculated as the average value of the intensities from each pixel of the specific nucleus. At least 2000 cells were used for analysis per each indicated condition.

PCA based on kinase activity profiles

To restrict the analysis to consistently estimated kinases, only those inferred in at least 75% of the perturbations were considered. Conditions displaying extreme redundancies were also excluded, reducing the matrix of kinase activities to 58 kinases and 387 conditions. For the 7.43% of the matrix containing missing values, the data was imputed using the regularized iterative PCA algorithm implemented in the imputePCA function contained in the R package missMDA. Using the resulting complete matrix Principal Components Analysis (PCA) was performed using the rda function in the R package vegan without any additional scaling. The expected (baseline) per cent variance in each PC stemming from noise in data was estimated using the stringent ‘broken stick’ method and the relaxed average eigenvalue (Kaiser–Guttman criterion) (Jackson 1993).

##Salmonella strains used for infection
Salmonella enterica Typhimurium 14028s (STm) transformed with the constitutive GPF expressing plasmid pDiGc (Helaine 2010) were cultivated in LB broth (Miller) containing 100\(\mu\)g/ml ampicillin by incubating on a rotating wheel at 37°C. HeLa cells (ATCC) were cultivated in DMEM 4.5 g/L glucose (Gibco cat. # 41965-039), pyruvate (100mM,Gibco), 10% FBS at 5% CO2 in a 37°C incubator. Stock drug solutions were dissolved in DMSO: trichostatin A (Sigma cat. T8552) and SB202190 (Sigma cat. S7067), or methanol: (±)-Verapamil hydrochloride (Sigma cat. V4629). Final drug concentrations used trichostatin A: 1.5, 1.0 and 0.5 \(\mu\)M, SB202190 15, 10 and 5 \(\mu\)M, (±)-Verapamil hydrochloride: 15, 10 and 5 \(\mu\)M. 100mg/mL stock solution of Gentamicin was dissolved in water (Sigma cat. G1914).Bacteria were prepared for HeLa cell invasion as previously described 20133586 (Helaine 2010) with the following modifications: overnight cultures of GFP expressing STm were diluted 1:33 into fresh LB broth and cultured for 3.5 hrs at 37°C prior to infection.

##HeLa cell preparation and infection
At 80% confluency, 3000 HeLa cells per well were seeded into a 384 wells clear bottom plate (Greiner cat. 781090) using a cell seeder (Thermo, Multidrop Combi) followed by an 18hr incubation overnight to allow cell attachment. Cells were then exposed to indicated drug concentrations in the presence of DMEM 1 g/L glucose + 10% FBS for 6 hrs. Prior to infection, cells were then washed 2 times with DMEM or PBS, followed by media replacement with fresh DMEM 1 g/L glucose + 10% FBS. Infection was carried out as previously described (Helaine 2010) using a liquid handler (Biomek FXP). STm was added directly to HeLa cells at an MOI of 100 in PBS. STm was then allowed to invade HeLa cells by incubating for 30 minutes at 5% CO2 in a 37°C incubator. Extracellular STm were then removed by washing 3 times with warm PBS, followed by treatment with 100 \(\mu\)g/mL gentamicin in DMEM 1 g/L glucose + 10% FBS for 1 hr. Media was then replaced with 10 \(\mu\)g/mL gentamicin in DMEM 1 g/L glucose + 10% FBS for the remainder of the experiment n.b. this step was considered t=0. At the indicated time points, cells were then washed with prewarmed PBS prior to fixation and permeabilization in 5% formaldehyde/0.2% Tritonx100 in PBS for 45 minutes. Fixing solution was then removed by washing with PBS and cells were stained using 2.5 \(\mu\)g/mL Hoechst33342 (Molecular Probes cat. H3570) and 80 ng/mL Phalloidin-Atto700 (Sigma cat. 79286) overnight at 4°C. Prior to imaging cells were washed 3 times in PBS.

##Salmonella microscopy and image analysis
384-well plates were imaged using a Molecular Devices, IXM XL microscope where 6 sites per well were imaged at 20x maginfication. Cellprofiler was used to analyze the images. Nuclear regions were determined by setting a manual intensity threshhold for the DAPI channel. Nuclei were expanded using the actin staining to determine the cellular regions. Salmonella colonies were determined by manual threshholding. Segmented cells were classified as infected or non-infected depending on the presence or the absence of a salmonella colony in a cell region. For every site imaged, the number of infected and non-infected cells was determined, along with the integrated salmonella fluorescence intensity inside infected cells. To determine the percentage of infected cells per well, the number of infected and non-infected cells from the 6 sites of every well was summed up and the ratio of infection was calculated. In addition, the mean integrated intesity of Salmonella in infected cells was determined for every site, and the average value for the 6 sites in a well was calculated to obtain the mean integrated intensity of salmonella in infected cells per well as a measure of salmonella intracellular proliferation.

Co-regulation between driver kinases and effector complexes

We first quantified the phospho-regulation of stable human complexes (from the Corum database (Ruepp 2010)) in each condition. We limited the complex redundancy by subsetting the interactions in which only one copy of the homologous protein complexes is included. For each of the 1,331 complexes and for each condition, we compared the distribution of absolute changes in phosphosite abundance in the complex against all phosphosites using the Kolmogorov-Smirnov (KS) test. The resulting p-values were log-transformed and signed based on the average fold change of all sites in the complex. We then fitted a linear regression to estimate those responses in protein complex phosphorylation that correlate with changes in kinase activity across conditions. For validation purposes and in order to avoid potential biases, the kinase substrates used to predict the kinase activities were excluded from the complex regulation estimates. The Pearson correlation p-values were corrected for multiple testing.

SILAC labelling, protein extraction and digestion

HeLa cells were passaged in DMEM (-Arg,-Lys) with penicillin-streptomycin, 10% dialyzed FBS at 37°C, 5% \(CO_{2}\), supplemented with either normal L-lysine and L-arginine (light K0, R0) or \(^{13}C_{6}\)-\(^{15}N_{2}\) lysine and \(^{13}C_{6}\)-\(^{15}N_{4}\) arginine (heavy K8, R10). Both populations of cells were deprived of serum overnight. 'Light' labeled cells were treated with 1\(\mu\)M Akt inhibitor VIII for 30 min prior to stimulation with 100 nM insulin for an additional 30 min. 'Heavy' labeled cells were stimulated with 100 nM insulin for 30 min. At the time of harvest, cells were washed three times with ice cold PBS and flash frozen over liquid nitrogen. Cells were scraped into ice-cold urea buffer (8M urea, 75 mM NaCl, 50 mM Tris HCl pH 8.2, complete protease inhibitor cocktail (Roche), 50 mM sodium fluoride, 50 mM beta-glycerophosphate, 1 mM sodium orthovanadate, 10 mM sodium pyrophosphate). Protein concentration was assayed using the BCA method and lysates from 'light' and 'heavy' cultures were mixed in a 1:1 ratio. Protein lysates were reduced with 5 mM DTT for 30 min at 55°C, alkylated with 10 mM iodoacetamide for 15 min at room temperature, and quenched with 10 mM DTT. Proteins were diluted 2-fold with 50 mM Tris pH 8.8 and digested with Lys-C (Wako) overnight at room temperature. The resulting peptides were desalted over a tC18 SepPak cartridge (Waters) and dried by lyophilization.

Strong cation exchange (SCX)/Immobilized metal affinity chromatography (IMAC)

Approximately 3 mg of peptides were resuspended in 50 mM Tris pH 8.2 and further digested with trypsin (Promega) overnight at 37°C. The resulting tryptic peptides were desalted over a tC18 SepPak cartridge (Waters) and dried by vacuum centrifugation. They were separated by strong cation exchange into 12 fractions using a volatile binary solvent system (A: 10 mM \(NH_{4}HCO_{2}\) + 25% MeCN + 0.05% FA, B: 500 mM \(NH_{4}HCO_{2}\) + 25% MeCN + 0.05% FA). Fractions were dried and desalted by vacuum centrifugation. Fractions were resuspended in 100 \(\mu\)l IMAC loading solution (80% MeCN+ 0.1% TFA). To prepare IMAC slurry, Ni-NTA magnetic agarose (Qiagen) was stripped with 40 mM EDTA for 30 min, reloaded with 10 mM \(FeCl_{3}\) for 30 min, washed 3 times and resuspended in IMAC loading solution. To enrich phosphopeptides, 50 \(\mu\)l of 5% bead slurry was added to each fraction and incubated with rotation for 30 min at room temperature, washed 3 times with 150 \(\mu\)l 80% MeCN, 0.1% TFA, and eluted with 60 \(\mu\)l 1:1 MeCN:1% \(NH_{4}OH\). The eluates were acidified with 10% FA and dried by vacuum centrifugation for LC-MS/MS.

LC-MS/MS

Phosphopeptide-enriched samples were resuspended in 4% formic acid, 3% MeCN and subjected to liquid chromatography on an EASY-nLC II system equipped with a 100 \(\mu\)m inner diameter x 40 cm column packed in-house with Reprosil C18 1.9 \(\mu\)m particles (Dr. Maisch GmbH) and column oven set to 50°C. Separations were performed using gradients of 9% to 32% MeCN in 0.125% formic acid ranging in length from 55-105 min and were coupled directly with a LTQ-Orbitrap Velos mass spectrometer (Thermo Fisher) configured to conduct a full MS scan (60k resolution, 3e6 AGC target, 500 ms maximum injection time, 300 to 1500 m/z) followed by up to 20 data-dependent MS/MS acquisitions on the top 20 most intense precursor ions (3e3 AGC target, 100 ms maximum injection time, 35% normalized collision energy, 40 sec dynamic exclusion).

LC-MS/MS data processing

Raw data files were converted to mzXML and searched using Comet version 2015.01 against the human Swissprot database including reviewed isoforms (April 2015; 42,121 entries) allowing for binary (all or none) labeling of lysine (+8.0142) and arginine (+10.0083), and variable oxidation of methionine, protein N-terminal acetylation, and phosphorylation of serine, threonine, and tyrosine residues. Carbamidomethylation of cysteines was set as a fixed modification. Trypsin (KR|P) fully digested was selected allowing for up to 2 missed cleavages. Precursor mass tolerance was set to 50 ppm, and fragment ion tolerance to 1.0005 Daltons. Search results were filtered using Percolator to reach a 1% false discovery rate at the PSM level. Peak area Heavy/Light ratios were calculated using an in-house quantification algorithm. Phosphosite assignment was performed using an in-house implementation of Ascore, and sites with Ascore \(\ge\) 13 were considered localized (\(p=0.05\)). Phosphopeptides in the database with multiple non-localized instances spanning the same sequence were only considered to correspond to the minimum number of phosphosites that explain the data. Finally, the dataset was additionally filtered to reach a site-adjusted false discovery rate of 1%.

##Co-regulation between Akt and potential new substrates
For all human phosphorylated residues compiled in the atlas, a series of evidences were generated in order to weight their potential role as substrates of AKT. Firstly, all sites in the atlas were matched against the position weight matrixes (PWM) generated from all the known substrates of AKT. To weight the motif - PWM similarity, two algorithms were compared with similar results: the sum of the log weights of the matched residues and the MSS score provided by MATCH, which incorporates the information content of the PWM (Kel 2003, Wasserman 2004). Secondly, the aforementioned inference of kinase activities based on the known kinase substrates was used to find sites co-regulated with the known substrates of AKT. The response of all sites in the atlas across the panel of conditions were correlated with the estimated activity of AKT in the same conditions. We also included as an additional validation the set of 1,778 AKT in vitro substrates described by Imamura et al. (Imamura 2014). Finally, as a fourth independent evidence we included the differential regulation of each of the candidate AKT sites under inhibition by AKT inhibitor VIII in insulin-stimulated HeLa cells described above.

#Acknowledgements
We thank all the groups involved in generating the phosphoproteomics data for their invaluable public resource. We thank Jeff Johnson, Brandon Invergo and Inigo Martincorena for critical reading of the manuscript, Omar Wagih and Francesco Ioirio for their help on implementing some of the methods and Julio Saez-Rodríguez, John Marioni and the rest of the group at the EMBL-EBI for their insightful comments. PB acknowledges support from an HFSP CDA award (CDA00069/2013) and ERC Starting Grant (ERC-2014-STG 638884 – PhosFunc). JV acknowledges support from a Howard Temin Pathway to Independence Award in Cancer Research (NIH K99/R00 CA140789) and an Ellison Medical Foundation New Scholar Award (AG-NS-0953-12).MJ and SDMS are supported by the Medical Research Council (MRC) (MCA652-5PZ600).

#Author contributions

D.O. J.V. S.S. A.T. and P.B. designed experiments. R.L., M.J. S.S. J.S. B.D. performed the experiments. R.L. data-processed the MS samples. D.O. performed the data analysis. D.O. and P.B. wrote the manuscript. P.B. oversaw the work. All authors read and approved the final manuscript.

#Conflict of interest
None declared.

References

  1. P Beli, N Lukashchuk, SA Wagner, BT Weinert, JV Olsen, L Baskcomb, M Mann, SP Jackson, C Choudhary. Proteomic investigations reveal a role for RNA processing factor THRAP3 in the DNA damage response.. Mol Cell 46, 212-25 (2012).

  2. P Beltrao, V Albanèse, LR Kenner, DL Swaney, A Burlingame, J Villén, WA Lim, JS Fraser, J Frydman, NJ Krogan. Systematic functional prioritization of protein posttranslational modifications.. Cell 150, 413-25 (2012).

  3. SC Bendall, EF Simonds, P Qiu, el-AD Amir, PO Krutzik, R Finck, RV Bruggner, R Melamed, A Trejo, OI Ornatsky, RS Balderas, SK Plevritis, K Sachs, D Pe’er, SD Tanner, GP Nolan. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum.. Science 332, 687-96 (2011).

  4. DC Berwick, I Hers, KJ Heesom, SK Moule, JM Tavare. The identification of ATP-citrate lyase as a protein kinase B (Akt) substrate in primary adipocytes.. J Biol Chem 277, 33895-900 (2002).

  5. AE Carpenter, TR Jones, MR Lamprecht, C Clarke, IH Kang, O Friman, DA Guertin, JH Chang, RA Lindquist, J Moffat, P Golland, DM Sabatini. CellProfiler: image analysis software for identifying and quantifying cell phenotypes.. Genome Biol 7, R100 (2006).

  6. P Casado, JC Rodriguez-Prados, SC Cosulich, S Guichard, B Vanhaesebroeck, S Joel, PR Cutillas. Kinase-substrate enrichment analysis provides insights into the heterogeneity of signaling pathway activation in leukemia cells.. Sci Signal 6, rs6 (2013).

  7. F Cunningham, MR Amode, D Barrell, K Beal, K Billis, S Brent, D Carvalho-Silva, P Clapham, G Coates, S Fitzgerald, L Gil, CG Girón, L Gordon, T Hourlier, SE Hunt, SH Janacek, N Johnson, T Juettemann, AK Kähäri, S Keenan, FJ Martin, T Maurel, W McLaren, DN Murphy, R Nag, B Overduin, A Parker, M Patricio, E Perry, M Pignatelli, HS Riat, D Sheppard, K Taylor, A Thormann, A Vullo, SP Wilder, A Zadissa, BL Aken, E Birney, J Harrow, R Kinsella, M Muffato, M Ruffier, SM Searle, G Spudich, SJ Trevanion, A Yates, DR Zerbino, P Flicek. Ensembl 2015.. Nucleic Acids Res 43, D662-9 (2015).

  8. N Dephoure, C Zhou, J Villén, SA Beausoleil, CE Bakalarski, SJ Elledge, SP Gygi. A quantitative atlas of mitotic phosphorylation.. Proc Natl Acad Sci U S A 105, 10762-7 (2008).

  9. H Dinkel, C Chica, A Via, CM Gould, LJ Jensen, TJ Gibson, F Diella. Phospho.ELM: a database of phosphorylation sites–update 2011.. Nucleic Acids Res 39, D261-7 (2011).

  10. X Feng, J Zhang, K Smuga-Otto, S Tian, J Yu, R Stewart, JA Thomson. Protein kinase C mediated extraembryonic endoderm differentiation of human embryonic stem cells.. Stem Cells 30, 461-70 (2012).

  11. JA Galan, KM Geraghty, G Lavoie, E Kanshin, J Tcherkezian, V Calabrese, GR Jeschke, BE Turk, BA Ballif, J Blenis, P Thibault, PP Roux. Phosphoproteomic analysis identifies the tumor suppressor PDCD4 as a RSK substrate negatively regulated by 14-3-3.. Proc Natl Acad Sci U S A 111, E2918-27 (2014).

  12. FS Garmaroudi, D Marchant, X Si, A Khalili, A Bashashati, BW Wong, A Tabet, RT Ng, K Murphy, H Luo, KA Janes, BM McManus. Pairwise network mechanisms in the host signaling response to coxsackievirus B3 infection.. Proc Natl Acad Sci U S A 107, 17053-8 (2010).

  13. TJ Gibson. Cell regulation: determined to signal discrete cooperation.. Trends Biochem Sci 34, 471-82 (2009).

  14. S Helaine, JA Thompson, KG Watson, M Liu, C Boyle, DW Holden. Dynamics of intracellular bacterial replication at the single cell level.. Proc Natl Acad Sci U S A 107, 3746-51 (2010).

  15. SM Hill, LM Heiser, T Cokelaer, M Unger, NK Nesser, DE Carlin, Y Zhang, A Sokolov, EO Paull, CK Wong, K Graim, A Bivol, H Wang, F Zhu, B Afsari, LV Danilova, AV Favorov, WS Lee, D Taylor, CW Hu, BL Long, DP Noren, AJ Bisberg, GB Mills, JW Gray, M Kellen, T Norman, S Friend, AA Qutub, EJ Fertig, Y Guan, M Song, JM Stuart, PT Spellman, H Koeppl, G Stolovitzky, J Saez-Rodriguez, S Mukherjee. Inferring causal molecular networks: empirical assessment through a community-based effort.. Nat Methods 13, 310-8 (2016).

  16. PV Hornbeck, B Zhang, B Murray, JM Kornhauser, V Latham, E Skrzypek. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations.. Nucleic Acids Res 43, D512-20 (2015).

  17. PP Hsu, SA Kang, J Rameseder, Y Zhang, KA Ottina, D Lim, TR Peterson, Y Choi, NS Gray, MB Yaffe, JA Marto, DM Sabatini. The mTOR-regulated phosphoproteome reveals a mechanism of mTORC1-mediated inhibition of growth factor signaling.. Science 332, 1317-22 (2011).

  18. SJ Humphrey, SB Azimifar, M Mann. High-throughput phosphoproteomics reveals in vivo insulin signaling dynamics.. Nat Biotechnol 33, 990-5 (2015).

  19. H Imamura, N Sugiyama, M Wakabayashi, Y Ishihama. Large-scale identification of phosphorylation sites for profiling protein kinase selectivity.. J Proteome Res 13, 3410-9 (2014).

  20. Donald A. Jackson. Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches. Ecology 74, 2204–2214 Wiley-Blackwell, 1993. Link

  21. JD Jordan, EM Landau, R Iyengar. Signaling networks: the origins of cellular multitasking.. Cell 103, 193-200 (2000).

  22. S Kane, H Sano, SC Liu, JM Asara, WS Lane, CC Garner, GE Lienhard. A method to identify serine kinase substrates. Akt phosphorylates a novel adipocyte protein with a Rab GTPase-activating protein (GAP) domain.. J Biol Chem 277, 22115-8 (2002).

  23. E Kanshin, LP Bergeron-Sandoval, SS Isik, P Thibault, SW Michnick. A cell-signaling network temporally resolves specific versus promiscuous phosphorylation.. Cell Rep 10, 1202-14 (2015).

  24. AE Kel, E Gössling, I Reuter, E Cheremushkin, OV Kel-Margoulis, E Wingender. MATCH: A tool for searching transcription factor binding sites in DNA sequences.. Nucleic Acids Res 31, 3576-9 (2003).

  25. C Keller, T Krude. Requirement of Cyclin/Cdk2 and protein phosphatase 1 activity for chromatin assembly factor 1-dependent chromatin assembly during DNA synthesis.. J Biol Chem 275, 35512-21 (2000).

  26. Prasad TS Keshava, R Goel, K Kandasamy, S Keerthikumar, S Kumar, S Mathivanan, D Telikicherla, R Raju, B Shafreen, A Venugopal, L Balakrishnan, A Marimuthu, S Banerjee, DS Somanathan, A Sebastian, S Rani, S Ray, Kishore CJ Harrys, S Kanth, M Ahmed, MK Kashyap, R Mohmood, YL Ramachandra, V Krishna, BA Rahiman, S Mohan, P Ranganathan, S Ramabadran, R Chaerkady, A Pandey. Human Protein Reference Database–2009 update.. Nucleic Acids Res 37, D767-72 (2009).

  27. AN Kettenbach, DK Schweppe, BK Faherty, D Pechenick, AA Pletnev, SA Gerber. Quantitative phosphoproteomics identifies substrates and functional modules of Aurora and Polo-like kinase activities in mitotic cells.. Sci Signal 4, rs5 (2011).

  28. HD Kim, AS Meyer, JP Wagner, SK Alford, A Wells, FB Gertler, DA Lauffenburger. Signaling network state predicts twist-mediated effects on breast cell migration across diverse growth factor contexts.. Mol Cell Proteomics 10, M111.008433 (2011).

  29. M Kinehara, S Kawamura, D Tateyama, M Suga, H Matsumura, S Mimura, N Hirayama, M Hirata, K Uchio-Yamada, A Kohara, K Yanagihara, MK Furue. Protein kinase C regulates human pluripotent stem cell self-renewal.. PLoS One 8, e54122 (2013).

  30. M Kinehara, S Kawamura, S Mimura, M Suga, A Hamada, M Wakabayashi, H Nikawa, MK Furue. Protein kinase C-induced early growth response protein-1 binding to SNAIL promoter in epithelial-mesenchymal transition of human embryonic stem cells.. Stem Cells Dev 23, 2180-9 (2014).

  31. S Matsuhashi, H Hamajima, J Xia, H Zhang, T Mizuta, K Anzai, I Ozaki. Control of a tumor suppressor PDCD4: Degradation mechanisms of the protein in hepatocellular carcinoma cells.. Cell Signal 26, 603-10 (2014).

  32. P Mertins, ND Udeshi, KR Clauser, DR Mani, J Patel, SE Ong, JD Jaffe, SA Carr. iTRAQ labeling is superior to mTRAQ for quantitative global proteomics and phosphoproteomics.. Mol Cell Proteomics 11, M111.014423 (2012).

  33. EJ Morris, GP Nader, N Ramalingam, F Bartolini, GG Gundersen. Kif4 interacts with EB1 and stabilizes microtubules downstream of Rho-mDia in migrating fibroblasts.. PLoS One 9, e91568 (2014).

  34. M Niepel, M Hafner, EA Pace, M Chung, DH Chai, L Zhou, B Schoeberl, PK Sorger. Profiles of Basal and stimulated receptor signaling networks predict drug response in breast cancer lines.. Sci Signal 6, ra84 (2013).

  35. H Niwa, J Miyazaki, AG Smith. Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells.. Nat Genet 24, 372-6 (2000).

  36. Bastos R Nunes, SR Gandhi, RD Baron, U Gruneberg, EA Nigg, FA Barr. Aurora B suppresses microtubule dynamics and limits central spindle size by locally activating KIF4A.. J Cell Biol 202, 605-21 (2013).

  37. JV Olsen, M Mann. Status of large-scale analysis of post-translational modifications by mass spectrometry.. Mol Cell Proteomics 12, 3444-52 (2013).

  38. JV Olsen, M Vermeulen, A Santamaria, C Kumar, ML Miller, LJ Jensen, F Gnad, J Cox, TS Jensen, EA Nigg, S Brunak, M Mann. Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis.. Sci Signal 3, ra3 (2010).

  39. JV Olsen, B Blagoev, F Gnad, B Macek, C Kumar, P Mortensen, M Mann. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.. Cell 127, 635-48 (2006).

  40. K Onishi, M Higuchi, T Asakura, N Masuyama, Y Gotoh. The PI3K-Akt pathway promotes microtubule stabilization in migrating fibroblasts.. Genes Cells 12, 535-46 (2007).

  41. FS Oppermann, M Klammer, C Bobe, J Cox, C Schaab, A Tebbe, H Daub. Comparison of SILAC and mTRAQ quantification for phosphoproteomics on a quadrupole orbitrap mass spectrometer.. J Proteome Res 12, 4089-100 (2013).

  42. A Palamarchuk, A Efanov, V Maximov, RI Aqeilan, CM Croce, Y Pekarsky. Akt phosphorylates and regulates Pdcd4 tumor suppressor protein.. Cancer Res 65, 11282-6 (2005).

  43. T Pawson. Specificity in signal transduction: from phosphotyrosine-SH2 domain interactions to complex cellular systems.. Cell 116, 191-203 (2004).

  44. KD Pruitt, J Harrow, RA Harte, C Wallin, M Diekhans, DR Maglott, S Searle, CM Farrell, JE Loveland, BJ Ruef, E Hart, MM Suner, MJ Landrum, B Aken, S Ayling, R Baertsch, J Fernandez-Banet, JL Cherry, V Curwen, M Dicuccio, M Kellis, J Lee, MF Lin, M Schuster, A Shkeda, C Amid, G Brown, O Dukhanina, A Frankish, J Hart, BL Maidak, J Mudge, MR Murphy, T Murphy, J Rajan, B Rajput, LD Riddick, C Snow, C Steward, D Webb, JA Weber, L Wilming, W Wu, E Birney, D Haussler, T Hubbard, J Ostell, R Durbin, D Lipman. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.. Genome Res 19, 1316-23 (2009).

  45. KT Rigbolt, TA Prokhorova, V Akimov, J Henningsen, PT Johansen, I Kratchmarova, M Kassem, M Mann, JV Olsen, B Blagoev. System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation.. Sci Signal 4, rs3 (2011).

  46. A Ruepp, B Waegele, M Lechner, B Brauner, I Dunger-Kaltenbach, G Fobo, G Frishman, C Montrone, HW Mewes. CORUM: the comprehensive resource of mammalian protein complexes–2009.. Nucleic Acids Res 38, D497-501 (2010).

  47. A Subramanian, P Tamayo, VK Mootha, S Mukherjee, BL Ebert, MA Gillette, A Paulovich, SL Pomeroy, TR Golub, ES Lander, JP Mesirov. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.. Proc Natl Acad Sci U S A 102, 15545-50 (2005).

  48. CD Terfve, EH Wilkes, P Casado, PR Cutillas, J Saez-Rodriguez. Large-scale models of signal propagation in human cells derived from discovery phosphoproteomic data.. Nat Commun 6, 8033 (2015).

  49. A Volk, JD Crispino. The role of the chromatin assembly complex (CAF-1) and its p60 subunit (CHAF1b) in homeostasis and disease.. Biochim Biophys Acta 1849, 979-86 (2015).

  50. T Wang, K Birsoy, NW Hughes, KM Krupczak, Y Post, JJ Wei, ES Lander, DM Sabatini. Identification and characterization of essential genes in the human genome.. Science 350, 1096-101 (2015).

  51. WW Wasserman, A Sandelin. Applied bioinformatics for the identification of regulatory elements.. Nat Rev Genet 5, 276-87 (2004).

  52. C Weber, TB Schreiber, H Daub. Dual phosphoproteomics and chemical proteomics analysis of erlotinib and gefitinib interference in acute myeloid leukemia cells.. J Proteomics 75, 1343-56 (2012).

  53. EH Wilkes, C Terfve, JG Gribben, J Saez-Rodriguez, PR Cutillas. Empirical inference of circuitry and plasticity in a kinase signaling network.. Proc Natl Acad Sci U S A 112, 7719-24 (2015).

  54. EL de Graaf, P Giansanti, AF Altelaar, AJ Heck. Single-step enrichment by Ti4+-IMAC and label-free quantitation enables in-depth monitoring of phosphorylation dynamics with high reproducibility and temporal resolution.. Mol Cell Proteomics 13, 2426-34 (2014).

[Someone else is editing this]

You are editing this file