2 Cancer Hospital of Gansu Province, Lanzhou, P.R. China

Abstract

Immune-related genes (IRGs) affect the composition and abundance of tumor-infiltrating immune cells (TIICs) by encoding immune molecules. The utilization of IRGs and TIICs offers considerable potential for immunotherapy studies of breast cancer. In this study, the differentially expressed genes (DEGs) between breast cancer patients and healthy individuals were assessed using the Cancer Genome Atlas (TCGA). ImmPort and TISIDB databases, the Weighted Gene Co-Expression Network Analysis (WGCNA), univariate Cox regression and LASSO penalized Cox regression methods were used to calculate the composition and abundance of immune cell types. It was found that CD79A and GZMAcould independently predict the prognosis of breast cancer and were significantly associated with activated CD8 T cells, immature B cells, folic helper T cells and regulatory T cells, implying that these two genes are involved in the immune regulation and progression of breast cancer and could potentially be new targets for breast cancer immunotherapy.
Keywords:breast cancer, immune-related genes, immune infiltration, prognosis

Introduction

Breast cancer is a public health issue that seriously jeopardizes social development and human survival. By 2020, breast cancer has surpassed lung cancer as the most common type of cancer in women worldwide, accounting for 25% of new cancer cases and 6% of cancer-related deaths in females. Its incidence has increased at a rate of approximately 0.5% per year from 1975 to 2017. Compared to 2012, when breast cancer accounted for 15% of female cancer deaths, the global mortality rate from breast cancer in women is gradually increasing. In clinical, breast cancer is mainly staged by estrogen (ER+), progesterone (PR+), and human epidermal growth factor receptor 2 (HER2) status, which can be further classified as luminal A and B, HER2-positive, and triple-negative breast cancer (TNBC). Surgery, radiation therapy, chemotherapy, and targeted therapy are currently the most widely used treatments for breast cancer, depending on the stage of the disease, while these treatments are greatly hampered by the bioinvasive nature of TNBC, poor prognosis, and lack of therapeutic targets.Nowadays, immunotherapy, which has been considered as the fifth treatment modality for cancer, is effective in different breast cancer staging. However, immunotherapy does not benefit all breast cancer patients due to the existence of ”cold” tumors with low immune cell infiltration and an immunosuppressive tumor microenvironment (TME), so it is necessary to find ways to improve the effectiveness of immunotherapy.
In immunotherapy, the composition and abundance of tumor-infiltrating immune cells (TIICs),which comprise the tumor microenvironment (TME) and contain various types of immune cells such as T cells, macrophages, and neutrophils, play an important role. There is growing evidence that tumor-infiltrating lymphocytes (TILs) are associated with a good prognosis in breast cancer. TILs are a strong prognostic factor for TNBC and are considered as Class I evidence. And the level of TILs can identify breast cancer patients more suitable for immunotherapy. TILs reduce the risk of death in breast cancer patients with both ER-negative and ER-positive HER2-positive subtypes, and CD8+ T cells are associated with a significantly lower relative risk of death in this subtype. Additionally, CCR9+ T cells have also shown great potential in mediating anti-tumor immunity. Therefore, the identification of TIICs can be used in anti-tumor immunity and in the prediction of prognosis of breast cancer. Immune-related genes (IRG) regulate immune responses by encoding cytokines, tumor necrosis factors, chemokines, T-cell antigen receptor (TCR) signaling pathway, and B-cell antigen receptor (BCR) signaling pathway. Thus, the IRG has a large influence on the composition and abundance of TIICs, and its characteristics are very important. For example, CCR4 mediates the migration of regulatory T cells (Treg) into tumors, and blockade of CCR4 reduces the amount of Treg and enhances anti-tumor immune activity. Several surface molecules of TIICs can also be used as targets for the treatment of tumors, such as GITR.
In this study, 1,072 breast cancer patient samples and 99 normal samples were obtained from the Cancer Genome Atlas(TCGA) database to screen the differentially expressed genes (DEGs). The tumor microenvironment was evaluated using the ESTIMATE algorithm to obtain the stromal score and immune score of the tumor. To further mine the core regulatory genes, candidate hub genes associated with patients’ immunity were screened according to the immune score using the Weighted Gene Co-Expression Network Analysis (WGCNA). Then Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to explore their possible molecular functions, biological processes, cellular components of gene enrichment, and the signaling pathways involved. Risk scores were calculated using LASSO penalized Cox regression, and risk classes were classified according to medians, and nomogram was made to predict the probability of individual time of occurrence. For the screened IRG, K-M survival curves were plotted. The correlation between risk class and immune cell infiltration, which may play a key role in the prognosis of breast cancer, was analyzed.

Method

Data source

UCSC Xena (http://xena.ucsc.edu/) is a platform that contains a large amount of multiple cancer genomics data from TCGA. Individual breast cancer transcriptome RNA sequencing (RNA-seq) data (Htseq-FPKM and Htseq-Counts) and corresponding clinical data were obtained from UCSC Xena, including 1,072 breast cancer samples and 99 normal samples. The data was last updated on July 18, 2019.

Differential expression analysis

RNA transcriptome sequencing data were processed and normalized. The DESeq2 package provides methods for testing the expression of variability by using a negative binomial generalized linear model, and the estimates of dispersion and logarithmic multiplicative variation incorporate data-driven prior distributions. The R4.2.0 DESeq2 package version 1.36.0 was used to analyze DEGs with false discovery rate (FDR) < 0.05 and |log2 fold change (FC)| > 1 selected as the thresholds for screening the DEGs. The R4.2.0 ggpubr package version 0.4.0, ggthemes package version 4.2.4, pheatmap package version 1.0.12 were used to make a Volcano Plot of DEGs.

Immune score analysis

The ESTIMATE algorithm was used to estimate the TME of BRCA patients in TCGA samples, as well as the stromal score (stromal content), immune score (degree of immune cell infiltration), ESTIMATE score (combined marker of stroma and immunity), and tumor purity. The R4.2.0 estimate package version 1.0.13 was used to perform the ESTIMATE algorithm.

WGCNA

The R4.2.0 WGCNA package version 1.71 was used to perform WGCNA. WGCNA is used to perform complex data analysis of multiple samples. It identifies gene collections (modules) with similar expression patterns by calculating inter-gene expression relationships, resolves associations between gene collections and phenotypes, maps regulatory networks among genes in the gene collections, and identifies key regulatory genes. DEGs were selected for WGCNA analysis, and correlations between modules and models were analyzed using stromal score, immune score, ESTIMATE score, and tumor purity as phenotypes. Gene significance (GS) and module significance (MS) were calculated to measure the significance of genetic and clinical information.

Functional and Pathway Enrichment Analyses

GO analysis and KEGGpathway analysis enrichment results were obtained from the R4.2.0 clusterProfiler package version 4.4.2. The GO analysis mainly describes the biological process (BP), cellular component (CC), and the possible molecular function (MF) exercised by the gene involved. KEGG correlates gene catalogs derived from genomes that have been fully sequenced with higher-level of cellular, species, and ecosystem-level system functions. A P -value <0.05 was set as the cut-off standard.

Univariate Cox regression and LASSO penalized Cox regression

A univariate Cox regression was performed to analyze each gene individually, and the univariate was included in the LASSO-penalized Cox regression when it was significant (p<0.05) for analysis in univariate Cox regression. LASSO penalized Cox regression was performed by using the R4.2.0 glmnet package version 4.1-4 to build the prognostic model to avoid overfitting. The risk score formula is as follows:\(\text{Risk\ score}=\sum_{i}^{n}X_{i}\times Y_{i}(X:\ coefficients,\ Y:\ gene\ expression\ level)\). By comparing the median risk scores of low- and high-risk breast cancer patients, all patients were categorized as low- or high-risk.

Survival Analysis

To verify the potential of pivotal genes as diagnostic markers and therapeutic targets for tumors, the R4.2.0 survminer package version 0.4.9 was used to perform Kaplan–Meier survival curves.

Nomogram

Nomogram predicts the probability of occurrence time for an individual by using external data independent of how that individual scored on each metric in the model. The metrics include age, pathological stage, T-stage, N-stage, gene expression level, and risk level. The R4.2.0 rms package version 6.3-0 was used to plot the nomogram.

Immune infiltration analysis

CIBERSORT is a method for calculating cell composition based on expression profiles. This deconvolution algorithm is used to calculate the proportion of 22 immune cells in each breast cancer patient. The sum of the 22 immune cell type scores in each sample is 1. Based on CIBERSORT, differences in the proportion of immune cells in breast cancer patients with different gene expressions were analyzed. Based on the gene expression levels in 28 published immune cell genomes, the R4.2.0 GSVA package version 1.44.2 was used to calculate the degree of infiltration of 28 immune cell species.

Statistical Analysis

All statistical analyses were performed by R (4.2.0) software. Survival curves were constructed using the Kaplan-Meier method (log-rank test). Nomogram analysis was performed using the Wilcoxon rank-sum test. All hypothesis tests were two-sided, and a P -value <0.05 were considered significant.

Result

DEG screening

Based on the thresholds for DEGs (FDR < 0.05 and |log2FC| > 1), a total of 5078 DEGs were screened, of which 2139 were significantly down-regulated and 2939 were significantly up-regulated. The results were visualized using a volcano plot in Figure 1.

WGCNA

WGCNA was used to find which genes in the DEGs of breast cancer patients were most associated with immune scores (Figure 2). The green module was chosen because it correlated with the immune score (R = 0.95, P< 0.05). According to the absolute value of GS > 0.20 and the absolute value of MM > 0.80, 52 genes with the highest connectivity in the green module were identified as the pending candidate module genes.

Function analysis

To determine the biological function of 408 green module genes, we performed a GO enrichment analysis (Figure 3A) and a KEGG pathway enrichment analysis (Figure 3B). In GO enrichment analysis, the BP group genes were focused on cytokine-mediated signaling pathway; the CC group genes were mainly related to the external side of the plasma membrane; the MF group genes were mainly enriched in cytokine activity, immune receptor activity, cytokine receptor binding, and receptor ligand activity. The results of the KEGG pathway enrichment analysis showed that the 408 candidate module genes were mainly enriched in Cytokine-cytokine receptor interaction, Viral protein interaction with cytokine and cytokine receptor, Chemokine signaling pathway, Hematopoietic cell lineage, JAK-STAT signaling pathway.

Construction and Validation of a Prognostic Signature 

IRGs were obtained from two websites, ImmPort and TISIDB, and 1931 IRGs were obtained after removing duplicate values. 29 IRGs were obtained by making a Venn diagram with 52 candidate modular genes (Figure 4A). To further obtain survival-related genes, univariate Cox regression analysis was performed on the 29 IRGs in the previous step for preliminary screening. Based on the threshold of P -value < 0.05, we identified 28 genes that were significantly associated with overall survival. The expression levels of these 28 genes were then entered into the LASSO-Cox regression model for feature selection (Figure 4B, C). Under the penalty condition (α=1), two gene scores with non-zero coefficients were selected to enter the risk score model (Figure 4D):\(Risk\ score=\ \left(-0.111770369395437\times\ GZMA\ espression\right)+\left(-0.0505760467639284\times\ CD79A\ espression\right)\). Patients with breast cancer could be classified into high-risk and low-risk groups by the median of the risk score. According to the survival analysis at different GZMA and CD79A expression levels and at different risk scores (Figure 4E, F and G), the overall survival (OS) for patients with a high-risk score was significantly shorter than those with low-risk score. The area under the ROC curve for 1-, 3-, and 5-year OS was 0.619, 0.630 and 0.578 respectively (Figure 4H). Considering that the constructed risk model was significantly associated with poor prognosis, a nomogram plot was established to explore whether the expression levels of GZMA and CD79A and the risk class could be used as independent factors to predict prognosis (Figure 4I).

Immune correlation analysis of the risk model

The proportion of immune cells in different risk classes was analyzed to explore the association between immune status and risk class in breast cancer patients in the TCGA database. As shown in Figure 5A, patients in the low-risk group had a higher percentage of memory B cells, macrophages, and Tregs compared to patients in the high-risk group. Similarly, the expression levels of activated CD4 T cells, activated CD8 T cells, and natural killer T cells were significantly higher in the low-risk group compared with those of the high-risk group, suggesting that T-cell regulation differs between two groups (Figure 5B). In addition, the correlation between gene and immune cell expression levels was further explored, and GZMA and CD79A correlated with the expression levels of most immune cells (Figure 5C).

Discussion

Recent years have seen a gradual advancement in the use of immunotherapy to treat tumors, and TICLs is particularly significant in the context of the immunological environment in cancer. Immune cells have a dual role in the development of tumors, contributing to both anti-tumor immune responses as well as possible promotion of tumor growth. By infiltrating tumor tissue and eliminating tumor cells, T cells play the most important role as the immunotherapy target, and high T cell infiltration can further increase the effect of immunotherapy. One of the mechanisms of cancer-induced immune tolerance is based on the expansion of myeloid-derived suppressor cells (MDSC), where myeloid cells infiltrating tumor sites to induce resistance of tumor cells to cytotoxic T lymphocytes (CTLs) by modifying pMHC complexes on tumor cells. CD4+25+Foxp3+ Tregs control T cells or antigen-presenting cells (APCs) in a cell-to-cell contact-dependent way by secreting suppressive cytokines, such as IL-10 and transforming growth factors. These cytokines have broad immunosuppressive effects. Specific subpopulations of tumor-associated macrophages may have different roles in cancer progression and antitumor immunity, and tumor-associated FOLR2+ macrophages are potent APCs with the function of triggering CD8+ T cell activation. B cells have a complex composition and can use both the secretion of cytokines and other substances to create an environment that is immunosuppressive for the tumor and even further to enhance the anti-tumor immune effect through perforin. The question of how to use different TICLs-associated breast cancer pathologies and subtypes of TICLs to enhance antitumor immunity has not been addressed. Therefore, it is urgent to identify the subsets of TICLs and find immunotherapies based on various TICL subsets.
Blockade of CTLA-4 or PD-1/PD-L1, the approved immune checkpoint inhibitors working in the immunotherapy direction, can increase CD8+ T cells’ anti-tumor response. Moreover, with rapid development, abundant immunotherapeutic targets such as TIM-3, LAG3, 4-1BB, CD28, ICOS, OX-40, and VISTA are emerged. However, because the majority of patients are resistant to specific medications, it is necessary to identify new targets using TICLs, which are crucial for the immune response and open up new possibilities for immunotherapy. In TICLs, different infiltration patterns or differentiation of B cells in different directions induced by the tumor microenvironment can affect their final overall effect. As a highly accurate marker of B cells in immunohistochemistry, CD79A is necessary for the production and operation of B cell antigen receptors. It has been studied more often in hematologic tumors but has also recently been reported in studies related to breast cancer. In immune-enriched, completely pathologically sensitive basal-like triple-negative breast cancer, CD79A is extensively expressed. Another study has suggested differential expression of CD79A is independent of the subtype of breast cancer and has independent prognostic relevance, exhibiting differences in overall survival among 1,998 patients. Given that CD79A on MDSCs plays a crucial functional role in maintaining the immature immunosuppressive phenotype of MDSCs and stimulating the release of tumor-promoting cytokines, as well as the observation that tumor-induced MDSCs express CD79A, it is promising that CD79A may represent a novel target for cancer therapy. GZMA from cytotoxic lymphocytes induces pyroptosis by lysing and activating GSDMB, which can mediate immunotherapy of relevant cancers. Similar to CD79A, GZMA can be used as an independent predictor of survival in breast cancer patients and can be a robust biomarker. Our study showed that the proportion and abundance of immune cells in the low-risk group with high expression of both genes were higher in our risk model. The low-risk group, in particular, possessed some immune advantages in T cells and their subpopulations.
In this study, two IRGs affecting breast cancer prognosis were screened and identified by ESTIMATE algorithm, WGCNA expression analysis, CIBERSORT, and LASSO penalized Cox regression analysis, and based on the median risk score, breast cancer patients could be classified into high- and low-risk groups. This study provides reference value for classifying breast cancer from the perspective of prognostic risk. To examine the clinical application value of these two IRGs, we developed a prognostic model with age, stage, T-stage, N-stage, the expression level of the two IRGs, and risk grade as independent prognostic factors. As the results of previous study, high expression levels of CD79A and GZMA predicts a good prognosis.
However, this study was only analyzed from a bioinformatics perspective, lacking evidence of detection in clinical samples and research on related mechanisms, further experimental proof of specific regulatory mechanisms is needed.

Data Availability Statement

Publicly available data sets were used in this study. These data can be found in the the UCSC Xena database and the Cancer Genome Atlas (TCGA) database.

Informed Consent

Written informed consent was waived since all data are from public databases.

Acknowledgments

The author gratefully acknowledges the UCSC Xena database and the Cancer Genome Atlas (TCGA) database, which made the data available.

Disclosure Statement

No competing financial interests exist.

Funding Information

Natural Science Foundation of Gansu Province, Grant/ Award Number: 22JR5RA505; Fundamental Research Funds for the Central Universities, Grant/ Award Number: lzujbky-2022-34

Reference