2 Cancer Hospital of Gansu Province, Lanzhou, P.R. China
Abstract
Immune-related genes (IRGs) affect the composition and abundance of
tumor-infiltrating immune cells (TIICs) by encoding immune molecules.
The utilization of IRGs and TIICs offers considerable potential for
immunotherapy studies of breast cancer. In this study,
the differentially expressed
genes (DEGs) between breast cancer patients and healthy individuals were
assessed using the Cancer Genome Atlas (TCGA). ImmPort and TISIDB
databases, the Weighted Gene Co-Expression Network Analysis (WGCNA),
univariate Cox regression and LASSO penalized Cox regression methods
were used to calculate the composition and abundance of immune cell
types. It was found that CD79A and GZMAcould independently predict the
prognosis of breast cancer and were significantly associated with
activated CD8 T cells, immature B cells, folic helper T cells and
regulatory T cells, implying that these two genes are involved in the
immune regulation and progression of breast cancer and could potentially
be new targets for breast cancer immunotherapy.
Keywords:breast cancer, immune-related genes, immune infiltration,
prognosis
Introduction
Breast cancer is a public health issue that seriously jeopardizes social
development and human survival. By 2020, breast cancer has surpassed
lung cancer as the most common type of cancer in women worldwide,
accounting for 25% of new cancer cases and 6% of cancer-related deaths
in females. Its incidence has increased at a rate of approximately 0.5%
per year from 1975 to 2017. Compared to 2012, when breast cancer
accounted for 15% of female cancer deaths, the global mortality rate
from breast cancer in women is gradually increasing. In clinical, breast
cancer is mainly staged by estrogen (ER+), progesterone (PR+), and human
epidermal growth factor receptor 2 (HER2) status, which can be further
classified as luminal A and B, HER2-positive, and triple-negative breast
cancer (TNBC). Surgery, radiation therapy, chemotherapy, and targeted
therapy are currently the most widely used treatments for breast cancer,
depending on the stage of the disease, while these treatments are
greatly hampered by the bioinvasive nature of TNBC, poor prognosis, and
lack of therapeutic targets.Nowadays, immunotherapy, which has been
considered as the fifth treatment modality for cancer, is effective in
different breast cancer staging. However, immunotherapy does not benefit
all breast cancer patients due to the existence of ”cold” tumors with
low immune cell infiltration and an immunosuppressive tumor
microenvironment (TME), so it is necessary to find ways to improve the
effectiveness of immunotherapy.
In immunotherapy, the composition and abundance of tumor-infiltrating
immune cells (TIICs),which comprise the tumor microenvironment (TME) and
contain various types of immune cells such as T cells, macrophages, and
neutrophils, play an important role. There is growing evidence that
tumor-infiltrating lymphocytes (TILs) are associated with a good
prognosis in breast cancer. TILs are a strong prognostic factor for TNBC
and are considered as Class I evidence. And the level of TILs can
identify breast cancer patients more suitable for immunotherapy. TILs
reduce the risk of death in breast cancer patients with both ER-negative
and ER-positive HER2-positive subtypes, and CD8+ T cells are associated
with a significantly lower relative risk of death in this subtype.
Additionally, CCR9+ T cells have also shown great potential in mediating
anti-tumor immunity. Therefore, the identification of TIICs can be used
in anti-tumor immunity and in the prediction of prognosis of breast
cancer. Immune-related genes (IRG) regulate immune responses by encoding
cytokines, tumor necrosis factors, chemokines, T-cell antigen receptor
(TCR) signaling pathway, and B-cell antigen receptor (BCR) signaling
pathway. Thus, the IRG has a large influence on the composition and
abundance of TIICs, and its characteristics are very important. For
example, CCR4 mediates the migration of regulatory T cells (Treg) into
tumors, and blockade of CCR4 reduces the amount of Treg and enhances
anti-tumor immune activity. Several surface molecules of TIICs can also
be used as targets for the treatment of tumors, such as GITR.
In this study, 1,072 breast cancer patient samples and 99 normal samples
were obtained from the Cancer Genome Atlas(TCGA) database to screen the
differentially expressed genes (DEGs). The tumor microenvironment was
evaluated using the ESTIMATE algorithm to obtain the stromal score and
immune score of the tumor. To further mine the core regulatory genes,
candidate hub genes associated with patients’ immunity were screened
according to the immune score using the Weighted Gene Co-Expression
Network Analysis (WGCNA). Then
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)
pathway enrichment analyses were performed to explore their possible
molecular functions, biological processes, cellular components of gene
enrichment, and the signaling pathways involved. Risk scores were
calculated using LASSO penalized Cox regression, and risk classes were
classified according to medians, and nomogram was made to predict the
probability of individual time of occurrence. For the screened IRG, K-M
survival curves were plotted. The correlation between risk class and
immune cell infiltration, which may play a key role in the prognosis of
breast cancer, was analyzed.
Method
Data source
UCSC Xena (http://xena.ucsc.edu/) is a platform that contains a large
amount of multiple cancer genomics data from TCGA. Individual breast
cancer transcriptome RNA sequencing (RNA-seq) data (Htseq-FPKM and
Htseq-Counts) and corresponding clinical data were obtained from UCSC
Xena, including 1,072 breast cancer samples and 99 normal samples. The
data was last updated on July 18, 2019.
Differential expression
analysis
RNA transcriptome sequencing data were processed and normalized. The
DESeq2 package provides methods for testing the expression of
variability by using a negative binomial generalized linear model, and
the estimates of dispersion and logarithmic multiplicative variation
incorporate data-driven prior distributions. The R4.2.0 DESeq2 package
version 1.36.0 was used to analyze DEGs with false discovery rate (FDR)
< 0.05 and |log2 fold change
(FC)| > 1 selected as the thresholds for screening
the DEGs. The R4.2.0 ggpubr package version 0.4.0, ggthemes package
version 4.2.4, pheatmap package version 1.0.12 were used to make a
Volcano Plot of DEGs.
Immune score analysis
The ESTIMATE algorithm was used to estimate the TME of BRCA patients in
TCGA samples, as well as the
stromal score (stromal content),
immune score (degree of immune cell infiltration), ESTIMATE score
(combined marker of stroma and immunity), and tumor purity. The R4.2.0
estimate package version 1.0.13 was used to perform the ESTIMATE
algorithm.
WGCNA
The R4.2.0 WGCNA package version 1.71 was used to perform WGCNA. WGCNA
is used to perform complex data analysis of multiple samples. It
identifies gene collections (modules) with similar expression patterns
by calculating inter-gene expression relationships, resolves
associations between gene collections and phenotypes, maps regulatory
networks among genes in the gene collections, and identifies key
regulatory genes. DEGs were selected for WGCNA analysis, and
correlations between modules and models were analyzed using stromal
score, immune score, ESTIMATE score, and tumor purity as phenotypes.
Gene significance (GS) and module significance (MS) were calculated to
measure the significance of genetic and clinical information.
Functional and Pathway Enrichment
Analyses
GO analysis and KEGGpathway analysis enrichment results were obtained
from the R4.2.0 clusterProfiler package version 4.4.2. The GO analysis
mainly describes the biological process (BP), cellular component (CC),
and the possible molecular function (MF) exercised by the gene involved.
KEGG correlates gene catalogs derived from genomes that have been fully
sequenced with higher-level of cellular, species, and ecosystem-level
system functions. A P -value <0.05 was set as the
cut-off standard.
Univariate Cox regression and LASSO penalized Cox
regression
A univariate Cox regression was performed to analyze each gene
individually, and the univariate was included in the LASSO-penalized Cox
regression when it was significant (p<0.05) for analysis in
univariate Cox regression. LASSO penalized Cox regression was performed
by using the R4.2.0 glmnet package version 4.1-4 to build the prognostic
model to avoid overfitting. The risk score formula is as follows:\(\text{Risk\ score}=\sum_{i}^{n}X_{i}\times Y_{i}(X:\ coefficients,\ Y:\ gene\ expression\ level)\).
By comparing the median risk scores of low- and high-risk breast cancer
patients, all patients were categorized as low- or high-risk.
Survival Analysis
To verify the potential of pivotal genes as diagnostic markers and
therapeutic targets for tumors, the R4.2.0 survminer package version
0.4.9 was used to perform Kaplan–Meier survival curves.
Nomogram
Nomogram predicts the probability of occurrence time for an individual
by using external data independent of how that individual scored on each
metric in the model. The metrics include age, pathological stage,
T-stage, N-stage, gene expression level, and risk level. The R4.2.0 rms
package version 6.3-0 was used to plot the nomogram.
Immune infiltration
analysis
CIBERSORT is a method for calculating cell composition based on
expression profiles. This deconvolution algorithm is used to calculate
the proportion of 22 immune cells in each breast cancer patient. The sum
of the 22 immune cell type scores in each sample is 1. Based on
CIBERSORT, differences in the proportion of immune cells in breast
cancer patients with different gene expressions were analyzed. Based on
the gene expression levels in 28 published immune cell genomes, the
R4.2.0 GSVA package version 1.44.2 was used to calculate the degree of
infiltration of 28 immune cell species.
Statistical Analysis
All statistical analyses were performed by R (4.2.0) software. Survival
curves were constructed using the Kaplan-Meier method (log-rank test).
Nomogram analysis was performed using the Wilcoxon rank-sum test. All
hypothesis tests were two-sided, and a P -value <0.05
were considered significant.
Result
DEG screening
Based on the thresholds for DEGs (FDR < 0.05 and
|log2FC| > 1), a total of
5078 DEGs were screened, of which 2139 were significantly down-regulated
and 2939 were significantly up-regulated. The results were visualized
using a volcano plot in Figure 1.
WGCNA
WGCNA was used to find which genes in the DEGs of breast cancer patients
were most associated with immune scores (Figure 2). The green module was
chosen because it correlated with the immune score (R = 0.95, P< 0.05). According to the absolute value of GS >
0.20 and the absolute value of MM > 0.80, 52 genes with the
highest connectivity in the green module were identified as the pending
candidate module genes.
Function analysis
To determine the biological function of 408 green module genes, we
performed a GO enrichment analysis (Figure 3A) and a KEGG pathway
enrichment analysis (Figure 3B). In GO enrichment analysis, the BP group
genes were focused on cytokine-mediated signaling pathway; the CC group
genes were mainly related to the external side of the plasma membrane;
the MF group genes were mainly enriched in cytokine activity, immune
receptor activity, cytokine receptor binding, and receptor ligand
activity. The results of the KEGG pathway enrichment analysis showed
that the 408 candidate module genes were mainly enriched in
Cytokine-cytokine receptor interaction, Viral protein interaction with
cytokine and cytokine receptor, Chemokine signaling pathway,
Hematopoietic cell lineage, JAK-STAT signaling pathway.
Construction and Validation of a Prognostic
Signature
IRGs were obtained from two websites,
ImmPort and TISIDB, and 1931 IRGs
were obtained after removing duplicate values. 29 IRGs were obtained by
making a Venn diagram with 52 candidate modular genes (Figure 4A). To
further obtain survival-related genes, univariate Cox regression
analysis was performed on the 29 IRGs in the previous step for
preliminary screening. Based on the threshold of P -value
< 0.05, we identified 28 genes that were significantly
associated with overall survival. The expression levels of these 28
genes were then entered into the LASSO-Cox regression model for feature
selection (Figure 4B, C). Under the penalty condition (α=1), two gene
scores with non-zero coefficients were selected to enter the risk score
model (Figure 4D):\(Risk\ score=\ \left(-0.111770369395437\times\ GZMA\ espression\right)+\left(-0.0505760467639284\times\ CD79A\ espression\right)\).
Patients with breast cancer could be classified into high-risk and
low-risk groups by the median of the risk score. According to the
survival analysis at different GZMA and CD79A expression levels and at
different risk scores (Figure 4E, F and G), the overall survival (OS)
for patients with a high-risk score was significantly shorter than those
with low-risk score. The area under the ROC curve for 1-, 3-, and 5-year
OS was 0.619, 0.630 and 0.578 respectively (Figure 4H). Considering that
the constructed risk model was significantly associated with poor
prognosis, a nomogram plot was established to explore whether the
expression levels of GZMA and CD79A and the risk class could be used as
independent factors to predict prognosis (Figure 4I).
Immune correlation analysis of the risk
model
The proportion of immune cells in different risk classes was analyzed to
explore the association between immune status and risk class in breast
cancer patients in the TCGA database. As shown in Figure 5A, patients in
the low-risk group had a higher percentage of memory B cells,
macrophages, and Tregs compared to patients in the high-risk group.
Similarly, the expression levels of activated CD4 T cells, activated CD8
T cells, and natural killer T cells were significantly higher in the
low-risk group compared with those of the high-risk group, suggesting
that T-cell regulation differs between two groups (Figure 5B). In
addition, the correlation between gene and immune cell expression levels
was further explored, and GZMA and CD79A correlated with the expression
levels of most immune cells (Figure
5C).
Discussion
Recent years have seen a gradual advancement in the use of immunotherapy
to treat tumors, and TICLs is particularly significant in the context of
the immunological environment in cancer. Immune cells have a dual role
in the development of tumors, contributing to both anti-tumor immune
responses as well as possible promotion of tumor growth. By infiltrating
tumor tissue and eliminating tumor cells, T cells play the most
important role as the immunotherapy target, and high T cell infiltration
can further increase the effect of immunotherapy. One of the mechanisms
of cancer-induced immune tolerance is based on the expansion of
myeloid-derived suppressor cells (MDSC), where myeloid cells
infiltrating tumor sites to induce resistance of tumor cells to
cytotoxic T lymphocytes (CTLs) by modifying pMHC complexes on tumor
cells. CD4+25+Foxp3+ Tregs control T cells or antigen-presenting cells
(APCs) in a cell-to-cell contact-dependent way by secreting suppressive
cytokines, such as IL-10 and transforming growth factors. These
cytokines have broad immunosuppressive effects. Specific subpopulations
of tumor-associated macrophages may have different roles in cancer
progression and antitumor immunity, and tumor-associated FOLR2+
macrophages are potent APCs with the function of triggering CD8+ T cell
activation. B cells have a complex composition and can use both the
secretion of cytokines and other substances to create an environment
that is immunosuppressive for the tumor and even further to enhance the
anti-tumor immune effect through perforin. The question of how to use
different TICLs-associated breast cancer pathologies and subtypes of
TICLs to enhance antitumor immunity has not been addressed. Therefore,
it is urgent to identify the subsets of TICLs and find immunotherapies
based on various TICL subsets.
Blockade of CTLA-4 or PD-1/PD-L1, the approved immune checkpoint
inhibitors working in the immunotherapy direction, can increase CD8+ T
cells’ anti-tumor response. Moreover, with rapid development, abundant
immunotherapeutic targets such as TIM-3, LAG3, 4-1BB, CD28, ICOS, OX-40,
and VISTA are emerged. However, because the majority of patients are
resistant to specific medications, it is necessary to identify new
targets using TICLs, which are crucial for the immune response and open
up new possibilities for immunotherapy. In TICLs, different infiltration
patterns or differentiation of B cells in different directions induced
by the tumor microenvironment can affect their final overall effect. As
a highly accurate marker of B cells in immunohistochemistry, CD79A is
necessary for the production and operation of B cell antigen receptors.
It has been studied more often in hematologic tumors but has also
recently been reported in studies related to breast cancer. In
immune-enriched, completely pathologically sensitive basal-like
triple-negative breast cancer, CD79A is extensively expressed. Another
study has suggested differential expression of CD79A is independent of
the subtype of breast cancer and has independent prognostic relevance,
exhibiting differences in overall survival among 1,998 patients. Given
that CD79A on MDSCs plays a crucial functional role in maintaining the
immature immunosuppressive phenotype of MDSCs and stimulating the
release of tumor-promoting cytokines, as well as the observation that
tumor-induced MDSCs express CD79A, it is promising that CD79A may
represent a novel target for cancer therapy. GZMA from cytotoxic
lymphocytes induces pyroptosis by lysing and activating GSDMB, which can
mediate immunotherapy of relevant cancers. Similar to CD79A, GZMA can be
used as an independent predictor of survival in breast cancer patients
and can be a robust biomarker. Our study showed that the proportion and
abundance of immune cells in the low-risk group with high expression of
both genes were higher in our risk model. The low-risk group, in
particular, possessed some immune advantages in T cells and their
subpopulations.
In this study, two IRGs affecting breast cancer prognosis were screened
and identified by ESTIMATE algorithm, WGCNA expression analysis,
CIBERSORT, and LASSO penalized Cox regression analysis, and based on the
median risk score, breast cancer patients could be classified into high-
and low-risk groups. This study provides reference value for classifying
breast cancer from the perspective of prognostic risk. To examine the
clinical application value of these two IRGs, we developed a prognostic
model with age, stage, T-stage, N-stage, the expression level of the two
IRGs, and risk grade as independent prognostic factors. As the results
of previous study, high expression levels of CD79A and GZMA predicts a
good prognosis.
However, this study was only analyzed from a bioinformatics perspective,
lacking evidence of detection in clinical samples and research on
related mechanisms, further experimental proof of specific regulatory
mechanisms is needed.
Data Availability
Statement
Publicly available data sets were used in this study. These data can be
found in the the UCSC Xena database and the Cancer Genome Atlas (TCGA)
database.
Informed Consent
Written informed consent was waived since all data are from public
databases.
Acknowledgments
The author gratefully acknowledges the UCSC Xena database and the Cancer
Genome Atlas (TCGA) database, which made the data available.
Disclosure Statement
No competing financial interests exist.
Funding Information
Natural Science Foundation of Gansu Province, Grant/ Award Number:
22JR5RA505; Fundamental Research Funds for the Central Universities,
Grant/ Award Number: lzujbky-2022-34
Reference