A novel prognostic target-gene signature and nomogram based on an integrated bioinformatics analysis in hepatocellular carcinoma
1Department of Engineering Structure and Mechanics, School of Science, Wuhan University of Technology, Wuhan, 430070, China
2Department of Geotechnical Engineering, Wuhan University of Technology, Wuhan, 430070, China
3Engineering Laboratory of Zhejiang Province for Pharmaceutical Development of Growth Factors & Biomedicine Collaborative Innovation Center of Wenzhou, Institute of Life Sciences, Wenzhou University, Wenzhou, 325035, China
*Address correspondence to: Qingjia Chi, firstname.lastname@example.org; Da Sun, email@example.com
Received: 03 August 2021; Accepted: 23 September 2021
Abstract: There is currently no effective solution to the problem of poor prognosis and recurrence of HCC. The technology of immunotherapy and prognosis of genetic material has made continuous progress in recent years. In the study, a 5-gene signature was established for the prognosis of HCC through biological information, and the immune infiltration of HCC patients was studied. After studied HCC patients’ immune infiltration, the paper screened the differential target genes of miR-126-3p in HCC downloaded from TCGA database, and uses WGCNA method to select the modular genes highly relevant to M2 macrophage. Then we use LASSO and COX regression analysis technology to establish the 5-gene signature. The nomogram is established by combining the prognostic score and clinical phenotype. Cibersort was empolyed to observe the immune infiltration in HCC patients. We revealed the biological pathways of HCC-related genes through GSEA and Metascape. The bioinformatics analysis of 2495 differential target genes finally constructed a 5-gene signature with a reliable prognostic ability (CDCA8, SLC41A3, PPM1G, TCOF1, GRPEL2). The combination of prognostic score and AJCC_Stage resulted in a more reliable prognosis ability. At the same time, 10 immune cells that are differentially expressed in HCC patients were also found. 8 GSEA pathways related to the prognosis were found. In the study, a reliable 5-gene signature was established based on the differential target gene of miR-126-3p to study the immune infiltration in HCC patients. It provides help for HCC-related prognosis research and immunotherapy.
Keywords: Hepatocellular carcinoma; Prognostic signature; Differentially expressed genes; microRNA
|AUC:||area under curve|
|DETGs:||differentially expressed target genes|
|GEO:||gene expression omnibus|
|LASSO:||least absolute shrinkage and selection operator|
|ROC:||receiver operating characteristic|
|TCGA:||the cancer genome atlas|
|WGCNA:||weighted gene co-expression network analysis|
|GSEA:||gene set enrichment analysis|
Primary hepatocellular carcinoma (HCC) and the corresponding chronic hepatitis or liver cirrhosis are widely regarded as precancerous lesions (Kanai et al., 1997). This disease is the third most common cause of death from cancer for human-beings (Pinato et al., 2020). The occurrence and progression of HCC are related to many factors (HBV infection, eating habits and drinking, etc.) (Piñero et al., 2020). Because there are no nerves in the liver tissue, most cancers have entered the advanced stage when patients feel pain and other discomfort. Untimely diagnosis leads to an extremely poor prognosis of patients. Owing to the high recurrence and metastasis characteristics of HCC, patients even lose the chance of radical treatment (Giannini et al., 2015). Therefore, a new and reliable method for the diagnosis and prognosis of liver cancer is urgently needed.
Carcinogenesis is a multi-step process that is caused by changes in signal pathways triggered by multiple genes that transform normal cells into malignant cells (Fantini et al., 2015). microRNAs (miRNAs) are small non-coding RNAs. As a kind of genetic material, miRNAs affect many biological processes of cancer by regulating genes. Studies have shown that the interacting genes revealed by HCC-related pathways are potential biomarkers and therapeutic targets for cancer (Guo et al., 2020). Simultaneously, in vivo detection technology feasibility has been improved, providing technical support for theoretical research on prognosis using genes. Therefore, the study of prognostic analysis of HCC human patients through the construction of multi-gene signatures is of great value. Studies reported hsa-miR-126-3p was differentially expressed and the progonostic role in various kinds of cancers (Han et al., 2020a; Nowicki et al., 2021). However, the prognostic role remainded to be expolored in liver cancer. Therefore, the study studied the expression of hsa-miR-126-3p, and established a prognostic gene-signature with the target genes of hsa-miR-126-3p.
Studies have shown that cancer development is closely linked to the immune system, especially the tumor microenvironment in human bodies (Han et al., 2020b). The HCC tumor immune microenvironment involves a series of non-cancer cell types, including normal liver cells, endothelial cells, pericytes, hepatic stellate cells (HSCs), portal fibroblasts and differential immune cell subgroups (Hou et al., 2020). Studies have shown that a variety of immunotherapy programs may significantly change the clinical outcome of HCC (Elsegood et al., 2017; Khemlina et al., 2017). Therefore, investigating the immune infiltration of HCC patients may be of great help to the judgment of cancer progression and prognosis.
In the paper, hsa-miR-126-3p and its target genes in liver cancer tissues were studied. Based on the TCGA and GEO databases, the 5-gene signature was constructed through WGCNA, LASSO regression analysis and other bioinformatics methods. The nomogram and survival curve are used to verify the 5-gene signature. Subsequently, we investigated the immune infiltration in HCC patients through the Cibersort deconvolution method. In this paper, the expression levels of 22 immune cells are related to gene expression levels, and the changes in the proportion of immune cells are analyzed. Finally, the tool GSEA was used to enrich the genes in HCC patients, 8 pathways were found (cell cycle, ECM–receptor interaction, herpes simplex virus 1 infection, p53 signaling pathway, inositol phosphate metabolism, microRNAs in cancer, phosphatidylinositol signaling system, renal cell carcinoma), GO and KEGG pathways related to HCC were found. The paper links the differential expression of genes, the level of immune infiltration, survival time, and clinical phenotypes in HCC patients, and provides help for HCC prognosis study and immunotherapy.
Materials and Methods
Twelve databases based on miRwalk2.0 (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2) identified potential target genes of miRNA-126-3p for subsequent research.
Both the transcriptome information expression matrix and clinical information matrix of HCC patients are downloaded from the TCGA database (https://cancergenome.nih.gov). The transcriptome expression matrix is composed of the expression levels of MiRNAs and genes in tissue samples and normal samples. The clinical information matrix is composed of visit information such as the age, survival time, tumor grade, tumor type, pathological stage of patients with tumor tissue samples. The tumor samples in the clinical information matrix correspond to the tumor tissue samples in the transcriptome expression matrix. Genes that were not recorded in more than a quarter of the samples were eliminated. The transcriptome information expression matrix was standardized by FPKM. Clinical samples with zero survival time were excluded. The HCC-related data set downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo) is used as an external verification set for 5-gene signatures.
The clinical evaluation
By selecting the method of analyzing the expression of miRNA-126-3p in each sample of the HCC transcriptome from the TCGA database, we observed the difference in the expression of miRNA-126-3p in the tumor group samples and the normal group samples, and study its clinical significance. To observe the differential expression of miRNA, we compared the expression of miRNA-126-3p in tumor samples and normal samples in groups. Based on the R language program package “survival”, the survival curves of the tumor group and the normal group are drawn. Based on the R language program package “ROCR” to draw the ROC curve of the tumor group and the normal group, with AUC value greater than 0.7 and P-value less than 0.05 as the standard, the differential expression of miRNA-126-3p in HCC patient samples was determined.
Immune infiltration analysis
To study the various immune cells infiltration levels in HCC patients, we uploaded the gene expression matrix of HCC patient samples to the Cibersort website (https://CIBERSORT.stanford.edu/). Based on the Cibersort deconvolution method, we can get the expression levels of 22 kinds of immune cells for each sample. The immune cells with the highest level of infiltration are selected, and the individual expression of the sample is used as the clinical phenotype for subsequent WGCNA analysis.
To study the difference in immune infiltration level between samples from high-risk groups and low-risk ones, we used the average of expression of 22 kinds of immune cells in samples from different risk groups as the ratio of immune cell infiltration. Then, we set perm = 1000 and selected immune cells with P < 0.05 as immune cells with significant differential expression. Visualization completed based on R language package “vioplot” and “ggplot2” Identification of differential target genes and modular genes.
Utilizing comparing the identified potential target genes with the genes in the transcriptome matrix in the TCGA database, the common genes were screened out. The common gene was screened based on the R language program package “DESeq2” with |logFC| > 1, P < 0.05 as thresholds, and used as a differential target gene (DETG) for subsequent research.
We analyzed the DETG obtained by WGCNA, and selected the modular gene by combining the sample gene expression with clinical information. This process is based on the R language package “wgcna”. WGCNA is a statistical method that constructs genes into a gene-weighted co-expression network through weighted processing of gene expression. The co-expressed gene set was identified through the analysis, the modules and the phenotype data were associated with the analysis, and the potential core genes were discovered.
First, performing a cluster analysis on the expression profile, calculate the Pearson correlation coefficient of the gene, and establish a correlation matrix.
Next, we build the adjacency matrix:
Among them, β is empolyed to indicate the weighted value which conforms to the scale-free network law (fit value R2 to 0.85). The gene pair correlation index aij is used to construct an adjacency matrix A. aij equals the β square of the correlation coefficient Sij.
Finally, the topological adjacency matrix was constructed:
So far, we believe that genes with similar expressions are functionally related and can be clustered into the same module based on the correlation. Genes that cannot be successfully aggregated into a certain module are oligogenes and are not included in the follow-up study due to weak correlation. According to the tree cutting method, the number of modules and the number of genes gathered in each module can be set. Then we imported the clinical data of the patient sample and correspond to the expression sample one-to-one, calculate the correlation and significance of the clinical traits and each module. And the module gene with the strongest correlation with the clinical phenotype can be taken as the modular gene.
Gene ontology and KEGG pathway
Based on the online tool Metascape (https://metascape.org/), the modular gene was enriched with GO Biological Processes, GO Cellular Components, GO Molecular Functions and KEGG. Enrichment is performed under the condition of perm = 1000, and the P-value is arranged from small to large. The enrichment of genes into pathways is visualized through the R package “GOplot”.
Development of prognostic model
Genes related to patients’ overall survival (OS) were identified through univariate Cox and Lasso Cox regression. Firstly, univariate COX regression analysis was used, with P < 0.05 as the standard, to initially screen out genes related to OS. Subsequently, the selected genes were analyzed by LASSO regression, and the LASSO analysis coefficient was calculated for subsequent risk model construction. LASSO regression applied the L1 norm summation method to punish the model parameters to fit the model properly,
Among them βi is a set of variables of the model, and λ controls the complexity of the model. Through LASSO regression analysis, an accurate and streamlined regression model was established. In this paper, the ten genes with the smallest partial likelihood deviation in the LASSO regression analysis were selected for ROC analysis, and they were sorted according to the AUC value. The top five genes with the AUC value were applied to construct the prognostic gene signature. The risk score (RS) of each sample can be calculated linearly:
In the equation, n is the number of genes contained in the final prognostic signature model. Coefi means the LASSO coefficient of gene i. Expi is the expression value of gene i. we selected the median of RS in HCC patient samples to complete the cutoff value, divided the patient samples into two groups accroding to their each RS, and completed the visualization of the relationship between different risk groups and patient survival. Simultaneously, based on the R language package “timeROC”, K-M curve and time-dependent ROC curve are applied to estimate the prognostic and diagnostic value of gene signature.
The patient’s risk score and other clinical information were analyzed together to analyze the variables, and the second univariate and multivariate COX regression analysis was performed. The C-index index of the regression analysis model is calculated to evaluate the predictive ability of the regression model for patient survival. The R program package “rms” is used to incorporate the prognostic factors obtained from the COX regression into the nomogram and calibration chart construction. Set the parameter perm = 1000, and select the 6-node difference fitting method to verify the consistency between the patient's survival prognosis and the real survival situation.
Gene set enrichment analysis
According to the predicted expression value of RS based on the patient sample, and the median as the screening criterion, the patient samples were divided into high risk or low risk groups. Based on the R speech package “limma”, the logFC value of the high and low risk group genes is calculated. Use the R package “clusterprofiler” to perform GSEA (Gene Set Enrichment Analysis) analysis on the whole genome. GSEA can find the biological pathways that are significantly enriched by the two sets of samples through the amount of gene expression, and find the phenotype with the expression pattern of the gene set closer. Set perm = 1000, and use |NES| > 1, P < 0.05 as the screening conditions to determine the significantly enriched gene set. Obtain and visualize the enriched pathways.
The R language package analyzed by WGCNA is called on the “R v3.6.1” platform. The rest of the R language packages for statistical analysis are all called on the “R v4.0.2” platform. P value less than 0.05 is considered to have statistically significant reliability. AUC value greater than 0.7 is considered to have significant specificity. TCGA database (https://cancergenome.nih.gov). miRwalk 2.0 website (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2). GEO database (https://www.ncbi.nlm.nih.gov/geo). Cibersort website (https://CIBERSORT.stanford.edu/). Online tool Metascape (https://metascape.org/).
Transcriptome and clinical data
We downloaded 372 HCC tumor tissue samples and 50 normal tissue samples from the TCGA database to study the expression level of miRNA-126-3p; download the gene expression matrix of HCC samples to obtain 19753 gene expression data. A total of 8671 potential target genes of miR-126-3p were identified for subsequent analysis. 8372 overlapped genes were prepared for the following study (Fig. 1A). The expression difference analysis of each gene finally determined that 5838 DEGs genes were studied, a total of 4760 up-regulated genes and 1078 down-regulated genes were identified (Fig. 1B). Downloaded 378 samples of clinical data, and eliminated 6 samples with missing visit information, which corresponded to samples of HCC tumor gene expression.
The paper obtained 21218 gene expression data in 219 tumor samples in GSE14520 in GEO data, and obtained clinical information of tumor samples through platform GPL3921-25447. Two samples with a survival time of 0 are eliminated, and the remaining data is used for 5-gene signature verification.
Clinical performance of miRNA-126-3p
By comparing miR-126-3p expression in tumor samples and normal samples, the expression of miR-126-3p in HCC tumor tissues was down-regulated (P = 2.9e−8) (Fig. 2A). Simultaneously, the expression of miR-126-3p has better recognition in tumor samples and normal samples (AUC = 0.7789, P = 0.029) (Fig. 2B). K–M curve analysis illustrated that the survival time of samples with low expression of miR-126-3p was significantly shortened (P = 0.0064) (Fig. 2C).
Selection of modular genes highly relevant to macrophages
By comparing the target genes of miR-126-3p with the DEGs obtained by pretreatment, we obtained 2495 overlapping genes. Among the overlapping genes, there are 2033 up-regulated genes and 462 down-regulated genes (Fig. 1C). Based on the WGCNA algorithm, we established a scale-free co-expression network with 2495 overlapping genes under the condition of β = 8. According to the tree cutting method, setting the parameters minModuleSize = 50 and mergeCutHeight = 0.15 as the conditions, we got 4 modules with 833 oligogenes. By calculating the Pearson correlation coefficient of each module and the patient's clinical information, we found that the turquoise module and ISUP_Grade (cor = 0.35, P = 6e−12) showed the most significant positive correlation (Figs. 4A and 4B). At the same time, the turquoise module has a significant positive correlation with AJCC_Stage (cor = 0.2, P = 3e−4) and Pathology Tumor (cor = 0.2, P = 1e−4). The turquoise module is negatively correlated with the expression of macrophages M2 (cor = −0.21, P = 3e−5), which is the most prominently expressed macrophages in patients (Fig. 3). Therefore, we identified 833 genes in the turquoise module as the modular genes that are the key to research in this study (Figs. 4C and 4D).
We accomplished the GO and KEGG enrichment analysis of the modular genes obtained by WGCNA analysis. In the GO biological processes analysis, 109 genes were enriched in the DNA repair pathway. 44 genes were enriched into DNA replication pathway. 45 genes were enriched into the DNA recombination pathway. 60 genes were enriched into the DNA conformation change pathway (Fig. 5A). In cellular components analysis, 117 genes were enriched in the chromosomal region pathway. 49 genes were enriched into the spindle pathway. 52 genes were enriched into the centrosome pathway. 16 genes were enriched into the replication fork pathway (Fig. 5B). In molecular functions analysis, 36 genes were enriched in catalytic activity, acting on DNA pathway. 58 genes were enriched into the chromatin binding pathway. 27 genes were enriched into the helicase activity pathway. 64 genes were enriched into the transcription coregulator activity pathway (Fig. 5C). In the KEGG analysis, 54 genes were enriched into the Herpes simplex virus 1 infection pathway. 22 genes were enriched into the cell cycle pathway. 12 genes were enriched into the DNA replication pathway. 20 genes were enriched into the cellular senescence pathway (Fig. 5D). All the above-enriched genes are up-regulated genes (Suppl. Tables 1–4).
Construction of target gene signature and nomogram
Through univariate COX regression analysis of 833 modular genes, we confirmed 511 genes significantly related to OS (P < 0.01). Based on the LASSO regression analysis of 511 genes, 10 genes (TCOF1, SLC41A3, FARSB, PFKFB4, OLA1, PPM1G, GRPEL2, CDCA8, MAFG, MEX3A) (Fig. 6) that are significantly related to the patient’s prognosis were further determined (Fig. 7A). Through ROC analysis of these 10 genes, we selected 5 genes with diagnostic significance to construct the gene signature CDCA8 (AUC = 0.9773), SLC41A3 (AUC = 0.9749), PPM1G (AUC = 0.9704), TCOF1 (AUC = 0.9701), GRPEL2 (AUC = 0.9631) (Figs. 6A–6E). The prognosis of these 5 genes does not influence clinical phenotype.
Finally, a prognostic formula is constructed based on gene expression and the corresponding LASSO coefficient, which is:
RS = 0.0324 × Expression of CDCA8 + 0.0087 × Expression of SLC41A3 + 0.0103 × Expression of PPM1G + 0.0039 × Expression of TCOF1 + 0.0954 × Expression of GRPEL2.
The median of the RS value was taken as the cutoff value, and 169 patient samples with RS values higher than the cutoff value were divided into high-risk groups, and 169 patients with RS values lower than the cutoff value were ranked into low-risk groups (Fig. 7B). By observing the dot-line and scatter plots that express the relationship between the two groups and the survival of patients, it is easy to find low-risk patients’ survival situation is better (Fig. 7C). By observing the heat map of signature gene expression in high and low-risk groups, RS increases with the increase of signature genes expression (Fig. 7D).
The risk score obtained based on this gene signature has diagnostic significance for the survival prediction of HCC patients at 1 year (AUC = 0.791), 3 years (AUC = 0.686), and 5 years (AUC = 0.645). At the same time, the K–M curve of gene level grouping also shows that low-risk samples’ prognosis is significantly better (P < 0.0001) (Figs. 8A and 8B). The differential expression of 5 signature genes was studied. CDCA8 (P < 2.22e−16), SLC41A3 (P = 1.5e−11), PPM1G (P < 2.2e−16), TCOF1 (P < 2.2e−16), GRPEL2 (P < 2.2e−16) In the high-risk group, the expression was high (Figs. 8C–8E).
We performed univariate and multivariate COX regression on RS and AJCC_Stage to explore the prognostic performance of the 5-gene gene signature. Univariate COX analysis illustrated that risk score and AJCC_Stage highly correlated with the survival of patients (P < 0.05). Nomogram analysis shows that the nomogram created by RS and AJCC_Stage (C-index = 0.644) is better than the nomogram created by RS (C-index = 0.607) or AJCC_Stage (C-index = 0.593) (Fig. 9A). Analyze the results separately. Simultaneous analysis of the prediction results of 1-year, 3-year and 5-year overall survival rate shows that the nomogram analysis based on the joint establishment of RS and AJCC_Stage has a better predictive ability for HCC patients (Fig. 9B).
According to the obtained 5-gene signature, the 1-year (AUC = 0.656), 3-year (AUC = 0.597), and 5-year (AUC = 0.605) survival predictions of data set GSE14520 for HCC patients have diagnostic significance. K–M curve demonstrated the high-risk samples showed a low overall survival time and poor prognosis (P < 0.0001) (Figs. 10A and 10B). The 5 genes that make up the gene signature CDCA8 (P = 9.1e−11), SLC41A3 (P = 1.5e−11), PPM1G (P = 0.0041), TCOF1 (P = 0.0037), GRPEL2 (P < 2.2e−16) High expression in the high-risk group (Figs. 10C–10G).
Subsequently, the sample RS and AJCC_Stage in the data set GSE14520 were used to complete univariate multivariate COX regression analysis, and jointly constructed a nomogram (C-index = 0.765) with good prognostic ability (Fig. 11A). At the same time, the 1-year, 3-year and 5-year prediction-true survival curves obtained better fitting results (Fig. 11B).
Immune infiltration and GSEA between high- and low- group
We obtained the proportion of 22 immune cells in each patient by applying Cibersort’s deconvolution method. The top five subtypes of immune cells in the HCC patient group are (19.336%), T cells CD4 memory resting (15.039%), Macrophages M0 (12.645%), T cells CD8 (9.949%), and macrophages M1 (8.396%) (Fig. 3). Based on the method of dividing risk groups according to RS, the top five subtypes in the high-risk samples we obtained are Macrophages M2 (17.735%), Macrophages M0 (16.725%), and T cells CD4 memory resting (13.368%), T cells CD8 (9.949%), macrophages M1 (8.158%) (Fig. 12A). The top five subtypes of immune cells in the low-risk samples we obtained were Macrophages M2 (20.919%), T cells CD4 memory resting (17.039%), T cells CD8 (10.036%), Macrophages M0 (8.631%), macrophages M1 (8.627%) (Fig. 12B). Subsequently, we compared the immune infiltration situation of the high-risk group and the low-risk group, using P < 0.05 as the screening condition, and determined a total of 10 immune cells with significant differential expression. Among the 10 immune cells, T cells CD4 memory resting, T cells regulartory (Tregs), NK cells resting, monocytes, and macrophages M2 were reduced in the high-risk group, while the expression of B cells memory, T cells CD4 memory activated, and T cells follicular helper, macrophages M0 and neutrophils in the high-risk group was significantly higher than in the low-risk group (Fig. 12C).
Eight pathways were analyzed by GSEA based on the gene expression of high- and low-risk patients: Cell cycle, ECM-receptor interaction, Herpes simplex virus 1 infection, p53 signaling pathway, Inositol phosphate metabolism, MicroRNAs in cancer, Phosphatidylinositol signaling system, Renal cell carcinoma (Fig. 13).
As a malignant tumor disease, the development process of HCC is highly related to the expression of transcriptome genetic material (Boyault et al., 2007). At the same time, the process of HCC tumor growth is very complicated, and it is affected by a variety of genetic materials and immune microenvironment. Therefore, the investigation of multiple genetic materials and the comprehensive investigation of multiple genetic materials need to be completed urgently. The study started with the differentially expressed microRNAs in HCC patients, found the core differential target genes of this miRNA by constructing a weighted gene co-expression network, and finally constructed a prognostic model through COX regression analysis and LASSO regression analysis. The survival analysis and nomogram confirmed that the proposed model has better predictive ability. The model and immune infiltration were investigated comprehensively, and the immune infiltration of patients with different prognostic scores was observed.
MicroRNAs are small non-coding endogenous regulatory RNA molecules that bind to target mRNA through base pairing, thereby regulating gene expression at the post-transcriptional level. Studies have shown that miR-126-3p is related to the development of HCC (Du et al., 2014). At the same time, some target genes of miR-126-3p are indeed involved in the relevant pathways of HCC (Lou et al., 2018). However, the construction of a prognostic model based on the target gene of miR-126-3p has not yet been completed. This study first analyzed the expression and prognosis of miR-126-3p in HCC patients. The obvious low expression of miRNA and outstanding prognostic differences proved that miR-126-3p has great research value in HCC. Then, by clustering the differential target genes of miR-126-3p with respect to HCC, a gene-weighted co-expression network was constructed. The differential target gene expression level is linked with the clinical phenotype Macrophages M2 expression level, Tumor, Node, Metastases, AJCC_Stage, ISUP_Grade, and the core genes are screened out. These genes are more closely related to other genes and are also more closely associated with the clinical phenotype of HCC patients. In this study, the core genes were enriched and analyzed. The main enriched KEGG pathways include Herpes simplex virus 1 infection, Cell cycle, DNA replication, and cellular senescence, and found the main ways for these core genes to regulate the development of HCC. The TNM staging system of the American Joint Committee on Cancer (AJCC_Stage) is based on the most important prognostic indicators, namely: tumor number (Tumor), lymph node positive and distant metastasis (Node), and whether there is vascular invasion (Metastases) in the tumor (Xing et al., 2016). And the ISUP_Grade grading system has the dual benefits of predicting patient outcomes and promoting patient communication (Egevad et al., 2016). This article links the expression of Macrophages M2 in patients with clinical phenotypes such as AJCC_Stage, ISUP_Grade and the patient’s survival time, and completes the high correlation between differential target genes and clinical phenotypes to ensure that the core genes screened have better prognosis. Studies have shown that in the process of Herpes simplex virus 1 (HSV-1) infection, the level of IFI16 protein is greatly reduced. IFI16 interacts with tumor suppressor proteins p53 and BRCA1 and affects the process of tumor development (Llaca, 2016). At the same time, cell cycle and DNA replication have long been studied and proved to be highly relevant to the process of cancer, and the molecular mechanism of cell transformation has been proposed, which may help determine potential targets for improving cancer treatment (Hartwell and Kastan, 1994; Tachibana et al., 2005; Zhang et al., 2016b). Similarly, cellular senescence has also been studied extensively and has an impact on the progression and prognosis of cancer (Hartwell and Kastan, 1994; Tachibana et al., 2005; Zhang et al., 2016b). Therefore, this study found the primary way that miRNA-126-3p differentially target genes in HCC affect tumor progression. It also proved that the modular-genes obtained by screening significantly related to the prognosis of cancer.
In the study, univariate COX regression analysis and LASSO regression analysis were used to select the 10 genes with the smallest partial likelihood deviation among the core genes. In order to make the prediction model have a more accurate prognostic ability, this study selected the 5 genes with the largest expression differences (CDCA8, SLC41A3, PPM1G, TCOF1, GRPEL2) by drawing ROC curves to complete the construction of gene signatures. Studies have shown that Cell division cycle associated 8 (CDCA8) as a component of the chromosome complex, it participates in the formation of the mitotic spindle (Jeon et al., 2017). According to research, CDCA8 is highly correlated with activating inflammation-induced motility, manifesting in liver tissues and affecting lung tissues and breast tissues (Zhou et al., 2015). Meanwhile, the significant pathways of CDCA8 are metabolism of proteins and cell cycle, same as the major pathway of GRPEL2. Both of the two genes up-regulated in HCC, may suppress hepatocellular carcinoma growth. As reported CDCA8 can prevent HCC tumors keep growing by altering protein expression. In this study, the expression of CDCA8 in the high-risk group was significantly increased, which is consistent with previous studies. SLC41A3 encodes protein with cation transmembrane transport protein activity, which may help Mg2+ transport (de Baaij et al., 2016). At the same time, a lot of research work has included the gene SLC41A3 in their gene signatures (Liu et al., 2020; Zhou et al., 2020b). The ectopic expression of PPM1G enhances the stability of p27 protein and delays the progression of the cell cycle from G1 to S phase (Sun et al., 2016). PPM1G dephosphorylates USP7S, which plays a major role in cancer prevention, thereby affecting the progress of cancer (Khoronenkova et al., 2012). The significant pathways of PPM1G are mRNA splicing, development Dopamine D2 receptor transactivation of EGFR. And the epidermal growth factor receptor (EGFR) is reported as a rational target for cancer therapy because it is commonly expressed at a high level in a variety of solid tumours and implicated in the control of cell survival, proliferation, metastasis and angiogenesis (Ciardiello and Tortora, 2003). Thus, PPM1G may influence the survival, proliferation, metastasis of tumor cells. TCOF1 may cause tumor formation by affecting the expression of p53 (Al-Sayegh, 2009). Increased p53 protein levels in response to DNA damage activate a G1-phase cell cycle checkpoint (Graeber et al., 1994). It affected the proliferation of HCC in the liver tissue. And it affect the proliferation and division of bone marrow cells in the bone marrow, leading to the occurrence of leukemia (Kastan et al., 1991). Research has shown that GrpEL2 as a stress resistant protein in higher vertebrates, can maintain the activity of molecular chaperone under specific conditions (Srivastava et al., 2017). The expression level is multiplied by the corresponding LASSO regression coefficient and summed to obtain the prognostic score of each sample. Subsequently, the median of the overall prognostic score of the HCC sample was selected as the cutoff value, and the high-scoring samples were classified as the high-risk group and the low-scoring samples as the low-risk group. The time-dependent ROC curve and K–M survival curve results show that the scoring formula has credible prognostic ability.
Subsequently, the prognostic score and clinical data AJCC_Stage were used as variables for multivariate COX regression analysis, and the nomogram C-index = 0.644 was constructed, indicating that it also has a good prognostic ability. Stage III&IV patients’ risk score is higher than I&II one’s with a long span, as seen from the nomogram. It means high-risk patients’ primary tumor is larger, metastasis of the tumor in the distant organs and tissues is farther and the involvement of local lymph nodes is worse, considering from the pathological aspect. The analysis from the validation data set confirmed the results. Although some people have investigated these genes (Luyao et al., 2017), and proposed a variety of biological beacons for HCC, including protein symbols (Nan et al., 2018), RNA symbols (Zhou et al., 2020a), 22-gene signature (Zhou et al., 2020b), 4-lncRNA signature (Zhang et al., 2016a), but no one has constructed a gene signature for the target genes of differentially expressed miRNAs. The 5-gene signature was finally grouped based on the risk score, and 8 GSEA pathways that differentiated into high and low groups enriched by HCC genes were found.
In the study, the immune status of HCC patients was linked with gene expression, and the proportion of 22 immune cells was analyzed. It was observed that Macrophages M2, which has the highest level of infiltration in patients, had a significant decrease in infiltration level in the high-risk group. According to research, Macrophages M2 activity can promote cell proliferation and tissue repair (Mills, 2012). At the same time, it was observed that the infiltration level of macrophages M0 showed a significant upward trend from low risk to high risk, which may be caused by the differentiation of M2 to M0. T cell CD4 memory resting after stimulation in culture produced a wide range of T-cell cytokines, inhibiting spontaneous tumour necrosis factor-alpha (Brennan et al., 2008). And it shown that T cell CD8 is involved in the process of liver cirrhosis, a high-risk factor for liver cancer (Behboudi and Pereira, 2010). These research conclusions are consistent with the results of the work. This study links the target genes of miRNA with the level of immune cell infiltration, observes the changes in the level of immune cell infiltration, enhances the accuracy of the prognosis of gene signatures, and also investigates the immune infiltration of HCC.
The study innovatively selected the differential target genes of miRNAs differentially expressed in HCC patients as the research object, and applied WGCNA technology to find the core differential target genes to ensure the correctness of subsequent analysis and provide a guarantee for the prognostic ability of gene signatures. At the same time, LASSO regression analysis is used to construct gene signatures, which avoids the excessive punishment of coefficients. In the process of constructing the gene signature, the gene is screened to ensure that the obtained gene signature is highly correlated with the patient’s survival prognosis. Finally, AJCC_Stage and survival time are used as variables to construct a nomogram, which further improves the prognostic ability.
In summary, the paper revealed five hub genes that significantly associated with patients’ prognosis and immune cell infiltration in HCC. And the paper constructed a 5-gene signature (CDCA8, SLC41A3, PPM1G, TCOF1, GRPEL2) and a nomogram based on the differential target genes of miRNA-126-3p, which have been verified to have a good prognostic ability. The comprehensive evaluation of patient gene signature may help guide us to more effective immunotherapy strategies.
Author Contributions: Conceptualization, Qingjia Chi; Methodology, Qingjia Chi and Rui Xu; Validation, Rui Xu and Qibiao Wu; Formal Analysis, Rui Xu and Qibiao Wu; Investigation, Yuhan Gong, Yongzhe Wu; Writing–Original Draft Preparation, Rui Xu and Qibiao Wu; Writing–Review & Editing, Yuhan Gong, Yongzhe Wu; Visualization, Rui Xu and Qibiao Wu; Supervision, Qingjia Chi; Project Administration, Qingjia Chi and Da Sun; Funding Acquisition, Qingjia Chi and Da Sun. All authors reviewed the results and approved the final version of the manuscript.
Availability of Data and Materials: The datasets analysed during the current study are available in the TCGA database (https://cancergenome.nih.gov). miRwalk 2.0 website (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2). GEO database (https://www.ncbi.nlm.nih.gov/geo). Cibersort website (https://CIBERSORT.stanford.edu/).
Funding Statement: The study was supported by grants from the National Natural Science Foundation of China (51901160).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|