国际医药卫生导报 ›› 2022, Vol. 28 ›› Issue (13): 1864-1871.DOI: 10.3760/cma.j.issn.1007-1245.2022.13.018

• 论著 • 上一篇    下一篇

基于TCGA筛选结肠癌预后相关lncRNA及建立预后风险模型

何天  曹天生   

  1. 广州市花都区人民医院胃肠外科,广州 510800
  • 收稿日期:2022-04-23 出版日期:2022-07-01 发布日期:2022-07-01
  • 通讯作者: 曹天生,Email:caotiansheng2088@sina.com
  • 基金资助:
    广州市花都区人民医院院内科研基金(2020B04)

Screening of lncRNA related to prognosis of colon cancer based on TCGA database and establishment of prognostic risk model 

He Tian, Cao Tiansheng   

  1. Department of Gastroenterological Surgery, People's Hospital of Huadu District, Guangzhou 510800, China
  • Received:2022-04-23 Online:2022-07-01 Published:2022-07-01
  • Contact: Cao Tiansheng, Email: caotiansheng2088@sina.com
  • Supported by:
    Scientific Research Fand of People's Hospital of Huadu District (2020B04)

摘要: 目的 筛选与结肠癌预后相关长链非编码RNA(lncRNA),并构建结肠癌预后风险模型。方法 数据提取时间:建库至2022年3月1日。从癌症基因组图谱(TCGA)数据库下载并整理结肠癌转录组数据,构建配对样本lncRNA表达矩阵,利用“edgeR”R包筛选获得差异表达lncRNA(DElncRNA)。对DElncRNA先后行COX回归模型单变量分析、Lasso回归分析、Kaplan-Meier(K-M)生存分析、多元COX回归模型分析,获取预后相关lncRNA。依据多元COX回归模型中回归系数构建结肠癌预后风险模型。通过C指数值、时间依赖的受试者工作特征曲线(ROC)和ROC下的面积(AUC)及K-M生存分析评估模型预测的准确性。对模型中lncRNA构建竞争性内源RNA(ceRNA)网络,对相关的mRNA进行基因本体论(GO)、京都基因与基因组大百科全书数据库(KEGG)富集分析,探索lncRNA影响结肠癌进展的机制。结果 整理转录组数据得到5 460个lncRNA,配对样本分析获得DElncRNA 868个,其中上调548个、下调320个。单变量COX回归分析后获得40个lncRNA,经Lasso回归分析过滤共线性因素,得到lncRNA 34个,K-M生存分析后,得出14个候选lncRNA。再进行多元COX回归分析,得到7个预后相关lncRNA(下调:LINC01132;上调:ELFN1-AS1、RP5-884M6.1、LINC00461、RP1-79C4.4、RP4-816N1.7、RP3-380B8.4),依据回归系数构建预后风险模型。模型的C指数值为0.82;3年和5年的AUC值分别为0.79、0.84;进行K-M生存分析提示高低风险组生存率差异有统计学意义(P<0.000 1)。随后构建ceRNA网络,通过KEGG富集分析提示下调lncRNA可能是通过肌动蛋白细胞骨架的调控、癌症中蛋白聚糖、PI3K-Akt信号通路等抑制结肠癌进展,上调lncRNA可能是通过细胞粘附分子、局灶性粘连、吞噬体等通路促进结肠癌进展。结论 本研究构建了一个包含7个lncRNA的结肠癌预后风险模型,具有较好预测患者生存预后准确性,每个lncRNA是潜在单独的预后生物标志物,对临床上结肠癌患者预后评估具有一定参考价值。

关键词: 结肠癌, 结直肠癌, TCGA, lncRNA, 预后模型

Abstract: Objective To screen long non-coded RNA (lncRNA) associated with the prognosis of colon cancer, and to build a prognostic risk model of colon cancer. Methods The data were collected from the establishment to March 1,2022. The transcriptome data of colon cancer were downloaded and sorted from The Cancer Genome Atlas (TCGA), then we constructed an expression matrix of lncRNA about paired samples. Differentially expressed lncRNAs (DElncRNAs) were obtained by R-packet "edgeR". For DElncRNAs, univariate COX regression analysis, Lasso regression analysis, Kaplan-Meier (K-M) survival analysis, and multivariate COX regression analysis were performed to obtain the prognostic associated lncRNAs. The prognostic risk model of colon cancer was established based on the coefficient of multivariate COX regression model. Then we evaluated the accuracy through C-index value, time-dependent receiver operating characteristic curve (ROC), area under ROC (AUC) value, and K-M survival analysis. CeRNA network was constructed for the lncRNAs in our model. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were performed for related mRNAs to explore the mechanism of lncRNA affecting the progression of colon cancer. Results Five thousand four hundred and sixty lncRNAs were screened by arranging the transcriptome data. Eight hundred and sixty-eight DElncRNAs were obtained by paired-sample analysis, including 548 up-regulated genes and 320 down-regulated genes. After univariate COX regression analysis, 40 lncRNAs were obtained. Through lasso regression analysis, we got 34 lncRNAs. Fourteen lncRNAs remained after K-M survival analysis. Multivariate COX regression analysis revealed 7 prognostic related lncRNAs (down-regulated genes: LINC01132; up-regulated genes: ELFN1-AS1, RP5-884M6.1, LINC00461, RP1-79C4.4, RP4-816N1.7, and RP3-380B8.4). The prognostic assessment model was constructed according to the regression coefficient. The C-index value of the model was 0.82; the AUC values at 3 and 5 years were 0.79 and 0.84; K-M survival analysis showed a statistical difference in the survival rate between the high and low risk groups (P<0.000 1). Next, we constructed the ceRNA network, and the KEGG enrichment analysis suggested that the down-regulation lncRNA inhibited the progression of colon cancer possibly through the pathways of regulation of actin cytoskeleton, proteoglycans in cancer, and PI3K-Akt signaling pathway; up-regulation lncRNAs promoted colon cancer possibly through the pathways of cellular adhesion molecules, focal adhesions, and phagosomes. Conclusions In our study, we constructed a prognostic risk model of colon cancer with 7 lncRNAs. It has a nice accuracy in predicting the patients' survival prognosis. Each lncRNA is a potential independently prognostic biomarker. The prognostic risk model has certain value for clinical prognostic assessment of colon cancer patients.

Key words: Colon cancer, Colorectal cancer, TCGA, lncRNA, Prognostic model