KIZ OpenIR
利用转录组数据探究人类复杂疾病
王倩
学位类型博士
2018-06
学位授予单位中国科学院大学
学位授予地点北京
学位名称理学博士
关键词转录组整合分析,帕金森病,阿茨海默症,癌症总生存期,T细胞受体信号通路 transcriptome Integrative Analysis, Parkinson's Disease, Alzheimer's Disease, Cancer Overall Survival, t Cell Receptor Signaling Pathway
摘要

转录组指特定组织或细胞在某一发育阶段或功能状态下转录出来的所有RNA的集合,有时特指所有参与翻译蛋白质的mRNA。转录组分析不仅可以探究生物个体不同发育阶段的基因表达情况,还能够揭示人类疾病发生机制。在过去的几十年中,对生理/病理条件下的基因差异表达分析一直是从分子水平研究人类疾病最有效的方法之一。随着二代测序技术的成熟,转录组数据迅速扩增,如何利用数据挖掘的方法从海量数据挖掘出有用的信息是生物信息学者一大挑战。由此,我们开发了一套整合分析的流程,主要通过搜集和整合公共数据库中的基因表达数据探究人类复杂疾病的发病机制。整合分析合并多个同类独立研究的数据,扩大了样本含量,提高了检验的效能,能有效评价结果的一致性,获得新的结果。利用整合分析的方法我们探究了两种人类复杂疾病——神经退行性疾病和癌症的发病机制,并取得一定的进展。1. 帕金森病和阿茨海默症共有的NRF2转录失调机制神经退行性疾病是以神经元退行性病变或凋亡为主要特征的一类疾病的统称,其中两个典型代表是帕金森病和阿尔茨海默综合症。帕金森病和阿茨海默症是两类完全不同的疾病,但是它们却拥有许多相同的临床特征,例如都表现线粒体功能损伤和氧化应激。为详细探究PD和AD两种疾病之间的关系,我们搜集了9组PD相关的人类的基因芯片表达数据集-包括4种大脑组织和7组AD相关的人类的基因芯片表达数据集-包括6种大脑组织进行整合分析。差异表达基因功能富集分析结果表明,PD和AD病人的多个脑区都在突触囊泡循环、氨基丁酸能突触、吞噬体、氧化磷酸化和三羧酸循环通路上有损伤,并且AD在这些通路上富集的差异表达基因数目更多。比较PD和AD各个脑区的差异表达基因,我们得到了≥6个脑区共有的54个差异表达基因,其中31个下调基因的上游启动子序列都包含NRF2结合并调控的抗氧化应激顺式作用元件。NRF2是一个转录调控因子,它能激活上游启动子区域包含抗氧化应激元件基因的表达,从而保护细胞免受氧化应激损伤。我们的研究发现,在PD和AD中,尽管NRF2表达量上调,但是31个包含抗氧化应激元件NRF2的靶基因却是下调的,这与先前的结论相悖。进一步研究发现MAFF在PD和AD病人的所有脑区的表达量均显著上调并且与31个包含抗氧化应激元件下调基因的表达呈负相关。先前的研究表明过表达的小MAFs蛋白能与自身结合形成同源二聚体抑制NRF2的转录调控作用。因此,我们提出这样一个假说:在PD和AD病人中,由于MAFF基因的过表达,MAFF自身形成同源二聚体与MAFF/NRF2形成的异源二聚体竞争性结合ARE区域,抑制了NRF2的转录调控作用,从而抑制抗氧化应激基因的表达。2. T细胞受体信号通路在不同种类的癌症病人的预后中发挥相反的作用总生存期是评估癌症患者生存期能关键终点指标,研究表明癌症发病机制与免疫功能失调密切相关。本研究中,为探究不同种类的癌症病人的总生存期和免疫系统之间的关系,我们分别从TCGA和CBioPortal两个数据库中搜集并下载了22种癌症相关的基因表达数据和临床数据。生存分析采用的是R包Survival中的Cox比例风险回归模型。人类免疫相关基因是从ImmPort和InnateDB两个数据库中下载整合而来的。研究结果表明不同癌症中生存预后基因的数目和具有免疫功能的生存预后基因的风险比率变化较大。功能富集分析结果表明7种癌症病人的生存预后基因与免疫功能密切相关。比较分析这7种癌症病人的免疫生存预后基因,我们筛选出了≥5种癌症共有的55个免疫生存预后基因,其中11个基因参与了T细胞信号通路。在这些基因ZAP70,CD3E,CD3G,CD3D和CD247是TCR信号的触发模块并且都是被LCK磷酸化并激活的。研究结果表明这11个基因的表达量越高越有利于延长宫颈鳞状细胞癌和子宫腺癌、头颈部鳞状细胞癌、乳腺浸润性癌、肉瘤和肺腺癌这5种癌症病人的总生存期,但不利于脑初级胶质瘤癌症病人的生存。我们的研究为癌症的免疫治疗提供了指导作用。 

其他摘要

Transcriptome refers to a set of RNAs which is transcribed by a tissue or cells at a certain developmental or functional state, and sometimes only represents protein-coding transcriptome, mRNA. Transcriptome analysis cannot only explore gene expression status at different developmental stages, but also reveal the pathogenesis of human diseases. Over the past few decades, gene expression analysis under physiological/pathological conditions has been one of the most effective ways to explore pathogenesis of human disease at the molecular level. With the mature of RNA sequencing technologies, transcriptome data is increasingly large. It will be a major challenge to mine useful information from massive data with bioinformatics. Therefore, we develop an integrating analysis protocol to investigate the pathogenesis of complex human diseases by collecting and integrating gene expression data from public databases. The integration analysis incorporates multiple independent research data of the same type, thereby expanding the sample content and improving the efficiency of the test. We use the protocol to explore the pathogenesis of two human diseases, neurodegenerative diseases and cancer, and make some progress.1. Abnormal transcriptional regulation of NRF2 in Parkinson's disease and Alzheimer's diseaseNeurodegenerative disease refers to a class of diseases characterized by neurodegenerative diseases or apoptosis, two typical representatives of which are Parkinson's disease (PD) and Alzheimer's syndrome (AD). PD and AD are two distinct diseases, but they have many similar clinical features, such as mitochondrial dysfunction and oxidative stress.To explore the relationship between the two diseases, we perform a meta-analysis with 9 microarray datasets of PD studies and 7 of AD studies to explore it. Functional enrichment analysis revealed that PD and AD both showed dysfunction in the synaptic vesicle cycle, GABAergic synapses, phagosomes, oxidative phosphorylation, and TCA cycle pathways, but AD had more enriched genes. Comparing the differentially expressed genes between AD and PD, we identified 54 common genes shared by more than six tissues. Among them, 31 down-regulated genes contained the antioxidant response element (ARE) consensus sequence bound by NRF2. NRF2 is a transcription factor, which protects cells against oxidative stress through coordinated up-regulation of ARE-driven genes. To our surprise, although NRF2 was up-regulated, its target genes were all down-regulated. Further exploration found that MAFF was up-regulated in all tissues and significantly negatively correlated with the 31 NRF2-dependent genes in diseased conditions. Previous studies have demonstrated over-expressed small MAFs can form homodimers and act as transcriptional repressors. Therefore, MAFF might play a significant role in dysfunction of NRF2 regulatory network in PD and AD.2. T Cell Receptor signaling pathway plays an opposite role in the prognosis of different cancersOverall survival is a key endpoint measure in the management of patients with cancer and the pathogenesis of cancer has been proven closely linked to the immune system disorders.In this study, we collected 22 types of cancer datasets from TCGA and corresponding clinical data from CBioPortal to investigate the relationship between overall survival and immune genes. Survival analysis was performed with the Cox proportional hazards regression model from survival R package. Lists of human immune genes were collected and downloaded from the two databases, ImmPort and InnateDB. Results showed that the number of prognostic genes and the hazard ratio (HR) of prognostic immune genes (PIGs) in different cancers varied greatly. Functional enrichment analysis identified 7 types of cancer that had a close relationship with the immune system. By comparing the PIGs in the 7 cancers, we screened out 55 key PIGs shared in more than 5 cancers. 11 of 55 PIGs were confirmed to participate in the T Cell Receptor (TCR) signaling pathway in the STRING protein-protein interaction network. High expression of the 11 genes (ZAP70, PTPRC, LCK, ICOS, CD3E, CD3G, CD3D, ITK, CD247, and CD40LG) improved overall survival in the 5 kinds of cancer (Breast Invasive Carcinoma (BRCA), Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC), Head and Neck Squamous Cell Carcinoma (HNSC), Lung Adenocarcinoma (LUAD), and Sarcoma (SARC)), but decreased overall survival in Brain Lower Grade Glioma (LGG). Among these genes, ZAP70, CD3E, CD3G, CD3D, and CD247 were the TCR ‘signal triggering module’ and all activated by LCK. TCR signaling pathway activation correlated with a good prognosis in the five cancers (BRCA, CESC, HNSC, LUAD, and SARC) but a poor prognosis of LGG. Therefore, the states of TCR signaling pathway should be considered seriously when immunotherapy applied to different cancers. 

语种中文
文献类型学位论文
条目标识符http://ir.kiz.ac.cn/handle/152453/12628
专题昆明动物研究所
遗传资源与进化国家重点实验室
结构生物信息学
推荐引用方式
GB/T 7714
王倩. 利用转录组数据探究人类复杂疾病[D]. 北京. 中国科学院大学,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
201518010415030王倩 【导(2732KB)学位论文 开放获取CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[王倩]的文章
百度学术
百度学术中相似的文章
[王倩]的文章
必应学术
必应学术中相似的文章
[王倩]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。