KMS KUNMING INSTITUTE OF ZOOLOGY.CAS
基于多组学大数据整合分析挖掘人类疾病相关基因及其功能 研究 | |
申秋硕![]() | |
学位类型 | 博士 |
2019-01 | |
学位授予单位 | 中国科学院大学 |
学位授予地点 | 北京 |
学位名称 | 医学博士 |
关键词 | 整合分析,疾病,干细胞,肿瘤,高原适应 integrative Analysis, Disease, Scz, Cancer, High-altitude Adapttion |
摘要 | 随着以第二代测序技术为标志的高通量筛查技术的快速发展,生命科学研究尤其是人类疾病的研究已经进入了大数据时代。转录组学、蛋白质组学、表观基因组学、代谢组学和互作组学等组学的长足进步使得人们能从mRNA、蛋白质、表观遗传修饰、物质代谢和生物分子相互作用等不同层次对生命现象进行多维视角的理解与阐述。正是得益于高通量的组学大数据分析,研究人员挖掘到了一系列与人类疾病相关的潜在基因靶点。然而,人类疾病的发生与发展是一个涉及到基因组、转录组、表观组、蛋白组及代谢组等多个不同层次的复杂的病理过程,单组学数据的分析往往只能体现出疾病样本其中一个层面的变化,这极大限制了组学大数据在筛选疾病靶点中的应用。因此对组学大数据进行整合分析能够有效去除单个组学层面的随机事件,观察到真正参与疾病发生发展过程的候选因子,并对不同因子在不同时间和不同空间的影响进行机制探究,从而为疾病的靶向治疗提供更为有效的措施。本文就是通过基于多组学大数据进行整合分析筛选人类疾病相关基因,并解析其在肿瘤和干细胞中的功能。尽管最近的全基因组关联研究(GWAS)已经确定了多个与精神分裂症(SCZ)相关的风险位点,但是对潜在的风险基因进行精确定位仍然是一个重大的挑战。在本文中,我们利用Sherlock整合分析的方法,筛选到了ALMS1、GLT8D1和CSNK2B是潜在的SCZ的风险基因,并通过独立的人脑表达数量性状基因座(eQTLs)数据和综合分析方法SMR对鉴定结果进行了验证。此外,与健康对照组相比,临床精神分裂症患者中ALMS1、GLT8D1和CSNK2B这三个基因的表达紊乱数据进一步支持了这三种基因在精神分裂症发病机制中的潜在作用。由于精神分裂症是一种神经发育障碍,我们在神经干细胞中进行了深入的研究,发现敲低GLT8D1和CSNK2B可显著地促进了神经干细胞的增殖,抑制了神经干细胞的分化能力,并且改变了神经元的形态和突触传递。所以,通过整合分析SCZ的GWAS数据、eQTL数据、转录组数据和互作网络数据等多组学数据,我们认为ALMS1、CSNK2B和GLT8D1基因可能通过调节神经干细胞的功能调控了SCZ的病理过程,是SCZ的风险基因。而GLT8D1和CSNK2B在神经干细胞中的功能结果,提示其在神经系统肿瘤(肿瘤干细胞)中也有潜在调控能力,深入的研究工作正在开展当中。鉴定与肿瘤相关的分子及通路变化是了解肿瘤发生机制的重要研究内容,而转录组学是研究该内容的一种重要手段。但是由于肿瘤组织的异质性限制了人们对差异表达基因(DEGs)的识别和癌症发生机制的理解,这主要是因为目前的方法过于依赖于交叉样本归一化和/或分布假设的分析方法,而这两种方法都对异质性数据很敏感。通过与中国科学院昆明动物研究所孔庆鹏课题组合作,我们开发了一种新的整合转录组学数据的分析方法,交叉值关联分析(Cross-Value Association Analysis,CVAA)用于鉴定DEGs。我们将CVAA应用于包含5540个转录组学数据在内的复杂的泛癌数据进行整合分析,发现了许多新的DEGs,并筛选鉴定出在多种肿瘤中相似的调控途径/过程,其中一些在体外和体内都被证实在肿瘤发生中起着至关重要的作用,例如酒精代谢(ADH1B)、染色体重构(NCAPH)和补体系统(Adipsin)。而为进一步证明CVAA整合分析方法的准确性,我们对整合分析筛选出来的基因非SMC凝聚素I复合体亚基H(NCAPH)在结肠癌(CC)中的功能进行了探究验证。通过免疫组化和苏木精-伊红染色,我们发现结肠癌组织中的NCAPH明显高于所有对应的癌旁组织,我们同样发现NCAPH在大肠癌细胞系中与正常人结肠上皮细胞相比有较高的表达,并在CC患者中发现了许多NCAPH突变。在结肠癌细胞系中敲低NCAPH的表达抑制了CC细胞的增殖、迁移和体内移植瘤的形成。此外,NCAPH基因敲低可促进细胞凋亡和诱导G2/M期细胞周期阻滞。有趣的是,在结肠癌组织中的NCAPH高表达组比低表达组有着更好的预后和生存率,这表明NCAPH高表达一方面促进了结肠癌细胞的增殖,另一方面也可能使结肠癌细胞对化疗或放射治疗产生敏感反应。总之,我们通过开发新的分析方法CVAA对泛癌转录组学数据进行整合分析鉴定出一系列与肿瘤相关的DEGs,如:NCAPH,并通过功能实验验证了其在结肠癌中的功能,证明了CVAA的准确性。 除了针对疾病组织(如肿瘤组织)的转录组、全基因组关联研究等多组学整合数据分析,我们也开展了针对肿瘤特别是实体瘤存在局部低氧的特性,以高原低氧适应为依托,探索低氧适应相关基因在肿瘤中的功能。通过与中国科学院昆明动物研究所施鹏课题组合作,我们对高原鼢鼠和鼠兔的基因组进行了新的更高精度解析,并对已发表的牦牛和藏羚羊的基因组数据及其平原临近种的基因组数据进行整合分析,发现了一个低氧适应基因RETSAT(数据未发表),功能实验验证,高原突变型小鼠(Q280R)有着更强大的心血管系统,更能适应低氧环境。考虑到低氧适应基因在肿瘤中也有着重要的作用(结合本实验室未发表数据),我们通过网络数据库发现RETSAT在皮肤癌中突变率很高,而通过DMBA/TPA诱导的SCC模型发现,高原突变型小鼠明显降低了DMBA/TPA诱导的SCC敏感性,这表明低氧适应基因RETSAT在肿瘤中起着一定的作用。综上所述,通过基于多组学的整合分析,我们挖掘出了许多人类疾病相关基因,如:GLT8D1和CSNK2B与通过调控干细胞的功能影响SCZ及胶质瘤的发生发展,NCAPH和RETSAT影响结直肠癌及皮肤SCC的进程,并验证了这些基因在人类疾病中的功能。由此可见,整合分析将会在未来人类疾病研究中发挥越来越重要的作用。 |
其他摘要 | With the rapid development of high-throughput screening technology marked by the second-generation sequencing technology, life science research, especially the study of human diseases, has entered the big data era. The advances in transcriptome, proteomics, epigenomics, metabonomics and interaction genomics have made it possible to understand life with multi-dimensional perspective from mRNA, proteins, metabolism and molecular interaction levels. Due to high-throughput analysis by big data, researchers have unearthed a range of potential genetic targets associated with human disease. However, the pathogenesis of human disease is a complex pathological process involving many different levels, such as genome, transcriptional group, epidermis, proteome and metabolism. The analysis of monoclinic data can only reflect the change of one level of disease samples, which greatly limits the application of big data in screening disease targets. Therefore, the integrative analysis of muti-omic data can effectively remove the random events at the single stratigraphic level and observe the candidate factors that are really involved in the process of human disease and explore the mechanism of different factors in different time and different space, so as to provide more effective measures for the targeted treatment of disease. In this paper, we excavate many genes associated to human diseases, such as schizophrenia and tumor, based on the integrative analysis of muti-omic data, and verify the function of these genes in diseases.Recent genome-wide association studies (GWAS) have identified multiple risk loci that show strong associations with schizophrenia. However, pinpointing the potential causal genes at the reported loci remains a major challenge. Here we identify candidate causal genes for schizophrenia using an integrative approach. Sherlock integrative analysis shows that ALMS1, GLT8D1, and CSNK2B are schizophrenia risk genes, which are validated using independent brain expression quantitative trait loci (eQTL) data and integrative analysis method (SMR). Consistently, gene expression analysis in schizophrenia cases and controls further supports the potential role of these three genes in the pathogenesis of schizophrenia. Finally, we show that GLT8D1 and CSNK2B knockdown promote the proliferation and inhibit the differentiation abilities of neural stem cells, and alter morphology and synaptic transmission of neurons. These convergent lines of evidence suggest that the ALMS1, CSNK2B, and GLT8D1 genes may be involved in pathophysiology of schizophrenia.Heterogeneity in transcriptional data hampers the identification of differentially expressed genes (DEGs) and understanding of cancer, essentially because current methods rely on cross-sample normalization and/or distribution assumption—both sensitive to heterogeneous values. Here, we developed a new method, Cross-Value Association Analysis (CVAA), which overcomes the limitation and is more robust to heterogeneous data than the other methods. Applying CVAA to a more complex pan-cancer dataset containing 5,540 transcriptomes discovered numerous new DEGs and many previously rarely explored pathways/processes; some of them were validated, both in vitro and in vivo, to be crucial in tumorigenesis, e.g., alcohol metabolism (ADH1B), chromosome remodeling (NCAPH) and complement system (Adipsin). In order to confirm the accuracy of CVAA, we explore the roles of NCAPH in colon cancer. Using tissue microarrays by immunohistochemistry and hematoxylin–eosin staining, we found that non-SMC condensin I complex subunit H (NCAPH) in colon cancerous tissues was higher than that in corresponding adjacent non-cancerous tissues. We then characterized the exact function of the NCAPH in CC. We provided evidences showing that NCAPH is highly expressed in colonic cancer cell lines comparing with normal human colonic epithelial cells, and identified many NCAPH mutations in CC patients. We found that depletion of NCAPH inhibits CC cell proliferation, migration in vitro and xenograft tumor formation in vivo. Furthermore, NCAPH knockdown promotes cell apoptosis and cell cycle arrest at G2/M phase. Interestingly, the NCAPH high expression in tumor tissues of colon patients had a significantly better prognosis and survival rate than low-expression patients, suggesting that NCAPH high expression promotes colonic cancerous cell proliferation; on the other hand, it may also sensitize these cells responding to chemo- or radio-therapies. Together, we developed CVAA to integratively analyze transcriptome of pan-cancer.In high-altitude adaptation study, we de novo sequence the genomes of plateau zokor (Myospalax baileyi) and plateau pika (Ochotona curzoniae), while together with the published genomes of yak (Bos gruniens) and Tibetan antelope (Pantholops hodgsonii). Comparing with their lowland relatives, we predicted that an enzyme Retinol Saturase (RETSAT), with a parallel site of Q247R in all the four QTP mammals, might be functional related to hypoxia adaptation. We further found that KI-mice present better cardiovascular system and ability of adaption to acute hypoxia. Interestingly, RETSAT mutation frequency in cancers is high and RETSAT KI mice show decreased sensitivity to DMBA/TPA-induced SCC .In summary, we discover many genes related to human diseases, such as GLT8D1 and CSNK2B in SCZ, NCAPH and RETSAT in tumor, and verified the function of these genes in human diseases through integrative analysis of muti-omic data . Thus, integrative analysis will play an increasingly important role in the future human disease research. |
学科门类 | 病理学与病理生理学 |
语种 | 中文 |
文献类型 | 学位论文 |
条目标识符 | http://ir.kiz.ac.cn/handle/152453/12642 |
专题 | 昆明动物研究所 科研部门_动物模型与人类重大疾病机理重点实验室 科研部门_肿瘤信号转导(陈勇彬) |
推荐引用方式 GB/T 7714 | 申秋硕. 基于多组学大数据整合分析挖掘人类疾病相关基因及其功能 研究[D]. 北京. 中国科学院大学,2019. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
201618010415005申秋硕 【(6489KB) | 学位论文 | 开放获取 | CC BY-NC-SA | 请求全文 |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[申秋硕]的文章 |
百度学术 |
百度学术中相似的文章 |
[申秋硕]的文章 |
必应学术 |
必应学术中相似的文章 |
[申秋硕]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论