False gene and chromosome losses in genome assemblies caused by GC content variation and repeats
Kim, J; Lee, C; Ko, BJ; Yoo, DA; Won, S; Phillippy, AM; Fedrigo, O; Zhang, GJ; Howe, K; Wood, J; Durbin, R; Formenti, G; Brown, S; Cantin, L; Mello, CV; Cho, S; Rhie, A; Kim, H; Jarvis, ED
2022
发表期刊GENOME BIOLOGY
ISSN1474-760X
卷号23期号:1
摘要Background Many short-read genome assemblies have been found to be incomplete and contain mis-assemblies. The Vertebrate Genomes Project has been producing new reference genome assemblies with an emphasis on being as complete and error-free as possible, which requires utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation. A more thorough evaluation of the recent references relative to prior assemblies can provide a detailed overview of the types and magnitude of improvements. Results Here we evaluate new vertebrate genome references relative to the previous assemblies for the same species and, in two cases, the same individuals, including a mammal (platypus), two birds (zebra finch, Anna's hummingbird), and a fish (climbing perch). We find that up to 11% of genomic sequence is entirely missing in the previous assemblies. In the Vertebrate Genomes Project zebra finch assembly, we identify eight new GC- and repeat-rich micro-chromosomes with high gene density. The impact of missing sequences is biased towards GC-rich 5 '-proximal promoters and 5 ' exon regions of protein-coding genes and long non-coding RNAs. Between 26 and 60% of genes include structural or sequence errors that could lead to misunderstanding of their function when using the previous genome assemblies. Conclusions Our findings reveal novel regulatory landscapes and protein coding sequences that have been greatly underestimated in previous assemblies and are now present in the Vertebrate Genomes Project reference genomes.
收录类别sci
语种英语
文献类型期刊论文
条目标识符http://ir.kiz.ac.cn/handle/152453/13437
专题科研部门_遗传多样性基因组学研究组(张国捷)
推荐引用方式
GB/T 7714
Kim, J,Lee, C,Ko, BJ,et al. False gene and chromosome losses in genome assemblies caused by GC content variation and repeats[J]. GENOME BIOLOGY,2022,23(1).
APA Kim, J.,Lee, C.,Ko, BJ.,Yoo, DA.,Won, S.,...&Jarvis, ED.(2022).False gene and chromosome losses in genome assemblies caused by GC content variation and repeats.GENOME BIOLOGY,23(1).
MLA Kim, J,et al."False gene and chromosome losses in genome assemblies caused by GC content variation and repeats".GENOME BIOLOGY 23.1(2022).
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
2023031531.pdf(8635KB)期刊论文出版稿开放获取CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Kim, J]的文章
[Lee, C]的文章
[Ko, BJ]的文章
百度学术
百度学术中相似的文章
[Kim, J]的文章
[Lee, C]的文章
[Ko, BJ]的文章
必应学术
必应学术中相似的文章
[Kim, J]的文章
[Lee, C]的文章
[Ko, BJ]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。