《Cancer Alterome,文献资源如何有助于癌症病理学的精细化解释》

“This study introduces Cancer-Alterome, a literature-mined dataset that focuses on the regulatory events of an organism’s biological processes or clinical phenotypes caused by genetic alterations. It empowers investigation of cancer pathology, enabling tracking of relevant literature support.”

“本研究介绍了Cancer-Alterome,这是一个通过文献挖掘得到的数据集,专注于研究生物体因遗传变异而导致的生物过程或临床表型的调控事件。它加强了癌症病理学的研究,并使得相关文献支持的追踪成为可能。”


Behind the paper (https://www.nature.com/articles/s41597-024-03083-9):


Cancer has long been a significant global health concern, posing a serious threat to human health and life. The occurrence and progression of cancer are often associated with complex regulatory events brought about by genetic alterations. Meanwhile, gaining a deeper understanding of the molecular mechanisms behind these regulatory events holds the promise of treating and overcoming cancer.

译文(KIMI大模型网页免费版):癌症长期以来一直是全球重大的健康问题,对人类健康和生命构成了严重威胁。癌症的发生和进展通常与由基因改变引起的复杂调控事件有关。同时,深入了解这些调控事件背后的分子机制有望治疗和克服癌症。


Our team has been dedicated to developing text mining methods to capture regulatory events brought about by genetic alterations from literature, thus aiding in the elucidation of fine-grained disease mechanisms. The Cancer-Alterome[2] represents a milestone in this series of studies. This work relies on our previously developed AGAC[1] (Annotation of Genes with Alteration-Centric function changes) corpus. It defines eight categories of biological concepts spanning from molecular to cellular levels, including genes, mutations, molecular process activities, cellular process activities, diseases, and more. Additionally, it defines two types of relationships to compose alteration-centric biological regulatory events, e.g. “ThemeOf” relationships between genes and alteration, and “CauseOf” relationships between alteration and downstream events. The Alteration caused Regulatory Events (GARE) in Cancer-Alterome are a continuation of this definition.

译文(KIMI大模型网页免费版):我们的团队一直致力于开发文本挖掘方法,从文献中捕捉由基因改变引起的调控事件,从而帮助阐明疾病的细粒度机制。Cancer-Alterome[2]代表了这一系列研究的里程碑。这项工作依赖于我们之前开发的AGAC[1](基于改变中心的基因功能变化注释)语料库。它定义了从分子到细胞水平的八个生物概念类别,包括基因、突变、分子过程活动、细胞过程活动、疾病等。此外,它还定义了两种类型的关系来组成以改变为中心的生物调控事件,例如基因与改变之间的“ThemeOf”关系,以及改变与下游事件之间的“CauseOf”关系。Cancer-Alterome中的改变引起的调控事件(GARE)是这一定义的延续。


We are proud to introduce Cancer-Alterome (http://lit-evi.hzau.edu.cn/PanCancer/), a finely curated resource of cancer pathology descriptions constructed from scientific literature. In the creation of this resource, we utilized a series of mature text mining methods to gather and process cancer-related scientific literature. Leveraging the AGAC corpus, we completed the definition and precise capture of GARE. Ultimately, data repositories and web services are provided to facilitate domain experts’ utilization and further development of this resource (Fig. 1). 我们很自豪地介绍Cancer-Alterome,

译文(KIMI大模型网页免费版):这是一个从科学文献构建的精细策划的癌症病理描述资源。在创建这一资源的过程中,我们使用了一系列成熟的文本挖掘方法来收集和处理与癌症相关的科学文献。利用AGAC语料库,我们完成了GARE的定义和精确捕捉。最终,提供数据存储库和网络服务,以促进领域专家利用和进一步发展这一资源(图1)。

Figure 1. Data processing details of the pipeline with an example.
The Cancer-Alterome represents a pioneering effort in achieving such comprehensive and granular coverage of cancer literature to date. Specifically, Cancer-Alterome focuses on 32 pan-cancer types defined in TCGA, gathering a staggering 4,354K articles and ultimately extracting 16,581k records of GARE. On average, each cancer entails 521K regulatory events. These records encompass 21K human genes, 136K dbSNP-standardized genetic mutations, and descriptions of 20K genetic alterations. Moreover, all downstream events have been standardized to over 4K GO terms, 2K HPO terms, and 146K MeSH terms. Finally, Cancer-Alterome’s web services (http://lit-evi.hzau.edu.cn/PanCancer/) and data repository (https://github.com/YaoXinZhi/Cancer-Alterome) have been made available to the scientific community. 

The downstream applications of Cancer-Alterome hold promising prospects, and our team has made meaningful attempts in this regard. On one hand, this resource can provide literature and statistical support for the mechanistic roles and biological correlations of key biomarkers in diseases, much like what Cancer-Alterome has achieved. On the other hand, the integration of mutation regulatory events described in the literature with multi-omics data holds the potential to further pinpoint crucial disease biomarkers, thereby facilitating knowledge discovery. In the work of 2021[3], in order to integrate heterogeneous mutation data, we proposed the “Gene-Disease Association Prediction through Mutation Data Bridging (GDAMDB)” pipeline and established a statistical generative model. This model can learn the distribution parameters of mutation associations and mutation types, and identify false-negative GWAS mutations that are supported by evidence representing functional biological processes in the literature but were not significant in conventional tests. Recently, our work[4] has further combined GARE knowledge with sequence analysis data, proposing a Bayesian deep learning model called PheSeq, which enhances and interprets association studies by integrating and perceiving phenotypic descriptions. This model also generates a vast dataset of association evidence, opening new possibilities for interpreting and exploring gene-disease associations.

译文(KIMI大模型网页免费版):Cancer-Alterome的下游应用前景广阔,我们的团队在这方面进行了有意义的尝试。一方面,这一资源可以为疾病中关键生物标志物的机制作用和生物学相关性提供文献和统计支持,就像Cancer-Alterome所实现的那样。另一方面,将文献中描述的突变调控事件与多组学数据相结合,有可能进一步精确识别关键疾病生物标志物,从而促进知识发现。在2021年的工作[3]中,为了整合异质性突变数据,我们提出了“通过突变数据桥接预测基因-疾病关联(GDAMDB)”流程,并建立了一个统计生成模型。该模型可以学习突变关联和突变类型的分布参数,并识别在传统测试中不显著但在文献中代表功能生物学过程的证据支持的假阴性GWAS突变。最近,我们的工作[4]进一步将GARE知识与序列分析数据相结合,提出了一种名为PheSeq的贝叶斯深度学习模型,通过整合和感知表型描述来增强和解释关联研究。该模型还生成了一个庞大的关联证据数据集,为解释和探索基因-疾病关联开辟了新的可能性。


Finally, we are delighted to share our work with the scientific community and domain experts in a prestigious journal like Scientific Data. We sincerely hope that this resource can provide valuable research groundwork and further insights for the community.

译文(KIMI大模型网页免费版):最后,我们很高兴能在这样的权威期刊Scientific Data上与科学界和领域专家分享我们的工作。我们真诚地希望这一资源能为社区提供宝贵的研究基础和进一步的洞察。


  • Reference
  • 1.Yuxing Wang, Kaiyin Zhou, Jin-Dong Kim, Kevin Cohen, Mina Gachloo, Yuxin Ren, Shanghui Nie, Xuan Qin, Panzhong Lu, Jingbo Xia. An Active Gene Annotation Corpus and Its Application on Anti-epilepsy Drug Discovery. BIBM 2019: International Conference on Bioinformatics & Biomedicine. Page: 512-519, San Diego, U.S, Nov, 2019.
  • 2.Xinzhi Yao, Zhihan He, Yawen Liu, Yuxing Wang, Sizhuo Ouyang, and Jingbo Xia. Cancer-Alterome: a literature-mined resource for regulatory events caused by genetic alterations in cancer. Scientific Data. 2024, 11:265. DOI: 10.1038/s41597-024-03083-9.
  • 3.Kaiyin Zhou#, Yuxing Wang#, Kevin Bretonnel Cohen, Jin-Dong Kim, Xiaohang Ma, Zhixue Shen, Xiangyu Meng, Jingbo Xia. Bridging Heterogeneous Mutation Data to Enhance Disease-Gene Discovery. Briefing in Bioinformatics, 2021, bbab079.
  • 4.Xinzhi Yao, Sizhuo Ouyang, Yulong Lian, Qianqian Peng, Xionghui Zhou, Feier Huang, Xuehai Hu, Feng Shi and Jingbo Xia. PheSeq, A Bayesian Deep Learning Model to Enhance and Interpret the Gene Disease Association Studies. Genome Medicine, 2024, 16:56. DOI: 10.1186/s13073-024-01330-7.

机器翻译-译文(KIMI大模型网页免费版)

原文链接:https://communities.springernature.com/posts/cancer-alterome-how-literature-resources-contribute-to-the-refined-interpretation-of-cancer-pathology (Written by Zhihan He, Xinzhi Yao, Jingbo Xia)

Leave a comment