AGAC on 84 Cancer Literatures


We applied AGAC to a PubMed-wide cancer literature, and used the annotations as features to perform a large scale mutation triples prediction, i.e., “gene-LOF/GOF-disease” triple.


Pipeline: The Prediction of Mutation Triples from 84 Cancers Based on AGAC

  • Train a NER-Relation joint learning model based on AGAC corpus
  • Extract literatures related of 84 cancers from PubMed from MESH TERM query
  • Convert literatures to the format of the joint learning model
  • Combine with PubTator to extract mutation triples
  • Finally we obtain 28,895 pairs of mutation triples

Statistics of Predicted Mutation Triple

Gene, disease, triplet statistics

Statistics of genes, diseases, triplets.


A total of 84 cancers and 4,832 genes were involved in the 28895 triplets extracted


Data Example and Full Data Downloading

Gene ID Function Change Disease MESH ID PMID
970 GOF D065646 28383817
2475 GOF D065646 29301825, 25295501
3791 GOF D065646 29615459
3815 LOF D065646 18622894
7157 GOF D065646 9768682,26376962
7157 LOF D065646 28068873,9661637
4176 COM D065646 15899946

The above table shows the results obtained by applying the mutation event extraction model based on the AGAC corpus to 84 cancers.


Contact Us

College of Informatics
Huazhong Agricultural Univ
Wuhan, Hubei 430070
China
Jingbo Xia, xiajingbo.math@gmail.com
Kaiyin ZHou, zhoukaiyinhzau@gmail.com
Yuxing Wang, yuxingwang.www@gmail.com

%d bloggers like this: