We applied AGAC to a PubMed-wide cancer literature, and used the annotations as features to perform a large scale mutation triples prediction, i.e., “gene-LOF/GOF-disease” triple.
Pipeline: The Prediction of Mutation Triples from 84 Cancers Based on AGAC
- Train a NER-Relation joint learning model based on AGAC corpus
- Extract literatures related of 84 cancers from PubMed from MESH TERM query
- Convert literatures to the format of the joint learning model
- Combine with PubTator to extract mutation triples
- Finally we obtain 28,895 pairs of mutation triples
Statistics of Predicted Mutation Triple

Statistics of genes, diseases, triplets.
A total of 84 cancers and 4,832 genes were involved in the 28895 triplets extracted
Data Example and Full Data Downloading
Gene ID | Function Change | Disease MESH ID | PMID |
---|---|---|---|
970 | GOF | D065646 | 28383817 |
2475 | GOF | D065646 | 29301825, 25295501 |
3791 | GOF | D065646 | 29615459 |
3815 | LOF | D065646 | 18622894 |
7157 | GOF | D065646 | 9768682,26376962 |
7157 | LOF | D065646 | 28068873,9661637 |
4176 | COM | D065646 | 15899946 |
… | … | … | … |
The above table shows the results obtained by applying the mutation event extraction model based on the AGAC corpus to 84 cancers.
Contact Us
College of Informatics
Huazhong Agricultural Univ
Wuhan, Hubei 430070
China
Jingbo Xia, xiajingbo.math@gmail.com
Kaiyin ZHou, zhoukaiyinhzau@gmail.com
Yuxing Wang, yuxingwang.www@gmail.com