We applied AGAC to a PubMed-wide cancer literature, and used the annotations as features to perform a large scale mutation triples prediction, i.e., “gene-LOF/GOF-disease” triple.
Pipeline: The Prediction of Mutation Triples from 84 Cancers Based on AGAC
- Train a NER-Relation joint learning model based on AGAC corpus
- Extract literatures related of 84 cancers from PubMed from MESH TERM query
- Convert literatures to the format of the joint learning model
- Combine with PubTator to extract mutation triples
- Finally we obtain 28,895 pairs of mutation triples
Statistics of Predicted Mutation Triple
Statistics of genes, diseases, triplets.
A total of 84 cancers and 4,832 genes were involved in the 28895 triplets extracted
Data Example and Full Data Downloading
|Gene ID||Function Change||Disease MESH ID||PMID|
The above table shows the results obtained by applying the mutation event extraction model based on the AGAC corpus to 84 cancers.