, “Annotation of Genes with Alteration-Centric function changes”. — A customized corpus for mining functions caused by mutations.
We are listing all about AGAC in this portal: Corpus annotation idea, data info, data usage in AGAC track as a shared task, baseline Python codes for AGAC track, and downstream applications with AGAC.
AGAC annotates the following labels in biomedical texts, aiming to describe the downstream molecular/cellular physiological mechanism after a gain/loss-of-function mutation.
Annotation Task 1. NER with the following labels:
- Variation (突变)
- Molecular physiological function (分子功能)
- Interaction (互作)
- Pathway (通路)
- Cellular physiological function (细胞功能)
- Positive regulation (正调控)
- Negative regulation (负调控)
- Regulation (调控)
- Protein/Gene (蛋白/基因)
Annotation Task 2. Thematic role with the following:
- Theme of (主事)
- Cause of (致事)
Corpus Guideline Book
Guideline of AGAC trigger words design follows the central dogma and fundamental functioning procedure of molecular biology.
The annotation guideline is released via the following publication.
Ref: Yuxing Wang, et. al. Guideline Design of an Active Gene Annotation Corpus for the Purpose of Drug Repurposing. 2018 11th CISP-BMEI 2018, Oct, 2018, Beijing.
The corpus construction got started at Sep, 2017, and reached its latest version in 2019 when it was released as the training data in AGAC Track in BioNLP-OST, 2019.
- 1) AGAC (v1.0). (Sep, 2017—Aug,2018)
- 2) AGAC (v1.1). (Sep, 2018—Jan,2019)
- 3) AGAC (V2.0). (Feb, 2019 —Aug, 2019)
Kaiyin Zhou, et al. GOF/LOF Knowledge Inference with Tensor Decomposition in Support of High order Link Discovery for Gene, MBE, 2019, 16(3):1376-1391.
Mina Gachloo, et al. A Review of the Drug Knowledge Discovery by Using BioNLP and Tensor or Matrix Decomposition. Genomics & Informatics, 2019, 17(2): e18.
Yuxing Wang, et al. An Active Gene Annotation Corpus and Its Application on Anti-epilepsy Drug Discovery. BIBM 2019: International Conference on Bioinformatics & Biomedicine, San Diego, U.S, Nov, 2019.
Corpus Availability and BioNLP Open Shared Task
The AGAC corpus is stored in PubAnnotation repo:
The corpus served as the data for the AGAC track in BioNLP OST 2019.
Ref: Yuxing Wang, Kaiyin Zhou, Mina Gachloo, Jingbo Xia*. An Overview of the Active Gene Annotation Corpus and the BioNLP OST 2019 AGAC Track Tasks. BioNLP Open Shared Task 2019, workshop in EMNLP-IJCNLP 2019, Hong Kong.
Data with Json form
AGAC corpus is provided with a JSON form.
Data description: https://sites.google.com/view/bionlp-ost19-agac-track/description
Baseline Python Codes
The baseline python codes are released in Github:
Task 2 (Thematic role labeling, aka., shallow semantic parsing) https://github.com/bionlp-hzau/BERT-for-BioNLP-OST2019-AGAC-Task2
AGAC on Alzheimer’s Disease
Ref: Kaiyin Zhou, Yuxing Wang, Kevin Bretonnel Cohen, Jin-Dong Kim, Xiaohang Ma, Zhixue Shen, Xiangyu Meng, Jingbo Xia. Bridging Heterogeneous Mutation Data to Enhance Disease-Gene Discovery. Briefing in Bioinformatics, 2021, doi: 10.1093/bib/bbab079.
AGAC on 84 Cancer Literature
AGAC on Covid-19 Literature
AGAC for Covid-19 Hackathon Project
AGAC for LitCovid literature is listed as one of the project in Biomedical Linked Annotation Hackathon 7