AGAC Portal

undefined , “Annotation of Genes with Alteration-Centric function changes”. — A customized corpus for mining functions caused by mutations.

undefined We are listing all about AGAC in this portal: Corpus annotation idea, data info, data usage in AGAC track as a shared task, baseline Python codes for AGAC track, and downstream applications with AGAC.

undefined 在这个页面中,我们列出了与AGAC语料库有关的标注规则、数据格式、代码和应用案例。

undefined Toy Example


AGAC annotates the following labels in biomedical texts, aiming to describe the downstream molecular/cellular physiological mechanism after a gain/loss-of-function mutation.

Annotation Task 1. NER with the following labels:

  • Variation (突变)
  • Molecular physiological function (分子功能)
  • Interaction (互作)
  • Pathway (通路)
  • Cellular physiological function (细胞功能)
  • Positive regulation (正调控)
  • Negative regulation (负调控)
  • Regulation (调控)
  • Protein/Gene (蛋白/基因)

Annotation Task 2. Thematic role with the following:

  • Theme of (主事)
  • Cause of (致事)

undefined Corpus Guideline Book


Guideline of AGAC trigger words design follows the central dogma and fundamental functioning procedure of molecular biology.

The annotation guideline is released via the following publication.

Ref: Yuxing Wang, et. al. Guideline Design of an Active Gene Annotation Corpus for the Purpose of Drug Repurposing. 2018 11th CISP-BMEI 2018, Oct, 2018, Beijing.

undefined Corpus Development


The corpus construction got started at Sep, 2017, and reached its latest version in 2019 when it was released as the training data in AGAC Track in BioNLP-OST, 2019.

  • 1) AGAC (v1.0). (Sep, 2017—Aug,2018)
  • 2) AGAC (v1.1). (Sep, 2018—Jan,2019)
  • 3) AGAC (V2.0). (Feb, 2019 —Aug, 2019)


Kaiyin Zhou, et al. GOF/LOF Knowledge Inference with Tensor Decomposition in Support of High order Link Discovery for Gene, MBE, 2019, 16(3):1376-1391.

Mina Gachloo, et al. A Review of the Drug Knowledge Discovery by Using BioNLP and Tensor or Matrix Decomposition. Genomics & Informatics, 2019, 17(2): e18.

Yuxing Wang, et al. An Active Gene Annotation Corpus and Its Application on Anti-epilepsy Drug Discovery. BIBM 2019: International Conference on Bioinformatics & Biomedicine, San Diego, U.S, Nov, 2019.

undefined Corpus Availability and BioNLP Open Shared Task


The AGAC corpus is stored in PubAnnotation repo:

The corpus served as the data for the AGAC track in BioNLP OST 2019.

Ref: Yuxing Wang, Kaiyin Zhou, Mina Gachloo, Jingbo Xia*. An Overview of the Active Gene Annotation Corpus and the BioNLP OST 2019 AGAC Track Tasks. BioNLP Open Shared Task 2019, workshop in EMNLP-IJCNLP 2019, Hong Kong. 

undefined Data with Json form


AGAC corpus is provided with a JSON form.

Data description:

undefined Baseline Python Codes


The baseline python codes are released in Github:

Task 1 (NER)

Task 2 (Thematic role labeling, aka., shallow semantic parsing)

undefined AGAC on Alzheimer’s Disease


Ref: Kaiyin Zhou, Yuxing Wang, Kevin Bretonnel Cohen, Jin-Dong Kim, Xiaohang Ma, Zhixue Shen, Xiangyu Meng, Jingbo Xia. Bridging Heterogeneous Mutation Data to Enhance Disease-Gene Discovery. Briefing in Bioinformatics, 2021, doi: 10.1093/bib/bbab079.

undefined AGAC on 84 Cancer Literature


undefined AGAC on Covid-19 Literature


undefined AGAC for Covid-19 Hackathon Project

(AGAC在BLAH7 Hackathon上的参与项目)

AGAC for LitCovid literature is listed as one of the project in Biomedical Linked Annotation Hackathon 7

Contact Us

College of Informatics
Huazhong Agricultural Univ
Wuhan, Hubei 430070
AGAC Teams

%d bloggers like this: