Automatic Mining Rice Gene-to-Trait Associations

To mine the plant trait of rice genes automatically in the literatures by NLP technique to propel the knowledge discovery of plant cultivation and molecular breeding. The main work of this research includes:

◆ Developed Rice Trait Ontology (RTO). (Detailed mentioned in 1.1 and 1.2)

◆ Developed encRiceTrait, a python tool for rice trait ontology. The traits are normalized with RTO1.0. (Detailed mentioned in 1.1.)

◆ Developed an unsupervised Gene-to-trait association extraction (GTAE) Pipeline. (Mentioned in 2.2)

◆ To develop a gold corpus for rice trait. (Ongoing). (Mentioned in 2.4)


1. TOOLS and RESOURCES

❏ 1.1 enrRiceTrait, an ontology-based tool for rice trait enrichment

Narrator: Yun Liu

(Video released in Apr, 2021). Alternatively, watch the video here.


❏ 1.2 Rice Trait Ontology Development

We released RTO (format version 1.1, referenced ontologies: TO, WTO). Download the RTO obo file here. (12/23/2020)


2. Main IDEA

❏ 2.1 Methodology Video: Gene-to-trait association extraction (GTAE) Pipeline

Narrator: Yun Liu

(Video released in Jan, 2021). Alternatively, watch the video here.


❏ 2.2 Methodology Figure: Gene-to-trait association extraction (GTAE) Pipeline


❏ 2.3 Methodology Figure: Unsupervised Rice Trait Extraction

Candidate mention extraction and Re-ranking make a doable concept linking strategy for rice traits

❏ 2.4 Methodology: 2.1k Project for Rice Traits Annotation

Ongoing project…

3. Early results


❏ 3.1 Result: Unsupervised Rice Gene Mention Extraction

We developed a HunFlaire-based unsupervised method, and compared the performance with other known gene tagger on rice gene mentions of OryzaGP (29,098 keywords among 13,136 PubMed abstracts).


❏ 3.2 Result: Novel Discovery of Gene-to-Trait Associations for Rice

Venn plot of gene-to-trait associations in OryzaBase, TAS and GTAE system.

GTAE pipeline discovered thousands of novel gene-to-trait associations for rice. Download the associations with sentence/abstract-level evidences here.


❏ 3.3 Results Visualization

The relation visualization of plant trait and related gene.

There are 323 red nodes which represent plant trait (not only just rice, all plants included) and the 119 yellow nodes are the genes. Lines between the red and yellow nodes represent the co-occurrence of corresponding plant trait and gene in one sentence at least four times. Also, we have lines link yellow and yellow nodes which implied they may have interaction effect.

4. RTO Web Service


❏ 4.1 RTO Ontology web service

Yufei is developing a webpage to showcase RTO ontology terms. Click to visit the web.


❏ 4.2 Video of the work

Yufei’s talk in Bio-Ontologies COSI, ISMB 2022


Publications


◆ Xinzhi Yao, Yun Liu, Qidong Deng, Yusha Liu, Xinchen Ma, Yufei Shen, Qianqian Peng, Zaiwen Feng, Jingbo Xia*. RTO, A Specific Crop Ontology for Rice Trait Concepts. Annual International Conference on International Society for Computational Biology (ISMB), Madison, WI, 10-14 July 2022 (Session Bio-Ontologies COSI). https://doi.org/10.5281/zenodo.6950749



Developer: HZAU BioNLP Team

Links to lab projects:

▶ BioNLP 新手通道

▶ 课题组组员通道 (隐藏)

▶ 课题组课题页

%d bloggers like this: