Automatic Mining Rice Gene-to-Trait Associations

To mine the plant trait of rice genes automatically in the literatures by NLP technique to propel the knowledge discovery of plant cultivation and molecular breeding. The main work of this research includes:

◆ Developed Rice Trait Ontology (RTO). (Detailed mentioned in 1.1 and 1.2)

◆ Developed encRiceTrait, a python tool for rice trait ontology. The traits are normalized with RTO1.0. (Detailed mentioned in 1.1.)

◆ Developed an unsupervised Gene-to-trait association extraction (GTAE) Pipeline. (Mentioned in 2.2)

◆ To develop a gold corpus for rice trait. (Ongoing). (Mentioned in 2.4)


1. TOOLS and RESOURCES

❏ 1.1 enrRiceTrait, an ontology-based tool for rice trait enrichment

Narrator: Yun Liu

(Video released in Apr, 2021). Alternatively, watch the video here.


❏ 1.2 Rice Trait Ontology Development

We released RTO (format version 1.1, referenced ontologies: TO, WTO). Download the RTO obo file here. (12/23/2020)


2. Main IDEA

❏ 2.1 Methodology Video: Gene-to-trait association extraction (GTAE) Pipeline

Narrator: Yun Liu

(Video released in Jan, 2021). Alternatively, watch the video here.


❏ 2.2 Methodology Figure: Gene-to-trait association extraction (GTAE) Pipeline


❏ 2.3 Methodology Figure: Unsupervised Rice Trait Extraction

Candidate mention extraction and Re-ranking make a doable concept linking strategy for rice traits

❏ 2.4 Methodology: 2.1k Project for Rice Traits Annotation

Ongoing project…

3. Early results


❏ 3.1 Result: Unsupervised Rice Gene Mention Extraction

We developed a HunFlaire-based unsupervised method, and compared the performance with other known gene tagger on rice gene mentions of OryzaGP (29,098 keywords among 13,136 PubMed abstracts).


❏ 3.2 Result: Novel Discovery of Gene-to-Trait Associations for Rice

Venn plot of gene-to-trait associations in OryzaBase, TAS and GTAE system.

GTAE pipeline discovered thousands of novel gene-to-trait associations for rice. Download the associations with sentence/abstract-level evidences here.


❏ 3.3 Results Visualization

The relation visualization of plant trait and related gene.

There are 323 red nodes which represent plant trait (not only just rice, all plants included) and the 119 yellow nodes are the genes. Lines between the red and yellow nodes represent the co-occurrence of corresponding plant trait and gene in one sentence at least four times. Also, we have lines link yellow and yellow nodes which implied they may have interaction effect.


Developer: HZAU BioNLP Team

%d bloggers like this: