To mine the plant trait of rice genes automatically in the literatures by NLP technique to propel the knowledge discovery of plant cultivation and molecular breeding. The main work of this research includes:
◆ Developed Rice Trait Ontology (RTO). (Detailed mentioned in 1.1 and 1.2)
◆ Developed encRiceTrait, a python tool for rice trait ontology. The traits are normalized with RTO1.0. (Detailed mentioned in 1.1.)
◆ Developed an unsupervised Gene-to-trait association extraction (GTAE) Pipeline. (Mentioned in 2.2)
◆ To develop a gold corpus for rice trait. (Ongoing). (Mentioned in 2.4)
1. TOOLS and RESOURCES
❏ 1.1 enrRiceTrait, an ontology-based tool for rice trait enrichment
(Video released in Apr, 2021). Alternatively, watch the video here.
❏ 1.2 Rice Trait Ontology Development
We released RTO (format version 1.1, referenced ontologies: TO, WTO). Download the RTO obo file here. (12/23/2020)
2. Main IDEA
❏ 2.1 Methodology Video: Gene-to-trait association extraction (GTAE) Pipeline
(Video released in Jan, 2021). Alternatively, watch the video here.
❏ 2.2 Methodology Figure: Gene-to-trait association extraction (GTAE) Pipeline
❏ 2.3 Methodology Figure: Unsupervised Rice Trait Extraction
❏ 2.4 Methodology: 2.1k Project for Rice Traits Annotation
3. Early results
❏ 3.1 Result: Unsupervised Rice Gene Mention Extraction
We developed a HunFlaire-based unsupervised method, and compared the performance with other known gene tagger on rice gene mentions of OryzaGP (29,098 keywords among 13,136 PubMed abstracts).
❏ 3.2 Result: Novel Discovery of Gene-to-Trait Associations for Rice
GTAE pipeline discovered thousands of novel gene-to-trait associations for rice. Download the associations with sentence/abstract-level evidences here.
❏ 3.3 Results Visualization
The relation visualization of plant trait and related gene.
There are 323 red nodes which represent plant trait (not only just rice, all plants included) and the 119 yellow nodes are the genes. Lines between the red and yellow nodes represent the co-occurrence of corresponding plant trait and gene in one sentence at least four times. Also, we have lines link yellow and yellow nodes which implied they may have interaction effect.
Developer: HZAU BioNLP Team