Javeed参加东京立川BLAH9黑客松

Javeed的项目为《A curation system of rice trait ontology with reliable interoperation by LLM and PubAnnotation》

项目梗概

1. Scientific Motives

Supported by BLAH9, we are calling on discussion and collaboration of “Ensuring Robustness in LLM-based Research: Reproducibility, Interoperability, and Reliable Evaluation”. Project issues and discussion points are as below:

  1. How to effectively use multicultural LLMs and PubAnnotation , to build a more knowledge-supportive Ontology.
  2. Interoperate human expert with LLM and Pubannotation, providing concrete definition and literature annotation of each concept.
  3. Showcase the user-friendly web design to explain Ontology Description with concrete explanations and concepts.
  4. Pipeline designed in this project, can be implemented to other species and research areas.
2. Project Theme

The development of specialized ontologies such as the Rice trait Ontology facilitates the organization and standardization of knowledge in the area of plant biology. Studies of rice, which is one of the most important food crops in the world, have been conducted for many years. Ontologies are also important tools for the integration and retrieval of phylogenetic information enabling the more comprehensive investigation of biological systems.

◎ Advancing ontology development through the integration of large language models

Yet even now the task of constructing and maintaining high-quality ontologies, for example: Rice Trait Ontology (RTO) [Yao, 2022], is a daunting task or remains one of the biggest challenges due to the volume of biological information and the rate. Expert can’t interact every time with many tool to get the proper definition of trait ontology particularly in specialized fields. Many trait Ontology have same definition in different fields , so it can be confusing and ambiguous for the expert to have a solid definition. Ontologies are often highly hierarchical, with multiple levels of parent-child relationships. Navigating this complexity to find the exact term that matches a specific experimental condition or biological context can be difficult. 

Leveraging tools like Large Language Models (LLM) and PubAnnotation has made ontology development more efficient and accurate. LLMs generate concise, context-rich definitions for ontology terms based on existing literature, while PubAnnotation allows researchers to collaboratively annotate scientific papers with definitions, comments, and contextual information linked directly to ontology terms. This structured approach creates a clear relationship between traits, definitions, and relevant literature. For human experts, these tools provide comprehensive definitions and literature annotations, requiring only verification to ensure reliability, ultimately enhancing the accuracy and robustness of the ontology as a trustworthy resource for researchers. 

◎ Leveraging the PubAnnotation for reliable data curation

Utilizing PubAnnotation enhances the reliability of data curation in research by enabling collaborative annotation of scientific literature. Researchers can add definitions, comments, and contextual information, which can be linked directly to ontology terms. This structured approach fosters a comprehensive understanding of traits and their relationships within the ontology, ensuring that the curated data is accurate and trustworthy. By integrating these annotations, researchers can efficiently verify concepts, leading to a more robust and reliable ontology resource. 

3. Background
◎ What is BLAH9?

BLAH9 (the 9th Biomedical Linked Annotation Hackathon) : BLAH is an annual hackathon events to promote the development of BioNLP community, which contains the biomedical literature annotation and mining resources sharing and linking. In this year, the BLAH9 is organized with a special theme which is “Ensuring Robustness in LLM-based Research: Reproducibility, Interoperability, and Reliable Evaluation”. The registration, timeline and more information about BLAH9 can be found here.

◎ What is Ontology?

Ontologies are understood to be organized structures that depict how different concepts, entities and terms in a field of knowledge interact with one another. In particular, when building ontologies through the application of LLMs or BioNLP tools, the purpose is to facilitate the extraction, verification and structuring of these terms so as to keep the ontology up to date and relevant.   More information about Ontologies can be found here.

◎ Target Ontology: Rice Trait Ontology
    • Website of RTO: Rice Trait Ontology
    • Here is some Ontology concept example : “Plant Morphology Trait” and it’s definition which gives concrete definition and  concepts : A plant trait (TO:0000387) which is a morphological quality of a plant anatomical entity (PO:0025131) or a constituent cellular component (GO:0005575) contained therein.
    • Enable the human experts to interoperate with LLM and PubAnnotation to provide concrete definition and literature annotation of  each concepts.
4. Plan Strategies
API Testing and Constructions
  • Focus on testing and understanding the two platform APIs  (KIMI and PubAnnotation) to support knowledge-driven ontology development 
  • Test the results from API individually, as results can sometimes be ambiguous. Ensure that queries return accurate and reliable information 
  • Both platform work in different way, so it is crucial to construct tailored queries for each to obtain solid results. The goal is to create web services that can seamlessly interact with a web interface. 
Ontology Expert Web Interface
  • Provide a unified interface for domain experts to interact with and discover specific ontologies, offering solid and precise definitions 
  • Interface will integrate both LLM (Large Language Models) and PubAnnotation for generating concrete definitions and annotating literature. 
  • Human experts will collaborate with the LLM and PubAnnotation platforms to verify concepts, update definitions, and refine descriptions. 
Performance Evaluation
  • Project aims to enhance interoperability between humans and machines. Experts will evaluate whether the machine-generated definitions are accurate and reliable 
  • Performance evaluation will follow a day-by-day plan, spanning five days, to track efficiency improvements and results .
  • The current concepts and methodologies can be reproduced in different studies and research areas, especially those encountering issues with ontology concepts 

Leave a comment