Tasks
GutBrainIE CLEF 2025 is the TASK #6 of the BioASQ CLEF Lab 2025, proposing a Natural Language Processing (NLP) challenge on biomedical texts within the context of the EU-supported project HEREDITARY.
Specifically, it is focused on extracting structured information from biomedical abstracts related to the gut microbiota and its connections with Parkinson's disease and mental health, aiming to foster the development of Information Extraction (IE) systems that can support experts in understanding the gut-brain interplay.
The GutBrainIE task is divided into two main subtasks. In the first task, participants are asked to identify and classify specific text spans into predefined categories, while in the second one they have to determine if a particular relationship defined between two categories holds or not.
The training data for all the subtasks are available upon registration for the challenge and the test data will be available around two weeks before the run submission deadline. The training set is divided into three parts:
- Gold-Standard Annotations: An highly curated dataset manually annotated by a team of 7 expert annotators from the University of Padua, Italy;
- Silver-Standard Annotations: A weakly curated dataset manually annotated by a team of about 40 students of Linguistics and Terminology trained and supervised by the experts;
- Distant Annotations: A distantly supervised dataset comprising automatically generated annotations.
Please see the Datasets and Important Dates sections for more information.
Subtask 6.1 - Named Entity Recognition
Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to classify specific text spans (entity mentions) into one of the 15 predefined categories, such as bacteria, chemical, microbiota.
Entity mentions are expressed as triplets (entityLabel ; startOffset ; endOffset).
Subtask 6.2.1 - Binary Relation Extraction
Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to identify which entities are in relation within a document. No relation type needs to be predicted within this subtask.
Relations are expressed as pairs (entityLabel1 ; entityLabel2).
Subtask 6.2.2 - Ternary Tag-Based Relation Extraction
Participants are asked to identify which entities are in relation within a document and predict the type of relation between them.
Relations are expressed as triples (entityLabel1 ; relationLabel ; entityLabel2).
Subtask 6.2.3 - Ternary Mention-Based Relation Extraction
Participants are required to identify the actual entities involved in a relation and predict the type of relation.
Relations are expressed as triples (entityMention1 ; relationLabel ; entityMention2).
Participating
To participate in GutBrainIE CLEF 2025, groups need to register to the BioASQ Laboratory at the following link: