Tasks
GutBrainIE CLEF 2025 is the TASK #6 of the BioASQ CLEF Lab 2025, proposing a Natural Language Processing (NLP) challenge on biomedical texts within the context of the EU-supported project HEREDITARY.
Specifically, it is focused on extracting structured information from biomedical abstracts related to the gut microbiota and its connections with Parkinson's disease and mental health, aiming to foster the development of Information Extraction (IE) systems that can support experts in understanding the gut-brain interplay.
The GutBrainIE task is divided into two main subtasks. In the first task, participants are asked to identify and classify specific text spans into predefined categories, while in the second one they have to determine if a particular relationship defined between two categories holds or not.
The training data for all the subtasks are available upon registration for the challenge and the test data will be available around two weeks before the run submission deadline. The training set is divided into four parts:
- Gold Collection: An highly curated dataset manually annotated by a team of 7 expert annotators from the University of Padua, Italy;
- Platinum Collection: A subset of the gold annotations further validated by biomedical experts from the Radboud University Medical Center, Netherlands;
- Silver Collection: A weakly curated dataset manually annotated by a team of about 40 students of Linguistics and Terminology trained and supervised by the experts;
- Bronze Collection: A distantly supervised dataset comprising automatically generated annotations. Note that no manual revision has been performed on this set.
The test set is a held-out selection of documents from the gold and platinum collections, consisting exclusively of titles and abstracts for each document, ensuring representativeness and full coverage of all entity and relation types.
Please see the Datasets and Important Dates sections for more information.
Subtask 6.1 - Named Entity Recognition
Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to classify specific text spans (entity mentions) into one of the 13 predefined categories, such as bacteria, chemical, microbiota.
Entity mentions are expressed as tuples (entityLabel ; entityLocation (title/abstract) ; startOffset ; endOffset).
Subtask 6.2.1 - Binary Relation Extraction
Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to identify which entities are in relation within a document. No relation type needs to be predicted within this subtask.
Relations are expressed as pairs (entityLabel1 ; entityLabel2).
Subtask 6.2.2 - Ternary Tag-Based Relation Extraction
Participants are asked to identify which entities are in relation within a document and predict the type of relation between them.
Relations are expressed as triples (entityLabel1 ; relationLabel ; entityLabel2).
Subtask 6.2.3 - Ternary Mention-Based Relation Extraction
Participants are required to identify the actual entities involved in a relation and predict the type of relation.
Relations are expressed as triples (entityMention1 ; relationLabel ; entityMention2).
For each task, the test set consists of a collection of documents, each including only the PubMed ID, title, and abstract. Given a test document, participants are required to extract tuples that include all the fields for the respective task. For example, in the NER task, tuples must contain all and only the following fields: (entityLabel ; entityLocation (title/abstract) ; startOffset ; endOffset). For more details, please refer to the submission format examples provided in the Submitting section.
Participating
To participate in GutBrainIE CLEF 2025, groups need to register to the BioASQ Laboratory at the following link:
Access to the data will be given within 24 hours from the registration.