Tasks
GutBrainIE CLEF 2026 is the TASK #6 of the BioASQ CLEF Lab 2026, proposing a Natural Language Processing (NLP) challenge on biomedical texts within the context of the EU-supported project HEREDITARY.
Specifically, it is focused on extracting structured information from biomedical abstracts related to the gut microbiota and its connections with Alzheimer's, Parkinson's, Multiple Sclerosis, Amyotrophic Lateral Sclerosis, and mental health. The task aims to foster the development of Information Extraction (IE) systems that can support experts in understanding the gut-brain interplay.
The GutBrainIE task is divided into two main subtasks. In the first task, participants are asked to identify and classify specific text spans into predefined categories, while in the second one they have to determine if a particular relationship defined between two categories holds or not.
The training data for all subtasks are available upon registration for the challenge and the test data will be available around two weeks before the run submission deadline. The training set is divided into four parts:
- Gold Collection: An highly curated dataset manually annotated primarily by a team of 7 expert annotators from the University of Padua, Italy, with the involvement of 6 additional external expert contributors;
- Silver Collection: A weakly curated dataset manually annotated by a team of about 55 students of Linguistics and Terminology trained and supervised by the experts;
- Silver Collection 2025: A weakly curated dataset manually annotated for the 2025 edition by a team of about 40 students of Linguistics and Terminology trained and supervised by the experts, with automatically generated concept-level annotations;
- Bronze Collection: A distantly supervised dataset comprising automatically generated annotations. Note that no manual revision has been performed on this set.
The test set is a held-out selection of documents from the gold collection, consisting exclusively of titles and abstracts for each document, ensuring representativeness and full coverage of all entity and relation types.
Please see the Datasets and Important Dates sections for more information.
Subtask 6.1.1 - Named Entity Recognition (NER)
Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to classify specific text spans (entity mentions) into one of the 13 predefined categories, such as bacteria, chemical, microbiota.
Entity mentions are expressed as tuples (entityCategory ; entityLocation (title/abstract) ; startOffset ; endOffset).
Subtask 6.1.2 - Named Entity Recognition and Disambiguation (NERD)
Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to identify and classify specific text spans (entity mentions) into one of the 13 predefined categories, such as bacteria, chemical, or microbiota. Each identified entity must also be linked to a concept identifier from one of the defined biomedical reference resources.
Entity mentions are represented as tuples (entityCategory ; entityLocation (title/abstract) ; startOffset ; endOffset ; conceptURI).
Subtask 6.2.1 - Mention-Level Relation Extraction (M-RE)
Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to identify relations between specific entity mentions within a document. Each relation must include the two involved entity mentions and the relation predicate connecting them.
Relations are expressed as triples (subjectMention ; relationPredicate ; objectMention).
Subtask 6.2.2 - Concept-Level Relation Extraction (C-RE)
This subtask extends the mention-level setting to the concept level. Participants must identify and classify relations between linked concepts rather than between their textual mentions. Concept-level relations aim to capture knowledge connections abstracted from surface forms and lexical variations.
Relations are expressed as tuples (subjectConceptURI ; subjectCategory ; relationPredicate ; objectConceptURI; objectCategory).
For each task, the test set consists of a collection of documents, each including only the PubMed ID, title, and abstract. Given a test document, participants are required to extract tuples that include all the fields for the respective task. For example, in the NER task, tuples must contain all and only the following fields: (entityCategory ; entityLocation (title/abstract) ; startOffset ; endOffset). For more details, please refer to the submission format examples provided in the Submitting section.
Participating
To participate in GutBrainIE CLEF 2026, groups need to register to the BioASQ Laboratory at the following link:
Data are available! Access to the data will be granted within 24 hours from registration.