Participation Guidelines

GutBrainIE CLEF 2025

Task #6 of the BioASQ Lab 2025

(An HEREDITARY challenge)

Tasks

GutBrainIE CLEF 2025 is the TASK #6 of the BioASQ CLEF Lab 2025, proposing a Natural Language Processing (NLP) challenge on biomedical texts within the context of the EU-supported project HEREDITARY.

Specifically, it is focused on extracting structured information from biomedical abstracts related to the gut microbiota and its connections with Parkinson's disease and mental health, aiming to foster the development of Information Extraction (IE) systems that can support experts in understanding the gut-brain interplay.

The GutBrainIE task is divided into two main subtasks. In the first task, participants are asked to identify and classify specific text spans into predefined categories, while in the second one they have to determine if a particular relationship defined between two categories holds or not.

The training data for all the subtasks are available upon registration for the challenge and the test data will be available around two weeks before the run submission deadline. The training set is divided into three parts:

  • Gold-Standard Annotations: An highly curated dataset manually annotated by a team of 7 expert annotators from the University of Padua, Italy;
  • Silver-Standard Annotations: A weakly curated dataset manually annotated by a team of about 40 students of Linguistics and Terminology trained and supervised by the experts;
  • Distant Annotations: A distantly supervised dataset comprising automatically generated annotations.
The test set consists of gold-standard annotations, further validated by biomedical experts at the Radboud University Medical Centre (RUMC).

Please see the Datasets and Important Dates sections for more information.

Subtask 6.1 - Named Entity Recognition

Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to classify specific text spans (entity mentions) into one of the 15 predefined categories, such as bacteria, chemical, microbiota.

Entity mentions are expressed as triplets (entityLabel ; startOffset ; endOffset).

Subtask 6.2.1 - Binary Relation Extraction

Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to identify which entities are in relation within a document. No relation type needs to be predicted within this subtask.

Relations are expressed as pairs (entityLabel1 ; entityLabel2).

Subtask 6.2.2 - Ternary Tag-Based Relation Extraction

Participants are asked to identify which entities are in relation within a document and predict the type of relation between them.

Relations are expressed as triples (entityLabel1 ; relationLabel ; entityLabel2).

Subtask 6.2.3 - Ternary Mention-Based Relation Extraction

Participants are required to identify the actual entities involved in a relation and predict the type of relation.

Relations are expressed as triples (entityMention1 ; relationLabel ; entityMention2).

Participating

To participate in GutBrainIE CLEF 2025, groups need to register to the BioASQ Laboratory at the following link:

Register

Important Dates

  • Registration closes: April 25, 2025
  • Test data release: April 28, 2025
  • Runs submission deadline: May 10, 2025
  • Evaluation results out: May 19, 2025
  • Participant and position paper submission deadline: May 30, 2025
  • Notification of acceptance for participant and position papers: June 27, 2025
  • Camera-ready participant papers submission: July 7, 2025
  • GutBrainIE CLEF Workshop: September 9-12, 2025 during the CLEF Conference in Madrid, Spain

Datasets

Data Description

The data for this task is a set of abstracts of biomedical articles retrieved from PubMed, focusing on the gut-brain interplay and its implications in neurological and mental health.

The datasets are organized into:
  • Gold-Standard Annotations: Expert curated, high-quality annotations.
  • Silver-Standard Annotations: Annotated by trained students under expert supervision.
  • Distant Annotations: Automatically generated annotations for NER using fine-tuned GLiNER.
To foster the development of effective IE systems for the subtasks proposed within GutBrainIE, the provided datasets include:
  • Entity Mentions: Text spans classified into predefined categories.
  • Relations: Associations between entities, specifying that a particular relationship holds between two entities.

Collections

  • Training and Development Data: Available soon...
  • Test Data: The test set will be available from April 28, 2025. This data will be used for final system evaluation.

Dataset Format

Annotations are provided in JSON format, for ease of use with NLP systems. Each entry in the dataset corresponds to a PubMed article, identified by its PubMed ID (PMID), and includes the following fields:
  • Metadata: Articles-related information, including title, author, year, and abstract.
  • Entities: An array of objects where each object represents an annotated entity mention in the article, with the following attributes:
    • annotator: The identifier of the annotator, which participants may use to assign different weights to annotations depending on who performed them.
    • start and end: Character offsets marking the span of the entity mention.
    • mention_location: Indicates if the entity mention is located in the title (title_value) or in the abstract (abstract_value).
    • mention_text: The actual text span of the entity mention.
    • entity_label: The label assigned to the entity mention (e.g., bacteria, microbiome).
  • Relations: An array of objects where each object represents an annotated relationship between two entity mentions in the article, with the following attributes:
    • annotator: The identifier of the annotator.
    • subject_start and subject_end: Character offsets marking the span of the subject entity mention.
    • subject_mention_location: Indicates if the subject entity mention is located in the title (title_value) or in the abstract (abstract_value).
    • subject_mention_text: The actual text span of the subject entity mention.
    • subject_entity_label: The label assigned to the subject entity mention (e.g., bacteria, microbiome).
    • predicate: The label assigned to the relationship.
    • object_start and object_end: Character offsets marking the span of the object entity mention.
    • object_mention_location: Indicates if the object entity mention is located in the title (title_value) or in the abstract (abstract_value).
    • object_mention_text: The actual text span of the object entity mention.
    • object_entity_label: The label assigned to the object entity mention (e.g., bacteria, microbiome).

Entity and Relation Labels

(To be revised after the training set is completed)

The set of entities considered for annotations includes:

Entity Label URI Definition
Anatomical Location NCIT_C13717 Named locations of or within the body. 
Animal NCIT_C14182 A non-human living organism that has membranous cell walls, requires oxygen and organic foods, and is capable of voluntary movement, as distinguished from a plant or mineral.
Biomedical Technique NCIT_C15188
Research concerned with the application of biological and physiological principles to clinical medicine.
Bacteria NCBITaxon_2 One of the three domains of life (the others being Eukarya and ARCHAEA), also called Eubacteria. They are unicellular prokaryotic microorganisms which generally possess rigid cell walls, multiply by cell division, and exhibit three principal forms: round or coccal, rodlike or bacillary, and spiral or spirochetal.
Chemical CHEBI_59999 A chemical substance is a portion of matter of constant composition, composed of molecular entities of the same type or of different types.
Dietary Supplement MESH_68019587 Products in capsule, tablet or liquid form that provide dietary ingredients, and that are intended to be taken by mouth to increase the intake of nutrients. Dietary supplements can include macronutrients, such as proteins, carbohydrates, and fats; and/or MICRONUTRIENTS, such as VITAMINS; MINERALS; and PHYTOCHEMICALS.
Disease, Disorder, or Finding (DDF) NCIT_C7057 A condition that is relevant to human neoplasms and non-neoplastic disorders. This includes observations, test results, history and other concepts relevant to the characterization of human pathologic conditions.
Drug CHEBI_23888 Any substance which when absorbed into a living organism may modify one or more of its functions. The term is generally accepted for a substance taken for a therapeutic purpose, but is also commonly used for abused substances.
Food NCIT_C1949  A substance consumed by humans and animals for nutritional purpose
Gene SNOMEDCT_67261001 A functional unit of heredity which occupies a specific position on a particular chromosome and serves as the template for a product that contributes to a phenotype or a biological function.
Human NCBITaxon_9606 Members of the species Homo sapiens.
Metabolite CHEBI_25212 In biochemistry, a metabolite is an intermediate or end product of metabolism.
Microbiome OHMI_0000003 This term refers to the entire habitat, including the microorganisms (bacteria, archaea, lower and higher eukaryotes, and viruses), their genomes (i.e., genes), and the surrounding environmental conditions.
Neurotransmitter CHEBI_25512 An endogenous compound that is used to transmit information across the synapse between a neuron and another cell.

while the defined set of relations includes:

Head Entity Tail Entity Predicate
Anatomical Location Human / Animal Located in
Bacteria Microbiome
Part of
Bacteria Disease, Disorder, or Finding
Influence
Bacteria Gene
Change expression
Disease, Disorder, or Finding Bacteria
Change abundance
Disease, Disorder, or Finding Human / Animal Affect
Disease, Disorder, or Finding Microbiome Change abundance
Drug / Chemical / Dietary Supplement Disease, Disorder, or Finding Change effect
Drug / Chemical / Dietary Supplement Microbiome Impact
Drug / Chemical / Dietary Supplement Bacteria Impact
Drug / Chemical / Dietary Supplement Gene Change expression
Human / Animal Biomedical Technique Used by
Metabolite Microbiome
Produced by
Metabolite Anatomical Location Located in
Microbiome Human / Animal Located in
Microbiome Biomedical Technique Used by
Microbiome Human / Animal Located in
Microbiome Gene Change expression
Microbiome Disease, Disorder, or Finding Is linked to
Microbiome Microbiome Compared to
Neurotransmitter Microbiome Related to
Neurotransmitter Anatomical Location Located in

For more details on entity and relation labels, refer to the Annotation Guidelines available for download HERE.

Submitting

Participating teams should satisfy the following guidelines:

  • The runs should be submitted in the format described below;
  • Each group can submit a maximum of 25 runs for each subtask.

Submission Guidelines

Participants are invited to submit results for any or all of the four subtasks (NER, Binary RE, Tag-based Ternary RE, Mention-based Ternary RE) independently.

Submissions will be handled via the BioASQ submission system. More detailed information coming soon...

In the following, a "run" refers to the predictions made by a single system on the test set. Each run must be submitted as a single zipped file named "teamID_taskID_runID_systemDesc.zip" without spaces or any special character, where:

  • teamID is the identifier of your team choosen when you registered to CLEF 2025;
  • taskID is the identifier of your task choosen, i.e., one of these tokens: T61 for task 6.1, T621 for task 6.2.1, T622 for task 6.2.2, and T623 for task 6.2.3 ;
  • runID is the identifier of the run choosen by the participants, containing only letters and numbers (a-z, A-Z, 0-9)
  • systemDesc is an optional (short) string that further describes your submission.
The content of the zipped file consists of two files:
  • teamID_taskID_runID_systemDesc.json: A JSON file with predictions on the test set.
  • teamID_taskID_runID_systemDesc.meta: A metadata briefly describing the approach, including:
    • Team ID.
    • Task ID.
    • Run ID.
    • Type of training applied.
    • Pre-processing methods.
    • Training data used.
    • Relevant details of the run.
    • A link to a GitHub repository enabling easy reproducibility of the run.

Each zipped file should carefully follow this filenames structure. The team name must remain consistent across all submissions, and the system name shold reflect the approach used by the submitted run. The run number serves as a progressive identifier for multiple submissions of the same system. An example of a valid run submission wil be available soon...

Submissions not adhering to these guidelines might be rejected. Please ensure accuracy and completeness of all submitted files to streamline evaluation.

Example of Submission Format

Submissions should follow the JSON format specified below. A script that participants might use to validate their submissions will be available soon...

Coming soon...

Evaluation Metrics

Coming soon...

Practice [Pre-Evaluation]

Coming soon...

Results

Leaderboards

Available after the systems submission deadline...

Details

Available after the systems submission deadline...

FAQs

Find answers to common questions about the GutBrainIE challenge, dataset, and submissions.

If you need any additional information, please get in contact with us writing to:

Coming soon...

Organizers

(For further details about the organizers, visit the official website of the IIIA Hub Research Group.)