Participation Guidelines

GutBrainIE CLEF 2026

Task #6 of the BioASQ Lab 2026

(An HEREDITARY challenge)

Register

Tasks

GutBrainIE CLEF 2026 is the TASK #6 of the BioASQ CLEF Lab 2026, proposing a Natural Language Processing (NLP) challenge on biomedical texts within the context of the EU-supported project HEREDITARY.

Specifically, it is focused on extracting structured information from biomedical abstracts related to the gut microbiota and its connections with Alzheimer's, Parkinson's, Multiple Sclerosis, Amyotrophic Lateral Sclerosis, and mental health. The task aims to foster the development of Information Extraction (IE) systems that can support experts in understanding the gut-brain interplay.

The GutBrainIE task is divided into two main subtasks. In the first task, participants are asked to identify and classify specific text spans into predefined categories, while in the second one they have to determine if a particular relationship defined between two categories holds or not.

The training data for all subtasks are available upon registration for the challenge and the test data will be available around two weeks before the run submission deadline. The training set is divided into four parts:

  • Gold Collection: An highly curated dataset manually annotated primarily by a team of 7 expert annotators from the University of Padua, Italy, with the involvement of 6 additional external expert contributors;
  • Silver Collection: A weakly curated dataset manually annotated by a team of about 55 students of Linguistics and Terminology trained and supervised by the experts;
  • Silver Collection 2025: A weakly curated dataset manually annotated for the 2025 edition by a team of about 40 students of Linguistics and Terminology trained and supervised by the experts, with automatically generated concept-level annotations;
  • Bronze Collection: A distantly supervised dataset comprising automatically generated annotations. Note that no manual revision has been performed on this set.

The test set is a held-out selection of documents from the gold collection, consisting exclusively of titles and abstracts for each document, ensuring representativeness and full coverage of all entity and relation types.

Please see the Datasets and Important Dates sections for more information.

Subtask 6.1.1 - Named Entity Recognition (NER)

Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to classify specific text spans (entity mentions) into one of the 13 predefined categories, such as bacteria, chemical, microbiota.

Entity mentions are expressed as tuples (entityCategory ; entityLocation (title/abstract) ; startOffset ; endOffset).

Subtask 6.1.2 - Named Entity Recognition and Disambiguation (NERD)

Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to identify and classify specific text spans (entity mentions) into one of the 13 predefined categories, such as bacteria, chemical, or microbiota. Each identified entity must also be linked to a concept identifier from one of the defined biomedical reference resources.

Entity mentions are represented as tuples (entityCategory ; entityLocation (title/abstract) ; startOffset ; endOffset ; conceptURI).

Subtask 6.2.1 - Mention-Level Relation Extraction (M-RE)

Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to identify relations between specific entity mentions within a document. Each relation must include the two involved entity mentions and the relation predicate connecting them.

Relations are expressed as triples (subjectMention ; relationPredicate ; objectMention).

Subtask 6.2.2 - Concept-Level Relation Extraction (C-RE)

This subtask extends the mention-level setting to the concept level. Participants must identify and classify relations between linked concepts rather than between their textual mentions. Concept-level relations aim to capture knowledge connections abstracted from surface forms and lexical variations.

Relations are expressed as tuples (subjectConceptURI ; subjectCategory ; relationPredicate ; objectConceptURI; objectCategory).


For each task, the test set consists of a collection of documents, each including only the PubMed ID, title, and abstract. Given a test document, participants are required to extract tuples that include all the fields for the respective task. For example, in the NER task, tuples must contain all and only the following fields: (entityCategory ; entityLocation (title/abstract) ; startOffset ; endOffset). For more details, please refer to the submission format examples provided in the Submitting section.

Participating

To participate in GutBrainIE CLEF 2026, groups need to register to the BioASQ Laboratory at the following link:

Register

Data are available! Access to the data will be granted within 24 hours from registration.

Important Dates

  • Training data release: Available Now! (access granted upon registration)
  • Registration closes: April 23, 2026
  • Test data release: April 28, 2026
  • Runs submission deadline: May 7, 2026
  • Evaluation results out: May 19, 2026
  • Participant and position paper submission deadline: May 28, 2026
  • Notification of acceptance for participant and position papers: June 30, 2026
  • Camera-ready participant papers submission: July 6, 2026
  • GutBrainIE CLEF Workshop: September 21-24, 2026 during the CLEF Conference in Jena, Germany

Results

Leaderboards

Available after the systems submission deadline...

Datasets

Data Description

The data for this task is a set of titles and abstracts of biomedical articles retrieved from PubMed, focusing on the gut-brain interplay and its implications in neurological and mental health.

The datasets are organized into:
  • Gold-Standard Annotations: High-quality annotations, expert-curated.
  • Silver-Standard Annotations: Mid-quality annotations, created by trained students under expert supervision. The Silver-Standard collection is divided into:
    • Silver: Mid-quality annotations created by trained students under expert supervision. The students are organized into two clusters:
      • StudentA, including those with more consistent annotation performance,
      • StudentB, including those with less consistent annotation performance.
    • Silver 2025: Same as Silver but for the 2025 edition. Concept-level annotations were automatically generated.
  • Bronze-Standard Annotations: Automatically generated annotations, using fine-tuned GLiNER for Named Entity Recognition (NER) and fine-tuned ATLOP for Relation Extraction (RE).
To foster the development of effective IE systems for the subtasks proposed within GutBrainIE, the provided datasets include:
  • Entity Mentions: Text spans classified into predefined categories and linked to concept URIs from biomedical reference resources.
  • Relations: Associations between entities, specifying that a particular relationship holds between two entities.

Collections

  • Training and Development Data: Available upon registration. After registering, participants will receive an email with a download link (starting from February 2–9, 2026).
  • Test Data: The test set will be available from April 28, 2026. This data will be used for final system evaluation.

Dataset Format

Annotations are provided in JSON format, for ease of use with NLP systems. Each entry in the dataset corresponds to a PubMed article, identified by its PubMed ID (PMID), and includes the following fields:
  • Metadata: Article-related information, including title, author, journal, year, and abstract. It also includes the identifier of the annotator: expert annotators are labeled as expert_1 to expert_7, student annotators are grouped into two clusters, identified as student_A and student_B, and automatically generated annotations are labeled as distant. Participants may use this information to assign different weights to annotations based on who performed them.
  • Entities: An array of objects where each object represents an annotated entity mention in the article, with the following attributes:
    • start and end indices: Character offsets marking the span of the entity mention.
    • location: Indicates if the entity mention is located in the title or in the abstract.
    • text_span: The actual text span of the entity mention.
    • label: The label assigned to the entity mention (e.g., bacteria, microbiome).
    • uri: The concept URI to which the entity mention is linked, taken from one of the biomedical reference resources.
  • Relations: An array of objects where each object represents an annotated relationship between two entity mentions in the article, with the following attributes:
    • subject_start and subject_end indices: Character offsets marking the span of the subject entity mention.
    • subject_location: Indicates if the subject entity mention is located in the title or in the abstract.
    • subject_text_span: The actual text span of the subject entity mention.
    • subject_uri: The concept URI to which the subject entity mention is linked, taken from one of the biomedical reference resources.
    • subject_label: The label assigned to the subject entity mention (e.g., bacteria, microbiome).
    • predicate: The label assigned to the relationship.
    • object_start and object_end indices: Character offsets marking the span of the object entity mention.
    • object_location: Indicates if the object entity mention is located in the title or in the abstract.
    • object_text_span: The actual text span of the object entity mention.
    • object_label: The label assigned to the object entity mention (e.g., bacteria, microbiome).
    • object_uri: The concept URI to which the object entity mention is linked, taken from one of the biomedical reference resources.
  • Mention-level Relations: Relations extracted from the Relations array, formatted as mention-based tuples of subject_text_span, subject_label, predicate, object_text_span, and object_label.
  • Concept-level Relations: Relations extracted from the Relations array, formatted as concept-based tuples of subject_uri, subject_label, predicate, object_uri, and object_label.

Alternative Formats

For those more familiar with CSV or tabular formats, the dataset is also provided in these formats. In this case, each of the fields mentioned above is stored in a separate file:
  • Metadata file
  • Entities file
  • Relations file
  • Mention-level relations file
  • Concept-level relations file
The CSV files use the pipe symbol (|) as a separator, while tabular files use the tab character (\t) for separation.

Entity and Relation Labels

The set of entities considered for annotations includes:

Entity Label URI Definition
Anatomical Location NCIT_C13717 Named locations of or within the body. 
Animal NCIT_C14182 A non-human living organism that has membranous cell walls, requires oxygen and organic foods, and is capable of voluntary movement, as distinguished from a plant or mineral.
Biomedical Technique NCIT_C15188
Research concerned with the application of biological and physiological principles to clinical medicine.
Bacteria NCBITaxon_2 One of the three domains of life (the others being Eukarya and ARCHAEA), also called Eubacteria. They are unicellular prokaryotic microorganisms which generally possess rigid cell walls, multiply by cell division, and exhibit three principal forms: round or coccal, rodlike or bacillary, and spiral or spirochetal.
Chemical CHEBI_59999 A chemical substance is a portion of matter of constant composition, composed of molecular entities of the same type or of different types. This category also includes metabolites, which in biochemistry are the intermediate or end product of metabolism, and neurotransmitters, which are endogenous compounds used to transmit information across the synapses.
Dietary Supplement MESH_68019587 Products in capsule, tablet or liquid form that provide dietary ingredients, and that are intended to be taken by mouth to increase the intake of nutrients. Dietary supplements can include macronutrients, such as proteins, carbohydrates, and fats; and/or micronutrients, such as vitamins; minerals; and phytochemicals.
Disease, Disorder, or Finding (DDF) NCIT_C7057 A condition that is relevant to human neoplasms and non-neoplastic disorders. This includes observations, test results, history and other concepts relevant to the characterization of human pathologic conditions.
Drug CHEBI_23888 Any substance which when absorbed into a living organism may modify one or more of its functions. The term is generally accepted for a substance taken for a therapeutic purpose, but is also commonly used for abused substances.
Food NCIT_C1949  A substance consumed by humans and animals for nutritional purpose
Gene SNOMEDCT_67261001 A functional unit of heredity which occupies a specific position on a particular chromosome and serves as the template for a product that contributes to a phenotype or a biological function.
Human NCBITaxon_9606 Members of the species Homo sapiens.
Microbiome OHMI_0000003 This term refers to the entire habitat, including the microorganisms (bacteria, archaea, lower and higher eukaryotes, and viruses), their genomes (i.e., genes), and the surrounding environmental conditions.
Statistical Technique NCIT_C19044
A method of calculating, analyzing, or representing statistical data.

while the defined set of relations includes:

Head Entity Tail Entity Predicate
Anatomical Location Human / Animal Located in
Bacteria Bacteria / Chemical / Drug Interact
Bacteria DDF Influence
Bacteria Gene Change expression
Bacteria Human / Animal Located in
Bacteria Microbiome Part of
Chemical Anatomical Location / Human / Animal Located in
Chemical Chemical Interact / Part of
Chemical Microbiome Impact / Produced by
Chemical / Dietary Supplement / Drug / Food Bacteria / Microbiome Impact
Chemical / Dietary Supplement / Food DDF Influence
Chemical / Dietary Supplement / Drug / Food Gene Change expression
Chemical / Dietary Supplement / Drug / Food Human / Animal Administered
DDF Anatomical Location Strike
DDF Bacteria / Microbiome Change abundance
DDF Chemical Interact
DDF DDF Affect / Is a
DDF Human / Animal Target
Drug Chemical / Drug Interact
Drug DDF Change effect
Human / Animal / Microbiome Biomedical Technique Used by
Microbiome Anatomical Location / Human / Animal Located in
Microbiome Gene Change expression
Microbiome DDF Is linked to
Microbiome Microbiome Compared to

For more details on entity and relation labels, refer to the Annotation Guidelines available for download HERE.


Datasets Statistics

The table below provides an overview of the key statistics for each dataset collection:

Collection Num of Documents Total Entities Avg Entities per Doc Total Relations Avg Rels per Doc
Train Gold 639 20'530 32.13 8'556 13.39
Train Silver 811 26'134 32.22 10'907 13.45
Train Silver 2025 499 15'275 30.61 10'616 21.27
Train Bronze 2972 89'987 30.28 29'692 9.99
Development Set 80 2'521 31.51 1'261 15.76

Train Gold Collection Statistics

The tables below provide detailed statistics for entities and relations in the Train Gold collection.

The Ratio column represents the proportion of each entity or relation label relative to the total number of entities or relations in the collection.

Entity Statistics

Entity Label Count Ratio Avg Per Doc
DDF6'9090.33710.81
Chemical3'1190.1524.88
Microbiome1'9200.0943.00
Human1'7860.0872.79
Bacteria1'5350.0752.40
Animal1'0940.0531.71
Biomedical Technique1'0930.0531.71
Anatomical Location9930.0481.55
Dietary Supplement7690.0371.20
Drug3790.0180.59
Statistical Technique3610.0180.56
Gene3010.0150.47
Food2710.0130.42

Relation Statistics

Relation Label Count Ratio Avg Per Doc
Influence1'4810.1732.32
Located In1'2820.1502.01
Target1'1770.1381.84
Is Linked To8880.1041.39
Affect8310.0971.30
Interact5240.0610.82
Impact5120.0600.80
Used By4780.0560.75
Change Abundance2970.0350.46
Is A2800.0330.44
Administered2400.0280.38
Part Of2060.0240.32
Strike1320.0150.21
Change Effect1050.0120.16
Change Expression870.0100.14
Produced By290.0030.05
Compared To70.0010.01

Train Silver Collection Statistics

The tables below provide detailed statistics for entities and relations in the Train Silver collection.

Entity Statistics

Entity Label Count Ratio Avg Per Doc
DDF8'5040.32510.49
Chemical4'3100.1655.31
Microbiome2'2430.0862.77
Bacteria2'0310.0782.50
Human2'0000.0772.47
Anatomical Location1'6610.0642.05
Biomedical Technique1'6570.0632.04
Animal1'0180.0391.26
Dietary Supplement8310.0321.02
Drug6290.0240.78
Statistical Technique4440.0170.55
Food4140.0160.51
Gene3920.0150.48

Relation Statistics

Relation Label Count Ratio Avg Per Doc
Influence2'0990.1922.59
Located In1'6450.1512.03
Target1'2540.1151.55
Is Linked To9030.0831.11
Affect8990.0821.11
Is A7400.0680.91
Interact7020.0640.87
Impact5210.0480.64
Part Of4410.0400.54
Used By4280.0390.53
Strike3430.0310.42
Administered3180.0290.39
Change Effect2580.0240.32
Change Abundance2370.0220.29
Change Expression630.0060.08
Produced By520.0050.06
Compared To40.0000.00

Train Silver 2025 Collection Statistics

The tables below provide detailed statistics for entities and relations in the Train Silver 2025 collection.

Entity Statistics

Entity Label Count Ratio Avg Per Doc
DDF5'5840.36611.19
Chemical1'8710.1223.75
Microbiome1'5990.1053.20
Human1'1990.0782.40
Bacteria1'1290.0742.26
Anatomical Location8560.0561.72
Dietary Supplement6600.0431.32
Biomedical Technique6550.0431.31
Drug5000.0331.00
Animal4830.0320.97
Gene3190.0210.64
Statistical Technique2580.0170.52
Food1620.0110.32

Relation Statistics

Relation Label Count Ratio Avg Per Doc
Influence2'5360.2395.08
Is Linked To1'7100.1613.43
Target1'5400.1453.09
Located In1'0490.0992.10
Impact8350.0791.67
Change Abundance5350.0501.07
Change Effect4690.0440.94
Used By3910.0370.78
Affect3830.0360.77
Part Of3550.0330.71
Interact2880.0270.58
Produced By2470.0230.49
Strike1240.0120.25
Change Expression920.0090.18
Administered340.0030.07
Is A180.0020.04
Compared To100.0010.02

Train Bronze Collection Statistics

The tables below provide detailed statistics for entities and relations in the Train Bronze collection.

Entity Statistics

Entity Label Count Ratio Avg Per Doc
DDF34'5600.38411.63
Chemical13'0540.1454.39
Microbiome8'6250.0962.90
Anatomical Location6'4090.0712.16
Human5'8740.0651.98
Bacteria4'9590.0551.67
Biomedical Technique3'8510.0431.30
Animal3'0900.0341.04
Dietary Supplement3'0170.0341.02
Drug2'3580.0260.79
Food1'7420.0190.59
Gene1'3040.0140.44
Statistical Technique1'1440.0130.38

Relation Statistics

Relation Label Count Ratio Avg Per Doc
Influence6'6520.2242.24
Target4'1630.1401.40
Is Linked To3'9050.1321.31
Located In3'8300.1291.29
Affect3'0780.1041.04
Is A1'7130.0580.58
Impact1'4940.0500.50
Used By9190.0310.31
Strike8500.0290.29
Interact7790.0260.26
Part Of6010.0200.20
Administered5540.0190.19
Change Abundance4980.0170.17
Produced By3170.0110.11
Change Effect2930.0100.10
Change Expression440.0010.01
Compared To20.0000.00

Dev Collection Statistics

The tables below provide detailed statistics for entities and relations in the Dev collection.

Entity Statistics

Entity Label Count Ratio Avg Per Doc
DDF7930.3159.91
Chemical3660.1454.58
Microbiome2310.0922.89
Human1920.0762.40
Bacteria1830.0732.29
Anatomical Location1690.0672.11
Animal1520.0601.90
Biomedical Technique1390.0551.74
Drug750.0300.94
Dietary Supplement640.0250.80
Gene630.0250.79
Food590.0230.74
Statistical Technique350.0140.44

Relation Statistics

Relation Label Count Ratio Avg Per Doc
Located In2090.1662.61
Influence1770.1402.21
Target1720.1362.15
Affect1420.1131.77
Is Linked To1340.1061.68
Administered760.0600.95
Used By740.0590.93
Impact640.0510.80
Is A490.0390.61
Interact430.0340.54
Change Effect340.0270.42
Part Of310.0250.39
Strike230.0180.29
Change Abundance160.0130.20
Change Expression70.0060.09
Produced By60.0050.07
Compared To40.0030.05

Submitting

Participating teams should satisfy the following guidelines:

  • The runs should be submitted in the format described below;
  • Each group can submit a maximum of 25 runs for each subtask.

Submissions are handled via the BioASQ submission system linked below:

Available after test set release

Submission Guidelines

Participants are invited to submit results for any or all of the four subtasks (NER, NERD, M-RE, C-RE) independently.

In the following, a "run" refers to the predictions made by a single system on the test set.

All runs must be submitted in a single zipped file named "teamID_GutBrainIE_2026.zip". Within this zip archive, each run should be placed in a separate folder named "teamID_taskID_runID_systemDesc" without spaces or any special character, where:

  • teamID is the identifier of your team choosen when you registered to CLEF 2026;
  • taskID is the identifier of your task choosen, i.e., one of these tokens: T611 for task 6.1.1, T612 for task 6.1.2, T621 for task 6.2.1, and T622 for task 6.2.2 ;
  • runID is the identifier of the run choosen by the participants, containing only letters and numbers (a-z, A-Z, 0-9)
  • systemDesc is an optional (short) string that further describes your submission.
The content of each folder consists of two files:
  • teamID_taskID_runID_systemDesc.json: A JSON file with predictions on the test set.
  • teamID_taskID_runID_systemDesc.meta: A metadata briefly describing the approach, including:
    • Team ID.
    • Task ID.
    • Run ID.
    • Type of training applied.
    • Pre-processing methods.
    • Training data used.
    • Relevant details of the run.
    • A link to a GitHub repository enabling easy reproducibility of the run.

Please ensure that the zip archive you submit strictly adheres to this filenames structure. The team name must remain consistent across all submissions, and the system name shold reflect the approach used by the submitted run. The run number serves as a progressive identifier for multiple submissions of the same system. An example of a valid submission wil be available soon...

Submissions not adhering to these guidelines might be rejected. Please ensure accuracy and completeness of all submitted files to streamline evaluation.

An example of a valid submission can be downloaded from HERE. Please note that the predicted entities and relations included in this example are dummy data, provided solely to illustrate the correct structure and formatting of a valid submission folder.

To ensure your submission follows the required structure, you can use the validation script available HERE.

Example of Submission Format

Submissions should follow the same JSON format used in the provided datasets and must include only the field associated with the specific subtask for which the submission is made:

  • Subtask 6.1.1 (NER): Include the entities field WITHOUT the uri subfield.
  • Subtask 6.1.2 (NERD): Include the entities field WITH the uri subfield.
  • Subtask 6.2.1 (Mention-level RE): Include the mention_level_relations field.
  • Subtask 6.2.2 (Concept-level RE): Include the concept_level_relations field.

Each entry must correspond to the PubMed ID of the article from the test set being considered.

Find below samples of valid entries for each submission type:

Subtask 6.1.1 (NER) Submission Sample

					
{
	"34870091": {
		"entities": [
			{
				"start_idx": 75,
				"end_idx": 82,
				"location": "title",
				"text_span": "patients",
				"label": "human"
			},
			{
				"start_idx": 250,
				"end_idx": 270,
				"location": "abstract",
				"text_span": "intestinal microbiome",
				"label": "microbiome"
			}
		]
	}
}
				
				

Subtask 6.1.2 (NERD) Submission Sample

					
{
	"34870091": {
		"entities": [
			{
				"start_idx": 75,
				"end_idx": 82,
				"location": "title",
				"text_span": "patients",
				"label": "human",
				"uri": "http://id.nlm.nih.gov/mesh/D010361"
			},
			{
				"start_idx": 250,
				"end_idx": 270,
				"location": "abstract",
				"text_span": "intestinal microbiome",
				"label": "microbiome",
				"uri": "http://purl.obolibrary.org/obo/NCIT_C93019"
			}
		]
	}
}
				
				

Subtask 6.2.1 (Mention-Level RE) Submission Sample

				
{
	"34870091": {
		"mention_level_relations": [
			{
				"subject_text_span": "intestinal microbiome",
				"subject_label": "microbiome",
				"predicate": "located in",
				"object_text_span": "patients",
				"object_label": "human"
			}
		]
	}
}
				
				

Subtask 6.2.2 (Concept-Level RE) Submission Sample

				
{
	"34870091": {
		"concept_level_relations": [
			{
				"subject_uri": "http://purl.obolibrary.org/obo/NCIT_C93019",
				"subject_label": "microbiome",
				"predicate": "located in",
				"object_uri": "http://id.nlm.nih.gov/mesh/D010361",
				"object_label": "human"
			}
		]
	}
}
				
				

Validation Script

A validation script that can be used to verify the correctness of the submission format can be downloaded from HERE.

Participant Papers

Participants are expected to write a report describing their participation in GutBrainIE CLEF 2026, their proposed solution, what features have been used for prediction, the analysis of the experimental results and insights derived from them.

Participant reports will be peer-reviewed and accepted reports will be published in the CLEF 2026 Working Notes at CEUR-WS, indexed in DBLP and Scopus.

Participants are strongly encouraged to use LaTeX, for which a template is available for download HERE. The template follows the CEURART single-column style and not only defines the layout of the report but also provides a suggested structure and outlines the expected content for each section.

It is also possible to use ODT (LibreOffice) or DOCX (Microsoft Word) formats. In these cases, participants should download the official CEURART templates directly from the CEUR website HERE and ensure that the content and structure match those outlined in the provided LaTeX template.

Participant papers are expected to be 10-20 pages in length, excluding references.

The schedule for submission, revision, and camera ready of the participant report is detailed above in the Important Dates section.

Participant papers can be submitted via Easychair HERE.

Baselines & Evaluation

Official Baselines

The official baselines for the GutBrainIE CLEF 2026 challenge can be found and reproduced by accesing this GitHub repository.

The repository provides a complete implementation of the baseline pipeline, allowing participants to reproduce the results from scratch, test the pre-trained models, or evaluate the provided predictions (i.e., runs) of the baselines for all subtasks (NER, NERD, and RE). These runs can also be used by participants as references of valid submission formats.

The repository also contains the official evaluation script, which will be used to assess submitted runs after the system submission deadline. It is crucial that participants ensure their submitted runs are compatible with this script, as submissions that cannot be evaluated with it will be considered invalid. The evaluation script can also be used to obtain preliminary performance results on the development set.

The detailed results of the baselines on the development set can be found in the Results section.

Evaluation Metrics

Submitted runs will be evaluated using standard Information Extraction metrics to assess both per-label and overall system performance. The employed metrics are the same for all the subtasks (a detailed explanation of these can be found HERE):

In the equations below:

  • \(TP\) (True Positives) refers to the number of correctly predicted entities or relations.
  • \(FP\) (False Positives) refers to the number of wrongly predicted entities or relations.
  • \(FN\) (False Negatives) refers to the number of entities or relations in the ground truth that were not predicted.
  • \(\mathcal{L}\) is the set of labels referring to:
    • For subtasks 6.1.1 and 6.1.2: entity labels.
    • For subtasks 6.2.1 and 6.2.2: triples of (subject label, predicate, object label).
  • Macro-average Precision: \( P_{\text{macro}_{\text{avg}}} = \frac{\sum_{l \in \mathcal{L}} \, \frac{TP_l}{TP_l \,+\, FP_l}}{|\mathcal{L}|} \)
  • Macro-average Recall: \( R_{\text{macro}_{\text{avg}}} = \frac{\sum_{l \in \mathcal{L}} \, \frac{TP_l}{TP_l \,+\, FN_l}}{|\mathcal{L}|} \)
  • Macro-average F1-score: \( F1_{\text{macro}_{\text{avg}}} = 2 \, \times \, \frac{P_{\text{macro}_{\text{avg}}} \; \times \; R_{\text{macro}_{\text{avg}}}}{P_{\text{macro}_{\text{avg}}} \; + \; R_{\text{macro}_{\text{avg}}}} \)
  • Micro-average Precision: \( P_{\text{micro}_{\text{avg}}} = \frac{\sum_{l \in \mathcal{L}} \; TP_l}{\sum_{l \in \mathcal{L}} \; TP_l \; + \; FP_l} \)
  • Micro-average Recall: \( R_{\text{micro}_{\text{avg}}} = \frac{\sum_{l \in \mathcal{L}} \; TP_l}{\sum_{l \in \mathcal{L}} \; TP_l \; + \; FN_l} \)
  • Micro-average F1-score: \( F1_{\text{micro}_{\text{avg}}} = 2 \, \times \, \frac{P_{\text{micro}_{\text{avg}}} \; \times \; R_{\text{micro}_{\text{avg}}}}{P_{\text{micro}_{\text{avg}}} \; + \; R_{\text{micro}_{\text{avg}}}} \)

For each subtask, the reference metric for the final leaderboard will be the micro-average F1-score, as it better accounts for class imbalances. However, system rankings will also be provided for each of the metrics detailed above.

Baseline Results

The scores reported in the table below refer to the development set and represent the actual evaluation of the inference capabilities of the proposed baseline models across the four subtasks.

Sub-task Macro-P Macro-R Macro-F1 Micro-P Micro-R Micro-F1
NER 0.7114 0.7480 0.7267 0.7782 0.8221 0.7996
NERD 0.3820 0.4045 0.3916 0.4281 0.4522 0.4398
Mention-level RE 0.3660 0.2862 0.3003 0.4462 0.3453 0.3893
Concept-level RE 0.1009 0.1021 0.0966 0.1409 0.1292 0.1348

FAQs

Find answers to common questions about the GutBrainIE challenge, dataset, and submissions.

Coming soon...

If you need any additional information, please get in contact with us writing to:

Organizers

(For further details about the organizers, visit the official website of the IIIA Hub Research Group.)

Organizer Support

(Supporters of the annotation effort)
Guglielmo Faggioli
Guglielmo Faggioli
University of Padua
Italy
Benedikt Kantz
Benedikt Kantz
Graz University of Technology
Austria
Simone Merlo
Simone Merlo
University of Padua
Italy
Riccardo Michelotto
Riccardo Michelotto
University of Padua
Italy
Gaia Tussardi
Gaia Tussardi
University of Padua
Italy
Peter Waldert
Peter Waldert
Graz University of Technology
Austria