Participation Guidelines

GutBrainIE CLEF 2025

Task #6 of the BioASQ Lab 2025

(An HEREDITARY challenge)

Register

Tasks

GutBrainIE CLEF 2025 is the TASK #6 of the BioASQ CLEF Lab 2025, proposing a Natural Language Processing (NLP) challenge on biomedical texts within the context of the EU-supported project HEREDITARY.

Specifically, it is focused on extracting structured information from biomedical abstracts related to the gut microbiota and its connections with Parkinson's disease and mental health, aiming to foster the development of Information Extraction (IE) systems that can support experts in understanding the gut-brain interplay.

The GutBrainIE task is divided into two main subtasks. In the first task, participants are asked to identify and classify specific text spans into predefined categories, while in the second one they have to determine if a particular relationship defined between two categories holds or not.

The training data for all the subtasks are available upon registration for the challenge and the test data will be available around two weeks before the run submission deadline. The training set is divided into four parts:

  • Gold Collection: An highly curated dataset manually annotated by a team of 7 expert annotators from the University of Padua, Italy;
  • Platinum Collection: A subset of the gold annotations further validated by biomedical experts from the Radboud University Medical Center, Netherlands;
  • Silver Collection: A weakly curated dataset manually annotated by a team of about 40 students of Linguistics and Terminology trained and supervised by the experts;
  • Bronze Collection: A distantly supervised dataset comprising automatically generated annotations. Note that no manual revision has been performed on this set.

The test set is a held-out selection of documents from the gold and platinum collections, consisting exclusively of titles and abstracts for each document, ensuring representativeness and full coverage of all entity and relation types.

Please see the Datasets and Important Dates sections for more information.

Subtask 6.1 - Named Entity Recognition

Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to classify specific text spans (entity mentions) into one of the 13 predefined categories, such as bacteria, chemical, microbiota.

Entity mentions are expressed as tuples (entityLabel ; entityLocation (title/abstract) ; startOffset ; endOffset).

Subtask 6.2.1 - Binary Relation Extraction

Participants are provided with PubMed abstracts discussing the gut-brain interplay and are asked to identify which entities are in relation within a document. No relation type needs to be predicted within this subtask.

Relations are expressed as pairs (entityLabel1 ; entityLabel2).

Subtask 6.2.2 - Ternary Tag-Based Relation Extraction

Participants are asked to identify which entities are in relation within a document and predict the type of relation between them.

Relations are expressed as triples (entityLabel1 ; relationLabel ; entityLabel2).

Subtask 6.2.3 - Ternary Mention-Based Relation Extraction

Participants are required to identify the actual entities involved in a relation and predict the type of relation.

Relations are expressed as triples (entityMention1 ; relationLabel ; entityMention2).


For each task, the test set consists of a collection of documents, each including only the PubMed ID, title, and abstract. Given a test document, participants are required to extract tuples that include all the fields for the respective task. For example, in the NER task, tuples must contain all and only the following fields: (entityLabel ; entityLocation (title/abstract) ; startOffset ; endOffset). For more details, please refer to the submission format examples provided in the Submitting section.

Participating

To participate in GutBrainIE CLEF 2025, groups need to register to the BioASQ Laboratory at the following link:

Register

Access to the data will be given within 24 hours from the registration.

Important Dates

  • Registration closes: April 25, 2025
  • Test data release: April 28, 2025
  • Runs submission deadline: May 10, 2025
  • Evaluation results out: May 19, 2025
  • Participant and position paper submission deadline: May 30, 2025
  • Notification of acceptance for participant and position papers: June 27, 2025
  • Camera-ready participant papers submission: July 7, 2025
  • GutBrainIE CLEF Workshop: September 9-12, 2025 during the CLEF Conference in Madrid, Spain

Datasets

Data Description

The data for this task is a set of titles and abstracts of biomedical articles retrieved from PubMed, focusing on the gut-brain interplay and its implications in neurological and mental health.

The datasets are organized into:
  • Platinum-Standard Annotations: Highest-quality annotations, expert-curated and reviewed by external biomedical specialists.
  • Gold-Standard Annotations: High-quality annotations, expert-curated.
  • Silver-Standard Annotations: Mid-quality annotations, created by trained students under expert supervision. The students are organized into two clusters:
    • StudentA, including those with more consistent annotation performance,
    • StudentB, including those with less consistent annotation performance.
  • Bronze-Standard Annotations: Automatically generated annotations, using fine-tuned GLiNER for Named Entity Recognition (NER) and fine-tuned ATLOP for Relation Extraction (RE).
To foster the development of effective IE systems for the subtasks proposed within GutBrainIE, the provided datasets include:
  • Entity Mentions: Text spans classified into predefined categories.
  • Relations: Associations between entities, specifying that a particular relationship holds between two entities.

Collections

  • Training and Development Data: Available upon registration. After registering, participants will receive an email with a download link.
  • Test Data: The test set will be available from April 28, 2025. This data will be used for final system evaluation.

Dataset Format

Annotations are provided in JSON format, for ease of use with NLP systems. Each entry in the dataset corresponds to a PubMed article, identified by its PubMed ID (PMID), and includes the following fields:
  • Metadata: Article-related information, including title, author, journal, year, and abstract. It also includes the identifier of the annotator: expert annotators are labeled as expert_1 to expert_7, student annotators are grouped into two clusters, identified as student_A and student_B, and automatically generated annotations are labeled as distant. Participants may use this information to assign different weights to annotations based on who performed them.
  • Entities: An array of objects where each object represents an annotated entity mention in the article, with the following attributes:
    • start and end indices: Character offsets marking the span of the entity mention.
    • location: Indicates if the entity mention is located in the title or in the abstract.
    • text_span: The actual text span of the entity mention.
    • label: The label assigned to the entity mention (e.g., bacteria, microbiome).
  • Relations: An array of objects where each object represents an annotated relationship between two entity mentions in the article, with the following attributes:
    • subject_start and subject_end indices: Character offsets marking the span of the subject entity mention.
    • subject_location: Indicates if the subject entity mention is located in the title or in the abstract.
    • subject_text_span: The actual text span of the subject entity mention.
    • subject_label: The label assigned to the subject entity mention (e.g., bacteria, microbiome).
    • predicate: The label assigned to the relationship.
    • object_start and object_end indices: Character offsets marking the span of the object entity mention.
    • object_location: Indicates if the object entity mention is located in the title or in the abstract.
    • object_text_span: The actual text span of the object entity mention.
    • object_label: The label assigned to the object entity mention (e.g., bacteria, microbiome).
  • Binary Tag-based Relations: Relations extracted from the Relations array, formatted as tag-based pairs of subject_label and object_label.
  • Ternary Tag-based Relations: Relations extracted from the Relations array, formatted as tag-based triples of subject_label, predicate, and object_label.
  • Ternary Mention-based Relations: Relations extracted from the Relations array, formatted as mention-based tuples of subject_text_span, subject_label, predicate, object_text_span, and object_label.

Alternative Formats

For those more familiar with CSV or tabular formats, the dataset is also provided in these formats. In this case, each of the fields mentioned above is stored in a separate file:
  • Metadata file
  • Entities file
  • Relations file
  • Binary Tag-based file
  • Ternary Tag-based file
  • Ternary Mention-based file
The CSV files use the pipe symbol (|) as a separator, while tabular files use the tab character (\t) for separation.

Entity and Relation Labels

The set of entities considered for annotations includes:

Entity Label URI Definition
Anatomical Location NCIT_C13717 Named locations of or within the body. 
Animal NCIT_C14182 A non-human living organism that has membranous cell walls, requires oxygen and organic foods, and is capable of voluntary movement, as distinguished from a plant or mineral.
Biomedical Technique NCIT_C15188
Research concerned with the application of biological and physiological principles to clinical medicine.
Bacteria NCBITaxon_2 One of the three domains of life (the others being Eukarya and ARCHAEA), also called Eubacteria. They are unicellular prokaryotic microorganisms which generally possess rigid cell walls, multiply by cell division, and exhibit three principal forms: round or coccal, rodlike or bacillary, and spiral or spirochetal.
Chemical CHEBI_59999 A chemical substance is a portion of matter of constant composition, composed of molecular entities of the same type or of different types. This category also includes metabolites, which in biochemistry are the intermediate or end product of metabolism, and neurotransmitters, which are endogenous compounds used to transmit information across the synapses.
Dietary Supplement MESH_68019587 Products in capsule, tablet or liquid form that provide dietary ingredients, and that are intended to be taken by mouth to increase the intake of nutrients. Dietary supplements can include macronutrients, such as proteins, carbohydrates, and fats; and/or micronutrients, such as vitamins; minerals; and phytochemicals.
Disease, Disorder, or Finding (DDF) NCIT_C7057 A condition that is relevant to human neoplasms and non-neoplastic disorders. This includes observations, test results, history and other concepts relevant to the characterization of human pathologic conditions.
Drug CHEBI_23888 Any substance which when absorbed into a living organism may modify one or more of its functions. The term is generally accepted for a substance taken for a therapeutic purpose, but is also commonly used for abused substances.
Food NCIT_C1949  A substance consumed by humans and animals for nutritional purpose
Gene SNOMEDCT_67261001 A functional unit of heredity which occupies a specific position on a particular chromosome and serves as the template for a product that contributes to a phenotype or a biological function.
Human NCBITaxon_9606 Members of the species Homo sapiens.
Microbiome OHMI_0000003 This term refers to the entire habitat, including the microorganisms (bacteria, archaea, lower and higher eukaryotes, and viruses), their genomes (i.e., genes), and the surrounding environmental conditions.
Statistical Technique NCIT_C19044
A method of calculating, analyzing, or representing statistical data.

while the defined set of relations includes:

Head Entity Tail Entity Predicate
Anatomical Location Human / Animal Located in
Bacteria Bacteria / Chemical / Drug Interact
Bacteria DDF Influence
Bacteria Gene Change expression
Bacteria Human / Animal Located in
Bacteria Microbiome Part of
Chemical Anatomical Location / Human / Animal Located in
Chemical Chemical Interact / Part of
Chemical Microbiome Impact / Produced by
Chemical / Dietary Supplement / Drug / Food Bacteria / Microbiome Impact
Chemical / Dietary Supplement / Food DDF Influence
Chemical / Dietary Supplement / Drug / Food Gene Change expression
Chemical / Dietary Supplement / Drug / Food Human / Animal Administered
DDF Anatomical Location Strike
DDF Bacteria / Microbiome Change abundance
DDF Chemical Interact
DDF DDF Affect / Is a
DDF Human / Animal Target
Drug Chemical / Drug Interact
Drug DDF Change effect
Human / Animal / Microbiome Biomedical Technique Used by
Microbiome Anatomical Location / Human / Animal Located in
Microbiome Gene Change expression
Microbiome DDF Is linked to
Microbiome Microbiome Compared to

For more details on entity and relation labels, refer to the Annotation Guidelines available for download HERE.


Datasets Statistics

The table below provides an overview of the key statistics for each dataset collection:

Collection Num of Documents Total Entities Avg Entities per Doc Total Relations Avg Rels per Doc
Train Platinum 111 3'638 32.77 1'455 13.11
Train Gold 208 5'192 24.96 1'994 9.59
Train Silver 499 15'275 30.61 10'616 21.27
Train Bronze 749 21'357 28.51 8'165 11.90
Development Set 40 1'117 27.93 623 15.58

Train Platinum Collection Statistics

The tables below provide detailed statistics for entities and relations in the Train Platinum collection.

The Ratio column represents the proportion of each entity or relation label relative to the total number of entities or relations in the collection.

Entity Statistics

Entity Label Count Ratio Avg Per Doc
DDF1'2320.33911.10
Chemical4210.1163.79
Bacteria4160.1143.75
Human4010.1103.61
Microbiome3790.1043.41
Dietary Supplement2100.0581.89
Biomedical Technique1500.0411.35
Anatomical Location1490.0411.34
Animal1010.0280.91
Gene640.0180.58
Drug500.0140.45
Statistical Technique450.0120.41
Food200.0050.18

Relation Statistics

Relation Label Count Ratio Avg Per Doc
Influence2490.1712.24
Target2160.1481.95
Is Linked To2140.1471.93
Located In1770.1221.59
Affect1310.0901.18
Impact1050.0720.95
Used By1020.0700.92
Change Abundance690.0470.62
Part Of550.0380.50
Change Expression450.0310.41
Interact310.0210.28
Strike190.0130.17
Administered100.0070.09
Change Effect100.0070.09
Is A90.0060.08
Produced By70.0050.06
Compared To60.0040.05

Train Gold Collection Statistics

The tables below provide detailed statistics for entities and relations in the Train Gold collection.

Entity Statistics

Entity Label Count Ratio Avg Per Doc
DDF2'0270.3909.75
Chemical6210.1202.99
Microbiome5740.1112.76
Human4740.0912.28
Bacteria2810.0541.35
Anatomical Location2620.0501.26
Animal2260.0441.09
Dietary Supplement1850.0360.89
Biomedical Technique1830.0350.88
Statistical Technique1190.0230.57
Drug1180.0230.57
Gene640.0120.31
Food580.0110.28

Relation Statistics

Relation Label Count Ratio Avg Per Doc
Target3350.1681.61
Is Linked To3170.1591.52
Influence3030.1521.46
Located In2990.1501.44
Affect2760.1381.33
Used By960.0480.46
Impact920.0460.44
Change Abundance700.0350.34
Interact650.0330.31
Is A540.0270.26
Administered280.0140.13
Part Of170.0090.08
Change Effect170.0090.08
Strike140.0070.07
Produced By60.0030.03
Change Expression40.0020.02
Compared To10.0010.00

Train Silver Collection Statistics

The tables below provide detailed statistics for entities and relations in the Train Silver collection.

Entity Statistics

Entity Label Count Ratio Avg Per Doc
DDF5'5840.36611.19
Chemical1'8710.1223.75
Microbiome1'5990.1053.20
Human1'1990.0782.40
Bacteria1'1290.0742.26
Anatomical Location8560.0561.72
Dietary Supplement6600.0431.32
Biomedical Technique6550.0431.31
Drug5000.0331.00
Animal4830.0320.97
Gene3190.0210.64
Statistical Technique2580.0170.52
Food1620.0110.32

Relation Statistics

Relation Label Count Ratio Avg Per Doc
Influence2'5360.2395.08
Is Linked To1'7100.1613.43
Target1'5400.1453.09
Located In1'0490.0992.10
Impact8350.0791.67
Change Abundance5350.0501.07
Change Effect4690.0440.94
Used By3910.0370.78
Affect3830.0360.77
Part Of3550.0330.71
Interact2880.0270.58
Produced By2470.0230.49
Strike1240.0120.25
Change Expression920.0090.18
Administered340.0030.07
Is A180.0020.04
Compared To100.0010.02

Train Bronze Collection Statistics

The tables below provide detailed statistics for entities and relations in the Train Bronze collection.

Entity Statistics

Entity Label Count Ratio Avg Per Doc
DDF8'4210.39411.24
Chemical2'7190.1273.63
Microbiome2'2990.1083.07
Human1'6740.0782.23
Bacteria1'3360.0631.78
Anatomical Location1'1420.0531.52
Animal8570.0401.14
Dietary Supplement7870.0371.05
Biomedical Technique7700.0361.03
Drug5810.0270.78
Statistical Technique3360.0160.45
Gene2410.0110.32
Food1940.0090.26

Relation Statistics

Relation Label Count Ratio Avg Per Doc
Influence1'5720.1932.10
Target1'3570.1661.81
Is Linked To1'3370.1641.79
Affect1'1890.1461.59
Located In6210.0760.83
Is A5810.0710.78
Impact3440.0420.46
Used By3410.0420.46
Change Effect2580.0320.34
Change Abundance2290.0280.31
Administered1320.0160.18
Strike1090.0130.15
Interact610.0070.08
Part Of180.0020.02
Produced By120.0010.02
Change Expression30.0000.00
Compared To10.0000.00

Dev Collection Statistics

The tables below provide detailed statistics for entities and relations in the Dev collection.

Entity Statistics

Entity Label Count Ratio Avg Per Doc
DDF3790.3399.47
Chemical1310.1173.27
Microbiome1270.1143.17
Human860.0772.15
Anatomical Location760.0681.90
Animal730.0651.82
Drug600.0541.50
Bacteria540.0481.35
Gene390.0350.97
Biomedical Technique360.0320.90
Dietary Supplement270.0240.68
Food260.0230.65
Statistical Technique30.0030.07

Relation Statistics

Relation Label Count Ratio Avg Per Doc
Located In990.1592.48
Affect890.1432.23
Target850.1362.12
Is Linked To740.1191.85
Influence680.1091.70
Impact540.0871.35
Change Effect320.0510.80
Is A310.0500.78
Used By170.0270.42
Administered160.0260.40
Part Of130.0210.33
Strike110.0180.28
Change Abundance110.0180.28
Interact90.0140.23
Produced By60.0100.15
Change Expression40.0060.10
Compared To40.0060.10

Submitting

Participating teams should satisfy the following guidelines:

  • The runs should be submitted in the format described below;
  • Each group can submit a maximum of 25 runs for each subtask.

Submission Guidelines

Participants are invited to submit results for any or all of the four subtasks (NER, Binary RE, Tag-based Ternary RE, Mention-based Ternary RE) independently.

Submissions will be handled via the BioASQ submission system. More detailed information coming soon...

In the following, a "run" refers to the predictions made by a single system on the test set. Each run must be submitted as a single zipped file named "teamID_taskID_runID_systemDesc.zip" without spaces or any special character, where:

  • teamID is the identifier of your team choosen when you registered to CLEF 2025;
  • taskID is the identifier of your task choosen, i.e., one of these tokens: T61 for task 6.1, T621 for task 6.2.1, T622 for task 6.2.2, and T623 for task 6.2.3 ;
  • runID is the identifier of the run choosen by the participants, containing only letters and numbers (a-z, A-Z, 0-9)
  • systemDesc is an optional (short) string that further describes your submission.
The content of the zipped file consists of two files:
  • teamID_taskID_runID_systemDesc.json: A JSON file with predictions on the test set.
  • teamID_taskID_runID_systemDesc.meta: A metadata briefly describing the approach, including:
    • Team ID.
    • Task ID.
    • Run ID.
    • Type of training applied.
    • Pre-processing methods.
    • Training data used.
    • Relevant details of the run.
    • A link to a GitHub repository enabling easy reproducibility of the run.

Each zipped file should carefully follow this filenames structure. The team name must remain consistent across all submissions, and the system name shold reflect the approach used by the submitted run. The run number serves as a progressive identifier for multiple submissions of the same system. An example of a valid run submission wil be available soon...

Submissions not adhering to these guidelines might be rejected. Please ensure accuracy and completeness of all submitted files to streamline evaluation.

Example of Submission Format

Submissions should follow the same JSON format used in the provided datasets and must include only the field associated with the specific subtask for which the submission is made:

  • Subtask 6.1 (NER): Include the entities field.
  • Subtask 6.2.1 (Binary Tag-Based RE): Include the binary_tag_based_relations field.
  • Subtask 6.2.2 (Ternary Tag-Based RE): Include the ternary_tag_based_relations field.
  • Subtask 6.2.3 (Ternary Mention-Based RE): Include the ternary_mention_based_relations field.

Each entry must correspond to the PubMed ID of the article from the test set being considered.

Find below samples of valid entries for each submission type:

Subtask 6.1 (NER) Submission Sample

					
{
	"34870091": {
		"entities": [
			{
				"start_idx": 75,
				"end_idx": 82,
				"location": "title",
				"text_span": "patients",
				"label": "human"
			},
			{
				"start_idx": 250,
				"end_idx": 270,
				"location": "abstract",
				"text_span": "intestinal microbiome",
				"label": "microbiome"
			}
		]
	}
}
				
				

Subtask 6.2.1 (Binary Tag-Based RE) Submission Sample

				
{
	"34870091": {
		"binary_tag_based_relations": [
			{
				"subject_label": "microbiome",
				"object_label": "human"
			}
		]
	}
}
				
				

Subtask 6.2.2 (Ternary Tag-Based RE) Submission Sample

				
{
	"34870091": {
		"ternary_tag_based_relations": [
			{
				"subject_label": "microbiome",
				"predicate": "located in",
				"object_label": "human"
			}
		]
	}
}
				
				

Subtask 6.2.3 (Ternary Mention-Based RE) Submission Sample

				
{
	"34870091": {
		"ternary_mention_based_relations": [
			{
				"subject_text_span": "intestinal microbiome",
				"subject_label": "microbiome",
				"predicate": "located in",
				"object_text_span": "patients",
				"object_label": "human"
			}
		]
	}
}
				
				

Validation Script

A validation script that can be used to verify the correctness of the submission format can be downloaded from HERE.

Participant Papers

Participants are expected to write a report describing their participation in GutBrainIE CLEF 2025, their proposed solution, what features have been used for prediction, the analysis of the experimental results and insights derived from them.

Participant reports will be peer-reviewed and accepted reports will be published in the CLEF 2025 Working Notes at CEUR-WS, indexed in DBLP and Scopus.

Participants are strongly encouraged to use LaTeX, for which a template is available for download HERE. The template follows the CEURART single-column style and not only defines the layout of the report but also provides a suggested structure and outlines the expected content for each section.

It is also possible to use ODT (LibreOffice) or DOCX (Microsoft Word) formats. In these cases, participants should download the official CEURART templates directly from the CEUR website HERE and ensure that the content and structure match those outlined in the provided LaTeX template.

Participant papers are expected to be 10-20 pages in length, excluding references.

The schedule for submission, revision, and camera ready of the participant report is detailed above in the Important Dates section.

Participant papers can be submitted via Easychair HERE.

A template for participant papers can be downloaded HERE. The template follows the CEURART single-column style and not only defines the layout of the report but also provides a suggested structure and outlines the expected content for each section.

Baselines & Evaluation

The official baselines for the GutBrainIE CLEF 2025 challenge can be found and reproduced by accesing this GitHub repository.

The repository provides a complete implementation of the baseline pipeline, allowing participants to reproduce the results from scratch, test the pre-trained models, or evaluate the provided predictions (i.e., runs) of the baselines for both NER and RE. These runs can also be used by participants as references of valid submission formats.

The repository also contains the official evaluation script, which will be used to assess submitted runs after the system submission deadline. It is crucial that participants ensure their submitted runs are compatible with this script, as submissions that cannot be evaluated with it will be considered invalid. The evaluation script can also be used to obtain preliminary performance results on the development set.

The detailed results of the baselines on the development set can be found in the Results section.

Evaluation Metrics

Submitted runs will be evaluated using standard Information Extraction metrics to assess both per-label and overall system performance. The employed metrics are the same for all the subtasks (a detailed explanation of these can be found HERE):

In the equations below:

  • \(TP\) (True Positives) refers to the number of correctly predicted entities or relations.
  • \(FP\) (False Positives) refers to the number of wrongly predicted entities or relations.
  • \(FN\) (False Negatives) refers to the number of entities or relations in the ground truth that were not predicted.
  • \(\mathcal{L}\) is the set of labels referring to:
    • For subtask 6.1: entity labels.
    • For subtask 6.2.1: pairs of (subject label, object label).
    • For subtasks 6.2.2 and 6.2.3: triples of (subject label, predicate, object label).
  • Macro-average Precision: \( P_{\text{macro}_{\text{avg}}} = \frac{\sum_{l \in \mathcal{L}} \, \frac{TP_l}{TP_l \,+\, FP_l}}{|\mathcal{L}|} \)
  • Macro-average Recall: \( R_{\text{macro}_{\text{avg}}} = \frac{\sum_{l \in \mathcal{L}} \, \frac{TP_l}{TP_l \,+\, FN_l}}{|\mathcal{L}|} \)
  • Macro-average F1-score: \( F1_{\text{macro}_{\text{avg}}} = 2 \, \times \, \frac{P_{\text{macro}_{\text{avg}}} \; \times \; R_{\text{macro}_{\text{avg}}}}{P_{\text{macro}_{\text{avg}}} \; + \; R_{\text{macro}_{\text{avg}}}} \)
  • Micro-average Precision: \( P_{\text{micro}_{\text{avg}}} = \frac{\sum_{l \in \mathcal{L}} \; TP_l}{\sum_{l \in \mathcal{L}} \; TP_l \; + \; FP_l} \)
  • Micro-average Recall: \( R_{\text{micro}_{\text{avg}}} = \frac{\sum_{l \in \mathcal{L}} \; TP_l}{\sum_{l \in \mathcal{L}} \; TP_l \; + \; FN_l} \)
  • Micro-average F1-score: \( F1_{\text{micro}_{\text{avg}}} = 2 \, \times \, \frac{P_{\text{micro}_{\text{avg}}} \; \times \; R_{\text{micro}_{\text{avg}}}}{P_{\text{micro}_{\text{avg}}} \; + \; R_{\text{micro}_{\text{avg}}}} \)

For each subtask, the reference metric for the final leaderboard will be the micro-average F1-score, as it better accounts for class imbalances. However, system rankings will also be provided for each of the metrics detailed above.

Baseline Results

The scores reported on the development set represent the actual evaluation of the inference capabilities of the proposed baseline models. Scores on the training collections are provided to give a sense of how well the models fit the data they were trained on. Therefore, they should not be interpreted as evaluation results but rather as indicators of performance on the seen data.

Subtask 6.1 (NER)

Dataset Macro-P Macro-R Macro-F1 Micro-P Micro-R Micro-F1
Dev 0.6627 0.7473 0.6917 0.7561 0.8272 0.7901
Train Platinum 0.7784 0.8967 0.8228 0.8516 0.8994 0.8749
Train Gold 0.8047 0.9226 0.8567 0.8653 0.9368 0.8997
Train Silver 0.7891 0.8347 0.8095 0.8402 0.8729 0.8562
Train Bronze 0.7618 0.8586 0.8023 0.8478 0.8954 0.871

Subtask 6.2.1 (Binary Tag-Based RE)

Dataset Macro-P Macro-R Macro-F1 Micro-P Micro-R Micro-F1
Dev 0.5181 0.4330 0.4404 0.6585 0.4909 0.5625
Train Platinum 0.8542 0.9298 0.8672 0.858 0.943 0.8985
Train Gold 0.8195 0.9559 0.8667 0.7806 0.968 0.8643
Train Silver 0.8509 0.8462 0.8414 0.8619 0.9027 0.8818
Train Bronze 0.2453 0.2454 0.2257 0.6778 0.619 0.6471

Subtask 6.2.2 (Ternary Tag-Based RE)

Dataset Macro-P Macro-R Macro-F1 Micro-P Micro-R Micro-F1
Dev 0.4875 0.4231 0.4270 0.6585 0.4696 0.5482
Train Platinum 0.8533 0.9299 0.8677 0.859 0.9435 0.8993
Train Gold 0.8228 0.9566 0.8696 0.7834 0.967 0.8656
Train Silver 0.8547 0.8459 0.8431 0.8627 0.9022 0.882
Train Bronze 0.1126 0.1182 0.1033 0.6618 0.5483 0.5998

Subtask 6.2.3 (Ternary Mention-Based RE)

Dataset Macro-P Macro-R Macro-F1 Micro-P Micro-R Micro-F1
Dev 0.2746 0.1906 0.2111 0.4574 0.2875 0.3531
Train Platinum 0.7741 0.8352 0.7856 0.78 0.8681 0.8217
Train Gold 0.7586 0.8574 0.7883 0.744 0.8927 0.8116
Train Silver 0.7997 0.7778 0.7828 0.7939 0.8071 0.8005
Train Bronze 0.08 0.0867 0.0736 0.5084 0.413 0.4557

Results

Leaderboards

Available after the systems submission deadline...

Details

Available after the systems submission deadline...

FAQs

Find answers to common questions about the GutBrainIE challenge, dataset, and submissions.

Coming soon...

If you need any additional information, please get in contact with us writing to:

Organizers

(For further details about the organizers, visit the official website of the IIIA Hub Research Group.)