language en

The HEREDITARY Ontology for multi-center phenolinical and genomics data

Release October 2024

Authors:
Laura Menotti (University of Padua)
Gianmaria Silvello (University of Padua)
Contributors:
Members of the HEREDITARY project, HEREDITARY Horizon Europe
Publisher:
Hereditary project
This project has received funding from the HEREDITARY Project, as part of the European Union's Horizon Europe research and innovation programme under grant agreement No GA101137074.
Genomics Data Modelling:
Ontology Documentation
Phenoclinical Data Modelling:
Ontology Documentation
License:
https://creativecommons.org/licenses/by/4.0/ License

Ontology Specification Document

This webpage describes the design and development of the HEREDITARY Ontology (HERO) which refines the Brainteaser Ontology (BTO) for modeling clinical and genomics data related to Amyotrophic Lateral Sclerosis (ALS) and Multiple Sclerosis (MS).

HERO is currently structured to capture both phenoclinical and genomics aspects of these diseases. THe ontology is divided into HERO-Clinical for phenoclinical aspects of brain-related diseases and HERO-Genomics, for the representation of genomic data.

Future phases will include further refinements to HERO-Genomics for expanded genomics data capabilities, along with extending HERO-Clinical to cover additional brain-related diseases and mental disorders. Additionally, HERO will be expanded to include aspects of the gut-brain interplay, with a focus on developing a gut microbiome ontology, an area currently with limited foundational work or related models. This approach will ensure that HERO remains robust and scalable, allowing it to capture complex biomedical data across a range of brain-related conditions and emerging research areas.

HERO-Clinical

HERO-Clinical models the phenoclinical aspects of brain-related diseases, focusing on ALS and MS, while preparing to extend coverage to Parkinson’s disease, Alzheimer’s disease, and various mental disorders. This extension of the BTO incorporates the diverse and heterogeneous clinical characteristics of the medical centers involved in the HEREDITARY projects.

The complete documentation of HERO-Clinical, including technical details, is available at: hereditary.dei.unipd.it/ontology/phenoclinical/.

HERO-Genomics

HERO-Genomics supports the representation of genomics data specific to ALS and MS, enabling the documentation of gene mutations linked to these diseases. The current version includes a detailed framework for genetic testing data or for storing specific gene sequencing variations, such as Single Nucleotide Polymorphisms (SNPs).

The complete documentation of HERO-Genomics, including technical details, is available at: hereditary.dei.unipd.it/ontology/genomics/.

HERO Gut-Brain Axis

HERO Gut-Brain Ontology (GBO) captures the interactions along the gut-brain axis. By formalizing both anatomical and microbiota relationships, GBO enables systematic exploration of how gut microbiota, metabolites, and immune signals influence neurological health and disease. The primary focus surrounds a large spectrum of disorders from Amyotrophic Lateral Sclerosis, Multiple Sclerosis, Parkinson's disease, Alzheimer's disease, Frontotemporal Dementia, to stroke, diabetes, obesity, and neuropsychiatric conditions such as depression and ADHD; each of them has a connection to the gut-brain interaction. By integrating clinical phenotypes, experimental findings, and molecular pathways into a unified and consistent ontology, GBO will help researchers to discover new risk factors, predict treatment responses, and raise public awareness of the critical role the gut-brain axis plays in human health.

The complete documentation of GBO, including technical details, is available at: https://hereditary.dei.unipd.it/ontology/gutbrain/.

Role of the HEREDITARY Ontology

HERO is the backbone of the Semantic Data Integration, which exploits the Ontology-Based Data Access (OBDA) technology to query, aggregate, and join large heterogeneous data in a distributed manner using a unique query language, i.e., SPARQL.

In Figure 1 it is possible to see the architecture of the federated data analytics system. The polystore, based on the OBDA paradigm, utilizes GraphDB, an Ontotext graph DBMS integrated with ONTOP. This integration facilitates SPARQL querying over the HEREDITARY ontology. Rather than instantiating RDF data according to the ontology, we map the ontology to local data. ONTOP, integrated within GraphDB, unfolds each SPARQL query and maps it to SQL queries. These SQL queries are then executed on the integrated schema by the Virtual Data Lakehouse, powered by Dremio.

Importantly, the HEREDITARY approach maintains data integrity by not physically copying data into the lakehouse; instead, it remains virtually accessible. The virtual data lakehouse system federates queries to local DBMS (including relational databases, CSV files, and document stores), where they are executed. Additionally, A privacy layer is provided, atop the local data stores, ensuring that sensitive data remains within these local repositories and does not traverse to the virtual lakehouse.

HEREDITARY architecture
Figure 1.The HEREDITARY ontology and its role in the overall HEREDITARY Federated Analytics architecture.