Prototyping with ontologies
2025 is here, with a lot of things in the horizon, though we found interesting to look back at some of our work of 2024.
Our mantra is Without metadata, your data make no sense, and we think that expressing metadata using the tools of the semantic web is a must.
Ontologies form a core component of this web of meaning, helping with producing quality open metadata and achieving FAIR science.
We explored many topics including:
- provenance with PROV-O and Structured Data Transformation History (SDTH)
- medical examintations with Semantic Sensor Network (SSN)
Documenting provenance
One of the main challenges of modern data management is to provide information regarding what happened to data throught time, establishing provenance.
PROV-O provides high-level concepts for describing which data (Entities) has been transformed (through Activities) by whom (Agents).
We were able to demonstrate the creation of such an information graph directly from source material. More precisely, we developped a program that takes various unstructured documentation and produces RDF triples.
We extended our work to build an even more detailled lineage of how the data has been transformed using a soon-to-be-published vocabulary named SDTH.
SDTH is intended as a mean to open the black boxes that are Activities in PROV-O in order to document what part of a dataset has been changed, for example by tracking how a particular set of instructions in a program has transformed a variable (e.g. the column of a table).
We implemented as a proof of concept the building of an SDTH graph directly from a VTL program with fairly good results.
For that, we developped a dedicated module in Trevas, the Java VTL engine, that produces the RDF triples during the program execution. For more details, check out the blog post here.
Documenting medical examinations
Longitudinal health research studies rely on qualitative data (for example via questionnaires) but are also using quantitative data that can come directly from medical examination.
When collecting data from the machines used for the examinations, SSN helps with adding meaning and structure to this quite raw data.
For example, we can represent the heigh measurement of a patient (here, the member of a health cohort):
<patient/1234/activiteCardiaque> a sosa:ObservableProperty , ssn:Property ;
  rdfs:label "Cardiac activity of volunteer #1234"@en , "Activité cardiaque du volontaire n° 1234"@fr ;
  ssn:isPropertyOf <patient/1234> .
<enregistreurECG/27> a sosa:Sensor ;
  rdfs:label "ECG recorder #27"@en , "Enregistreur ECG n° 27"@fr ;
  rdfs:comment "ECG recorder #27"@en , "Enregistreur ECG n° 27"@fr ;
  sosa:isHostedBy <ces/1> .
SSN being built upon the Sensor, Observation, Sample, and Actuator ontology (SOSA), we use both ontologies to represent this information.
Of course, because we help lifting existing systems to quality data management, we also worked on the mapping of actual data to this semantic representation.
As we continue our work on those two topics, we will provide more guidance on how to implement solutions, we might even open source solutions, stay tuned!