PolyMat - bringing semantics to polymer membrane
research
Marta Dembska1,*,† , Martin Held2,† and Sirko Schindler1,†
1
German Aerospace Center (DLR), Institute of Data Science, Mälzerstraße 5, 07745 Jena, Germany
2
Helmholtz-Zentrum Hereon, Institute of Membrane Research, Max-Planck-Str. 1, 21502 Geesthacht, Germany
Abstract
Electronic laboratory notebooks (ELNs) and other data management systems are increasingly replacing
more traditional means of documenting experimental processes to foster (meta)data capture and reuse.
However, the usage of free-text hinders automated, large-scale data processing and invites a variety of
data quality issues. Ontologies can address this issue by providing the necessary vocabulary and context
— a quality that also puts them at the core of the FAIR principles. The reusability of free-text lab notes
increases enormously through semantic descriptions of experimental data. Still, to date, many domains
are lacking sufficiently expressive ontologies for more advanced features like consistency checks at data
collection or large cross-experiment analyses.
For the field of polymer membranes, we present PolyMat, an ontology to document laboratory
experiments and their results. Located at the crossroads of material science and chemistry, this ontology
acts as a bridge and can enable new cross-domain discoveries. It is specifically designed to be used in
electronic laboratory notebooks and applicable for standardisation of terminology there, to ease and
improve FAIR-compliant data collection from the get-go.
Keywords
Ontology, Polymer Membrane, Electronic Lab Notebook
1. Introduction
Membranes play a crucial role in various applications of chemical technology including desalina-
tion of seawater, removal of fertiliser residue from drinking water, purification of carbon dioxide
before storage, or separation of natural gas and hydrogen in a mixed gas grid – essential tasks
in a sustainable world dealing with the effects of climate change [1, 2]. Polymeric materials,
renowned for their outstanding processability, cost-effectiveness, and abundance, remain central
in membrane development. In the interdisciplinary field of membrane science and technology,
collaboration spans various disciplines. Polymer chemists contribute to the development of
innovative membrane materials, while physical chemists and mathematicians work on models
to characterise transport properties. Finally, chemical engineers design large-scale industrial
separation processes. The expanse of this domain adds a considerable layer of complexity when
Research Data Management (RDM) is concerned.
SeMatS 2024: The 1st International Workshop on Semantic Materials Science co-located with the 20th International
Conference on Semantic Systems (SEMANTiCS), September 17–19, Amsterdam, The Netherlands.
*
Corresponding author.
†
These authors contributed equally.
$ marta.dembska@dlr.de (M. Dembska); martin.held@hereon.de (M. Held); sirko.schindler@dlr.de (S. Schindler)
0000-0002-8180-1525 (M. Dembska); 0000-0003-1869-463X (M. Held); 0000-0002-0964-4457 (S. Schindler)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Similar to many areas of research, membrane science and technology is undergoing a shift
towards an increasingly digitalised environment. With controlled laboratory experiments at its
core, electronic laboratory notebooks (ELNs) become an indispensable building block in this
transition by providing the interface to document each step in an experiment and sometimes
even (semi)automatically import results from measurement devices. This enhances the quality
of such records and improves reproducibility of experiments. A strong influence are the FAIR
principles [3] which focus on the findability, accessibility, interoperability, and reusability of
research artefacts. Being at the core of modern RDM, they are commonly addressed by using
suitable ontologies in metadata descriptions. Semantic concepts complement free-text fields
by providing contextual frameworks for measurements, research software, and other research
outputs. The domain of polymer membranes lacks a formal semantic description, impeding a
common understanding of terminology by the various disciplines with their local vocabularies.
An unanimous membrane ontology would constitute a machine-actionable, structured knowl-
edge base, paralleling literature and enabling cognitive reprocessing by researchers or machine
learning models.
In this paper, we introduce PolyMat, an ontology aimed at knowledge representation within
the domain of polymer membrane research. PolyMat serves as a framework for capturing and
organising information relevant to laboratory experiments, their results, and the modelling of
laboratory processes. The ontology is specifically designed to support the semantic annotation
of experiments using ELNs, thus fostering low-effort FAIRification of experimental results.
Being at the intersection of material science and chemistry, it provides a bridge between these
domains and can enable cross-domain discoveries.
1.1. Online resources
Resource type Ontology
Licence CC BY 4.0 International
URL https://w3id.org/polymat/
GitLab https://gitlab.com/dlr-dw/poly-ontologies/polymat-ontology
DOI 10.5281/zenodo.10286389 TiB terminology service https://terminology.tib.eu/ts/ontologies/
pmat
1.2. Terminology
The Polymat ontology is crafted for a particular domain and thus requires certain domain-specific
terms. While also defined in the ontology itself, we repeat some here to ease understanding.
Monomer A chemical substance whose molecules can be joined together to form a polymer.
Polymer Material made of long, repeating chains of molecules.
Membrane A barrier separating two volumes (both can be in a similar phase states).
Module Several membranes arranged sequentially.
2. Related work
2.1. Ontologies in materials science and engineering and chemistry
Materials science and engineering (MSE) comprises a vast community with diverse subfields,
presenting challenges in data interoperability and standardisation in RDM. One strategy for
achieving interoperability involves the reuse of existing ontologies and adherence to best
practices during the development. Further, alignment with Top Level Ontologies (TLOs) stream-
lines the harmonisation of different conceptual frameworks. To establish a standardised rep-
resentational ontology framework rooted in current understanding of materials modelling
and characterisation, the Elementary Multiperspective Material Ontology (EMMO) was devel-
oped [4]. It currently serves as the most commonly used TLO in MSE, with the Basic Formal
Ontology (BFO) right after [5]. However, as polymer membrane research incorporates aspects
of chemical engineering, existing MSE-specific ontologies frequently lack critical elements
describing corresponding processes, inhibiting their reuse.
Considering the prominence of polymer membranes and chemical processes, specific ontolo-
gies related to chemistry become essential components for knowledge representation within
the polymer membrane domain. Chemistry relies heavily on the accurate identification and
categorisation of chemical substances and their reactions. Information on chemical substances is
readily accessible in databases such as PubChem [6], CAS [7], or ChemSpider [8] where unique
numerical identifiers or chemical structure identifiers like SMILES [9] or InChI [10] facilitate
the linkage of data from diverse sources. Unfortunately, the multiple domain ontologies in
chemistry are mainly used without a coordinated approach unlike, e.g., the field of biomedicine
where the OBO Foundry [11] is the nucleus for most if not all relevant ontologies. Still, most of
the prominent ontologies for the chemistry domain adhere to the OBO Foundry principles and
are aligned to BFO [12]. Especially the BFO- and OBO-based Chemical Entities of Biological
Interest (ChEBI) [13] ontology finds extensive utilisation in chemistry as it integrates seam-
lessly with other domain-specific ontologies and offers a comprehensive and well-documented
classification of chemical entities but is also considered a domain ontology in MSE.
2.2. Electronic Laboratory Notebooks
As laboratories increasingly embrace digitalisation, ontologies become pivotal in providing
a structured understanding of chemical experiments. Thus, effectively integrating the use of
electronic laboratory notebooks (ELNs) implies simultaneously modelling the course of the
experiments themselves. This involves collecting data provenance of laboratory processes as
well as the measurement results themselves. Ontologies may be used in an ELN for knowledge
representation, classification, and connection of entries. Only a limited number of ELNs are
specifically designed for the chemical sciences. Chemotion [14] is a prominent web-based ELN
developed for the field of synthetic and analytical chemistry. Implementing Chemical Methods
Ontology (CHMO) [15] and Name Reaction Ontology (RXNO) [16] ontologies, it is one of very
few free ELNs in chemistry employing ontologies. Users can annotate a method used in chemical
analysis and a type of reaction utilising concepts available respectively in CHMO and RXNO.
These annotations enhance the Chemotion dataset by enabling connections based on method or
reaction types and facilitating ontology-based searches within the Chemotion repository [17].
Presently, the ELN facilitates data transfer to the Chemotion repository, a platform designed for
storing and managing chemical data that also offers various search options, including filtering
by CHMO.
Due to the diversity and interdisciplinary nature of MSE, general-purpose ELNs fall short
in addressing the entire range of the domain and only few specialised ELNs are available.
Kadi4Mat [18] is a generic ELN directed towards MSE experiments and simulation. Garabe-
dian et al. present several studies concerning the integration of a vocabulary [19, 20] and
an ontology [21] into Kadi4Mat, aiming at generating user interfaces and a completely FAIR
treatment of laboratory data [22, 23]. Furthermore, the non-generic ELN Herbie [24] is intended
to focus even more on the MSE aspects in the domain by enabling fully structured and validated
entries. Notably, this ELN stands out as one of the first in the domain of chemistry and MSE
that incorporates ontologies in its core. The system is designed to offer users the flexibility
to select and implement ontologies during form design, allowing for automatic form creation
based on the chosen ontology. This feature ensures a semantic connection between form fields
and their outputs in the ELN records, enhancing data integration and usability.
3. PolyMat ontology development
The motivation behind the PolyMat ontology stems from two key objectives: the documentation
of laboratory processes and knowledge representation within the domain. The domain of
polymer membrane research presents a challenge for solely reusing domain-specific ontologies
due to its unique combination. Despite these challenges, by establishing relationships between
domain-specific terms and connecting them with existing domain ontologies, the semantic
annotation of laboratory metadata becomes possible. This documentation also enhances RDM
practices in the domain of polymer membrane research, aligning with the ongoing digitalisation
initiatives in the field. These efforts of establishing digital processes and tools encompass both
bottom-up and top-down approaches, without necessarily implying organised consortia.
In terms of knowledge representation, establishing the PolyMat ontology is designed to ad-
dress future needs for modeling laboratory processes. By semantically representing a substantial
portion of knowledge, which has so far been documented only in paper notebooks or resides
within researchers’ minds, we can capture this information and make it accessible for both
human and machine use. The ultimate goal is to seamlessly integrate the ontology with ELNs,
contributing to a searchable, queryable, more efficient, and streamlined research environment.
3.1. Methodology
The ontology was constructed utilising the Linked Open Terms (LOT) methodology [25], recog-
nised as a suitable framework for crafting ontologies and vocabularies tailored for industry
projects. The LOT methodology is known for its lightweight and iterative approach, which
proved useful when involving domain experts without formal ontology engineering back-
grounds. Built on existing methodologies, LOT thoroughly addresses various crucial aspects
of PolyMat ontology development. The intentional inclusion of significant ontology reuse
supports ongoing community development, which aligns well with the methodology. Given our
emphasis on aligning with industrial development alongside academic, research, and software
Envelope
Polymer Membrane production
synthesis fabrication Module ...
Creation
fabrication ...
Analysis
Membrane
morphology
Chemical Gas membrane Module
analysis performance performance
Liquid membrane
performance
Figure 1: Laboratory experiments workflow in the polymer membrane domain. The workflow includes
two main groups of methods: creation and analysis. Creation involves synthesising polymers and
fabricating them into membranes used to produce envelopes and fabricate modules. Analysis involves
chemical characterisation of polymers, morphological examination of membranes, and performance
evaluation of membranes and modules. The arrows indicate the possible order of methods in a given
workflow but the starting point depends on a specific use case.
development initiatives, adopting LOT is the most suitable approach for our context. Moreover,
LOT’s focus on crafting ontologies and vocabularies for Linked Data generation makes it a
natural fit for our scope and ontology development process.
3.1.1. Requirements specification
The requirements formulation process relied on close collaboration among ontology developers,
domain experts, and future ontology users. The primary task for ontology developers was
to thoroughly familiarise themselves with the specifics of the scientific work in the field of
polymer membrane research. This was achieved through hybrid collaboration between domain
experts and ontology developers, including an on-site research stay where domain experts were
accompanied in their daily work. Besides direct conversations, domain data sources included
posters, paper laboratory notebooks, experiment protocols, and others. This collaboration
provided valuable insights into the procedures, intricacies of laboratory processes, and the typical
infrastructure found in such institutions. Given the nature of this field, the processing of both
physical data (e.g., substrates or laboratory materials) and electronic data (e.g., measurements
obtained from digital output devices) played a significant role. Particularly, aspects like data
storage and access were crucial from an RDM perspective. It was necessary to gather information
about processes and practices other than laboratory-related ones, including the document
circulation cycle, planning and preparation of laboratory processes, and the flow of information
within the organisation. The on-site stay was followed up by remote collaboration between
domain experts and ontology developers. Based on the acquired domain knowledge, use cases,
research applications, and competency questions were defined in accordance with the LOT
methodology. The PolyMat ontology emerged as a result of this collaboration.
3.1.2. Scope
First, ontology developers organised laboratory processes within the polymer membrane domain,
as shown in Figure 1, to define essential terminology and the structuring of the PolyMat ontology
at an early stage. Additionally, the ontology developers identified Chemotion and Herbie as
suitable ELNs for the research applications relevant in polymer membrane research due to
their ability to manage complex, multi-step processes with standardised protocols enabling
reproducibility and quality assurance. Herbie’s modular structure and lifecycle management
capabilities allow for comprehensive tracking of experiments, while Chemotion’s specialised
features for chemical documentation and data transfer support interdisciplinary work and
facilitate easy data sharing within the scientific community. Both systems provide the flexibility
and integration needed for efficient research in polymer membranes. The close collaboration
between developers ensures a seamless integration of both ELNs.
3.1.3. Use cases
The PolyMat ontology is intended to fulfil the following objectives and use cases:
#1 The primary aim of the PolyMat ontology is knowledge representation within the domain
of polymer membrane research.
#2 The ontology is designed to document scientific work and laboratory processes. This is
instrumental in promoting good practices in data management and advancing RDM.
#3 The integration of the PolyMat ontology is planned for the Herbie ELN, a system presently
undergoing development at Helmholtz-Zentrum Hereon.
#4 The PolyMat ontology is set to provide the basis for a future modelling of laboratory
processes, complementing another ontology currently in development.
3.1.4. Competency Questions
The initiation of the ontology implementation process involved the collaborative creation of a
set of Competency Questions (CQs). Following the visit, ontology engineers, aided by domain
experts, determined the scope and purpose of the ontology. This led to the selection of groups of
upper-level concepts that met the requirements. Using these concepts (e.g. experiment, method,
device, characteristic, person, data), 19 CQs were formulated. Several examples of these CQs
are presented below:
1 What polymers were used to fabricate a given membrane?
4 What method was used for a given polymer synthesis or membrane fabrication?
9 Who were the persons performing a given experiment?
10 What equipment was used in a given experiment?
11 What characteristics of polymers or membranes are recorded?
15 Where are the results of a given calculation stored?
18 When was an experiment request submitted?
The full list of the CQs is available in the Gitlab repository of the ontology1 . The CQs, integral
to the ontology development framework, played a pivotal role in shaping the definition of
1
https://gitlab.com/dlr-dw/poly-ontologies/polymat-ontology/-/blob/main/doc/competency_questions.md
necessary classes and properties. To systematically address each CQ, examples of use were
developed. These were later leveraged in the creation of an example dataset.
3.1.5. Ontology reuse
To enhance the interoperability of the PolyMat ontology with other domain-specific ontologies,
we adopted a soft reuse approach. This approach, as demonstrated by Poveda Villalón et al. [26],
involves referencing the IRIs of the reused ontology. The decision was made to minimise
unnecessary overhead, especially when importing ontologies that contain a substantial number
of concepts, not all of which may be directly applicable. This consideration was particularly
relevant for the reuse of ChEBI. Given that the PolyMat ontology is intended for implementation
in Herbie, which will be synchronised with Chemotion, a primary objective was to achieve a
high level of compatibility with Chemotion. Additionally, the reuse of other chemistry-specific
ontologies necessitated BFO compatibility, requiring the use of selected BFO classes. Further, the
incorporation of selected object properties from the OBO Relation Ontology (RO) [27] in PolyMat
was deemed essential. Since research data provenance documentation is a key requirement,
this was achieved through the reuse of selected parts of the PROV Ontology (PROV-O) [28].
Additionally, to facilitate the formalisation of measurements and their results, the Ontology of
Units of Measure (OM) [29] was incorporated.
3.2. Conceptualisation and implementation
This ontology was formulated in the OWL language utilising Protegé [30]. Based on under-
standing the domain, main entities and relationships were defined and subsequently organised
into a structured taxonomy. Properties and attributes of these entities were identified to cap-
ture essential characteristics. Furthermore, elements from existing ontologies were seamlessly
integrated, enhancing the ontology’s comprehensiveness and interoperability.
Validation of intermediate states as well as the final result took two forms: Firstly, at various
stages we discussed the current state of the ontology with domain experts to verify its content-
related correctness and with ELN developers to align it with their current development. Secondly,
we created a set of artificial test data to evaluate whether the ontology can accurately represent
the domain of polymer membranes. Adherence to data privacy and intellectual property
protection regulations necessitates the exclusion of authentic laboratory data. Instead, test
data was created by domain experts to emulate real laboratory experiences. These examples
serve as the foundation for manually constructing a knowledge graph that fulfils all aspects
outlined in the competency questions. It is important to emphasise that the knowledge graph
was exclusively developed for ontology evaluation purposes. To streamline the process, not
every potential connection for each individual was generated. Subsequently, SPARQL queries
were defined and evaluated based on the previously developed CQs. This helped not only to
confirm that the posed requirements could be fulfilled but also allowed to spot regressions
throughout the development similar to unit tests in software development.
Method Device Software
Module prov:used
Person
prov:wasAssociatedWith prov:wasAttributedTo
prov:used
Membrane Experiment prov:used InhouseSoftware
prov:generated
prov:used
Polymer
ro:has output ro:has output Calculation
ro:has output
Monomer
Quantity Data
ro:contains om:Quantity
om:has Value ro:located in
Substance ro: has characteristic
Measure om:Measure Location
Figure 2: Summary of the PolyMat ontology. Details omitted for readability. PolyMat classes are colored
in yellow and classes from re-used ontologies in green. Subclasses are indicated by a white-headed
arrow.
3.2.1. Documentation and publication
Besides the documentation inherently part of the ontology itself, we created a human-readable
documentation using WIDOCO [31]. The namespace of PolyMat, https://w3id.org/polymat/,
is relying on the services of w3id.org persistent IRIs. All intermediate results like CQs or
the aforementioned SPARQL queries as well as the final ontology are published at https://
gitlab.com/dlr-dw/poly-ontologies/polymat-ontology under a CC-BY 4.0 license. We welcome
contributions, comments, and other feedback via the corresponding issue tracker and are
committed to further maintaining and advance the ontology.
4. The PolyMat ontology
In polymer membrane research, specialised terminology is used to describe various concepts
and characteristics associated with membrane materials, fabrication, devices, and systems. The
core components of terminology include structure and properties of monomers, polymers and
membranes, creation techniques of polymers, modules and envelopes, and characterisation
methods of all the polymer-based elements for membrane technology. Figure 2 illustrates the
most significant elements of the PolyMat ontology structure (selected elements and relations
have been omitted, despite their discussion in the text, for better readability)2 .
At the core of the ontology is pmat:Experiment characterised by the pmat:Persons
2
Here and in the following examples, we use the following namespace prefixes: chebi: ; om: ; pmat: ;
prov: ; rdfs: ; ro: ; xsd:
involved, the tools used (e.g., pmat:Software, pmat:Device, or pmat:Method), the actual
objects of interest, i.e. pmat:Polymers and pmat:Monomers, and the measured characteristics
being recorded in form of om:Quantity. Further, instances of pmat:Method describe the
underlying workflow of experiments as well as the om:Quantitys being involved.
[a pmat:Experiment] prov:used [a pmat:Device] ;
prov:used [a pmat:Method] ;
prov:used [a pmat:Monomer] ;
prov:generated [a pmat:Polymer] ;
ro:has_output [a om:Quantity] .
A pmat:ExperimentRequestSubmission precedes the execution of a
pmat:Experiment and encapsulates the planning phase of actual experiments. This doc-
uments a rather administrative process but provides a link to resources residing in different
systems. A pmat:ComputationalModel can participate in modelling in two ways. Initially,
the model can be utilised in the preparation of an experiment scenario as depicted in Figure 2.
Secondly, the application of PROV-O via prov:used facilitates indicating whether the model
was actively employed during the execution of a pmat:Experiment.
[a pmat:ExperimentRequestSubmission]
ro:is_basis_for_realizable [a pmat:Experiment] .
[a pmat:ComputationalModel]
ro:has_role_in_modelling [a pmat:Experiment] .
[a pmat:Experiment]
prov:used [a pmat:ComputationalModel] .
pmat:Polymer, pmat:Monomer, and pmat:Membrane and their respective subclasses are
the main objects of interest in experiments. Their chemical relationships are represented using
ro:contains. Instances are further described by possible additional physical features given
by instances of om:Quantity or links to external databases, e.g., via pmat:hasCASNr.
[a pmat:Polymer] ro:contains [a pmat:Monomer] .
[a pmat:Membrane] ro:contains [a pmat:Polymer] .
pmat:Copolymer rdfs:subClassOf pmat:Polymer .
pmat:MembraneForLiquids rdfs:subClassOf pmat:Membrane .
pmat:FlatSheetEnvelopeModule rdfs:subClassOf pmat:Module .
[a pmat:Membrane] ro:has_characteristic [a om:Quantity] .
[a pmat:Monomer] pmat:has_CAS_nr "XXX-XX-X"^^xsd:string .
Physical features (om:Quantity) can often not be measured directly but are the results of
more or less complex calculations. This fact is represented by instances of pmat:Calculation
and its relation to the resulting instances of om:Quantity. The software used to execute
those calculations is documented in instances of pmat:Software. More details can be in-
cluded via, e.g., pmat:hasVersion or pmat:usesRuntimeEnvironment. In general, results
of pmat:Calculation will also include other outputs represented by instances of pmat:Data.
[a pmat:Calculation] ro:has_output [a om:Quantity] ;
ro:has_output [a pmat:Data] ;
prov:used [
a pmat:Software ;
pmat:hasVersion "1.2.3" ;
pmat:usesRuntimeEnvironment [
a pmat:RuntimeEnvironment
]
] .
For the resulting instances of pmat:Data, the location, both physical and digital, is defined
via ro:located_in. This especially considers cases when results are exclusively stored on
offline media or can not be accessed via generic interfaces. While such cases are slowly fading
out, it is still an important use case to consider.
[a pmat:Data] ro:located_in [a pmat:Location] .
Part of the provenance record are also the people involved both within the experiments as
well as the software development at least for tools maintained in-house. Their contributions are
encoded using the PROV-O vocabulary using prov:Attribution and prov:Association.
[a pmat:InhouseSoftware] prov:wasAttributedTo [a pmat:Person] .
[a pmat:Experiment] prov:wasAssociatedWith [a pmat:Person] .
Kindly note that the overview of Figure 2 omits large parts of the details modelled in Poly-
Mat. Especially class hierarchies have been omitted for readability’s sake. Examples include
the above-mentioned hierarchy of pmat:Polymer, pmat:Monomer, pmat:Membrane, and
pmat:Module but extend to other areas like pmat:Software and pmat:Calculation.
Proxy classes (e.g., pmat:ChemicalEntity) allow to reuse ontologies like ChEBI represent-
ing substances to provide more detail on some aspects. However, this approach does not make
any assumptions on which ontology is used to provide the corresponding entities.
4.1. Examples of use
We generated examples of use, which formed the foundation for producing test data3 . This
sample data serves two purposes: Firstly, it documents the intended use of the ontologies by
providing examples for common scenarios. Secondly, together with SPARQL queries4 for each
CQ, it allowed us to validate the ontology at several stages. The results of each SPARQL query
were assessed with respect to their expected completeness and accuracy. This process was
repeated for every example of use (and their respective SPARQL queries) after completing the
knowledge graph to validate the ontology. Consequently, we also provide examples for all CQs
alongside the ontology5 . An example is illustrated in Figure 3.
3
https://gitlab.com/dlr-dw/poly-ontologies/polymat-ontology/-/blob/main/data/example_data.ttl
4
https://gitlab.com/dlr-dw/poly-ontologies/polymat-ontology/-/blob/main/doc/queries.md
5
https://gitlab.com/dlr-dw/poly-ontologies/polymat-ontology/-/blob/main/doc/competency_questions.md
Experiment CationicPolymerisation
prov:used synthesis32
chebi:Isobutylene-isoprene copolymer
experiment43
prov: generated butylRubber1 BlockCopolymer
Figure 3: CQ#4. What method was used for a given polymer synthesis or membrane fabrication?
Answer: The experiment involves the use of cationic polymerization as the method to synthesise a
polymer instance known as butyl rubber.
5. Discussion
We briefly examine the lessons learned from the ontology development process, the impact of
this work, its application, and future directions.
Ontology development process. Knowledge representation and adapting modelling for
implementation in ELNs posed significant challenges. First, the quality, reusability, and potential
harmonisation of existing ontologies are critical considerations. This caused a divergence from
the EMMO model and leaning more towards ontologies from chemistry as they exhibit better
compatibility and less complex structures. The latter was crucial to ensure the active involvement
of domain experts. Here, the support for an agile, iterative approach, particularly concerning
ontology reuse, of LOT fulfils these requirements. Next, existing ontologies were often very
complex hindering their use in ELNs. Even the ontology implementation in Chemotion initially
limits the number of concepts before allowing users to annotate their data.
Scientific impact. In the context of polymer membrane research, existing ontologies often
suffer from limitations in quality, completeness, and interoperability, as revealed during the
development of PolyMat. Additionally, in this domain, the description of knowledge and experi-
mental procedures are closely intertwined. To ensure a model that is relatively user-friendly, we
decided against detailed separation of concepts, as seen in EMMO or other typically modular
ontologies. However, the need to integrate descriptions and other laboratory processes led to
the development of a separate ontology (to be published soon) solely for modelling laboratory
procedures at the project’s inception. Both ontologies are designed to be mutually compatible
and incorporate data provenance. After sharing our concept with other MSE laboratories, we
inspired similar approaches to develop their own modules representing domain knowledge
that can be aligned with laboratory process modelling. Specific use cases will be detailed in
a paper currently in preparation. The adoption of PROV-O contributes to the reproducibility
of experimental results. Moreover, employing the reference model description of provenance
ensures interoperability across different domains.
Application and future directions. The integrated development of ELN and ontology, as
exemplified by Herbie and PolyMat, enables adjustments to be made cohesively before the
testing or implementation phase and serves multiple purposes. First, the ontology will be
used to generate SHACL shapes, facilitating the automated creation of forms within the ELN.
Second, it involves the semantic annotation of (meta)data entries within the ELN forms. This
is especially crucial for free-text entries, which are typically more challenging for machines
to comprehend. With PolyMat, scientific resources are semantically correctly described. This
semantic annotation is particularly valuable for future records of laboratory protocols, as it
supports text mining applications. Third, irrespective of the data type, the implementation
involves metadata enrichment of records from specific experiments. This enrichment aims to
contextualise them within a broader framework by fostering more efficient interconnection
between specific fields of ELN forms.
While this approach may be limited to a particular setup, it allows for necessary adjustments
to both ELN and ontology simultaneously. As the integration of ontologies in ELNs is currently
under intense development, diverse coupling mechanisms for different ELNs may arise. Still,
the semantic structure provided by ontologies ensures the interoperability of (meta)data from
two ELNs when adhering to the same ontology. Future plans include expanding the existing
knowledge graph with data from use cases across diverse institutions within the same domain.
PolyMat, as a pioneer ontology tailored for ELNs, will be reused to model knowledge of more
membrane researcher groups. Therefore, fostering more efficient interconnection between
specific fields of forms. It will establish semantically annotated connections between records of
laboratory activities to enhance the reproducibility and queriability of results.
6. Summary and conclusion
We presented PolyMat, an ontology for polymer membrane research specifically designed
for use within electronic laboratory notebooks (ELNs). It allows documenting laboratory
experiments and thus represents a building block towards the further FAIRification of the
domain. The development has been conducted in close collaboration with both domain experts
and practitioners as well as the developers of a specific ELN, Herbie. We provided a detailed
account of its development including Competency Questions as well as SPARQL queries to verify
the ontology’s comprehensiveness and suitability. All resources are published under a permissive
license and are publicly available under both the namespace URL, https://w3id.org/polymat/,
and the corresponding development repository, https://gitlab.com/dlr-dw/poly-ontologies/
polymat-ontology. The ontology is accessible through the TiB Terminology Service, https:
//terminology.tib.eu/ts/ontologies/pmat. We are further committed to advance the ontology and
continue to adapt it to emerging needs especially in context of the continuous spread of ELNs.
Acknowledgments
MD acknowledges the Helmholtz Information & Data Science Academy (HIDA) for their financial
support enabling a short-term research stay at the Institute of Membrane Research of the
Helmholtz-Zentrum Hereon in Geesthacht to get familiar with the domain and create PolyMat.
Acknowledgements are also due to over a dozen domain experts whose work MD could observe
and who served as points of contact for laboratory work and other processes. MD and MH
thank Fabian Kirchner for rich discussions on the integration into the Herbie ELN.
References
[1] V. Abetz, T. Brinkmann, M. Dijkstra, K. Ebert, D. Fritsch, K. Ohlrogge, D. Paul, K.-V.
Peinemann, S. Pereira-Nunes, N. Scharnagl, M. Schossig, Developments in Membrane Re-
search: from Material via Process Design to Industrial Application, Advanced Engineering
Materials 8 (2006) 328–358. doi:10.1002/adem.200600032.
[2] V. Abetz, Isoporous block copolymer membranes, Macromolecular rapid communications
36 (2015) 10–22. doi:10.1002/marc.201400556.
[3] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak,
N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes,
T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez-
Beltran, A. J. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. ’t Hoen, R. Hooft, T. Kuhn,
R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson, P. Rocca-Serra,
M. Roos, R. van Schaik, S.-A. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M. A.
Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wit-
tenburg, K. Wolstencroft, J. Zhao, B. Mons, The FAIR Guiding Principles for scientific data
management and stewardship, Scientific Data 3 (2016). doi:10.1038/sdata.2016.18.
[4] G. Goldbeck, E. Ghedini, A. Hashibon, G. Schmitz, J. Friis, A reference language and
ontology for materials modelling and interoperability, in: Proceedings of the 2019 NAFEMS
World Congress, 2019.
[5] A. De Baas, P. D. Nostro, J. Friis, E. Ghedini, G. Goldbeck, I. M. Paponetti, A. Pozzi, A. Sarkar,
L. Yang, F. A. Zaccarini, D. Toti, Review and Alignment of Domain-Level Ontologies for
Materials Science, IEEE Access 11 (2023) 120372–120401. doi:10.1109/ACCESS.2023.
3327725.
[6] S. Kim, P. A. Thiessen, E. E. Bolton, J. Chen, G. Fu, A. Gindulyte, L. Han, J. He, S. He, B. A.
Shoemaker, J. Wang, B. Yu, J. Zhang, S. H. Bryant, PubChem Substance and Compound
databases, Nucleic Acids Research 44 (2015) D1202–D1213. doi:10.1093/nar/gkv951.
[7] P. G. Dittmar, R. E. Stobaugh, C. E. Watson, The Chemical Abstracts Service Chemical
Registry System. I. General Design, Journal of Chemical Information and Computer
Sciences 16 (1976) 111–121. doi:10.1021/ci60006a016.
[8] H. E. Pence, A. Williams, ChemSpider: An Online Chemical Information Resource, Journal
of Chemical Education 87 (2010) 1123–1124. doi:10.1021/ed100697w.
[9] D. Weininger, SMILES, a chemical language and information system. 1. Introduction to
methodology and encoding rules, Journal of Chemical Information and Computer Sciences
28 (1988) 31–36. doi:10.1021/ci00057a005.
[10] S. Heller, A. McNaught, S. Stein, D. Tchekhovskoi, I. Pletnev, InChI - the worldwide
chemical structure identifier standard, Journal of Cheminformatics 5 (2013). doi:10.1186/
1758-2946-5-7.
[11] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L. J. Goldberg, K. Eilbeck,
A. Ireland, C. J. Mungall, N. Leontis, P. Rocca-Serra, A. Ruttenberg, S.-A. Sansone, R. H.
Scheuermann, N. Shah, P. L. Whetzel, S. Lewis, The OBO Foundry: coordinated evolution
of ontologies to support biomedical data integration, Nature Biotechnology 25 (2007)
1251–1255. doi:10.1038/nbt1346.
[12] P. Strömert, J. Hunold, A. Castro, S. Neumann, O. Koepler, Ontologies4Chem: the landscape
of ontologies in chemistry, Pure and Applied Chemistry 94 (2022) 605–622. doi:10.1515/
pac-2021-2007.
[13] K. Degtyarenko, P. de Matos, M. Ennis, J. Hastings, M. Zbinden, A. McNaught, R. Alcantara,
M. Darsow, M. Guedj, M. Ashburner, ChEBI: a database and ontology for chemical entities
of biological interest, Nucleic Acids Research 36 (2007) D344–D350. doi:10.1093/nar/
gkm791.
[14] P. Tremouilhac, C. Lin, P. Huang, Y. Huang, A. Nguyen, N. Jung, F. Bach, R. Ulrich,
B. Neumair, A. Streit, S. Bräse, The Repository Chemotion: Infrastructure for Sustainable
Research in Chemistry, Angewandte Chemie International Edition 59 (2020) 22771–22778.
doi:10.1002/anie.202007702.
[15] C. Batchelor, CHMO – Chemical Methods Ontology, 2019. URL: https://github.com/
rsc-ontologies/rsc-cmo.
[16] C. Batchelor, RXNO: reaction ontologies, 2020. URL: https://github.com/rsc-ontologies/
rxno.
[17] P. Tremouilhac, P.-C. Huang, C.-L. Lin, Y.-C. Huang, A. Nguyen, N. Jung, F. Bach, S. Bräse,
Chemotion Repository, a Curated Repository for Reaction Information and Analyti-
cal Data, Chemistry–Methods 1 (2021) 8–11. doi:https://doi.org/10.1002/cmtd.
202000034.
[18] N. Brandt, L. Griem, C. Herrmann, E. Schoof, G. Tosato, Y. Zhao, P. Zschumme, M. Selzer,
Kadi4Mat: A Research Data Infrastructure for Materials Science, Data Science Journal 20
(2021). doi:10.5334/dsj-2021-008.
[19] N. Garabedian, I. Bagov, K. Weber, C. Greiner, B. Klusemann, F. Bock, M. Held,
F. Wieland, C. Eschke, MetaCook: FAIR Vocabularies Cookbook, 2022. doi:10.5281/
ZENODO.7125643.
[20] I. Bagov, M. Flachmann, N. Garabedian, T. Tiezema, Y. Li, J. Rau, I. Blatter, A. Dollmann,
M. Seitz, C. Greiner, Vocabulary of Materials Tribology Lab at KIT, 2023. doi:10.5281/
ZENODO.7709546.
[21] N. Garabedian, I. Bagov, TriboDataFAIR Ontology, 2023. doi:10.5281/ZENODO.5720197.
[22] N. Garabedian, P. J. Schreiber, N. Brandt, P. Zschumme, I. L. Blatter, A. Dollmann, C. Haug,
D. Kümmel, Y. Li, F. Meyer, C. E. Morstein, J. S. Rau, M. Weber, J. Schneider, P. Gumbsch,
M. Selzer, C. Greiner, Generating FAIR research data in experimental tribology, Scientific
data 9 (2022). doi:10.1038/s41597-022-01429-9.
[23] N. Brandt, N. Garabedian, E. Schoof, P. J. Schreiber, P. Zschumme, C. Greiner, M. Selzer,
Managing FAIR Tribological Data Using Kadi4Mat, Data 7 (2022) 15. doi:10.3390/
data7020015.
[24] F. Kirchner, C. Eschke, A.-L. Höhme, M. Meller, A. Foremny, M. Held, S. A. Sahim,
R. Willumeit-Römer, Herbie - The Semantic Laboratory Notebook & Research Database.,
2024. doi:10.5281/zenodo.12205430.
[25] M. Poveda-Villalón, A. Fernández-Izquierdo, M. Fernández-López, R. García-Castro, LOT:
An industrial oriented ontology engineering framework, Engineering Applications of
Artificial Intelligence 111 (2022) 104755. doi:10.1016/j.engappai.2022.104755.
[26] M. Poveda Villalón, M. C. Suárez-Figueroa, A. Gómez-Pérez, The Landscape of Ontology
Reuse in LinkedData, in: Proceedings Ontology Engineering in a Data-driven World
(OEDW 2012), 2012.
[27] C. Mungall, J. A. Overton, D. Osumi-Sutherland, M. Haendel, Mbrush, RO, 2015. URL:
http://obofoundry.org/ontology/ro.html. doi:10.5281/zenodo.32899.
[28] T. Lebo, S. Sahoo, D. Mcguinness, K. Belhajjame, J. Cheney, D. Corsar, D. Garijo, S. Soiland-
Reyes, S. Zednik, J. Zhao, PROV-O: The PROV ontology, 2013.
[29] H. Rijgersberg, M. van Assem, J. Top, Ontology of units of measure and related concepts,
Semantic Web 4 (2013) 3–13. doi:10.3233/sw-2012-0069.
[30] M. A. Musen, The protégé project: a look back and a look forward, AI Matters 1 (2015)
4–12. doi:10.1145/2757001.2757003.
[31] D. Garijo, WIDOCO: a wizard for documenting ontologies, in: International Semantic Web
Conference, Springer, Cham, 2017, pp. 94–102. doi:10.1007/978-3-319-68204-4_9.