<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>EFive: An Ontology Design Pattern for Representing Statistical Variation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David K. Kedrowski</string-name>
          <email>david.kedrowski@maine.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Torsten Hahmann</string-name>
          <email>torsten.hahmann@maine.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing and Information Science, University of Maine</institution>
          ,
          <addr-line>Orono, ME 04469</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Our world is replete with variation; however, formal logic is not well-suited to representing and reasoning about statistical summaries and variation. The EFive ontology design pattern (ODP) addresses this gap by providing a formal structure for representing five-number summaries from exploratory data analysis, along with two key measures of variation: the interquartile range (IQR) and the values beyond which outliers lie. To enable querying and reasoning over these summaries, EFive is encoded entirely in OWL2.</p>
      </abstract>
      <kwd-group>
        <kwd>ontology</kwd>
        <kwd>ontology design pattern</kwd>
        <kwd>knowledge graph</kwd>
        <kwd>statistical summarization</kwd>
        <kwd>variation</kwd>
        <kwd>five-number summary</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Most everything in the world around us exhibits some level of variability. That is, things are rarely
identically the same - they vary, often across space and/or time.</p>
      <sec id="sec-1-1">
        <title>Variation refers to, for example, the</title>
        <p>
          diferences we see in the definitions of concepts, among the members of classes, or when measuring
attributes that exhibit variability. This distinction between variability and variation is from Reading
and Shaugnessy [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Probability can be used to describe events that exhibit variability, like in games of
chance where there is variation in the results.
        </p>
        <p>
          Much of statistics involves working with variation – according to [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], it “is at the heart of statistics.”
When we have a dataset, it is helpful to know how much variation is present (think of the example in
nearly every introductory statistics textbook where a measure of variation is used to show a distinction
between two datasets with identical measures of central tendency). Variation makes modeling hard
because data rarely (if ever) matches any function exactly [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          Variation can be leveraged to do much more. Hahmann and McIlraith [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] provide six areas where
knowledge of variation can be useful: object classification tasks, purely logical reasoning, statistical
querying, object retrieval tasks, inductive reasoning, and clustering. These tasks are aided by knowledge
of how things within a class or between classes are similar (low variation) and how they difer (high
variation). Many of these tasks are related to work in machine learning (ML) and artificial intelligence
(AI) where variation plays a crucial role in, e.g., evaluation, classification, and clustering.
        </p>
        <sec id="sec-1-1-1">
          <title>Overall Objective</title>
          <p>
            Ontological languages based on first-order logic (FOL) or description logic (DL)
subsets, which form the basis of OWL2, are not well-suited for representing variation. Our goal is to
develop EFive as an ontology design pattern (ODP) [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] that (1) can succinctly represent variation in the
spirit of five-number summaries, (2) can be implemented in OWL2 to represent variation, and (3) can
facilitate basic reasoning about/with variation. The EFive ODP is not intended as a domain ontology
Proceedings of FOIS 2025 Satellite events co-located with the 15th International Conference on Formal Ontology in Information
          </p>
          <p>CEUR
Workshop</p>
          <p>ISSN1613-0073
about statistics; rather, it is meant to be used with existing ontologies in any domain that considers
variation to facilitate the explicit representation of variation therein. However, this leaves open an
important question: what do we mean by variation?</p>
          <p>To answer that question, we first note that nearly all of the eforts to explicitly capture uncertainty—a
concept that is distinct from but closely related to variability—in ontological representations take a
Bayesian probabilistic approach. A hallmark of this approach is the focus on subjective probability and
“degrees of belief.” Work with uncertainty typically attempts to quantify the likelihood a given statement
is true (see Section 2.1), while variation is about how the values in a dataset are dispersed. Therefore, for
specifically representing variation we choose to take a frequentist, or empirical, approach. Empiricism,
because it is rooted in data, is the natural approach for this work but presumes that the developed
ODP will be used in practice with a domain ontology that encompasses both a TBox (terminology and
axioms) and an ABox (i.e. data) in order to represent variation in the data and reason with it.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Two distinct bodies of knowledge inform our work on the EFive ODP: one focusing on unifying logic and
probability with the goal of representing and reasoning about uncertainty in knowledge more generally
and the second aiming to capture statistical knowledge in some fashion within OWL2 ontologies and
knowledge graphs (KGs) that incorporate them. We also briefly consider datacubes.
2.1. Unifying Logical and Statistical Representations of Knowledge
Eforts to represent and reason with variation appear to be absent from the literature on logic-based
representations of knowledge. The closest eforts, seeking to unify first-order logic or description logics
with probability-based representations of uncertainty,1 date back to at least the 1950s. The emphasis
for much of the work in this area is on representing and reasoning about uncertainty using subjective
probability. A common example in the literature involves the statement “This bird can fly.” While a
statement like “All birds can fly” is logically false, “This bird can fly” will be false only some of the
time and is dependent on the specific bird in question. A (subjective) probability can be applied to this
statement, essentially assigning it a truth value somewhere in the interval [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ].
      </p>
      <p>
        There are many extensions of Nilsson’s [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] influential framework for embedding probability within
logic, including work with nonmeasurable probability events [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], reference classes [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], and
Bayesian [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and multi-entity Bayesian networks [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Other work has focused on probabilistic
extensions of DLs [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 14, 15</xref>
        ]. We found only a single work [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] that has considered empirical
probabilities instead of subjective probabilities; however, we note that its focus was still on reasoning with
probabilities and did not extend to other statistical concepts such as variation.
      </p>
      <p>
        There have been a variety of eforts to extend OWL and OWL2 to work with probability [
        <xref ref-type="bibr" rid="ref17 ref18 ref19 ref20 ref21">17, 18, 19,
20, 21</xref>
        ] – again for the purpose of reasoning with and about uncertainty. They mostly difer in how
probabilities are represented in OWL, whether or not a Bayesian network is stored in OWL, and the
semantics that can be stored within a Bayesian network. In general, much of the reasoning needs to
be done outside purely logical reasoners as opposed to our desire to reason with and about variation
within the confines of a logical reasoner.
2.2. Ontologies for Representing Statistical Knowledge
A variety of ontologies and ontology design patterns (ODPs) incorporate statistical concepts in some
fashion. They can generally be classified as defining statistical concepts and procedures, dealing with
values and units, or defining data structures that facilitate data analysis (see Section 2.3).
      </p>
      <p>
        Statistical concepts include measures of variation, measures of central tendency, measures of position,
and many more. Procedures are algorithms for calculating values that correspond to these concepts.
1Though related, variation and uncertainty are diferent concepts [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Ontologies like GovStat [22], the Ontology for Biomedical Investigations (OBI) [23], The Information
Artifact Ontology (IAO) [24], the Ontology of Biological and Clinical Studies (OBCS) [25], and the
Statistical Methods Ontology (STATO) [26] all represent many statistical concepts hierarchically and
generally capture both concepts and procedures. They tend to sufer from incompleteness (e.g., GovStat
focused only on concepts found on US government websites), atypical naming conventions (e.g., OBI uses
the ambiguous names center value and average value for the specific concepts of median and arithmetic
mean, respectively), and problematic hierarchies (e.g., OBCS classifies the descriptive statistics percentile
and interquartile range as inferential statistics). These and similar issues, along with their focus on
concepts and procedures as opposed to the use of aggregation results, make them unsuitable for reuse
in this work.</p>
      <p>Statistical calculations result in values which often have an accompanying unit. The ontologies
mentioned so far have limited capacity for representing values, units, and related concepts. For that we
consider ontologies like the Ontology of units of Measure (OM) 1.8 [27], the Extensible Observation
Ontology (OBOE) [28], and the Quantities, Units, Dimensions and Types (QUDT) [29] ontology, as well
as the Spatial and Temporal Aggregate Data (STAD) [30, 31] ODP. A common feature is a focus on
quantitative measurements (especially in OM and OBOE). While QUDT allows for qualitative values, it
is still focused on quantities as evidenced by its focus on quantity kinds. We find STAD most useful for
two primary reasons: it extends the notion of quantity kind to the more general quality kind (which
encompasses both qualitative and quantitative kinds) and it diferentiates between observed data values
and calculated (aggregate) data values, the latter of which include statistical values. STAD also allows
for the representation of underlying datasets and their descriptions as well as algorithms (and their
implementations and executions) used for producing aggregates. These concepts are crucial to the
appropriate reuse of aggregated values.
2.3. Datacubes
Datacubes are an altogether diferent approach for structuring data storage to support statistical queries
of datasets. The general notion of a datacube does not include semantics; however, The RDF Data
Cube Vocabulary [32] (QB) provides a pattern for representing multi-dimensional datasets within an
OWL2-based KG where it is also possible to capture semantics. QB4OLAP [33] is an extension of QB
for integrating data cubes with the online analytical processing (OLAP) [34] model. However, it takes
considerable planning to implement a data cube structure, so it is better undertaken when first creating
a KG rather than attempting to adapt a KG later. We see this as a disadvantage to this approach.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Example Use Cases</title>
      <p>To guide the development of and to evaluate the EFive ODP, we rely on two use cases. The first use case
involves a synthetic dataset simulating a fictitious medical practice with approximately 100 patients. It
includes attributes such as age, height, weight, gender, educational attainment, body temperature, eye
color, and patient addresses. This dataset has proven useful for testing EFive’s core ideas on statistics
across the entire dataset and within subsets defined by individual attributes (e.g., gender, patient type,
or zip code) or combinations thereof.</p>
      <p>The second use case draws from the Safe Agricultural Products and Water Graph (SAWGraph)
[35, 36], an NSF Proto-OKN project. SAWGraph currently contains about 1.5 billion triples, with
ongoing expansion. Its data includes extensive records about PFAS environmental testing, including
observations and measurements represented using the Contaminant Observations and Samples Ontology
(ContaminOSO) [37]. Some of the capabilities of EFive have been motivated by anticipated needs within
SAWGraph. For example, stakeholders from the EPA and other government agencies have highlighted
the challenges posed by so-called “non-detects”—measurements that fall below detection or quality
assurance thresholds but do not necessarily indicate the complete absence of the tested contaminant.
These common cases complicate data analysis and prompted the recommendation of non-parametric
methods, such as the five-number summary, for more robust statistical summarization. Although EFive
has not yet been applied to SAWGraph data, we expect to use it for future testing and evaluation of the
pattern at scale.</p>
      <p>Examples of specific questions relevant to the two use cases help illustrate the kinds of knowledge the
pattern should represent and the types of questions it should help answer. They suggest the following
broader categories of questions:
• How does the variation in data for A compare with the variation in B?
– How does the IQR for adult male patient height compare with the IQR for adult females in a
medical practice? for adult males vs. females living in the 04411 zip code?
– Do private wells in Illinois show more/less variation in PFOA levels than in Ohio? in 2024?
• Is a specific data value an outlier ?
– In a medical practice, is a 57 inch tall male adult an outlier? among men who are overweight
(BMI &gt; 25)?
– In the state of Maine, is a public water supply PFOS level of 500 ng/g an outlier? in Hancock</p>
      <p>County? for public water supplies within one mile of a known PFOS source?
• Is a specific data value typical? (Define typical as in the middle 50%.)
– In a medical practice, is 52 in. a typical height for a 10 year old male? in the 04411 zip code?
– In Kansas, is an aquifer measurement of 25 ng/g PFOS typical? for the Ozark Aquifer?
• Into which quartile does a given data value fall?2
– In a medical practice, in which quartile does a 100 kg male fall? a 100 kg male among males
aged 40-50?
– In the state of Maine, in which quartile does a 25 ng/g PFOS level for a private well fall? a
25 ng/g PFOS level among shallow (less than 100 ft deep) wells?</p>
      <p>More broadly, the EFive ODP is expected to be applicable across a wide range of applications,
including business intelligence (e.g. comparison of diferent classes of customers, products or locations),
environmental AI (e.g. identification of locations with similar ecological or climate patterns), census data
(e.g. aggregation and comparison of locations by demographics), and health and biomedical applications
(e.g. comparison of patient populations and responses to treatment). Once information from any of these
areas is represented via a domain ontology, the inclusion of EFive would add a way to store precomputed
measures of variation with the ontology or a knowledge graph (KG) based on it. This added statistical
knowledge can then be accessed like any other data from the graph, facilitating queries that, for example,
compare the variability of disjoint subsets of a dataset against each other (e.g., county-by-county or zip
code-by-zip code within a state). And once something is explicitly represented in an ontology or a KG
it can be reasoned over as well.</p>
      <p>The work with these use cases has helped define requirements that inform our choice of approach and
the five-number summary. Most importantly, the pattern should represent a non-parametric measure
of statistical variation. It should explicitly represent datapoints as statistical aggregates. As shown in
the questions above, the pattern should support questions that compare variation across and within
classes, allow for the classification of particular datapoints as outliers or as typical values, and indicate
the approximate position of a datapoint within a distribution. Further, the pattern should be reasonably
simple to apply to existing ontologies and KGs and not require changes to underlying ontologies or
datasets.
2A five-number summary creates four intervals: [min,  1], [ 1, median], [median,  3], and [ 3, max] which are often referred
to as the first, second, third, and fourth quartiles, respectively.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Approach</title>
      <p>Ultimately, the EFive ODP is intended to be implemented in OWL2-based KGs where querying and
reasoning over aggregate statistics—focusing on variation—is desired. Measures of variation are
generally expected to be calculated over a given class or over disjoint partitions of a class (e.g., private
water wells partitioned by counties within a given state or by depth). They can be used for comparisons
across classes as well as within classes. Our approach centers on three key components: the adoption
of the five-number summary as a statistical foundation, the reuse of existing ontology design patterns
to represent aggregates, and an emphasis on interoperability with domain ontologies.
Five-Number Summary Typical introductory statistics courses take the frequentist approach and
include the range, variance, standard deviation, and interquartile range (IQR) as measurements of
variation. Motivated by the use cases and because we purposely seek a representation that is broadly
applicable and agnostic toward the type of distribution a dataset may have, we choose to work with the
ifve-number summary of exploratory data analysis [ 38] as a simple statistical summary of data. A
fivenumber summary consists of the minimum, first quartile (  1), median, third quartile ( 3), and maximum
values for a dataset. As such, it provides a simple yet powerful non-parametric statistical summary that
includes not only a measure of central tendency (the median) but also two simple measures of variation:
the range (range = max − min) and the interquartile range (IQR =  3 −  1).</p>
      <p>Reuse of STAD and QUDT Another critical element of our approach is the reuse of the Spatial and
Temporal Aggregate [30] ontology design pattern (STAD) as well as the Quantities, Units, Dimensions
and Types [29] ontology (QUDT). A five-number summary is a set of aggregate statistics about a dataset
and STAD extends QUDT for the express purpose of better representing aggregate statistics. As we
discuss in more detail in Sections 5.2, 5.3, and 5.5, STAD provides useful concepts for representing
not only statistical aggregate values but also their base datasets and the algorithm(s) involved, which
facilitate semantic integration and comparison of data across datasets.</p>
      <p>Interoperability with Domain Ontologies As discussed in Section 2.3, datacubes are an example of
an existing approach for representing data that supports a wide-variety of statistical analyses. However,
datacubes require the data to be formatted in specific ways to define the structure of the cube, which
can make adapting existing ontologies and datasets dificult and time-consuming. Conversely, the
EFive ODP has been designed to co-exist with existing ontologies and their associated KGs. While it is
important to know the structure of an ontology to use its associated data with EFive, it is not necessary
to change the structure of or design the ontology in any particular way.</p>
    </sec>
    <sec id="sec-5">
      <title>5. The EFive ODP</title>
      <p>We now present how we model ontologically five-number summaries by the Extended Five-Number
Summary (EFive) ontology design pattern (ODP). We first provide a more in-depth review of the
fivenumber summary in Section 5.1. We introduce its conceptualization as an ODP in Section 5.2, and then
generalize and extend the ODP—thus the “Extended” in its name—in Sections 5.3 and 5.4, respectively,
before discussing more tangential aspects arising from its integration with the STAD ODP.
5.1. The Five-Number Summary
The five-number summary, introduced by Tukey in 1977 [ 38], is a widely accepted summarization
method within descriptive statistics. The measures that constitute a five-number summary are useful
on their own or for other statistical analyses like comparison of central tendency and spread, outlier
detection (“outside“ and “far out“ per Tukey), and spatial techniques such as median polish [39].</p>
      <p>The five-number summary of a dataset includes the minimum, first quartile (  1), median (second
quartile or  2), third quartile ( 3), and maximum values from the dataset. The median, a measure of
central tendency, is determined by putting the data values in ascending order and selecting the middle
value. The median therefore bisects the data into two smaller datasets of equal size. We get  1 if we
repeat this process on the dataset values to the left of the median and  3 if repeated on the values to the
right of the median. The original dataset is now subdivided into four parts, each containing essentially
the same number of values (within 1), where min ≤  1 ≤ median ≤  3 ≤ max.</p>
      <p>
        For example, consider the ordered dataset [
        <xref ref-type="bibr" rid="ref10 ref14 ref18 ref9">9, 10, 14, 18, 31, 42, 42, 43, 65, 72, 76</xref>
        ]. Its five-number
summary is shown in Figure 1 in the form of a box-and-whisker plot. It has a minimum of 9 and maximum
of 76. As visualized in Figure 2, the median value is 42 with 5 values smaller and larger, while the median
of the first half is 14 and that of the second half is 65, which are the values for  1 and  3, respectively.3
      </p>
      <p>These five values allow us to capture variation in at least three diferent ways: range, interquartile
range (IQR), and outliers. The range is simply the diference between the maximum value and the
minimum value (max − min). Since it is generally considered to be of limited use, particularly because
it is strongly influenced by outliers [ 40], we choose not to include it explicitly in this work. Instead, we
rely on the IQR as our principle measure of variation. It is the diference between the third and first
quartiles: IQR =  3 −  1. Outliers have little to no influence on this value, making it a much more
useful and robust comparator across datasets. It provides a sense of the spread of the data values as well
as a glimpse into their distribution. The IQR also supports a standard method for detecting outliers that
labels any data value less than  1 − 1.5 ⋅ IQR or greater than  3 + 1.5 ⋅ IQR as an outlier. Identifying
outliers this way can enhance our understanding of variation and can reveal important characteristics,
such as skewness or long tails, in the distribution of the dataset. For example, consider Figure 1. Recall
that each of the four segments of the plot contains ∼25% of the dataset and note how values “clump” in
the first and third segments, while the second and fourth segments show greater spread, indicating an
uneven distribution across the entire dataset. As an historical aside, Tukey referred to the values at
 1 − 1.5 ⋅ IQR and  3 + 1.5 ⋅ IQR as inner fences. We simply refer to them as fences as we do not model
outer fences in the ODP.</p>
      <p>We have chosen not to use standard deviation (and, by extension, variance) as a measure of variation.
We do so for three reasons: IQR is simpler to calculate and understand, it does not imply any specific
distribution (e.g., normal), and it provides useful thresholds for outlier detection. Using IQR to identify
3This is a simplistic example always with odd numbers of values so there is an obvious middle value. If there are an even
number of values, the median is defined as the mean of the middle two numbers.</p>
      <p>(a) The median of an ordered dataset.
(b) The first quartile (  1) of the dataset.
(c) The third quartile ( 3) of the dataset
outliers is similar to the ±3 threshold for normally distributed data (for normal data it is approximately
equivalent to ±2.7 which classifies 0.70% of normally distributed data as outliers as opposed to 0.27%
at ±3 ) , but works well regardless of statistical distribution.
5.2. The Core Pattern: :Percentile and :OrderedE5Summary
Figure 3 shows the high-level conceptual structure of the EFive ODP.4 At its heart are the classes
:Percentile and :OrderedE5Summary. An :OrderedE5Summary represents a five-number summary,
which consists of a collection of five statistical values, each linked to it via the :hasPercentile object
property. Each of them can be thought of as a specific percentile of the form   where  is any integer
in the closed interval [0, 100]. This is captured by the :Percentile class and its datatype property
:n to represent the percentile number  , restricted to the interval [0, 100] in OWL2 using a datatype
restriction involving xsd:minInclusive and xsd:maxInclusive. A cardinality restriction is placed on
:Percentile requiring instances to have exactly one :n property. The :Percentile class is modeled
as a subclass of stad:StatisticalAggregateDatapoint, which encompasses statistical aggregates
as opposed to specific observations or predictions from a model. The algorithm used to calculate a
:Percentile is represented by the associated stad:DataTransformation.</p>
      <p>We use the shortcut notation :P&lt;n&gt; to denote named subclasses of :Percentile with a fixed  . The
statistical values :Minimum, :Q1, :Median, :Q3, and :Maximum can then be represented as :P0, :P25,
:P50, :P75, and :P100, respectively. The axiomatic definitions of these :P&lt;n&gt; in OWL2 are exemplified
by the following definition of :P50:
:P50 owl:equivalentClass [ owl:intersectionOf ( :Percentile
[ rdf:type owl:Restriction ;
owl:onProperty :n ;
owl:hasValue "50"^^xsd:integer
] ) ] .</p>
      <p>All these defined classes are subclasses of :Percentile; the hierarchy is shown in Figure 4. They
are associated with an :OrderedE5Summary by specific properties :hasMin, :hasQ1, etc., which are all
subproperties of the generic :hasPercentile object property. They are shown as attributes inside the
:OrderedE5Summary class in Figure 5.</p>
      <p>Only data that can be ordered can have a five-number summary, thus we name the
appropriate class :OrderedE5Summary. It is further described by an associated :SummaryDescription; the
dataset it summarizes can be specified and described further by the associated stad:Dataset and
stad:DatasetDescription classes that we reuse from STAD.
5.3. Generalizing the :OrderedE5Summary
We have focused thus far on the :OrderedE5Summary which is modeled for use with ordinal scale data.
However, it does not apply to nominal scale data and is inappropriate for measures like IQR which
require interval or ratio scale data.
4The EFive ODP and supplementary materials, including sample queries, are available at https://github.com/theSKAILab/EFive.
:Percentile
:Mode
:VariationDatapoint
:Minimum
? :P0</p>
      <p>:Q1
? :P25
:Median
? :P50</p>
      <p>:Q3
? :P75
:Median
? :P50
:LowerFence
:IQR
:UpperFence</p>
      <p>As a brief refresher on measurement scales [41], nominal scale data has no natural order (e.g.,
gender, eye color), ordinal scale data can be ordered but lacks consistent diferences (e.g., educational
attainment), interval scale data can be ordered and has consistent diferences but lacks meaningful
ratios (e.g., body temperature), and ratio scale data can be ordered, has consistent diferences, and has
meaningful ratios (e.g., height, weight). We note that, typically, nominal and ordinal scale data are
qualitative while interval and ratio level data are quantitative. As shown in Figure 5, we extend the
:AggregateVariable class to fully capture the measurement scale hierarchy (see also Section 5.3.4).
An :OrderedE5Summary can summarize an :AggregateOrdinalVariable and, likewise, we restrict
instances of :QuantitativeE5Summary in Section 5.4 to summarizing an :AggregateIntervalVariable
or an :AggregateRatioVariable. While, in everyday language, we talk about measurement scales as
disjoint, their definition clearly reflects a nested hierarchy.
5.3.1. The :StatisticalSummary Class
The hierarchy of measurement scales for variables suggests an analogous class hierarchy for the
statistical summaries like :OrderedE5Summary. As shown in Figure 5, we introduce at the top of the
hierarchy the :StatisticalSummary class that can be used with all kinds of data, including nominal
scale data. With none of the five values from a five-number summary being applicable to nominal
scale data, we have added the :Mode as another subclass of stad:StatisticalAggregateDatapoint,
which is inherited by all of the other summary classes. :OrderedE5Summary is then a subclass of
:StatisticalSummary.</p>
      <p>We introduce :hasStatistic as a generalization of :hasPercentile and :hasMode. Therefore,
all object properties that link a :StatisticalSummary to a stad:StatisticalAggregateDatapoint
are subproperties of :hasStatistic. This creates an object proprety hierarchy that mirrors the class
hierarchy of Figure 4.</p>
      <sec id="sec-5-1">
        <title>5.3.2. Extending STAD’s Dataset Class</title>
        <p>A :StatisticalSummary summarizes a variable over a dataset. That dataset is a class or a subset of
a class. Subsets of classes are created using attribute filters; for example, the set of all patients (class)
who are male (filter) and live in the 04411 zip code (filter). These datasets reside in the ABox of some
ontology or KG and EFive uses the stad:Dataset class to represent them (see Figure 6).</p>
        <p>There are two links via the stad:hasBaseDataset property to a dataset, one from the
:StatisticalSummary and one from each stad:StatisticalAggregateDatapoint. The latter one
was already provided by STAD while the former is an addition. Its introduction allows us to
explicitly express that a :StatisticalSummary—which may consist of multiple aggregate datapoints,
e.g., five aggregate datapoints for a five-number summary—is tied to a single dataset and each
stad:StatisticalAggregateDatapoint that is part of that summary must be calculated over that
same dataset. We enforce this axiomatically with a combination of restrictions that, collectively, have
the desired efect: the composition of stad:hasBaseDataset with :hasStatistic is defined as a
subproperty of stad:hasBaseDataset while cardinality restrictions require each :StatisticalSummary
and stad:StatisticalAggregateDatapoint to be related to exactly one stad:Dataset.
:aggregateVariable
:hasSummaryDescription
:StatisticalSummary
+ :hasMode rdfs:range :Mode
:hasAggVar
:AggregateNominalVariable
:OrderedE5Summary
+ :hasMin rdfs:range :Minimum
+ :hasQ1 rdfs:range :Q1
+ :hasMedian rdfs:range :Median
+ :hasQ3 rdfs:range :Q3
+ :hasMax rdfs:range :Maximum</p>
        <p>:QuantitativeE5Summary
+ :hasIQR rdfs:range :IQR
+ :hasLowerFence rdfs:range :LowerFence
+ :hasUpperFence rdfs:range :UpperFence
:hasAggVar
:hasAggVar
:AggregateOrdinalVariable
:AggregateIntervalVariable
:AggregateRatioVariable</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.3.3. Extending STAD’s DatasetDescription Class</title>
        <p>The stad:DatasetDescription class provides a detailed description of a dataset to support user
understanding and to enhance the reusability of :Dataset instances across multiple EFive summaries.
EFive extends it by adding the data property :datasetRepresentation to allow describing the related
dataset in more detail using OWL2 class expressions. It is a string representation of either the name of
a class or, if filters are used, a class expression that uses OWL2 property restrictions. As an example,
consider an :OrderedE5Summary that represents the median educational attainment of male patients
with addresses in the 04411 zip code. The corresponding dataset can be described—as a string—as
the intersection of the patient class, a property restriction on gender with value male, and a property
restriction on zip code with value 04411. This representation provides a mechanism to quickly find
the slice of data from which a descriptive statistic was calculated. Moreover, it simplifies updating a
summary as the ABox changes and instances and attributes of this slice may change.</p>
        <p>The annotation properties :aggregateFilterOnDataProp, :aggregateFilterOnObjectProp, and
:aggregateClass capture some of those details of the dataset definition in more granular form to
faciliate the automated generation of instances of :StatisticalSummary from an ABox. The
properties :aggregateFilterOnDataProp and :aggregateFilterOnObjectProp are only needed when the
:aggregateClass is filtered by some attribute(s). Each can be used with individual properties or with
property paths.5 There can be as many of each of these as needed.
5.3.4. The :SummaryDescription Class
The instances in a dataset are typically related to multiple attributes, so it is possible to have diferent
instances of :StatisticalSummary that aggregate over diferent variables from the same dataset. For
example, we can consider the mode of eye color or the median of weight for the same set of male
patients. Therefore, the variable of interest is specific to a summary, rather than being a characteristic
of the underlying dataset, as are the date and the size of the dataset when the summary was created.</p>
        <p>EFive uses the :SummaryDescription class (see Figure 6) to represent summary-specific details. The
data properties :sizeOfDataset and :dateOfSummary capture when the summary was created as well
as the size of the underlying dataset at that time. The object property :aggregateVariable and the class
:AggregateVariable as its range represent the variable of interest. :AggregateVariable has the data
property :hasVariableName and the object property :hasScale. The latter more explicitly captures
the variable’s measurement scale—also captured by the summary and aggregate variable hierarchies—
using a controlled vocabulary of instances of :MeasurementScale: :NominalScale, :OrdinalScale,
:IntervalScale, and :RatioScale. Each of them are connected to related QUDT classes using
skos:closeMatch.
5.4. Specializing the :OrderedE5Summary to Capture Variation
Neither the original :OrderedE5Summary nor its :StatisticalSummary generalization address yet
how to capture variation. To do so requires working with interval or ratio scale data because we
need consistent diferences among values for measures like IQR and fences. We therefore
introduce :QuantitativeE5Summary as a specialization of :OrderedE5Summary equipped with three
additional properties and associated classes to describe variation within a dataset. The classes for
representing variation are :IQR, :LowerFence, and :UpperFence, all of which are modeled as subclasses
of the :VariationDatapoint subclass of stad:StatisticalAggregateDatapoint as shown in
Figure 4. Because they are intended for use only with interval or ratio scale data, the domains of the
associated object properties—:hasIQR, :hasLowerFence, and :hasUpperFence—are restricted to the
:QuantitativeE5Summary class as shown in Figure 5. To continue mirroring the class structure of
Figure 4, they are generalized by :hasVariationDatapoint which is a subproperty of :hasStatistic.</p>
        <p>We follow common practice and define :IQR = :Q3 − :Q1. The lower and upper fence are used as
thresholds beyond which (less than the lower fence or greater than the upper fence) data values may be
classified as outliers. They can then be defined in terms of the IQR as follows:
Definition 5.1 (Lower Fence and Upper Fence). The :LowerFence is a value 1.5 times the :IQR below
:P25 and the :UpperFence is a value 1.5 times the :IQR above :P75:
:LowerFence = :P25 − 1.5 ⋅ :IQR
:UpperFence = :P75 + 1.5 ⋅ :IQR
5.5. Reusing Other Features of STAD’s StatisticalAggregateDatapoint
Recall that each instance of :StatisticalSummary (or its subclasses) may be linked to multiple instances
of stad:StatisticalAggregateDatapoint or any of its subclasses that are shown in Figure 4.</p>
        <p>In addition to the classification from Figure 4, each stad:StatisticalAggregateDatapoint must
also be an instance of either stad:QualitativeDatapoint or stad:QuantitativeDatapoint. Each
5The last property in a path determines whether the filter is an object or data property type.</p>
        <p>:OrderedE5Summary
:hasMedian
rdf:type</p>
        <p>:Median
stad:hasQualitativeValue
stad:QualitativeValue</p>
        <p>qudt:value
datapoint is related to a data value, a quality kind, and a data transformation. These features can be
specified by reusing concepts from STAD [ 30] and QUDT as illustrated in Figure 7 using the example of
a :Median. They are summarized in the remainder of this section.</p>
        <p>Data Values Semantically, a datapoint (stad:Datapoint) and its value (stad:DataValue) are
diferent things. In the conceptualization we use here (based on STAD and QUDT), a data value includes a
value and, when applicable, a unit. A datapoint includes a data value (as just described), a quality kind,
and, since we are working with aggregate statistical values, a data transformation.</p>
        <p>STAD provides stad:DataValue as an extension of the qudt:QuantityValue that
includes subclasses for both qualitative values (stad:QualitativeValue, associated with
stad:QualitativeDatapoint instances) and quantitative values (stad:QuantitativeValue,
associated with stad:QuantitativeDatapoint instances). A value along with an optional unit are
attached to each stad:DataValue instance. Figure 7 shows the use of stad:QualitativeValue on
the left and stad:QuantitativeValue on the right. The property qudt:value is used for qualitative
values, while qudt:numericValue is used for quantitative ones. In either case, units are optional and
can be added via the qudt:unit property.</p>
        <p>QualityKind The notion of quality kind goes beyond the value (and unit) of a datapoint and answers
the question “What was measured and aggregated?” Examples include length, color, and age. Quality
kinds can be adapted to statistical results, like MedianLength using the stad:StatisticalQualityKind
subclass of stad:QualityKind. While the unit associated with a data value is helpful, the quality kind
adds additional semantics; for example, QUDT’s quantitykind ontology6 includes 57 qudt:QualityKind
instances that have the unit qudt:M (meter), including length, depth, or diameter.</p>
        <p>DataTransformation EFive uses the stad:DataTransformation class to represent the specific
algorithm used to calculate a descriptive statistic. This can be important for data reuse; for
example, there are at least 15 diferent methods to calculate percentiles [ 42]. The object property
stad:hasTransformationKind is used to link a stad:StatisticalAggregateDatapoint to the
specific algorithm used in calculating the statistical value. As outlined in Figure 8, STAD includes a
custom version of the Algorithm, Implementation, and Execution ODP [43] using the prefix
stadmls:. An algorithm is an instance of stad:DataTransformation. Diferent implementations of an
6https://www.qudt.org/doc/DOC_VOCAB-QUANTITY-KINDS.html
:hasStatistic
stad:StatisticalAggregateDatapoint
stad:generatedBy
stad-mls:AlgorithmExecution
stad:hasTransformationKind</p>
        <p>stad-mls:realizes
stad-mls:executes
stad:DataTransformation</p>
        <p>stad-mls:implements
stad-mls:Implementation
algorithm can be captured as instances of stad-mls:Implementation. The execution of an
algorithm’s implementation that generates a specific result can then be represented as an instance of
stad-mls:AlgorithmExecution. Data transformations from the Ontology for Biomedical
Investigations (OBI) [23], STATO, and other ontologies discussed in Section 2.2 can be reused here as well.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Summary and Future Work</title>
      <p>To the best of our knowledge, no prior work has attempted to represent empirical variation and its
semantics using only the expressivity ofered by the OWL2 language. This paper spends considerable
time developing the EFive ODP as a semantic framework for modeling five-number summaries in
OWL2 in a way that enables integration with instance data grounded in other OWL2 domain ontologies.
Central to this pattern are the interquartile range (IQR) and the notion of fences, which explicitly capture
variation. These values allow users to quantify and compare variation within and across datasets; help
identify outliers (values that show excessive variation); and determine whether specific data points fall
within a typical range based on some percentile-based distance from the median.</p>
      <p>This work is incomplete and ongoing. This paper presents a complete working draft of the ODP that
is now readied for more comprehensive evaluation. Considerable work remains to be done:
• More in-depth evaluation, including of the consistency and completeness, sound ontological
structure, and validation via implementation, querying, and reasoning with the ODP.
• Extending the pattern to allow for diferent approaches to variation by defining  1 and  3 as
“hinges” (a term from Tukey [38]) and then expanding to ofer additional options, such as the
20th and 80th percentiles, the 10th and 90th percentiles, and the 5th and 95th percentiles.
• Extending the notion of an interquartile range (IQR) to the generalized interquantile range (GIQR)
as a measure of variation for diferent sets of hinges.
• Developing the class :PositionSet, instances of which will connect to all quartiles, quintiles,
deciles, or twentiles. Users can use position sets to determine where in a distribution specific
values lie, get a general sense of what a distribution looks like, and more.
• Making sure summaries can be added to an ontology or KG and subsequently queried and reasoned
over, even without access to their underlying data.
• Possibly adding empirical frequency distributions of some form. While we do not have current
plans to pursue this line of work, we see it as a useful long term expansion of the ODP.
• Evaluate how the ODP can be used to analyze data that includes “non-detect” values. A result
of “non-detect” is diferent from a result of zero because, while the actual value could be zero,
it could also be any value up to some minimum detection limit. Non-parametric methods, like
ifve-number summaries and IQR, may prove helpful in analyzing data that includes these values.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This material is based in part upon work supported by the National Science Foundation under Grant
Number ITE-2333782.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have employed ChatGPT for improving the clarity of the text. The authors reviewed and
edited the content afterwards as needed and take full responsibility for the publication’s content.
[22] M. C. Pattuelli, The GovStat Ontology: Technical Report, Technical Report, University of North</p>
      <p>Carolina School of Information and Library Science, 2004. Accessed on March 27, 2025.
[23] A. Bandrowski, R. Brinkman, M. Brochhausen, M. H. Brush, B. Bug, M. C. Chibucos, K. Clancy,
M. Courtot, D. Derom, M. Dumontier, et al., The ontology for biomedical investigations, PloS one
11 (2016) e0154556.
[24] W. Ceusters, An information artifact ontology perspective on data collections and associated
representational artifacts, Stud Health Technol Inform 180 (2012) 68–72.
[25] J. Zheng, M. R. Harris, A. M. Masci, Y. Lin, A. Hero, B. Smith, Y. He, The ontology of biological and
clinical statistics (obcs) for standardized and reproducible statistical analysis, Journal of Biomedical
Semantics 7 (2016) 1–13.
[26] A. Gonzalez-Bertran, P. Rocca-Serra, O. Burke, S.-A. Sansone, Statistics ontology (stato), http:
//static-ontology.org/, 2012. Accessed on March 27, 2025.
[27] C. Keßler, M. d’Aquin, S. Dietze, H. Rijgersberg, M. van Assem, J. Top, Ontology of units of measure
and related concepts, Semantic Web 4 (2013) 3–13.
[28] J. Madin, S. Bowers, M. Schildhauer, S. Krivov, D. Pennington, F. Villa, An ontology for describing
and synthesizing ecological observation data, Ecological informatics 2 (2007) 279–296.
[29] R. Hodgson, P. J. Keller, J. Hodges, J. Spivak, QUDT: Quantities, units, dimensions and types,
https://qudt.org/, 2022.
[30] K. Wiafe-Kwakye, T. Hahmann, K. Beard, An ontology design pattern for spatial and temporal
aggregate data (stad), in: 13th Workshop on Ontology Design and Patterns (WOP 2022), 2022.
[31] K. Wiafe-Kwakye, T. Hahmann, K. Beard, STAD: An ontology design pattern and ontology for the
semantic representation of aggregate spatial and temporal data, submitted for review to Semantic
Web Journal (2025).
[32] Government Linked Data Working Group, The RDF Data Cube Vocabulary, Technical Report,</p>
      <p>World Wide Web Consortium (W3C), 2014. Recommendation.
[33] L. Etcheverry, A. A. Vaisman, QB4OLAP: a new vocabulary for olap cubes on the semantic web,
in: 3rd Intern. Conf. on Consuming Linked Data, volume 905, CEUR-WS.org, 2012, pp. 27–38.
[34] S. Chaudhuri, U. Dayal, An overview of data warehousing and olap technology, ACM Sigmod
record 26 (1997) 65–74.
[35] T. Hahmann, P. Hitzler, H. K. McGinty, G. Hettiarachchi, O. Apul, et al., Safe Agricultural Products
and Water Graph (SAWGraph): An Open Knowledge Network to Monitor and Trace PFAS and
Other Contaminants in the Nation’s Food and Water Systems, https://sawgraph.github.io/, 2024.
[36] K. Schweikert, D. Kedrowski, S. Stephen, T. Hahmann, Precomputed topological relations for
integrated geospatial analysis across knowledge graphs, in: 13th Intern. Conf. on Geographic
Information Science (GIScience 2025), LIPIcs 346, 2025 (to appear), pp. 4:1–21.
[37] T. Hahmann, K. Schweikert, S. Stephen, D. Kedrowski, ContaminOSO: Ontological foundations
and key design choices for an ontology for environmental contaminant data, in: 25th International
Conference on Formal Ontology in Inf. Systems (FOIS-25), IOS Press, 2025 (to appear).
[38] J. W. Tukey, Exploratory Data Analysis, Addison-Wesley, 1977.
[39] G.-H. Strand, Large-scale variations in radial tree growth in norway: an application of median
polish for spatial trend detection, Applied Geography 18 (1998) 153–168.
[40] R. W. Cooksey, Descriptive statistics for summarising data, Illustrating statistical procedures:</p>
      <p>Finding meaning in quantitative data (2020) 61–139.
[41] S. S. Stevens, On the theory of scales of measurement, Science 103 (1946) 677–680.
[42] E. Langford, Quartiles in elementary statistics, Journal of Statistics Education 14 (2006).
[43] A. Ławrynowicz, D. Esteves, P. Panov, T. Soru, S. Džeroski, J. Vanschoren, An algorithm,
implementation and execution ontology design pattern, in: Advances in Ontology Design and Patterns,
IOS Press, 2017, pp. 55–68.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Reading</surname>
          </string-name>
          ,
          <article-title>Reasoning about variation, The challenge of developing statistical literacy, reasoning</article-title>
          , and thinking/Kluwer Academic Publishers (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Garfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ben-Zvi</surname>
          </string-name>
          ,
          <article-title>A framework for teaching and assessing reasoning about variability</article-title>
          ,
          <source>Statistics Education Research Journal</source>
          <volume>4</volume>
          (
          <year>2005</year>
          )
          <fpage>92</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Wild</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pfannkuch</surname>
          </string-name>
          ,
          <article-title>Statistical thinking in empirical enquiry</article-title>
          ,
          <source>International statistical review 67</source>
          (
          <year>1999</year>
          )
          <fpage>223</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hahmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>McIlraith</surname>
          </string-name>
          ,
          <article-title>Towards ontologies in variation</article-title>
          , in: 2015 AAAI Spring Symposium Series,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Blomqvist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Daga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <article-title>Pattern-based ontology design</article-title>
          , in: Ontology Engineering in a Networked World, Springer,
          <year>2011</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Begg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Welsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Bratvold</surname>
          </string-name>
          ,
          <article-title>Uncertainty vs. variability: What's the diference and why is it important?, in: SPE hydrocarbon economics and evaluation symposium</article-title>
          , SPE,
          <year>2014</year>
          , p.
          <fpage>D011S003R002</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Nilsson</surname>
          </string-name>
          ,
          <article-title>Probabilistic logic</article-title>
          ,
          <source>Artificial intelligence 28</source>
          (
          <year>1986</year>
          )
          <fpage>71</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Halpern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Megiddo</surname>
          </string-name>
          ,
          <article-title>A logic for reasoning about probabilities</article-title>
          ,
          <source>Information and computation 87</source>
          (
          <year>1990</year>
          )
          <fpage>78</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>H. E. Kyburg,</surname>
          </string-name>
          <article-title>The reference class</article-title>
          ,
          <source>Philosophy of science 50</source>
          (
          <year>1983</year>
          )
          <fpage>374</fpage>
          -
          <lpage>397</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bacchus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Grove</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Halpern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Koller</surname>
          </string-name>
          ,
          <article-title>A response to “believing on the basis of the evidence”</article-title>
          ,
          <source>Computational Intelligence</source>
          <volume>10</volume>
          (
          <year>1994</year>
          )
          <fpage>21</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Koller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pfefer</surname>
          </string-name>
          , P-classic:
          <article-title>A tractable probablistic description logic</article-title>
          ,
          <source>AAAI/IAAI</source>
          <year>1997</year>
          (
          <year>1997</year>
          )
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>K. B. Laskey</surname>
          </string-name>
          ,
          <article-title>Mebn: A language for first-order bayesian knowledge bases</article-title>
          ,
          <source>Artificial intelligence 172</source>
          (
          <year>2008</year>
          )
          <fpage>140</fpage>
          -
          <lpage>178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Giugno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          ,
          <string-name>
            <surname>P</surname>
          </string-name>
          -(d):
          <article-title>a probabilistic extension of (d) for probabilistic ontologies in the semantic web</article-title>
          ,
          <source>in: European Workshop on Logics in Artificial Intelligence</source>
          , Springer,
          <year>2002</year>
          , pp.
          <fpage>86</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          ,
          <article-title>Expressive probabilistic description logics</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>172</volume>
          (
          <year>2008</year>
          )
          <fpage>852</fpage>
          -
          <lpage>883</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. Van Harmelen</given-names>
            ,
            <surname>From</surname>
          </string-name>
          <string-name>
            <surname>SHIQ</surname>
          </string-name>
          and
          <article-title>RDF to OWL: The making of a web ontology language</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>1</volume>
          (
          <year>2003</year>
          )
          <fpage>7</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bacchus</surname>
          </string-name>
          ,
          <string-name>
            <surname>Lp,</surname>
          </string-name>
          <article-title>a logic for representing and reasoning with statistical knowledge</article-title>
          ,
          <source>Computational Intelligence</source>
          <volume>6</volume>
          (
          <year>1990</year>
          )
          <fpage>209</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Demey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kooi</surname>
          </string-name>
          ,
          <article-title>Logic and probabilistic update</article-title>
          ,
          <source>Johan van Benthem on Logic and Information Dynamics</source>
          (
          <year>2014</year>
          )
          <fpage>381</fpage>
          -
          <lpage>404</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <article-title>A probabilistic extension to ontology language OWL</article-title>
          ,
          <source>in: 37th Hawaii Intern. Conf. on System Sciences, IEEE</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>P. C. G. Da Costa</surname>
            ,
            <given-names>K. B.</given-names>
          </string-name>
          <string-name>
            <surname>Laskey</surname>
            ,
            <given-names>K. J.</given-names>
          </string-name>
          <string-name>
            <surname>Laskey</surname>
          </string-name>
          ,
          <article-title>PR-OWL: a bayesian ontology language for the semantic web</article-title>
          ,
          <source>in: Intern. Workshop on Uncertainty Reasoning for the Semantic Web</source>
          , Springer,
          <year>2005</year>
          , pp.
          <fpage>88</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Tjoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hudec</surname>
          </string-name>
          ,
          <article-title>Ontology-based generation of bayesian networks</article-title>
          ,
          <source>in: 2009 Intern. Conf. on Complex, Intelligent and Software Intensive Systems</source>
          , IEEE,
          <year>2009</year>
          , pp.
          <fpage>712</fpage>
          -
          <lpage>717</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Calmet</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ontobayes:</surname>
          </string-name>
          <article-title>An ontology-driven uncertainty model</article-title>
          ,
          <source>in: Intern. Conf. on Computational Intelligence for Modelling</source>
          ,
          <source>Control and Automation and on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06)</source>
          , volume
          <volume>1</volume>
          , IEEE,
          <year>2005</year>
          , pp.
          <fpage>457</fpage>
          -
          <lpage>463</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>