<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Applying principles of Ontology-Driven Conceptual Modeling to the interpretation process</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcelo Jaccoud Amaral</string-name>
          <email>jaccoud@ufrj.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vânia Borges</string-name>
          <email>vjborges30@ufrj.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>João L. R. Moreira</string-name>
          <email>j.luizrebelomoreira@utwente.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Luiza M.</string-name>
          <email>mluiza@ppgi.ufrj.br</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Campos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Digital Object, Semiotics, Interpretation, Conceptual Model.</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Programa de Pós-Graduação em Informática, Universidade Federal do Rio de Janeiro</institution>
          ,
          <addr-line>Rio de janeiro</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Twente</institution>
          ,
          <addr-line>De Boelelaan 1105, 1081 HV Amsterdam, NL</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <fpage>147</fpage>
      <lpage>159</lpage>
      <abstract>
        <p>Effective interpretation of symbolic information is a critical challenge, especially as data grows in length and complexity, leading to an unpredictable or inadequately described surrounding context. Software development becomes a challenging task in the lack of a systematic description of the operating context, resulting in imprecise coding and ambiguous documentation. Interoperability standards, designed to align information meaning in heterogeneous environments, face similar issues. This work uses ontology-driven conceptual modeling to formalize the dynamic nature of interpretation as described by the Peircian Semiotics. It presents a generic model of the interpretation process that, by explicitly affirming its dynamic nature, directly contributes to the development of context-aware metadata standards. This formalization can significantly enhance the design of data models and standards, enabling the proper inclusion of contextual metadata, roles, and their dependencies, even within the limitations of current representation languages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In order to ensure information is reused in another system with the exact same meaning it was
intended at the source, interoperability standards strive to register it with detailed documentation
in clear language, and, more recently, with machine-readable semantic metadata. Standards for
highly reusable data types, such as those directly supported by programming languages, have
already coalesced into a few universally adopted ones, like Unicode [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for text or IEEE 754 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for
floating-point arithmetic. Most of these standards succeed in their task because they are designed
to encode highly abstract types that are rarely used in their pure form, but rather as primitive types
to build more complex and actually usable types.
      </p>
      <p>For example, consider the words cats and dogs, which in most sentences, refer respectively to
animals of the species Felis catus and Canis familiaris. However, when talking about animals in
general, they may refer to the whole families Felidae or Canidae, which include wild cats like
cougars and tigers, or coyotes and wolves. Consider now the meaning of the expression “It’s
raining cats and dogs”. Now the same words convey a completely different meaning, that it is
raining heavily. Note that this type of expression occurs in every language, and does not
necessarily use the same words: in Portuguese, we would say “Está chovendo canivetes” (literally:
it is raining pocket knives). Sometimes they are semantically linked: “Chove a cântaros” (lit. it rains
in pitchers) correlates to “It’s pouring rain” (from a vase). The context may influence the meaning
of a word either subtly or radically. Consider also these two sentences:
(a) The box could not pass through the window because it was too large.
(b) The box could not pass through the window because it was too small.</p>
      <p>Notice that in (a) the pronoun it refers to the box, but in (b), the same word, in the same exact
syntactic structure, refers to the window. This well-known linguistic problem is also linked to the
context: the target of the pronoun is determined not only by the adjective ending the phrase,  but
also by the readers’ knowledge about the domain of discourse.</p>
      <p>This context dependence is not a distinct behavior of natural languages. It is present in
computer languages (the plus sign may signify addition for numbers, concatenation for strings or
character repetition in regular expressions), visual codes (red means danger in a road sign, spelling
error in a syntax verifier, closing in the computer dialog window). Music, dance, and all forms of
expression exhibit this same contextual behavior. As information grows in complexity, its
dependence on context becomes more pronounced, often in ways difficult to formally describe. We
believe the writing of documentation, and more broadly, interoperability standards, may benefit
from a more detailed understanding of the interpretation process itself. As in most tasks where
humans play a role, it is convoluted, complex, and sometimes impossible to automate. Already
plagued by low-quality data, most diligent data managers have concerns about interpretations that
may be incorrect, lossy, or imprecise.</p>
      <p>
        In this paper, we aim to address these issues through ontology-driven conceptual modeling
(ODCM), seeking a deeper understanding of the elements and mechanisms involved in the
interpretation process. ODCM is the activity of capturing and formalizing how a community
perceives a domain of interest using modeling primitives inherited from a foundational ontology
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It plays a fundamental role, helping us to understand, elaborate, negotiate and precisely
represent subtle distinctions in our multiple conceptualizations of reality.
      </p>
      <p>This work presents a generic conceptual model for the interpretation process. It aims to uncover
the dynamic nature of interpretation, an aspect often disregarded in static relational schemas that
assume data will always be used in the manner prescribed by data publishers. We believe that by
using proper metadata to better describe the context in which data should be used, we end up with
datasets that are more safely directed to software written for that scenario. To design such models,
we need to understand the semiotic role of each property and its dependencies, considering the
limits imposed by the representation, either human- or machine-related.</p>
      <p>This paper is organized as follows: Section 2 presents background information on the
terminology required for understanding this work. Section 3 addresses the interpretation process
and presents an associated generic model. Section 4 explores this generic model by applying it to
the interpretation of digital entities. Section 5 discusses requirements derived from the
categorization of signs for metadata involved in the interpretation processes. Section 6 draws some
conclusions and proposes directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        The Semiotics of Charles Sanders Peirce (1839–1914) has been the object of many books and
dissertations. The reader may refer to our short introduction in the discussion about the digital
object concept [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A more detailed introduction may be found in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which used the Peircian
concepts in his dissertation on Trinitology. We shall revise here the main topics regarding the
process of interpretation, which Peirce called semiosis.
      </p>
      <p>
        According to Peirce [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the atomic piece for interpretation is the sign, which happens when
something takes the place of something else in some context. Signs emerge continuously in our
brains when cognitive signals from our senses match previously stored information regarding
things they may represent. Later in his work, Peirce admitted that this process may also happen in
the minds of other animals, or even in machines designed to reproduce the behavior of a mind. We
interpret dark clouds as a rain forecast, a red light as a command to stop, a picture as the thing it
depicts, and an utterance as corresponding words in a language. Signs usually come in chains: the
smell of smoke may be interpreted as something being on fire, which in turn may be interpreted as
danger, prompting us to run away or to seek means of extinguishing the fire. Signs do not emerge
spontaneously: they require an intentional act by the interpreter agent. Things that happen
mechanically, such as the production of electric signals by a microphone or a movement induced
by a relay, occur without interpretation; they are simply physical reactions that happen in response
to other phenomena. This is an important distinction to which we shall return later when
regarding software.
      </p>
      <p>
        Signs may vary largely in different scenarios. The well-known Saussurian linguistic sign is
dyadic, composed of two parts: the signifier and the signified [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The Percian sign takes into
account that the meaning may vary in different situations, and models the sign not as having 3
parts itself, but as something that emerges from the conjunction of a triad [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]:



      </p>
      <p>The representamen is the relevant aspect of something that depicts or describes something
else, things that in common language we use to call symbols and graphic signs. Peirce
specifically avoided the common terminology for this role in order to render it generic: any
physical or mental manifestation may act as representation of something else.</p>
      <p>The immediate object is the relevant aspect of the thing being recognized. It is not an actual
object in its plenitude, which is called the dynamic object.</p>
      <p>The interpretant embodies the knowledge of the situation, what links the representamen to
the immediate object and the possible resulting effects. This is not the product of the sign,
but what determines it. It is also not the interpreter, but depends on his competence.</p>
      <p>Recalling the previous examples: dark clouds may be considered the representamen of a sign
where imminent rain is the immediate object and the interpretant is the knowledge of the observer
that the former usually precedes the latter; the redness of a traffic light is the representamen that
signifies the danger of keeping on driving (the immediate object) which is determined by the
driving knowledge (the interpretant) acquired by the driver and that results in braking the car.
These three elements can be identified in every interpretation step.</p>
      <p>It is important to note that the sign does not imply truth or correctness, since misinterpretations
are also interpretations, and the interpretant is not limited to logical rules or formal knowledge.
Some interpretations, and the actions they induce, are based on beliefs and convictions inherent in
the agent. One common effect of a sign that has an assertion as its object is the storing of such
information for future needs, changing the interpretant itself. Peirce called this recurrent
signmaking process semiosis and believed this is how we acquire knowledge and pursue truth.</p>
      <p>
        Another persistent element in Peirce’s writings is the three universal categories, an abstract
classification that he uses repeatedly in many situations. These trichotomies pervade his writings,
and he purposely named the categories in a way that avoids confusing them with more concrete,
quotidian categories. In a very simplified way, paraphrasing Peirce’s words [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], these can be
described as follows: firstness is what simply exists in itself, without referring to anything;
secondness is what it is by force of something to which it is second; and thirdness is what it is
through two other things that it mediates and brings into relation.
      </p>
      <p>When we apply these categories to the sign model, we derive the three basic sign types: icon,
index, and symbol. An icon is a sign in which the representamen displays firstness and relates to
the object by a property of its own similitude, such as an image representing the thing it depicts.
An index is a sign in which the representamen references or points to the object it represents,
displaying secondness, such as a pointing finger, an arrow, an address, a compass needle that
indicates North. A symbol is a sign in which representamen and object are bound by an arbitrary
relation that mediates between them, displaying thirdness, such as words, ideograms, and all sorts
of arbitrary rule encoding.</p>
      <p>When the categories are reapplied to each type, we can derive other subtypes. We shall use
these universal categories as a template to derive important metadata categories that expose these
monadic, dyadic, and triadic valences.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Generic semiosis</title>
      <p>
        As signs are very general in scope, a formal model of semiosis must address high-level entities in
some upper-level ontology of choice. Because most of them define a description as a static relation,
we had to resort to very abstract entities. We ended up choosing the Unified Foundational
Ontology (UFO) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which is a grounded ontology that provides a foundation for domain analysis
in conceptual modeling [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The UFO categories and relations enable the construction of a robust
conceptual model that helps clarify concepts, facilitates meaning negotiation, and precisely defines
the ontological semantics of represented notions [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. UFO provides us with an adequate model for
events and also an entity called relator, which is particularly useful in reifying complex relations
like the ones involving a sign. UFO classes are expressed here with OntoUML, an ontology-driven
modeling language that incorporates the distinctions underlying UFO into UML class diagrams
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. It introduces various stereotypes (shown between «guillemets») that correspond to the
concepts defined in UFO, as well as grammatical formal constraints that reflect UFO axiomatization
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which allows us to better point out some analogies and patterns using a more concise notation.
      </p>
      <p>To improve clarity, we have highlighted stereotypes related to UFO categories used in this
paper for endurants and perdurants. Endurants are entities that exist in time with all their parts.
They have essential and accidental properties and can undergo qualitative changes while
maintaining their identity. Before proceeding, it is essential to discuss the concepts of sortal and
non-sortal, as defined in UFO. Sortals are types that aggregate individuals with the same identity
principle. In contrast, non-sortals aggregate individuals with different identity principles. In sortals,
the identity is defined by a single «kind» it instantiates. The kind may have specializations that can
be either rigid or anti-rigid. The former is stereotyped with «subkind» (e.g., hatchback car as a
subkind of car). The latter classifies only contingently their instances and is stereotyped as «role»
when grounded on relational properties (e.g., employee as a role of a person within the scope of an
employment relationship). For non-sortal endurants, «roleMixin» defines contingent relational
properties for individuals of multiple kinds (e.g., 'customer' for the kind person and organization).
Intrinsic aspects of individuals, which depend on them or other individuals, are stereotyped as
«quality» (e.g., car color) and «mode» (e.g., symptom). Extrinsic aspects are stereotyped with
«relator» (e.g., marriage, enrollment). The «event» stereotype refers to perdurants, which are
individuals that occur in time, accumulating temporal parts. They are existentially dependent on
endurants that participate in them.</p>
      <p>The main subject of a Peircian model of interpretation is the sign. We declared it as a perdurant,
or, in OntoUML terms, gave it the «event» stereotype. It is important to stress that the sign, which
denotes the meaning of something, is not a static object, but a temporal product of an intentional
act happening in a particular situation. Even when a situation seems to repeat itself, the meaning
may not be the same. Take, for example, the repetition of a mother’s plea, “Go tidy your room!”.
Each utterance, even if using the same volume, results in a different interpretation. In this case, the
same sign can never emerge again at a later point in time, because at least one of the components
of the triad — the interpretant — is time-dependent. Even when the triad repeats itself, we consider
the generated sign as another event, happening in a different time. This distinction between the
triad and the sign is an important aspect of the Peircian model: the sign is not the triad, but
emerges from it.</p>
      <p>
        To clarify the roles played by the different components of the sign triad, we could just declare
all related agents as participants. However, to better understand how each agent contributes to the
sign emergence, we decided to reify the triad, a common modeling practice described in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The
result, however, is not a simple relator class. It not only brings together three very different
participants, but their roles are characterized by complex dispositions embedded in each of them.
Figure 1 depicts this general view and helps us unravel each component role separately.  If no
multiplicity is shown on an association end, it implies exactly 1. In the following text, expressions
in Capitalized Bold Font refer to the ontology classes in the diagrams.
      </p>
      <p>An important aspect of interpretation is that, being a process carried out by beings with a
partial and imperfect view of the world, it works on this limited segmented information that is
available to the interpreter. Peirce called these ‘cuts’ of the real things, and they apply to the three
components, even if they are of distinct ontological natures. We have adjusted the terminology
here to avoid confusion, since Peirce himself frequently used more than one term for the same
concept.</p>
      <p>
        The Representamen is “that character of a thing by virtue of which, for the production of a
certain mental effect, it may stand in place of another thing” [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. It summarizes the qualities of a
larger thing, which, for practical purposes, we named a Phenomenon. A phenomenon is the
object of the interpreter’s perception and which has the potential for being interpreted as the
occurrence or presence of something else, the signified thing. This is not restricted to physical
phenomena, such as pictures or sounds, but also encompasses abstract or surreal ones, such as the
shadows of monsters in a dark room.
      </p>
      <p>The same cutting pattern happens with the signified component. In this respect, Peirce himself
clearly differentiated between the actual full signified object, which he called Dynamic Object,
and its qualities that are active in the triad, called Immediate Object. It is important to preserve
this distinction because the actual dynamic object may be anything in the object of discourse, real
or imaginary, and it is never known to the interpreter in its plenitude. We deal with and talk about
things we only know in part, either by descriptions or direct inspection (their representamens).</p>
      <p>The third component is the actual difference from other semiotic models, since Peirce postulated
that something can only be said to represent another in a particular context. A static
signifier/signified pairing fails to capture innumerable situations where the context changes the
meaning. The case of linguistics is exemplary: the same word or expression can mean many
different things in different languages, different sentences, or different contexts. And may even
signify two things at once. This is not necessarily an error; it may be intentional: a pun only
achieves its goal (being funny or outrageous) if two meanings emerge simultaneously. A semiosis
must account for all these strange kinds of interpretations. Peirce denoted this contextual part of
the triad Interpretant, and although he also dissected this complex entity in many subtypes, we
shall take a more pragmatic, task-oriented view of keeping the same pattern adopted for the other
two components. We consider that, whichever qualities (modes or moments) the interpretant
represents, they always inhere in the Interpreter, the real-world intentional agent who actually
performs the task. These qualities may include:</p>
      <p>Knowledge of the relevant qualities in the representamen, which means acquaintance to
important perceivable properties, such as physical modes (e.g. shapes or colors), symbolic
codes (e.g. languages, their vocabulary, syntax or schemata), movements (e.g. in dance or
sign languages), sounds (e.g. phones, tones, chords and intonation) etc.</p>
      <p>Knowledge of the relevant objects present in the domain, which may lead to the
identification of suitable immediate objects.</p>
      <p>Time and spatial context: the meaning of words and gestures may vary radically from one
decade to another, or from one place to another, from one user group to another.
The source of the representamen. For example, the folded hands emoji ( ) means please,
thank you, praying, namaste, añjali mudrā or even a high-five, depending on where you are
and with whom you are chatting.</p>
      <p>The principles used to match the representamen and the immediate object and determine
the resulting effect of the emergent sign. Peirce frequently called these “habits”, to avoid
restricting them to logical rules. Interpretation may be based on stochastic functions, good
or bad correlations, emotions, beliefs, etc.</p>
      <p>The history of the interpreter’s dispositions. The agent’s knowledge about a situation may
change with time, resulting in different interpretations or actions.</p>
      <p>This list may change drastically depending on the situation the interpreter is presented to. An
agent who is experienced in translating languages may not have the proficiency to appreciate a
complex piece of music or interpret a potentially dangerous situation while driving. The main
intrinsic limitation with automating decision procedures using artificial intelligence is related to
the colossal amount of information and modeling needed to support even simple scenarios.</p>
      <p>Once the Sign Triad is formed, by matching the relevant parts in the three components, a Sign
emerges, which in turn causes or evokes a Sign Effect, also determined by the interpretant. There
are two basic types of effects: (i) the representamen is replaced based on the resulting immediate
object (now in focus), creating a new matching step that will result in a new sign and a sign chain;
we named this event Introspection, in analogy to the mental process of examining one’s own
thoughts; and (ii) the sign causes a physical action of some kind. If the resulting object is some
assertion of a fact, the effect may be the storing of the information in memory for further use. If the
result is a command of some kind, the agent executes it. This type of effect terminates the sign
chain. This complex sequence of events constitutes the special process called Semiosis.</p>
      <p>One should note that artificial symbolic representations, which act as representamens when
they are perceived by another agent, are also the result of similar sign chains, in which the objects
play different roles. For example, in reading, written characters are interpreted into spoken
phrases, and in dictation the heard utterances are interpreted into written symbols. Note that they
are not reverse processes; the signs and actions are not the same executed in reverse order, but, if
executed correctly, they produce an inverse effect. As any foreign language student knows, it takes
time to master both directions and perfect the different skills needed for each language.</p>
      <p>This model is purposely abstract in order to be used as a template that must be specialized to
reflect elements of the domain of choice. In the next session, we show how this may be applied to
the interpretation of data.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Digital object semiosis and an application example</title>
      <p>
        Applying Peirce’s model to the world of digital computing may arise some doubts and
misunderstandings. If the sign results from an intentional disposition from the interpreter, how
would a machine with no volition perform a similar task? Souza [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], in her work on
humancomputer interaction, provides a clue to answering this question. She argues that our common
perception of conversing with a user interface is, in fact, an illusion. Instead, the interface's
responses are a direct consequence of how its human designers programmed it to logically react,
mimicking anticipated human behavior. So, we are not actually talking to the machine, but talking
to the designers via the machine, in the way they intended the conversation to unfold. We can say
the machine acts as a delegate to the programmers, automatically responding to user inputs in the
way the programmers would react. Also, a computer program does not spontaneously initiate; it is
intentionally started by a human, either directly or indirectly. This also holds for very complex
interfaces such as AI-driven assistants — there is a large infrastructure behind the interface,
carefully constructed to behave in a human-like fashion, but the responses are all driven by the
intentional acts of the many humans involved in its construction. So, if a piece of software is built
to behave according to the will of one or many humans, we can say it inherits such intentional
dispositions from the programmers. The software will always behave in the way it was
programmed to behave and will do so because it is intentionally started and used. We can thus say
that any interpretation that eventually happens by means of such code is intentional.
      </p>
      <p>An important aspect of the digital environment is that it is very much uniform, since everything
is represented by bit-encoded sequences. Because there is no natural source of data, every bit
sequence present in the universe of a digital system comes either from some other system or from
an analog-to-digital converter (ADC), a device that produces digital data from some analog sensor.
For example, a microphone transforms sound waves into an analog signal, which is fed to an ADC
circuit that regularly generates data encoded in some digital format like pulse code modulation
(PCM). Also, everything that exists in the digital world needs a digital-analog converter (DAC) or
similar circuitry that reacts to data stored in some memory and produces a physical reaction. For
example, a sound DAC captures octets regularly from a serial interface and generates an analog
electrical signal that may be later transformed into sound by a loudspeaker. So, our interpretation
model deals exclusively with things that are derived from bit sequences. This is not a conceptual
restriction, since in the generic model all components of the sign triad were also informational in
nature, but a format restriction: in a digital system, everything must be represented by discrete bit
sequences.</p>
      <p>Porting our generic model into the digital space requires that each concrete class be replaced
with some sort of subclass from a superclass we call Bit Sequence. To facilitate visualization,
Figure 2 illustrates a model with the entities associated with this superclass, highlighting those
involved in the process. Note that bit sequences and their subclasses are all of an informational
nature, and these bits are always realized in some physical media, in this case, the computer
memory. The derived model is illustrated in Figure 3.</p>
      <p>
        The definition of Data in this ontology is derived from [15], which borrows from DOLCE. It is a
sequence of bits ordered according to some Encoding System. Although this may be used with all
kinds of data, this example illustrates how it would work for a Digital Object (DO), which we
define as in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] as “a finite bit sequence that is identified and has a set of assigned descriptors (also bit
sequences) that support or aid in its interpretation”. These descriptors were represented here by the
common concept of Metadata, data that describes another piece of data. To simplify the example,
we considered only DOs that are data, excluding those that may describe untyped or random bit
sequences, which may appear in some scenarios.
      </p>
      <p>Because the computer engine can only deal with bits, everything is a bit sequence that encodes
some other entity. This includes descriptions of entities and all sorts of metainformation.
Interpretation, thus, is the process of converting between representation spaces with different
encodings. For example, in interpreting a program source code within a specific context (text
encoding, programming language, syntax), the bit sequence for the string “0x34” is translated into a
different bit sequence that represents the integer number 52. Note that the computer does not
actually know what 52 means, because it has no idea of what an integer number is. However, its
circuitry knows how to do integer arithmetic on a limited range of integers by manipulating the
bits with digital logic dates, and so it is able to do integer arithmetic without knowing what an
integer is.</p>
      <p>In fact, we have built software that can deal with thousands of other entity bit representations.
But the computer can only correctly operate on data that is properly encoded, and representations
must be constantly adapted in memory. Consider the example of software itself. It starts as ideas in
a human mind, which are then encoded in text, the source code. Once the source code is input to a
machine, it becomes a bit sequence, text encoded in some character encoding standard, such as
UTF-8. The computer cannot run such a program encoded in text; it needs to interpret the text into
an executable form, and it does so using another software called a compiler. The resulting bit
sequence is now an object code, but still not ready to be executed. Another piece of software, part
of the operating system, needs to copy such code into a specific memory address, adjust
boundaries, and start a process to execute the code. The software is now a dynamic sequence of bits
that continuously change (the quintessential Turing strip) until the program terminates. We use
the same term — software — for all these bit sequences in different encodings, but they are distinct
instances with different representations (source and object code) and even different ontological
natures (a static plan, the program, and a mutable sequence manipulated by the running process).</p>
      <p>Figure 3 shows the relevant parts of the interpretation process that, in different granularities,
operate in such encoding transformations. The representamen is the relevant BitSeq Aspects of
the Running BitSeq that is being interpreted. When dealing with digital objects, the immediate
object is the DO Description, which is the metadata associated with a particular Running DO.
The interpretant is the context-aware Algorithm that the Running Software realizes. This triad
is here a form of pattern match that results in the sign, the emergence of a DO Occurrence. The
same algorithm determines which DO Command will result as an effect. This sequence of
occurrences represents the Software Process.</p>
      <p>To help visualize this model applied to a real system, consider an XML parser, whose task is to
interpret an XML-conforming text according to the semantics of an XML Schema. There are two
possible approaches: the Document Object Model (DOM) parser [16] and the Simple API for XML
(SAX) parser [17], as seen in Figure 4.</p>
      <p>The DOM parser (a) uses only its knowledge of syntax rules to determine and produce a new
DOM graph, which is a new representation for the same information. Another routine, which
knows nothing about XML, but needs to know the DOM encoding and the semantics inherent in
the schema. The DOM sign focuses on the whole XML file at once, producing a single DOM graph,
so the application has the whole context of the XML file to work with.</p>
      <p>The SAX parser (b), which is used for very big files, translates every structural XML component
(starting tag, attribute, ending tag, comment etc.) in discrete events, passing their representations
as notifications to particular subroutines that will only be able to work with a smaller context of
the XML tree. These are simpler signs, which result in less memory consumption and smaller
routines, but are potentially restricted in interpreting the data because of the shrunken context.</p>
      <p>In alignment to Peirce’s view, parsing has already been considered as a knowledge acquisition
method [18]. In a SAX parser, the schema and other previous information collected while
traversing the XML tree may be cached to help resolve cross-references or improve the semantic
resolution of terms. Every relevant information added to the interpretant should result in a better
interpretation.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Metadata requirements</title>
      <p>For each element of a data structure that requires interpretation, the interpretant must comprise
knowledge of its ontological nature and the context for the operation. This information is provided
statically by the software code or dynamically by metadata. Knowing how much metadata to
provide is important to guarantee a proper interpretation. How much metadata is needed for each
type of sign? Not coincidentally, there is a parallel with Peirce's sign typology and universal
categories.</p>
      <p>The universal categories may be applied to partition domains in segments that are characterized
by the self (showing firstness), by something else (showing secondness), or by relating two other
things (displaying thirdness). Dozens of such trichotomies appear in Peirce’s writings, and many
more have been developed by his disciples in many different domains. Table 1 lists a few of a long
list collected by [19] that are of interest to our analysis. The last three were used by Peirce to
develop a more granular categorization of signs, which we have used to derive requirements for
the metadata involved in the interpretation process.</p>
      <p>Besides the already seen icon/index/symbol distinction, based on the way representamen and
object are matched, Pierce considered how the sign is used: (i) a qualisign is based on intrinsic
qualities that convey character, possibility, chance, disposition, not facts; (ii) a sinsign represents
occurrence, reality, factuality, individuality, instantiation, and conveys haecceity and identity; and
(iii) a legisign displays continuity, generality, relationship and changes in space-time. All symbols
are legisigns.</p>
      <p>The last trichotomy considers what drives the interpretant in the process. A rheme represents
purpose and focuses on marks or the character of the representamen. A dicisign stands for fact,
assertion, and, as such, focuses on the object. Only symbols and indexes behave this way, since
icons are based on their own attributes. An argument expresses reason, rules, habits, and the
influence of context. Examples are premises, conclusions, and a sequence of statements. Only
symbols may act as arguments.</p>
      <p>These sign trichotomies help us identify the type of knowledge required for the interpretant to
perform its job, as well as the corresponding metadata. Table 2 outlines the metadata requirements
in the case of digital objects.</p>
      <p>Standards for addressing bit sequences.</p>
      <p>Handled mainly by the operating system and
network protocols. However, special
standards may be developed for specific
scenarios.</p>
      <p>Binary encoding standards. Formats and
packing standards. Local defined types.</p>
      <p>Knowledge of the possible arbitrary values an
attribute may assume, which are provided by
the Mode definitions.</p>
      <p>Standards for identifying resources, such as
URIs, UUIDs, CRCs etc.</p>
      <p>It is worth noting that the context dependence varies according to the complexity of the sign.
On one side, qualisigns, which are based only on intrinsic properties, can only be iconic (nothing to
index or mediate) and rhematic (fixed purpose), and thus require very little contextual information.
Low-level routines in the software handle most of this low-level encoding, since the possible
scenarios for bit encoding and memory manipulation are fixed and cannot be changed. On the
other extreme, an argument needs many different standards to be efficiently interpreted, including
the contextual information on which the algorithms apply. Some metadata annotation properties
which are considered simple, like the Resource Description Framework (RDF) Schema comment2 or
the Simple Knowledge Organization System (SKOS) definition3 are actually arguments that map
from an artificial language (RDF or SKOS) into some natural language used to convey the intention
of the author. Any system that needs to interpret such information will need a sophisticated
interpretant armed with computational linguistic capabilities. This sort of metadata is clearly aimed
at the human reader. Still, there is a lot of research using such information to help modelers in
building or aligning ontologies [20, 21] or building natural language interfaces [22].</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and future work</title>
      <p>The analysis proposed in this paper has revealed that various factors influence the interpretation of
symbolic information encoded in digital systems. Although we automate this procedure with
software and employ many methods to assure the quality of its intended execution, every exchange
of data to or from another system implies a change of context that needs to be managed. It is not
enough to guarantee that encodings and formats are compatible, but also that other contextual
information is aligned, such as ontology versions, inference machines and rules. In the case of
human interaction, localization parameters such as language and formatting are the same. This is
especially important when data is exchanged, not in a neutral format, but one designed for human
consumption.</p>
      <p>The distinct aspects of knowledge required during interpretation depend on the various
components of the sign triad, and this knowledge must be integrated into the running software for
the system to provide the intended interpretation. This knowledge may already be embedded in the
form encoding and format conformance, but it may also be loaded dynamically, in response to
changes in the environment of accumulated knowledge. Proper validation of these new data is
crucial to ensure the consistency of the running software. This is especially important with
knowledge graphs, which cannot be fed conflicting information, and language models, which
cannot receive their own output as feedback.</p>
      <p>We hope this ontology helps the derivation of interpretation scenarios for other domains, but
especially the case of proper metadata specification for digital objects, which lacks better validation
procedures and for completeness and suitability. Incoherent metadata is an omen of incorrect data
processing.</p>
      <p>As future work, we intend to demonstrate how these principles can be applied to an RDF-based
knowledge base and its associated services and show how we can derive Shapes Constraint
Language4 (SHACL) validations to pinpoint lags or inconsistencies within datasets, or problems
resulting from the simultaneous use of multiple datasets with incompatible metadata.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work has been partially supported with research grants from RNP, CAPES (Process
number 88887.613048/2021-00), and FINEP/DCT/FAPEB (nº 2904/20-01.20.0272.00).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[15] D. Oberle. Semantic Management of Middleware. Springer (2006). ISBN 978-0-387-27630-4.
[16] T. Leithead. DOM Parsing and Serialization [Internet]. W3C (2016). URL:
https://www.w3.org/TR/DOM-Parsing/.
[17] S. M. Foo, W. M. Lee. (2002). Simple API for XML (SAX). doi: 10.1007/978-1-4302-0829-7_7.
[18] A. Wallis, G. Nelson. Syntactic Parsing as a Knowledge Acquisition Problem. Proceedings of
the 10th European Workshop on Knowledge Acquisition, Modeling and Management,
EKAW'97, pgs. 285-300. Springer (1997).
[19] M. K. Bergman. A Knowledge Representation Practionary — Guidelines Based on Charles</p>
      <p>Sanders Peirce, p.143-7. Springer (2018). doi:10.1007/978-3-319-98092-8.
[20] B. Biébow, S. Szulman. TERMINAE: A Linguistics-Based Tool for the Building of a Domain
Ontology Proceedings of the 11th European Workshop on Knowledge Acquisition, Modeling
and Management, EKAW'99, pgs. 49-66. Springer (1999).
[21] S. J. Ali, V. Naganathan, D. Bork. Establishing Traceability Between Natural Language
Requirements and Software Artifacts by Combining RAG and LLMs. Proceedings of the 43rd
International Conference on Conceptual Modeling, ER 2024, pg.295-314. Springer (2025).
[22] A. Oliveira, E. Nascimento, J. Pinheiro et al. Small, Medium, and Large Language Models for
Text-to-SQL. Proceedings of the 43rd International Conference on Conceptual Modeling, ER
2024, pg. 276-294. Springer (2025).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>The</given-names>
            <surname>Unicode Consortium</surname>
          </string-name>
          .
          <source>The Unicode Standard, Version 16.0</source>
          .0, (South San Francisco: The Unicode Consortium,
          <year>2024</year>
          . ISBN 978-1-
          <fpage>936213</fpage>
          -34-4) https://www.unicode.org/versions/Unicode16.0.0/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>IEEE</given-names>
            <surname>Computer</surname>
          </string-name>
          <article-title>Society (</article-title>
          <year>2019</year>
          -07-22).
          <article-title>IEEE Standard for Floating-Point Arithmetic</article-title>
          . IEEE STD 754
          <article-title>-2019</article-title>
          . IEEE. pp. 
          <fpage>1</fpage>
          -
          <lpage>84</lpage>
          . doi:
          <volume>10</volume>
          .1109/IEEESTD.
          <year>2019</year>
          .8766229. ISBN 
          <fpage>978</fpage>
          -1-
          <fpage>5044</fpage>
          -59
          <lpage>2</lpage>
          . IEEE Std 754
          <article-title>-</article-title>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Sales</surname>
          </string-name>
          . Ontology Validation for Managers. (
          <year>2014</year>
          ).
          <volume>312</volume>
          f.
          <article-title>- Universidade Federal do Espírito Santo, Vitória</article-title>
          ,
          <string-name>
            <surname>ES</surname>
          </string-name>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Amaral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Borges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L. M.</given-names>
            <surname>Campos</surname>
          </string-name>
          .
          <article-title>A Terminological and Semiotic Review of the Digital Object Concept</article-title>
          ,
          <string-name>
            <surname>ER</surname>
          </string-name>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -47262-
          <issue>6</issue>
          _
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>God and the world of signs: Trinity, evolution, and the metaphysical semiotics of CS Peirce </article-title>
          (Vol.
          <volume>2</volume>
          ).
          <source>Brill</source>
          (
          <year>2010</year>
          ).
          <source>ISBN 978 90 04</source>
          <volume>18799 3.</volume>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hoopes</surname>
          </string-name>
          (ed.).
          <source>Peirce on Signs - Writings on Semiotic by Charles Sanders Peirce</source>
          . The University of North Carolina Press (
          <year>1991</year>
          ). https://muse.jhu.edu/book/41103.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>de Saussure. Curso de Linguística Geral</surname>
          </string-name>
          (Cours de linguistique générale).
          <article-title>Portuguese translation by A</article-title>
          .
          <string-name>
            <surname>Chelini</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          <string-name>
            <surname>Paes</surname>
            and
            <given-names>I. Blikstein.</given-names>
          </string-name>
          28th. ed.
          <source>Editora Cultrix</source>
          (
          <year>2012</year>
          ).
          <source>ISBN 978- 85-316-0102-6</source>
          . 
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Peirce</surname>
          </string-name>
          ,
          <source>The Collected Papers of Charles Sanders Peirce, 8 Volumes</source>
          , Cambridge, MA. Harvard University Press (
          <year>1931</year>
          ) - p.
          <fpage>2</fpage>
          .
          <fpage>356</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Guizzardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Fonseca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P. A.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Sales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.B.</given-names>
            <surname>Benevides</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Porello</surname>
          </string-name>
          ,
          <article-title>Types and taxonomic structures in conceptual modeling: a novel ontological theory and engineering support. Data &amp; Knowledge Engineering</article-title>
          , (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1016/j.datak.
          <year>2021</year>
          .
          <volume>101891</volume>
          . 
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Guizzardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Benevides</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Fonseca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Porello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P. A.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Sales</surname>
          </string-name>
          . UFO:
          <article-title>Unified foundational ontology</article-title>
          . Applied ontology (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .3233/AO-210256.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Guizzardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Proper</surname>
          </string-name>
          .
          <article-title>On understanding the value of domain modeling</article-title>
          .
          <source>In: 15th International Workshop on Value Modelling and Business Ontologies</source>
          ,
          <string-name>
            <surname>VMBO</surname>
          </string-name>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Guarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Sales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Guizzardi. Reification</surname>
          </string-name>
          and Truthmaking Patterns,
          <string-name>
            <surname>ER</surname>
          </string-name>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -00847-5_
          <fpage>13</fpage>
          . 
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Benedict</surname>
          </string-name>
          , What Are Representamens?
          <article-title>Transactions of the Charles S</article-title>
          . Peirce
          <string-name>
            <surname>Society</surname>
          </string-name>
          (
          <year>1985</year>
          ). http://www.jstor.org/stable/40320088.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>C. S. de Souza</surname>
          </string-name>
          ,
          <article-title>The semiotic engineering of human-computer interaction</article-title>
          . MIT press (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>