<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Exchange Format for Modular Knowledge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Florian Rabe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Kohlhase</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Jacobs University Bremen</institution>
        </aff>
      </contrib-group>
      <fpage>50</fpage>
      <lpage>66</lpage>
      <abstract>
        <p>We present a knowledge exchange format that is designed to support the exchange of modularly structured information in mathematical and logical systems. This format allows to encode mathematical knowledge in a logic-neutral representation format that can represent the meta-theoretic foundations of the systems together with the knowledge itself and to interlink the foundations at the meta-logical level. This “logics-as-theories” approach, makes system behaviors as well as their represented knowledge interoperable and thus comparable. The advanced modularization features of our system allow communication between systems without losing structure. We equip the proposed format with a web-scalable XML/URI-based concrete syntax so that it can be used as a universal interchange and archiving format for existing systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Mathematical knowledge is at the core of science, engineering, and economics, and we are seeing a
trend towards employing computational systems like semi-automated theorem provers, model checkers,
computer algebra systems, constraint solvers, or concept classifiers to deal with it. It is a characteristic
feature of these systems that they either have mathematical knowledge implicitly encoded in their critical
algorithms or (increasingly) manipulate explicit representations of the relevant mathematical knowledge
often in the form of logical formulae. Unfortunately, these systems have differing domains of
applications, foundational assumptions, and input languages, which makes them non-interoperable and difficult
to compare and evaluate in practice. Moreover, the quantity of mathematical knowledge is growing faster
than our ability to formalize and organize it, aggravating the problem that mathematical software systems
cannot share knowledge representations.</p>
      <p>The work reported in this paper focuses on developing an exchange format between math-knowledge
based systems. We concentrate on a foundationally unconstrained framework for knowledge
representation that allows to represent the meta-theoretic foundations of the mathematical knowledge in the same
format and to interlink the foundations at the meta-logical level. In particular, the logical foundations of
domain representations for the mathematical knowledge can be represented as modules themselves and
can be interlinked via meta-morphisms. This “logics-as-theories” approach, makes systems behavior
as well as their represented knowledge interoperable and thus comparable at multiple levels. Note that
the explicit representation of epistemic foundations also benefits systems whose mathematical
knowledge is only implicitly embedded into the algorithms. Here, the explicit representation can serve as a
documentation of the system as well as a basis for verification or testing attempts.</p>
      <p>Of course communication by translation to the lowest common denominator logic — the current state
of the art — is always possible. But such translations lose the very structural properties of the knowledge
representation that drive computation and led to the choice of logical system in the first place. Therefore
our format incorporates a module system geared to support flexible reuse of knowledge items via theory
morphisms. This module system is the central part of the proposed format — emphasizing
interoperability between theorem proving systems, and the exchange and reusability of mathematical facts across
different systems. In contrast to the formula level, which needs to be ontologically unconstrained, the
module system level must have a clear semantics (relative to the semantics of the formula level) and
be expressive enough to encode current structuring practice so that systems can communicate without
losing representational structure.</p>
      <p>On a practical level, an exchange format should support interoperability by being easy to parse and
standards-compatible. Today, where XML-parsers are available for almost all programming language
and higher-level protocols for distributed computation are XML-based, an exchange format should be
XML-based and in particular an extension of the MATHML [ABC+03] and OPENMATH [BCC+04]
standards. Moreover, an exchange format must be web-scalable in the sense that it supports the
distribution of resources (theories, conjectures, proofs, etc.) over the Internet, so that they can be managed
collaboratively. In the current web architecture this means that all (relevant) resources must be
addressable by a URI-based naming scheme [BLFM05]. Note that in the presence of complex modularity and
reusability infrastructure this may mean that resources have to be addressable, even though they are only
virtually induced by the inheritance structure.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Syntax</title>
      <p>A Four-Level Model of Mathematical Knowledge
ring ring?add cgroup
add mon; inv
mult</p>
      <p>cgroup?mon
ring?mult</p>
      <p>monoid
comp; unit
v2
integers</p>
      <p>0; +;
v1
comp 7! +
unit 7! 0
v2
mon=comp 7! +
mon=unit 7! 0
inv 7!
import
view
or mon 7! v1
Example 1. We begin the exposition of our language with a simple motivating example that already
shows some of the crucial features of MMT and that we will use as a running example: Fig. 1 gives a
theory graph for a portion of the algebraic hierarchy. The bottom node represents the theory of monoids,
which declares operations for composition and unit. (We omit the axioms and the types here.) The theory
cgroup for commutative groups arises by importing from monoid and adding an operation inv for the
inverse element. This theory does not have to declare the operations for composition and unit again
because they are imported from the theory of monoids. The import is named mon, and it can be referenced
by the qualified name cgroup?mon; it induces a theory morphism from monoids to commutative groups.</p>
      <p>Then the theory of rings can be formed by importing from monoid (via an import named mult
that provides the multiplicative structure) and from cgroup (via an import named add that provides
the additive structure). Because all imports are named, different import paths can be distinguished: By
concatenating import and symbol names, the theory ring can access the symbols add=mon=comp (i.e.,
addition), add=mon=unit (i.e, zero), add=inv (i.e., additive inverse), mult=comp (i.e., multiplication),
and mult=unit (i.e., one).</p>
      <p>The node on the right side of the graph represents a theory for the integers declaring the operations 0,
+, and . The fact that the integers are a monoid is represented by the view v1. It is a theory morphism
that is explicitly given by its interpretations of comp as + and of unit as 0. (If we did not omit axioms,
this view would also have to interpret all the axioms of monoid as — using Curry-Howard representation
— proof terms.)</p>
      <p>The view v2 is particularly interesting because there are two ways to represent the fact that the
integers are a commutative group. Firstly, all operations of cgroup can be interpreted as terms over
integers: This means to interpret inv as and the two imported operations mon=comp and mon=unit
as + and 0, respectively. Secondly, v2 can be constructed along the modular structure of cgroup and
use the existing view v1 to interpret all operations imported by mon. In MMT, this can be expressed
elegantly by the interpretation mon 7! v1, which interprets a named import with a theory morphism. The
intuition behind such an interpretation is that it makes the right triangle commute: v2 is defined such
that v2 cgroup?mon = v1. Clearly, both ways lead to the same theory morphism; the second one is
conceptually more complex but eliminates redundancy. (This redundancy is especially harmful when
axioms are considered, which must be interpreted as proofs.)
Ontology and Grammar We characterize mathematical theories on four levels: the document,
module, symbol, and object level. On each level, there are several kinds of expressions. In Fig. 2, the
relations between the MMT-concepts of the first three levels are defined in an ontology. The MMT
knowledge items are grouped into six primitive concepts. Documents (Doc, e.g., the whole graph in
Fig. 1) comprise the document level. Theories (T hy, e.g., monoid and integers) and views (Viw, e.g.,
v1 and v2) comprise the module level. And finally constants (Con, e.g., comp and inv) and
structures (Str, e.g., mon and add) as well as assignments to them (ConAss, e.g., inv 7! , and StrAss, e.g.,
mon 7! v2) comprise the symbol level.</p>
      <p>In addition, we define four unions of concepts: First modules (Mod) unite theories and views,
symbols (Sym) unite constants and structures, and assignments (Ass) unite the assignments to constants and
structures. The most interesting union is that of links (Lnk): They unite the module level concept of
views and the symbol level concept of structures. Structures being both symbols and links makes our
approach balanced between the two different ways to understand them.</p>
      <p>These higher three levels are called the structural lev- subconcept of Doc
els because they represent the structure of mathematical
knowledge: Documents are sets of modules, theories are declared in
sets of symbols, and links are sets of assignments. The Mod
actual mathematical objects are represented at the fourth
level: They occur as arguments of symbols and assign- T hy Lnk Viw
ments. MMT provides a formalization of the structural
levels while being parametric in the specific choice of Sym Ass
objects used on the fourth level.</p>
      <p>The declarations of the four levels along with the Con Str ConAss StrAss
meta-variables we will use to reference them are given
in Fig. 3 and 4. The MMT knowledge items of the struc- Figure 2: The MMT Ontology
tural levels are declared with unqualified names
(underlined Latin letter) in a certain scope and referenced by qualified names (Latin letter). Greek letters are
used as meta-variables for composed expressions. Following practice in programming languages, we
will use as an unnamed meta-variable for irrelevant values.</p>
      <p>The grammar for MMT is given in Fig. 5 where , +, j, and [ ] denote repetition, non-empty
repetition, alternative, and optional parts, respectively. The rules for the non-terminals URI and pchar are
defined in RFC 3986. Thus, g produces a URI without a query or a fragment. (The query and fragment
components of a URI are those starting with the special characters ? and #, respectively.) pchar,
essentially, produces any Unicode character, possibly using percent-encoding for reserved characters. (Thus,</p>
      <sec id="sec-2-1">
        <title>Level Declared concept</title>
        <p>Document document g
Module theory T; S; R; M
view v
structure or view m
Symbol symbol s
constant c
structure h; i; j
assignment to a constant
assignment to a structure</p>
        <p>Object variable x
percent-encoding is necessary for the characters ?/#[]% and all characters generally illegal in URIs.) In
this section, we will describe the syntax of MMT and the intuition behind it in a bottom-up manner, i.e.,
from the object to the document level. Alternatively, the following subsections can be read in top-down
order.
2.1.1</p>
        <sec id="sec-2-1-1">
          <title>The Object Level</title>
          <p>We distinguish two kinds of objects: terms and morphisms. Atomic terms are references to declared
constants, and general terms are built up from the constants via operations such as application and binding
as detailed below. Atomic morphisms are references to structures and views, and general morphisms are
built up using structures, views, identity, and composition as detailed below.</p>
          <p>The MMT objects are typed. For terms, we do not distinguish terms and types syntactically: Rather,
typing is a binary relation on terms that is not further specified by MMT. Instead, it is left to be determined
by the foundation, and MMT is parametric in the specific choice of typing relation. In particular, terms
may be untyped or may have multiple types. For morphisms, the domain theory doubles as the type.
Thus, every morphism has a unique type.</p>
          <p>Well-formedness of objects is checked relative to a home theory. For a term w, the home theory
must declare or import all symbols that occur in w. For a morphism, the home theory is the codomain.
Objects with home theory T are also called objects over T . These relations are listed in Fig. 6.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Terms</title>
          <p>be</p>
          <p>MMT-terms are a generalization of a fragment of OPENMATH objects ([BCC+04]). They can
Library
Document
Document body
Module
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
::=
constants c declared or imported by the home theory,
variables x declared by a binder,
bindings b (w1; ¡; w2) by a binder w1 of a list of variables ¡ with scope w2,
attributions a(w1; w2 7! w3) to a term w1 with key w2 and value w3,
morphism applications w m of m to w,
special term &gt;.</p>
          <p>A term over T , may use the constant T ?c to refer to a previously declared symbol c. And if i is a
previously declared structure instantiating S, and c is a constant declared in S, then T may use T ?i=c to
refer to the imported constant. By concatenating structure names, any indirectly imported constant has a
unique qualified name.</p>
          <p>The attributions of OPENMATH mean that every term can carry a list of key-value pairs that are
themselves OPENMATH objects. In particular, attributions are used to attach types to bound variables.
In OPENMATH, the keys must be symbols, which we relax to obtain a more uniform syntax. Because
OPENMATH specifies that nested attributions are equivalent to a single attribution, we can introduce
attributions of multiple key-value pairs as abbreviations:
a(w; w1 7! w10 ; : : : ; wn 7! wn0 ) := a(: : : (a(w; w1 7! w10 ); : : :); wn 7! w0 )
n
We use the auxiliary function str( ) to strip toplevel attributions from terms, i.e.,
str(a(w; )) = str(w)
str(w) = w otherwise:
This is used in the grammar to make sure that only attributed variables and not arbitrary terms may occurs
in the context bound in a binding.</p>
          <p>The special term &gt; is used for terms that are inaccessible because they refer to constants that have
been hidden by a structure or view. &gt; is also used to subsume hidings under assignments (see below).
Example 2 (Continued). The running example only contains the atomic terms given by symbols.
Composed terms arise when types and axioms are covered. For example, the type of the inverse in a
commutative group is @(!; i; i). Here ! represents the function type constructor and i the carrier set. These
two constants are not declared in the example. Instead, we will add them later by giving cgroup a
metatheory, in which these symbols are declared. A more complicated term is the axiom for left-neutrality of
the unit:
Here 8 and = are further constants that must be declared in the so far omitted meta-theory. The same
applies to oftype, which is used to attribute the type i to the bound variable x. We assume that the
example is located in a document with URI e. Thus, for example, e?monoid?comp is used to refer to the
constant comp in the theory monoid.</p>
          <p>Morphisms Morphisms are built up by compositions m1 m2 of structures i, views m, and the identity
idT of T . Here m1 is applied before m2, i.e., is composition in diagram order. Morphisms are not used in
OPENMATH, which only has an informal notion of theories, namely the content dictionaries, and thus no
need to talk about theory morphisms. A morphism application w m takes a term w over S and a morphism
m from S to T , and returns a term over T . Similarly, a morphism over S, e.g., a morphism m0 from R
to S becomes a morphism over T by taking the composition m0 m. The normalization given below will
permit to eliminate all morphism applications.</p>
          <p>Example 3 (Continued). In the running example, an example morphism is</p>
          <p>me := e?cgroup?mon e?v2:
It has domain e?monoid and codomain e?integers. The intended semantics of the term weme is that it
yields the result of applying me to we, i.e.,
Here, we assume me has no effect on those constants that are inherited from the meta-theory. We will
make that more precise below.
2.1.2</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>The Symbol Level</title>
          <p>We distinguish symbol declarations and assignments to symbols. Declarations are the constituents of
theories: They introduce named objects, i.e., the constants and structures. Similarly, assignments are
the constituents of links: A link from S to T can be defined by a sequence of assignments that instantiate
constants or structures declared in S with terms or morphisms, respectively, over T . This yields four
kinds of knowledge items which are listed in Fig. 7. Both constant and structure declarations are further
subdivided as explained below.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Terms Morphisms</title>
      </sec>
      <sec id="sec-2-3">
        <title>Declaration Assignment</title>
        <p>of a constant Con to a constant c 7! w
of a structure Str to a structure i 7! m</p>
      </sec>
      <sec id="sec-2-4">
        <title>Declarations There are two kinds of symbols:</title>
        <p>Constant declarations c : t := d declare a constant c of type t with definition d . Both the type
and the definition are optional yielding four kinds of constant declarations. If both are given, then
d must have type t. In order to unify these four kinds, we will sometimes write ? for an omitted
type or definition.</p>
        <p>[m]
Structure declarations i : S := fs g declare a structure i from the theory S defined by assignments
s . Such structures can have an optional meta-morphism m (see below). Alternatively, structures
may be introduced using an existing morphism: i : S := m, which simply means that i serves as an
abbreviation for m; we call these structures defined structures. While the domain of a structure
is always given explicitly (in the style of a type), the codomain is always the theory in which the
structure is declared. Consequently, if i : S := m is declared in T , m must be a morphism from S
to T .</p>
        <p>In well-formed theory bodies, the declared or imported names must be unique.</p>
        <p>Assignments Parallel to the declarations, there are two kinds of assignments that can be used to define
a link m:</p>
        <p>Assignments to constants of the form c 7! w express that m maps the constant c of S to the term w
over T . Assignments of the form c 7! &gt; express that the constant c is hidden, i.e., m is undefined
for c.</p>
        <p>Assignments to structures of the form i 7! m for a structure i declared in S and a morphism m
over (i.e., into) T express that m maps the structure i of S to m. This results in the commuting
triangle S?i m = m.</p>
        <p>Both kinds of assignments must type-check. For a link m with domain S and codomain T defined
by among others an assignment c 7! w, the term w must type-check against tm where t is the type of c
declared in S. This ensures that typing is preserved along links. For an assignment i 7! m where i is a
structure over S of type R, type-checking means that m must be a morphism from R to T .
Virtual Symbols Intuitively, the semantics of a structure i with domain S declared in T is that all
symbols of S are imported into T . For example, if S contains a symbol s, then i=s is available as a symbol
in T . In other words, the slash is used as the operator that dereferences structures. Another way to say it
is that structures create virtual or induced symbols. This is significant because these virtual symbols and
their properties must be inferred, and this is non-trivial because it is subject to the translations given by
the structure. While these induced symbols are easily known to systems built for a particular formalism,
they present great difficulties for generic knowledge management services.</p>
        <p>Similarly, every assignment to a structure induces virtual assignments to constants. Continuing the
above example, if a link with domain T contains an assignment to i, this induces assignments to the
imported symbols i=s. Furthermore, assignments may be deep in the following sense: If c is a constant
of S, a link with domain T may also contain assignments to the virtual constant i=c. Of course, this
could lead to clashes if a link contains assignments for both i and i=c; links with such clashes are not
well-formed.</p>
        <p>Example 4 (Continued). The symbol declarations in the theory cgroup are written formally like this:
and
mon : e?monoid := fg:
The latter induces the virtual symbols e?cgroup?mon=comp and e?cgroup?mon=unit.</p>
        <p>Using an assignment to a structure, the assignments of the view v2 look like this:
inv 7! e?integers?
and
mon 7! e?v1:
The latter induces virtual assignments for the symbols e?cgroup?mon=comp and e?cgroup?mon=unit.
For example, e?cgroup?mon=comp is mapped to e?monoid?compe?v1.</p>
        <p>The alternative formulation of the view v2 arises if two deep assignments to the virtual constants are
used instead of the assignment to the structure mon:
mon=comp 7! e?integers? +
and
mon=unit 7! e?integers?0
2.1.3</p>
        <sec id="sec-2-4-1">
          <title>The Module Level</title>
          <p>The module level consists of two kinds of declarations: theory and view declarations.</p>
          <p>[M]
Theory declarations T := fJ g declare a theory T defined by a list of symbol declarations J ,
which we call the body of T . Theories have an optional meta-theory M.</p>
          <p>[m]
View declarations v : S ! T := fs g declare a link v from S to T defined by a list of assignments
s . If S has a meta-theory M, a meta-morphism m from M to T must be provided. Just like
structures, views may also be defined by an existing morphism: v : S ! T := m.</p>
          <p>Meta-Theories Above, we have already mentioned that theories may have meta-theories and that links
may have meta-morphisms. Meta-theories provide a second dimension in the graph induced by theories
and links. If M is the meta-theory of T , then there is a special structure instantiating M in T , which
we denote by T ?::. M provides the syntactic material that T can use to define the semantics of its
symbols: T can refer to a symbol s of M by T ?::=s. While meta-theories could in principle be replaced
with structures altogether, it is advantageous to make them explicit because the conceptual distinction
pervades mathematical discourse. For example, systems can use the meta-theory to determine whether
they understand a specific theory they are provided as input.</p>
          <p>Because theories S with meta-theory M implicitly import all symbols of M, a link from S to T must
provide assignments for these symbols as well. This is the role of the meta-morphism: Every link from
S to T must provide a meta-morphism, which must be a morphism from M to T . Defined structures or
views with definition m do not need a meta-morphism because a meta-morphism is already implied by
the meta-morphisms of the links occurring in m.</p>
          <p>Example 5 (Continued). An MMT theory for the logical framework LF could be declared like this
lf := ftype; funtype; : : :g
where we only list the constants that are relevant for our running example. If this theory is located in a
document with URI m, we can declare a theory for first-order logic in a document with URI f like this:
fol m:?=lf fi : ??::=type; o : ??::=type; equal : @(??::=funtype; ??i; ??i; ??o); : : :g
Here we already use relative names (see Sect. 2.3.2) in order to keep the notation readable: Every name
of the form ??s is relative to the enclosing theory: For example, ??i resolves to f ?fol?i. Furthermore,
we use the special structure name :: to refer to those constants inherited from the meta-theory. Again we
restrict ourselves to a few constant declarations: types i and o for terms and formulas and the equality
operation that takes two terms and returns a formula.</p>
          <p>Then the theories monoid, cgroup, and ring can be declared using f ?fol as their meta-theory. For
example, the declaration of the theory cgroup finally looks like this:</p>
          <p>f ?fol
cgroup :=
inv : @(??::=::=funtype; ??i; ??i); inv : e?monoid
e?cgroup?::
:=
fg
Here ::=::=funtype refers to the function type constant declared in the meta-theory of the meta-theory.
And the structure mon must have a meta-morphism with domain f ?fol and codomain e?cgroup. This
is trivial because the meta-theory of e?cgroup is also f ?fol: The meta-morphism is simply the implicit
structure e?cgroup?:: via which e?cgroup inherits from f ?fol.</p>
          <p>A more complicated meta-morphism must be given in the view v1 if we assume that the meta-theory
of integers is some other theory, i.e., a representation of set theory.</p>
          <p>Structures and Views Both structures and views from S to T are defined by a list of assignments s
that assigns T -objects to the symbols declared in S. And both induce theory morphisms from S to T that
permit to map all objects over S to objects over T . The major difference between structures and views
is that a view only relates two fixed theories without changing either one. On the other hand, structures
from S to T occur within T because they change the theory T . Structures have definitional flavor, i.e.,
the symbols of S are imported into T . In particular, if s contains no assignment for a constant c, this is
equivalent to copying (and translating) the declaration of c from S to T . If s does provide an assignment
c 7! w, the declaration is also copied, but in addition the imported constant receives w as its definiens.</p>
          <p>Views, on the other hand, have theorem flavor: s must provide assignments for the symbols of S. If
a constant c : t represents an axiom stating t, giving an assignment c 7! p means that p is a proof of the
translation of t.</p>
          <p>Therefore, the assignments defining a structure may be (and typically are) partial whereas a view
should be total. This leads to a crucial technical difficulty in the treatment of structures: Contrary to
views from S to T , the assignments by themselves in a structure from S to T do not induce a theory
morphism from S to T — only by declaring the structure do the virtual symbols become available in T
that serve as the images of (some of) the symbols of S. This is unsatisfactory because it makes it harder
to unify the concepts of structures and views.</p>
          <p>Therefore, we admit partial views as well. As it turns out, this is not only possible, but indeed
desirable. A typical scenario when working with views is that some of the specific assignments making up
the view constitute proof obligations and must be found by costly procedures. Therefore, it is reasonable
to represent partial views, namely views where some proof obligations have already been discharged
whereas others remain open. Thus, we use hiding to obtain a semantics for partial views: All constants
for which a view does not provide an assignment are implicitly hidden, i.e., &gt; is assigned to them.</p>
          <p>If a link m from S to T is applied to an S-constant that is hidden, there are two cases: If the hidden
symbol has a definition in S, it is replaced by this definition before applying the link. If it does not have
a definition, it is mapped to &gt;. Hiding is strict: If a subterm is mapped to &gt;, then so is the whole term.
In that case, we speak of hidden terms.
2.1.4</p>
        </sec>
        <sec id="sec-2-4-2">
          <title>The Document Level</title>
          <p>Document declarations are of the form g := fgg where g is a document body and g is a URI identifying
the document. The meaning of a document declaration is that g is accessible via the name g. Since g is
a URI, it is not necessarily only the name, but can also be the primary location of g. By forming lists
of documents, we obtain libraries, which represent mathematical knowledge bases. Special cases of
libraries are single self-contained documents and the internet seen as a library of MMT documents.</p>
          <p>Documents provide the border between formal and informal treatment: How documents are stored,
copied, cached, mirrored, transferred, and combined is subject to knowledge management services that
may or may not take the mathematical semantics of the document bodies in the processed documents
into account. For example, libraries may be implemented as web servers, file systems, databases, or any
combination of these. The only important thing is that they provide the query interface described below.</p>
          <p>Theory graphs are the central notion of MMT. The theory graph is a directed acyclic multigraph.
The nodes are all theories of all documents in the library. And similarly, the edges are the structures and
views of all documents. Then theory morphisms can be recovered as the paths in the theory graph.
2.2</p>
          <p>Querying a Library
MMT is designed to scale to a mathematical web. This means that we define access functions to MMT
libraries that have the form of HTTP requests to a RESTful web server [Fie00]. Specifically, there
is a lookup function that takes a library L and a URI U as arguments and returns an MMT fragment
L(U ). This specifies the behavior of a web server hosting an MMT library in response to GET requests.
Furthermore, all possible changes to an MMT library can be formulated as POST, PUT, and DELETE
requests that add, change, and delete knowledge items, respectively.</p>
          <p>It is non-trivial to add such a RESTful interface to formal systems a posteriori. It requires the rigorous
use of URIs as identifiers for all knowledge items that can be the object of a change. And it requires
to degrade gracefully if the documents in a library are not in the same main memory or on the same
machine. In large applications, it is even desirable to load only the relevant parts of a document into the
main memory and request further fragments on demand from a low-level database service. In MMT,
webscalability is built into the core: All operations on a library L including the definition of the semantics
only depend on the lookup function L( ) and not on L itself. In particular, two libraries that respond
in the same way to lookup requests are indistinguishable by design. Therefore, MMT scales well to
web-based scenarios.</p>
          <p>Here we will only give a brief overview over the lookup function L( ). It is a partial function
from URIs to MMT fragments. For example, assume L to contain a theory S containing the symbol
declarations c : t; c0 : t0; and h : R := fg; furthermore, assume a theory T = g?T declaring a structure
i : S := fc 7! d ; h 7! mg.</p>
          <p>Then the lookups of S?c and S?h yield c : t and h : R := fg, respectively. These are lookups of
explicit symbol declarations — an important feature of MMT is that all induced, symbols can be looked
up in the same way. For example, L(T ?i=c) yields i=c : tT?i := d . Here i=c is the unique name of the
constant c induced by the structure i; tT?i is the translation of the type of S?c along the structure T ?i; and
the assignment c 7! d causes d to occur as the definiens of the constant i=c. Similarly, the lookup of the
induced structure T ?i=h yields i=h : R := m. Here the assignment i 7! m causes i=h to be defined by a
morphism.</p>
          <p>To query and edit documents, assignments are also accessible by URIs even though they are not
referenced by any rule of the syntax. An assignment c 7! w in a view v is referenced by the URI v?c
and similarly for assignments to structures. To access assignments in structures, we distinguish the URIs
g?T ?i=c and g?T =i?c: The former identifies the constant c that is induced by i as defined above; the
latter identifies the assignment c 7! d to the symbol c in the structure T ?i. In particular, the lookup of the
former always is always defined, while the lookup of the latter is undefined if no assignment is present.</p>
          <p>Just like symbols induced by structures have URIs, so have assignments induced by assignments to
structures. For example, if additionally R declares a constant d : r := ?, then the lookup of g?T =i?h=d
yields h=d 7! (R?d)m . This means that the assignment h 7! m in T ?i induces an assignment to h=d. The
lookup of assignments to induced structures is defined accordingly.</p>
          <p>The above lookup functions are actually modified to accommodate for hiding. If the lookup of a
constant would yield c : t := d according to the above definition, but d is (or simplifies to) &gt;, then the
lookup is actually undefined.</p>
          <p>Example 6 (Continued). If L is a library containing the three documents with URIs m, f , and e from the
running example, we obtain the following lookup results:</p>
          <p>L(e?monoid) = ( f ?fol; J ) where J contains the declarations for comp and unit,
L(e?cgroup=mon) = (e?monoid; e?cgroup; e?cgroup?::; ),
i.e., e?cgroup=mon is a morphism from e?monoid to e?cgroup with meta-morphism e?cgroup?::,
and without any assignments,
Le?monoid(unit) = (e?monoid?::=i; ?),
i.e., the theory monoid has a constant unit with type e?monoid?::=i and no definition,
Le?cgroup(mon=unit) = (e?monoid?::=ie?cgroup?mon; ?),
i.e., the type of the virtual constant mon=unit arises by translating the type from the source theory
along the importing structure,
Le?cgroup=mon(unit) = ?,
i.e., the lookup is undefined because the structure e?cgroup=mon does not have an assignment for
unit,
the lookup Le?v2(mon=unit) yields e?integers?0 if the variant with the deep assignment mon=unit 7!
e?integers?0 is used to define v2, and e?monoid?unite?v1 if the variant with the structure
assignment mon 7! e?v1 is used.
2.3</p>
          <p>Concrete Syntax</p>
        </sec>
        <sec id="sec-2-4-3">
          <title>2.3.1 XML-Encoding</title>
          <p>The XML grammar mostly follows the abstract grammar. Documents are omdoc elements with a
theory graph, i.e., theory and view elements as children. The children of theories are constant and
structure. And the children of views and structures are maps (assignment to a constant or structure).</p>
          <p>Both terms and morphisms are represented by OPENMATH elements. And all names of theories
are URIs. The grammar does not account for the well-formedness of objects and names. In particular,
well-formedness is not checked at this level.</p>
          <p>Formally, we define an encoding function E( ) that maps MMT-expressions to sequences of XML
elements. The precise definition of E( ) is given in Fig. 9 and 10 where we assume that the following
namespace bindings are in effect:
xmlns=”http: // www.omdoc.org/ns/omdoc”
xmlns:om=”http: // www.openmath.org/OpenMath”
And we assume the following content dictionary with cdbase http://cds.omdoc.org/omdoc/mmt.
omdoc, which is itself given as an OMDOC theory:
&lt;omdoc&gt;
&lt;theory name=”mmt”&gt;
&lt;constant name=”hidden”/&gt;
&lt;constant name=”identity”/&gt;
&lt;constant name=”composition”/&gt;
&lt;/theory&gt;
&lt;/omdoc&gt;
We abbreviate the OMS elements referring to these symbols by OMDoc(identity) etc.</p>
          <p>The encoding of the structural levels is straightforward. The encoding of objects is a
generalization of the XML encoding of OPENMATH objects. We use the OMS element of OPENMATH to refer to
symbols, theories, views, and structures. The symbol with name OMDoc(hidden) is used to encode
the special term &gt;. To encode morphisms, OMDoc(identity) and OMDoc(composition) are used.
OMDoc(identity) takes a theory as an argument and returns a morphism. OMDoc(composition)
takes a list of structures and views as arguments and returns their (left-associative) diagram order
composition. Morphism application is encoded by reusing the OMA element from OPENMATH.</p>
          <p>The encoding of names is giving
separately in Fig. 8 on the right. There are two OMS-triple E(g?m?s) base=”g” module=”m” name=”s”
different ways to encode names. In OMS el- E(g?m) base=”g” module=”m”
ements, triples (g; m; s) of document, mod- URI E(g?m?s) g?m?s
ule, and symbol name are used, albeit with E(g?m) g?m
more fitting attribute names. These triples
correspond to the (cdbase; cd; name) triples Figure 8: XML Encoding of Names
of the OPENMATH standard ([BCC+04]).</p>
          <p>We also use them to refer to module level names by omitting the name attribute. When names occur in
attribute values, their URIs are used.
2.3.2</p>
        </sec>
        <sec id="sec-2-4-4">
          <title>Relative Names</title>
          <p>In practice it is very inconvenient to always give qualified names. Therefore, we define relative references
as a relaxation of the syntax that is elaborated into the official syntax.</p>
          <p>A relative reference consists of three optional components: a document reference g, a module
reference m and a symbol reference s. We write relative references as triples (g; m; s) where we write ? if a
component is omitted. g must be a URI reference as defined in RFC 3986 ([BLFM05]) but without query
or fragment. m and s must be unqualified names, i.e., slash-separated non-empty sequences of names.
Furthermore, m ans s may optionally start with a slash, which is used to distinguish absolute module and
symbol references from relative ones.</p>
          <p>An absolute reference, which serves as the base of the resolution, is an MMT-name G, G?M, or
G?M?S. Then the resolution of relative references is a partial function that takes a relative reference</p>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>Library Document Theory View</title>
        <p>R = (g; m; s) and an absolute reference B as input and returns an MMT-name resolve(B; R) as output. It
is defined as follows:</p>
        <p>If g 6= ?, then possible starting slashes of m and s are ignored and
– if R = (g; m; s): resolve(B; R) = (G + g)?m?s,
– if R = (g; m; ?): resolve(B; R) = (G + g)?m,
– if R = (g; ?; ?): resolve(B; R) = G + g,
where G + g denotes the resolution of the URI reference g relative to the URI G as defined in RFC
3986 ([BLFM05]).</p>
        <p>If g = ? and m 6= ?, then a possible starting slash of s is ignored and
– if R = (?; m; s): resolve(B; R) = G?M + m?s,
– if R = (?; m; ?): resolve(B; R) = G?M + m,
where M + m resolves m relative to M: If M is not defined or if m starts with a slash, M + m is m
with a possible starting slash removed; otherwise, it is M=m.</p>
        <p>If g = m = ? and M is defined, then resolve(B; R) = G?M?S + s, where S + s is defined like M + m
above.</p>
        <p>resolve(B; R) is undefined otherwise.</p>
        <p>Relative references can also be encoded as URIs: The triple (g; m; s) is encoded as g?m?s. If
components are omitted, they are encoded as the empty string. Trailing (but not leading) ? characters can be
dropped. For example,
(g; m; ?) is encoded as g?m,
(?; =m; s) is encoded as ?=m?s,
(?; ?; s) is encoded as ??s,
This encoding can be parsed back uniquely by splitting a URI into up to three components around the
separator ?.</p>
      </sec>
      <sec id="sec-2-6">
        <title>Constant</title>
        <p>E(c : [t ] := [d ])</p>
      </sec>
      <sec id="sec-2-7">
        <title>Structure</title>
      </sec>
      <sec id="sec-2-8">
        <title>Term Assignments</title>
        <p>E(Ass1; : : : ; Assn)</p>
        <p>[m]
E(i : S := fs g)
E(i : S := m )
E(c 7! w )
E(i 7! m )
E(c)
E(x)
E( )</p>
        <p>&gt;
E(w m )
E(b (w1; x1; : : : ; xn; w2))
E(a (w1; w2 7! w3))</p>
      </sec>
      <sec id="sec-2-9">
        <title>Morphism</title>
        <p>E(idT )
E(m1 : : : mn)
E(i)
E(v)
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Advanced Concepts</title>
      <p>In this section, we give an overview over the most important advanced concepts related to MMT. Full
definitions are given in [Rab08].</p>
      <p>Well-formed Expressions The well-formedness of expressions is defined by an inference system about
MMT fragments. This system guarantees in particular the uniqueness of all names. The most difficult of
the well-formedness definition is to deal with assignments to structures: In general, an assignment i 7! m
to a structure with domain S may cause inconsistencies if a constant S?c has a definition in S that differs
from (S?c)m . The definition of well-formedness makes sure that such inconsistencies are prevented. And
it does so in a way that can be checked efficiently because the modular structure is exploited throughout
the well-formedness checking.</p>
      <p>MMT does not provide a definition of well-typed terms or of logical consequence. Rather, the MMT
inference system is parametric in the judgments for equality and typing of terms. The definitions of these
judgments must be provided externally. In our MMT implementation, these judgments are defined by
plugins where the meta-theory of the home theory of a term w determines which plugin is used to check
the well-typedness of w.</p>
      <p>Semantics The representation of theory graphs is geared towards expressing mathematical knowledge
with the least redundancy by using inheritance between theories. This style of writing mathematics has
been cultivated by the Bourbaki group ([Bou74]) and lends itself well to a systematic development of
theories. However, it is often necessary to eliminate the modular structure, for example when interfacing
with a system that cannot understand it or to elaborate modular theories into a non-modular trusted core.</p>
      <p>Therefore, we define a so-called flattening operation that eliminates all structures, meta-theories, and
morphisms, and reduces theories to collections of constants, possibly with types and definitions, which
conforms to the non-modular logical view of theories. For a given MMT-library L, we can view the
flattening of L as its semantics, since flattening eliminates all specific MMT-representation
infrastructure. Essentially, the flattening replaces all structures with the induced symbols and all assignments to
structures with the induced assignments as described in Sect. 2.2. The crucial invariant of the flattening
is that if a library is well-formed, then so is its flattening; and furthermore, a library and its flattening are
indistinguishable by the lookup function L( ). This guarantees that flattening can be performed
transparently. In particular, the above-mentioned external definitions of typing and equality are only needed
for the non-modular case.</p>
      <p>Applications There are two main use cases for MMT: the use of MMT as a simple interface language
between (i) two logical systems as well as between (ii) a logical system and a knowledge management
service.</p>
      <p>The first use case uses MMT as an interlingua for system interoperability. This works well because
the choice of primitives in MMT is the result of a careful trade-off between simplicity and expressivity.
Thus, MMT provides a simple abstraction over the most important language constructs that are
common to systems maintaining logical libraries, while deliberately avoiding other features that would have
increased the burden on systems reading MMT. Here the logics-as-theories approach provides an
extremely helpful way to document the semantics of the logic-specific concepts and their counterparts in
other logics; furthermore, systems can use the meta-theory relation to detect which theories they can
interpret and how to interpret them.</p>
      <p>By knowledge management services, we mean services that can be applied to logical knowledge
but are not primarily logical in nature. Such services include user interaction, concurrent versioning,
management of change, and search. Implementing such services often requires intricate details about
the underlying system, thus severely limiting the set of potential developers. But in fact these services
can often be developed independently of the logical systems and with relatively little logical expertise.
Here MMT comes in as an interface language providing knowledge management services with exactly
the information they need while hiding all logic- and system-specific details. Therefore, for example, the
concrete syntax of MMT fully marks up the term structure, while the well-formedness definition of MMT
does not take type-checking into account.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We have presented an exchange format for mathematical knowledge that supports system interoperability
by providing an infrastructure for efficiently re-using representations of mathematical knowledge and for
formalizing foundational assumptions and structures of the underlying logical systems themselves in a
joint web-scalable representation infrastructure.</p>
      <p>We consider it a major achievement of the work reported here that the MMT format integrates
theoretical aspects like the syntax/semantics interface of the module systems or meta-logical aspects with
practical considerations like achieving web-scalability via a URI-based referencing scheme that is
wellintegrated with the module system.</p>
      <p>The proposed MMT format takes great care to find a minimal set of primitives to keep the
representational core of the language small and therefore manageable but expressive enough to cover the semantic
essentials of current (semi)-automated theorem provers, model checkers, computer algebra systems,
constraint solvers, concept classifiers and mathematical knowledge management systems.</p>
      <p>We are currently evaluating the MMT format and in particular are developing translators from the
Isabelle [Pau94] and CASL [CoF04] formats into MMT. Both representation formats have strong module
systems and well-specified underlying semantics, which make them prime evaluation targets. We are also
integrating the MMT infrastructure into the upcoming version 2 of the OMDOC format [Koh06].</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [ABC+03]
          <string-name>
            <surname>Ron</surname>
            <given-names>Ausbrooks</given-names>
          </string-name>
          , Stephen Buswell, David Carlisle, Ste´phane Dalmas, Stan Devitt, Angel Diaz, Max Froumentin, Roger Hunter, Patrick Ion, Michael Kohlhase, Robert Miner, Nico Poppelier,
          <string-name>
            <given-names>Bruce</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Neil</given-names>
            <surname>Soiffer</surname>
          </string-name>
          , Robert Sutor, and Stephen Watt.
          <article-title>Mathematical Markup Language (MathML) version 2.0 (second edition)</article-title>
          .
          <source>W3C recommendation, World Wide Web Consortium</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [BCC+04]
          <string-name>
            <given-names>S.</given-names>
            <surname>Buswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Caprotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Carlisle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dewar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaetano</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kohlhase</surname>
          </string-name>
          .
          <source>The Open Math Standard, Version 2.0. Technical report, The Open Math Society</source>
          ,
          <year>2004</year>
          . See http://www.openmath. org/standard/om20.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [BLFM05]
          <string-name>
            <given-names>Tim</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          , Roy. Fielding, and
          <string-name>
            <given-names>L.</given-names>
            <surname>Masinter</surname>
          </string-name>
          .
          <article-title>Uniform resource identifier (URI): Generic syntax</article-title>
          .
          <source>RFC 3986</source>
          , Internt Engineering Task Force,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Bou74] N. Bourbaki. Algebra I. Elements of Mathematics</source>
          . Springer,
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [CoF04]
          <article-title>CoFI (The Common Framework Initiative)</article-title>
          .
          <source>CASL Reference Manual. LNCS 2960 (IFIP Series)</source>
          . Springer,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Fie00]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fielding</surname>
          </string-name>
          .
          <article-title>Architectural Styles and the Design of Network-based Software Architectures</article-title>
          .
          <source>PhD thesis</source>
          , University of California, Irvine,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Koh06] Michael Kohlhase. OMDOC - An open markup format for mathematical documents [Version</source>
          <volume>1</volume>
          .2]. Number 4180 in LNAI. Springer Verlag,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Pau94]
          <string-name>
            <surname>Lawrence</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Paulson</surname>
          </string-name>
          .
          <article-title>Isabelle: A Generic Theorem Prover</article-title>
          .
          <source>LNCS</source>
          . Springer Verlag,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>