An Exchange Format for Modular Knowledge
                                      Florian Rabe, Michael Kohlhase
                                         Jacobs University Bremen

                                                   Abstract
          We present a knowledge exchange format that is designed to support the exchange of modularly
      structured information in mathematical and logical systems. This format allows to encode math-
      ematical knowledge in a logic-neutral representation format that can represent the meta-theoretic
      foundations of the systems together with the knowledge itself and to interlink the foundations at the
      meta-logical level. This “logics-as-theories” approach, makes system behaviors as well as their rep-
      resented knowledge interoperable and thus comparable. The advanced modularization features of
      our system allow communication between systems without losing structure.
          We equip the proposed format with a web-scalable XML/URI-based concrete syntax so that it
      can be used as a universal interchange and archiving format for existing systems.


1    Introduction
Mathematical knowledge is at the core of science, engineering, and economics, and we are seeing a
trend towards employing computational systems like semi-automated theorem provers, model checkers,
computer algebra systems, constraint solvers, or concept classifiers to deal with it. It is a characteristic
feature of these systems that they either have mathematical knowledge implicitly encoded in their critical
algorithms or (increasingly) manipulate explicit representations of the relevant mathematical knowledge
often in the form of logical formulae. Unfortunately, these systems have differing domains of applica-
tions, foundational assumptions, and input languages, which makes them non-interoperable and difficult
to compare and evaluate in practice. Moreover, the quantity of mathematical knowledge is growing faster
than our ability to formalize and organize it, aggravating the problem that mathematical software systems
cannot share knowledge representations.
    The work reported in this paper focuses on developing an exchange format between math-knowledge
based systems. We concentrate on a foundationally unconstrained framework for knowledge representa-
tion that allows to represent the meta-theoretic foundations of the mathematical knowledge in the same
format and to interlink the foundations at the meta-logical level. In particular, the logical foundations of
domain representations for the mathematical knowledge can be represented as modules themselves and
can be interlinked via meta-morphisms. This “logics-as-theories” approach, makes systems behavior
as well as their represented knowledge interoperable and thus comparable at multiple levels. Note that
the explicit representation of epistemic foundations also benefits systems whose mathematical knowl-
edge is only implicitly embedded into the algorithms. Here, the explicit representation can serve as a
documentation of the system as well as a basis for verification or testing attempts.
    Of course communication by translation to the lowest common denominator logic — the current state
of the art — is always possible. But such translations lose the very structural properties of the knowledge
representation that drive computation and led to the choice of logical system in the first place. Therefore
our format incorporates a module system geared to support flexible reuse of knowledge items via theory
morphisms. This module system is the central part of the proposed format — emphasizing interoper-
ability between theorem proving systems, and the exchange and reusability of mathematical facts across
different systems. In contrast to the formula level, which needs to be ontologically unconstrained, the
module system level must have a clear semantics (relative to the semantics of the formula level) and

Rudnicki P, Sutcliffe G., Konev B., Schmidt R., Schulz S. (eds.);
Proceedings of the Combined KEAPPA - IWIL Workshops, pp. 50-66


                                                       50
An Exchange Format for Modular Knowledge                                                   Rabe, Kohlhase


be expressive enough to encode current structuring practice so that systems can communicate without
losing representational structure.
    On a practical level, an exchange format should support interoperability by being easy to parse and
standards-compatible. Today, where XML-parsers are available for almost all programming language
and higher-level protocols for distributed computation are XML-based, an exchange format should be
XML-based and in particular an extension of the M ATH ML [ABC+ 03] and O PEN M ATH [BCC+ 04]
standards. Moreover, an exchange format must be web-scalable in the sense that it supports the distri-
bution of resources (theories, conjectures, proofs, etc.) over the Internet, so that they can be managed
collaboratively. In the current web architecture this means that all (relevant) resources must be address-
able by a URI-based naming scheme [BLFM05]. Note that in the presence of complex modularity and
reusability infrastructure this may mean that resources have to be addressable, even though they are only
virtually induced by the inheritance structure.


2     Syntax
2.1   A Four-Level Model of Mathematical Knowledge

                                                                v2
      ring ring?add                                                                
                     cgroup          v2    integers               mon/comp 7→ +
      add                                                                               or mon 7→ v1
                    mon, inv                0, +, −               mon/unit 7→ 0
      mult
                                                                inv 7→ −
             cgroup?mon                    v1
      ring?mult                            comp 7→ +
                                           unit 7→ 0
                                                               import
                        monoid
                                                               view
                      comp, unit

                                            Figure 1: Example


Example 1. We begin the exposition of our language with a simple motivating example that already
shows some of the crucial features of M MT and that we will use as a running example: Fig. 1 gives a
theory graph for a portion of the algebraic hierarchy. The bottom node represents the theory of monoids,
which declares operations for composition and unit. (We omit the axioms and the types here.) The theory
cgroup for commutative groups arises by importing from monoid and adding an operation inv for the
inverse element. This theory does not have to declare the operations for composition and unit again
because they are imported from the theory of monoids. The import is named mon, and it can be referenced
by the qualified name cgroup?mon; it induces a theory morphism from monoids to commutative groups.
    Then the theory of rings can be formed by importing from monoid (via an import named mult
that provides the multiplicative structure) and from cgroup (via an import named add that provides
the additive structure). Because all imports are named, different import paths can be distinguished: By
concatenating import and symbol names, the theory ring can access the symbols add/mon/comp (i.e.,
addition), add/mon/unit (i.e, zero), add/inv (i.e., additive inverse), mult/comp (i.e., multiplication),
and mult/unit (i.e., one).
    The node on the right side of the graph represents a theory for the integers declaring the operations 0,
+, and −. The fact that the integers are a monoid is represented by the view v1. It is a theory morphism
that is explicitly given by its interpretations of comp as + and of unit as 0. (If we did not omit axioms,

                                                    51
An Exchange Format for Modular Knowledge                                                 Rabe, Kohlhase


this view would also have to interpret all the axioms of monoid as — using Curry-Howard representation
— proof terms.)
    The view v2 is particularly interesting because there are two ways to represent the fact that the
integers are a commutative group. Firstly, all operations of cgroup can be interpreted as terms over
integers: This means to interpret inv as − and the two imported operations mon/comp and mon/unit
as + and 0, respectively. Secondly, v2 can be constructed along the modular structure of cgroup and
use the existing view v1 to interpret all operations imported by mon. In M MT, this can be expressed
elegantly by the interpretation mon 7→ v1, which interprets a named import with a theory morphism. The
intuition behind such an interpretation is that it makes the right triangle commute: v2 is defined such
that v2 ◦ cgroup?mon = v1. Clearly, both ways lead to the same theory morphism; the second one is
conceptually more complex but eliminates redundancy. (This redundancy is especially harmful when
axioms are considered, which must be interpreted as proofs.)


Ontology and Grammar We characterize mathematical theories on four levels: the document, mod-
ule, symbol, and object level. On each level, there are several kinds of expressions. In Fig. 2, the
relations between the M MT-concepts of the first three levels are defined in an ontology. The M MT
knowledge items are grouped into six primitive concepts. Documents (Doc, e.g., the whole graph in
Fig. 1) comprise the document level. Theories (T hy, e.g., monoid and integers) and views (Viw, e.g.,
v1 and v2) comprise the module level. And finally constants (Con, e.g., comp and inv) and struc-
tures (Str, e.g., mon and add) as well as assignments to them (ConAss, e.g., inv 7→ −, and StrAss, e.g.,
mon 7→ v2) comprise the symbol level.
     In addition, we define four unions of concepts: First modules (Mod) unite theories and views, sym-
bols (Sym) unite constants and structures, and assignments (Ass) unite the assignments to constants and
structures. The most interesting union is that of links (Lnk): They unite the module level concept of
views and the symbol level concept of structures. Structures being both symbols and links makes our
approach balanced between the two different ways to understand them.
     These higher three levels are called the structural lev-        subconcept of Doc
els because they represent the structure of mathematical
                                                                     declared in
knowledge: Documents are sets of modules, theories are
                                                                                    Mod
sets of symbols, and links are sets of assignments. The
actual mathematical objects are represented at the fourth
                                                                         T hy Lnk             Viw
level: They occur as arguments of symbols and assign-
ments. M MT provides a formalization of the structural
levels while being parametric in the specific choice of                  Sym                  Ass
objects used on the fourth level.
     The declarations of the four levels along with the           Con           Str   ConAss       StrAss
meta-variables we will use to reference them are given
in Fig. 3 and 4. The M MT knowledge items of the struc-              Figure 2: The M MT Ontology
tural levels are declared with unqualified names (under-
lined Latin letter) in a certain scope and referenced by qualified names (Latin letter). Greek letters are
used as meta-variables for composed expressions. Following practice in programming languages, we
will use as an unnamed meta-variable for irrelevant values.
     The grammar for M MT is given in Fig. 5 where ∗ , + , |, and [−] denote repetition, non-empty rep-
etition, alternative, and optional parts, respectively. The rules for the non-terminals URI and pchar are
defined in RFC 3986. Thus, g produces a URI without a query or a fragment. (The query and fragment
components of a URI are those starting with the special characters ? and #, respectively.) pchar, essen-
tially, produces any Unicode character, possibly using percent-encoding for reserved characters. (Thus,

                                                   52
An Exchange Format for Modular Knowledge                                                       Rabe, Kohlhase


                                   Level         Declared concept
                                   Document      document g
                                   Module        theory T, S, R, M
                                                 view v
                                                 structure or view m
                                   Symbol        symbol s
                                                 constant c
                                                 structure h, i, j
                                                 assignment to a constant
                                                 assignment to a structure
                                   Object        variable x

                      Figure 3: Meta-Variables for References to M MT Declarations


                     Variable    Type
                     Λ           library (list of document declarations)
                     γ           document body (list of module declarations)
                     ϑ           theory body (list of symbol declarations)
                     σ           view/structure body (list of assignments to symbols)
                     ω           term
                     µ           morphism

                              Figure 4: Meta-Variables for M MT Expressions


percent-encoding is necessary for the characters ?/#[]% and all characters generally illegal in URIs.) In
this section, we will describe the syntax of M MT and the intuition behind it in a bottom-up manner, i.e.,
from the object to the document level. Alternatively, the following subsections can be read in top-down
order.

2.1.1   The Object Level
We distinguish two kinds of objects: terms and morphisms. Atomic terms are references to declared
constants, and general terms are built up from the constants via operations such as application and binding
as detailed below. Atomic morphisms are references to structures and views, and general morphisms are
built up using structures, views, identity, and composition as detailed below.
    The M MT objects are typed. For terms, we do not distinguish terms and types syntactically: Rather,
typing is a binary relation on terms that is not further specified by M MT. Instead, it is left to be determined
by the foundation, and M MT is parametric in the specific choice of typing relation. In particular, terms
may be untyped or may have multiple types. For morphisms, the domain theory doubles as the type.
Thus, every morphism has a unique type.
    Well-formedness of objects is checked relative to a home theory. For a term ω, the home theory
must declare or import all symbols that occur in ω. For a morphism, the home theory is the codomain.
Objects with home theory T are also called objects over T . These relations are listed in Fig. 6.

Terms    M MT-terms are a generalization of a fragment of O PEN M ATH objects ([BCC+ 04]). They can
be

                                                      53
An Exchange Format for Modular Knowledge                                                      Rabe, Kohlhase


         Library                    Λ              ::=    Doc∗
         Document                   Doc            ::=    g := {γ}
         Document body              γ              ::=    Mod ∗
         Module                     Mod            ::=    T hy | Viw
                                                             [T ]
         Theory                     T hy           ::=    T := {ϑ }
                                                                          [µ]
         View                      Viw             ::=    v : T → T := {σ } | v : T → T := µ
         Theory body               ϑ               ::=    Sym∗
         Symbol                    Sym             ::=    Con | Str
         Link body                 σ               ::=    Ass∗
         Assignment                Ass             ::=    ConAss | StrAss
         Constant                  Con             ::=    c : ω := ω | c : ω | c := ω | c
                                                                    [µ]
         Structure                 Str             ::=    i : T := {σ } | i : T := µ
         Ass. to constant          ConAss          ::=    c 7→ ω
         Ass. to structure         StrAss          ::=    i 7→ µ
         Term                      ω               ::=    > | c | x | ω µ | @(ω, ω + )
                                                            | β (ω, ϒ, ω) | α(ω, ω 7→ ω)
         Variable context          ϒ               ::=    · | ϒ, ω     if str(ω) of the form x
         Morphism                  µ               ::=    idT | i | v | µ • µ
         Document reference        g               ::=    URI, no query, no fragment
         Module reference          T, v            ::=    g?T | g?v
         Symbol reference          c, i            ::=    T ?c | T ?i
         Assignment reference      a               ::=    v?c | v?i
         Local name                T , v, c, i     ::=    C+ [/C+ ]∗
         Variable name             x               ::=    C+
         Character                 C               ::=    pchar
                                   URI, pchar             see RFC 3986 [BLFM05]

                           Figure 5: The Grammar for Raw M MT Expressions


                          Atomic object     Composed object           Type      Checked relative to
           Terms          constant          term                      term      home theory
           Morphisms      structure/view    morphism                  domain    codomain

                                        Figure 6: The Object Level


   • constants c declared or imported by the home theory,

   • variables x declared by a binder,

   • applications @(ω, ω1 , . . . , ωn ) of ω to arguments ωi ,

   • bindings β (ω1 , ϒ, ω2 ) by a binder ω1 of a list of variables ϒ with scope ω2 ,

   • attributions α(ω1 , ω2 7→ ω3 ) to a term ω1 with key ω2 and value ω3 ,

   • morphism applications ω µ of µ to ω,

                                                     54
An Exchange Format for Modular Knowledge                                                                  Rabe, Kohlhase


   • special term >.
     A term over T , may use the constant T ?c to refer to a previously declared symbol c. And if i is a
previously declared structure instantiating S, and c is a constant declared in S, then T may use T ?i/c to
refer to the imported constant. By concatenating structure names, any indirectly imported constant has a
unique qualified name.
     The attributions of O PEN M ATH mean that every term can carry a list of key-value pairs that are
themselves O PEN M ATH objects. In particular, attributions are used to attach types to bound variables.
In O PEN M ATH, the keys must be symbols, which we relax to obtain a more uniform syntax. Because
O PEN M ATH specifies that nested attributions are equivalent to a single attribution, we can introduce
attributions of multiple key-value pairs as abbreviations:
                 α(ω, ω1 7→ ω10 , . . . , ωn 7→ ωn0 ) := α(. . . (α(ω, ω1 7→ ω10 ), . . .), ωn 7→ ωn0 )
We use the auxiliary function str(·) to strip toplevel attributions from terms, i.e.,
                               str(α(ω, )) = str(ω)          str(ω) = ω otherwise.
This is used in the grammar to make sure that only attributed variables and not arbitrary terms may occurs
in the context bound in a binding.
     The special term > is used for terms that are inaccessible because they refer to constants that have
been hidden by a structure or view. > is also used to subsume hidings under assignments (see below).
Example 2 (Continued). The running example only contains the atomic terms given by symbols. Com-
posed terms arise when types and axioms are covered. For example, the type of the inverse in a commu-
tative group is @(→, ι, ι). Here → represents the function type constructor and ι the carrier set. These
two constants are not declared in the example. Instead, we will add them later by giving cgroup a meta-
theory, in which these symbols are declared. A more complicated term is the axiom for left-neutrality of
the unit:
           ωe := β (∀, α(x, oftype 7→ ι), @(=, @(e?monoid?comp, e?monoid?unit, x), x)).
Here ∀ and = are further constants that must be declared in the so far omitted meta-theory. The same
applies to oftype, which is used to attribute the type ι to the bound variable x. We assume that the
example is located in a document with URI e. Thus, for example, e?monoid?comp is used to refer to the
constant comp in the theory monoid.

Morphisms Morphisms are built up by compositions µ1 • µ2 of structures i, views m, and the identity
idT of T . Here µ1 is applied before µ2 , i.e., • is composition in diagram order. Morphisms are not used in
O PEN M ATH, which only has an informal notion of theories, namely the content dictionaries, and thus no
need to talk about theory morphisms. A morphism application ω µ takes a term ω over S and a morphism
µ from S to T , and returns a term over T . Similarly, a morphism over S, e.g., a morphism µ 0 from R
to S becomes a morphism over T by taking the composition µ 0 • µ. The normalization given below will
permit to eliminate all morphism applications.
Example 3 (Continued). In the running example, an example morphism is
                                          µe := e?cgroup?mon • e?v2.
It has domain e?monoid and codomain e?integers. The intended semantics of the term ωe µe is that it
yields the result of applying µe to ωe , i.e.,
                                β (∀, α(x, oftype 7→ ι), @(=, @(+, 0, x), x)).
Here, we assume µe has no effect on those constants that are inherited from the meta-theory. We will
make that more precise below.

                                                          55
An Exchange Format for Modular Knowledge                                                 Rabe, Kohlhase


2.1.2   The Symbol Level

We distinguish symbol declarations and assignments to symbols. Declarations are the constituents of
theories: They introduce named objects, i.e., the constants and structures. Similarly, assignments are
the constituents of links: A link from S to T can be defined by a sequence of assignments that instantiate
constants or structures declared in S with terms or morphisms, respectively, over T . This yields four
kinds of knowledge items which are listed in Fig. 7. Both constant and structure declarations are further
subdivided as explained below.

                                          Declaration          Assignment
                         Terms            of a constant Con    to a constant c 7→ ω
                         Morphisms        of a structure Str   to a structure i 7→ µ

                                     Figure 7: The Statement Level


Declarations There are two kinds of symbols:

   • Constant declarations c : τ := δ declare a constant c of type τ with definition δ . Both the type
     and the definition are optional yielding four kinds of constant declarations. If both are given, then
     δ must have type τ. In order to unify these four kinds, we will sometimes write ⊥ for an omitted
     type or definition.
                                    [µ]
   • Structure declarations i : S := {σ } declare a structure i from the theory S defined by assignments
     σ . Such structures can have an optional meta-morphism µ (see below). Alternatively, structures
     may be introduced using an existing morphism: i : S := µ, which simply means that i serves as an
     abbreviation for µ; we call these structures defined structures. While the domain of a structure
     is always given explicitly (in the style of a type), the codomain is always the theory in which the
     structure is declared. Consequently, if i : S := µ is declared in T , µ must be a morphism from S
     to T .

In well-formed theory bodies, the declared or imported names must be unique.

Assignments Parallel to the declarations, there are two kinds of assignments that can be used to define
a link m:

   • Assignments to constants of the form c 7→ ω express that m maps the constant c of S to the term ω
     over T . Assignments of the form c 7→ > express that the constant c is hidden, i.e., m is undefined
     for c.

   • Assignments to structures of the form i 7→ µ for a structure i declared in S and a morphism µ
     over (i.e., into) T express that m maps the structure i of S to µ. This results in the commuting
     triangle S?i • m = µ.

    Both kinds of assignments must type-check. For a link m with domain S and codomain T defined
by among others an assignment c 7→ ω, the term ω must type-check against τ m where τ is the type of c
declared in S. This ensures that typing is preserved along links. For an assignment i 7→ µ where i is a
structure over S of type R, type-checking means that µ must be a morphism from R to T .

                                                      56
An Exchange Format for Modular Knowledge                                                    Rabe, Kohlhase


Virtual Symbols Intuitively, the semantics of a structure i with domain S declared in T is that all
symbols of S are imported into T . For example, if S contains a symbol s, then i/s is available as a symbol
in T . In other words, the slash is used as the operator that dereferences structures. Another way to say it
is that structures create virtual or induced symbols. This is significant because these virtual symbols and
their properties must be inferred, and this is non-trivial because it is subject to the translations given by
the structure. While these induced symbols are easily known to systems built for a particular formalism,
they present great difficulties for generic knowledge management services.
     Similarly, every assignment to a structure induces virtual assignments to constants. Continuing the
above example, if a link with domain T contains an assignment to i, this induces assignments to the
imported symbols i/s. Furthermore, assignments may be deep in the following sense: If c is a constant
of S, a link with domain T may also contain assignments to the virtual constant i/c. Of course, this
could lead to clashes if a link contains assignments for both i and i/c; links with such clashes are not
well-formed.
Example 4 (Continued). The symbol declarations in the theory cgroup are written formally like this:

                           inv : @(→, ι, ι)      and        mon : e?monoid := {}.

The latter induces the virtual symbols e?cgroup?mon/comp and e?cgroup?mon/unit.
   Using an assignment to a structure, the assignments of the view v2 look like this:

                             inv 7→ e?integers? −            and   mon 7→ e?v1.

The latter induces virtual assignments for the symbols e?cgroup?mon/comp and e?cgroup?mon/unit.
For example, e?cgroup?mon/comp is mapped to e?monoid?compe?v1 .
    The alternative formulation of the view v2 arises if two deep assignments to the virtual constants are
used instead of the assignment to the structure mon:

                 mon/comp 7→ e?integers? +             and     mon/unit 7→ e?integers?0

2.1.3   The Module Level
The module level consists of two kinds of declarations: theory and view declarations.
                                [M]
   • Theory declarations T := {ϑ } declare a theory T defined by a list of symbol declarations ϑ ,
     which we call the body of T . Theories have an optional meta-theory M.
                                      [µ]
   • View declarations v : S → T := {σ } declare a link v from S to T defined by a list of assignments
     σ . If S has a meta-theory M, a meta-morphism µ from M to T must be provided. Just like
     structures, views may also be defined by an existing morphism: v : S → T := µ.

Meta-Theories Above, we have already mentioned that theories may have meta-theories and that links
may have meta-morphisms. Meta-theories provide a second dimension in the graph induced by theories
and links. If M is the meta-theory of T , then there is a special structure instantiating M in T , which
we denote by T ?... M provides the syntactic material that T can use to define the semantics of its
symbols: T can refer to a symbol s of M by T ?../s. While meta-theories could in principle be replaced
with structures altogether, it is advantageous to make them explicit because the conceptual distinction
pervades mathematical discourse. For example, systems can use the meta-theory to determine whether
they understand a specific theory they are provided as input.

                                                       57
An Exchange Format for Modular Knowledge                                                  Rabe, Kohlhase


    Because theories S with meta-theory M implicitly import all symbols of M, a link from S to T must
provide assignments for these symbols as well. This is the role of the meta-morphism: Every link from
S to T must provide a meta-morphism, which must be a morphism from M to T . Defined structures or
views with definition µ do not need a meta-morphism because a meta-morphism is already implied by
the meta-morphisms of the links occurring in µ.
Example 5 (Continued). An M MT theory for the logical framework LF could be declared like this

                                      lf := {type, funtype, . . .}

where we only list the constants that are relevant for our running example. If this theory is located in a
document with URI m, we can declare a theory for first-order logic in a document with URI f like this:
             m?lf
        fol := {i : ??../type, o : ??../type, equal : @(??../funtype, ??i, ??i, ??o), . . .}

Here we already use relative names (see Sect. 2.3.2) in order to keep the notation readable: Every name
of the form ??s is relative to the enclosing theory: For example, ??i resolves to f ?fol?i. Furthermore,
we use the special structure name .. to refer to those constants inherited from the meta-theory. Again we
restrict ourselves to a few constant declarations: types i and o for terms and formulas and the equality
operation that takes two terms and returns a formula.
    Then the theories monoid, cgroup, and ring can be declared using f ?fol as their meta-theory. For
example, the declaration of the theory cgroup finally looks like this:
                                                                                                
                   f ?fol                                                          e?cgroup?..
          cgroup :=        inv : @(??../../funtype, ??i, ??i), inv : e?monoid          :=      {}

Here ../../funtype refers to the function type constant declared in the meta-theory of the meta-theory.
And the structure mon must have a meta-morphism with domain f ?fol and codomain e?cgroup. This
is trivial because the meta-theory of e?cgroup is also f ?fol: The meta-morphism is simply the implicit
structure e?cgroup?.. via which e?cgroup inherits from f ?fol.
     A more complicated meta-morphism must be given in the view v1 if we assume that the meta-theory
of integers is some other theory, i.e., a representation of set theory.

Structures and Views Both structures and views from S to T are defined by a list of assignments σ
that assigns T -objects to the symbols declared in S. And both induce theory morphisms from S to T that
permit to map all objects over S to objects over T . The major difference between structures and views
is that a view only relates two fixed theories without changing either one. On the other hand, structures
from S to T occur within T because they change the theory T . Structures have definitional flavor, i.e.,
the symbols of S are imported into T . In particular, if σ contains no assignment for a constant c, this is
equivalent to copying (and translating) the declaration of c from S to T . If σ does provide an assignment
c 7→ ω, the declaration is also copied, but in addition the imported constant receives ω as its definiens.
     Views, on the other hand, have theorem flavor: σ must provide assignments for the symbols of S. If
a constant c : τ represents an axiom stating τ, giving an assignment c 7→ π means that π is a proof of the
translation of τ.
     Therefore, the assignments defining a structure may be (and typically are) partial whereas a view
should be total. This leads to a crucial technical difficulty in the treatment of structures: Contrary to
views from S to T , the assignments by themselves in a structure from S to T do not induce a theory
morphism from S to T — only by declaring the structure do the virtual symbols become available in T
that serve as the images of (some of) the symbols of S. This is unsatisfactory because it makes it harder
to unify the concepts of structures and views.

                                                    58
An Exchange Format for Modular Knowledge                                                      Rabe, Kohlhase


    Therefore, we admit partial views as well. As it turns out, this is not only possible, but indeed de-
sirable. A typical scenario when working with views is that some of the specific assignments making up
the view constitute proof obligations and must be found by costly procedures. Therefore, it is reasonable
to represent partial views, namely views where some proof obligations have already been discharged
whereas others remain open. Thus, we use hiding to obtain a semantics for partial views: All constants
for which a view does not provide an assignment are implicitly hidden, i.e., > is assigned to them.
    If a link m from S to T is applied to an S-constant that is hidden, there are two cases: If the hidden
symbol has a definition in S, it is replaced by this definition before applying the link. If it does not have
a definition, it is mapped to >. Hiding is strict: If a subterm is mapped to >, then so is the whole term.
In that case, we speak of hidden terms.

2.1.4    The Document Level
Document declarations are of the form g := {γ} where γ is a document body and g is a URI identifying
the document. The meaning of a document declaration is that γ is accessible via the name g. Since g is
a URI, it is not necessarily only the name, but can also be the primary location of γ. By forming lists
of documents, we obtain libraries, which represent mathematical knowledge bases. Special cases of
libraries are single self-contained documents and the internet seen as a library of M MT documents.
    Documents provide the border between formal and informal treatment: How documents are stored,
copied, cached, mirrored, transferred, and combined is subject to knowledge management services that
may or may not take the mathematical semantics of the document bodies in the processed documents
into account. For example, libraries may be implemented as web servers, file systems, databases, or any
combination of these. The only important thing is that they provide the query interface described below.
    Theory graphs are the central notion of M MT. The theory graph is a directed acyclic multigraph.
The nodes are all theories of all documents in the library. And similarly, the edges are the structures and
views of all documents. Then theory morphisms can be recovered as the paths in the theory graph.

2.2     Querying a Library
M MT is designed to scale to a mathematical web. This means that we define access functions to M MT
libraries that have the form of HTTP requests to a RESTful web server [Fie00]. Specifically, there
is a lookup function that takes a library Λ and a URI U as arguments and returns an M MT fragment
Λ(U). This specifies the behavior of a web server hosting an M MT library in response to GET requests.
Furthermore, all possible changes to an M MT library can be formulated as POST, PUT, and DELETE
requests that add, change, and delete knowledge items, respectively.
     It is non-trivial to add such a RESTful interface to formal systems a posteriori. It requires the rigorous
use of URIs as identifiers for all knowledge items that can be the object of a change. And it requires
to degrade gracefully if the documents in a library are not in the same main memory or on the same
machine. In large applications, it is even desirable to load only the relevant parts of a document into the
main memory and request further fragments on demand from a low-level database service. In M MT, web-
scalability is built into the core: All operations on a library Λ including the definition of the semantics
only depend on the lookup function Λ(−) and not on Λ itself. In particular, two libraries that respond
in the same way to lookup requests are indistinguishable by design. Therefore, M MT scales well to
web-based scenarios.
     Here we will only give a brief overview over the lookup function Λ(−). It is a partial function
from URIs to M MT fragments. For example, assume Λ to contain a theory S containing the symbol
declarations c : τ, c0 : τ 0 , and h : R := {}; furthermore, assume a theory T = g?T declaring a structure
i : S := {c 7→ δ , h 7→ µ}.

                                                      59
An Exchange Format for Modular Knowledge                                                       Rabe, Kohlhase


     Then the lookups of S?c and S?h yield c : τ and h : R := {}, respectively. These are lookups of
explicit symbol declarations — an important feature of M MT is that all induced, symbols can be looked
up in the same way. For example, Λ(T ?i/c) yields i/c : τ T ?i := δ . Here i/c is the unique name of the
constant c induced by the structure i; τ T ?i is the translation of the type of S?c along the structure T ?i; and
the assignment c 7→ δ causes δ to occur as the definiens of the constant i/c. Similarly, the lookup of the
induced structure T ?i/h yields i/h : R := µ. Here the assignment i 7→ µ causes i/h to be defined by a
morphism.
     To query and edit documents, assignments are also accessible by URIs even though they are not
referenced by any rule of the syntax. An assignment c 7→ ω in a view v is referenced by the URI v?c
and similarly for assignments to structures. To access assignments in structures, we distinguish the URIs
g?T ?i/c and g?T /i?c: The former identifies the constant c that is induced by i as defined above; the
latter identifies the assignment c 7→ δ to the symbol c in the structure T ?i. In particular, the lookup of the
former always is always defined, while the lookup of the latter is undefined if no assignment is present.
     Just like symbols induced by structures have URIs, so have assignments induced by assignments to
structures. For example, if additionally R declares a constant d : ρ := ⊥, then the lookup of g?T /i?h/d
yields h/d 7→ (R?d)µ . This means that the assignment h 7→ µ in T ?i induces an assignment to h/d. The
lookup of assignments to induced structures is defined accordingly.
     The above lookup functions are actually modified to accommodate for hiding. If the lookup of a
constant would yield c : τ := δ according to the above definition, but δ is (or simplifies to) >, then the
lookup is actually undefined.
Example 6 (Continued). If Λ is a library containing the three documents with URIs m, f , and e from the
running example, we obtain the following lookup results:
      • Λ(e?monoid) = ( f ?fol, ϑ ) where ϑ contains the declarations for comp and unit,
      • Λ(e?cgroup/mon) = (e?monoid, e?cgroup, e?cgroup?.., ·),
        i.e., e?cgroup/mon is a morphism from e?monoid to e?cgroup with meta-morphism e?cgroup?..,
        and without any assignments,
      • Λe?monoid (unit) = (e?monoid?../i, ⊥),
        i.e., the theory monoid has a constant unit with type e?monoid?../i and no definition,
      • Λe?cgroup (mon/unit) = (e?monoid?../ie?cgroup?mon , ⊥),
        i.e., the type of the virtual constant mon/unit arises by translating the type from the source theory
        along the importing structure,
      • Λe?cgroup/mon (unit) = ⊥,
        i.e., the lookup is undefined because the structure e?cgroup/mon does not have an assignment for
        unit,
      • the lookup Λe?v2 (mon/unit) yields e?integers?0 if the variant with the deep assignment mon/unit 7→
        e?integers?0 is used to define v2, and e?monoid?unite?v1 if the variant with the structure as-
        signment mon 7→ e?v1 is used.

2.3     Concrete Syntax
2.3.1    XML-Encoding
The XML grammar mostly follows the abstract grammar. Documents are omdoc elements with a the-
ory graph, i.e., theory and view elements as children. The children of theories are constant and
structure. And the children of views and structures are maps (assignment to a constant or structure).

                                                       60
An Exchange Format for Modular Knowledge                                                   Rabe, Kohlhase


    Both terms and morphisms are represented by O PEN M ATH elements. And all names of theories
are URIs. The grammar does not account for the well-formedness of objects and names. In particular,
well-formedness is not checked at this level.
    Formally, we define an encoding function E(−) that maps M MT-expressions to sequences of XML
elements. The precise definition of E(−) is given in Fig. 9 and 10 where we assume that the following
namespace bindings are in effect:
xmlns=”http: // www.omdoc.org/ns/omdoc”
xmlns:om=”http: // www.openmath.org/OpenMath”

And we assume the following content dictionary with cdbase http://cds.omdoc.org/omdoc/mmt.
omdoc, which is itself given as an OMD OC theory:
<omdoc>
 <theory name=”mmt”>
   <constant name=”hidden”/>
   <constant name=”identity”/>
   <constant name=”composition”/>
 </theory>
</omdoc>

We abbreviate the OMS elements referring to these symbols by OMDoc(identity) etc.
     The encoding of the structural levels is straightforward. The encoding of objects is a generaliza-
tion of the XML encoding of O PEN M ATH objects. We use the OMS element of O PEN M ATH to refer to
symbols, theories, views, and structures. The symbol with name OMDoc(hidden) is used to encode
the special term >. To encode morphisms, OMDoc(identity) and OMDoc(composition) are used.
OMDoc(identity) takes a theory as an argument and returns a morphism. OMDoc(composition)
takes a list of structures and views as arguments and returns their (left-associative) diagram order com-
position. Morphism application is encoded by reusing the OMA element from O PEN M ATH.
     The encoding of names is giving sepa-
rately in Fig. 8 on the right. There are two OMS-triple E(g?m?s) base=”g” module=”m” name=”s”
different ways to encode names. In OMS el-                   E(g?m)        base=”g” module=”m”
ements, triples (g, m, s) of document, mod- URI              E(g?m?s) g?m?s
ule, and symbol name are used, albeit with                   E(g?m)        g?m
more fitting attribute names. These triples
correspond to the (cdbase, cd, name) triples              Figure 8: XML Encoding of Names
of the O PEN M ATH standard ([BCC 04]).+

We also use them to refer to module level names by omitting the name attribute. When names occur in
attribute values, their URIs are used.


2.3.2   Relative Names

In practice it is very inconvenient to always give qualified names. Therefore, we define relative references
as a relaxation of the syntax that is elaborated into the official syntax.
    A relative reference consists of three optional components: a document reference g, a module ref-
erence m and a symbol reference s. We write relative references as triples (g, m, s) where we write ⊥ if a
component is omitted. g must be a URI reference as defined in RFC 3986 ([BLFM05]) but without query
or fragment. m and s must be unqualified names, i.e., slash-separated non-empty sequences of names.
Furthermore, m ans s may optionally start with a slash, which is used to distinguish absolute module and
symbol references from relative ones.
    An absolute reference, which serves as the base of the resolution, is an M MT-name G, G?M, or
G?M?S. Then the resolution of relative references is a partial function that takes a relative reference

                                                    61
An Exchange Format for Modular Knowledge                                                      Rabe, Kohlhase


           Library      E(Doc1 , . . . , Docn )          E(Doc1 ) . . . E(Docn )
           Document     E(g := {Mod1 , . . . , Modn })   <omdoc>E(Mod1 ) . . . E(Modn )</omdoc>
                              [M]                        <theory name=”T ” [metatheory=”E(M)”]>
           Theory       E(T := {Sym1 , . . . , Symn })    E(Sym1 ) . . . E(Symn )
                                                         </theory>
                                                         <view name=”v” from=”E(S)” to=”E(T )”>
                                       [µ]
                                                          [<metamorphism>E(µ)</metamorphism>]
           View         E(v : S → T := {σ })              E(σ )
                                                         </view>
                                                         <view name=”i” from=”E(S)” to=”E(T )”>
                                                          <definition>
                        E(i : S → T := µ)                    <OMOBJ>E(µ)</OMOBJ>
                                                          </ definition >
                                                         </view>

                       Figure 9: XML Encoding of Document and Module Level


R = (g, m, s) and an absolute reference B as input and returns an M MT-name resolve(B, R) as output. It
is defined as follows:
   • If g 6= ⊥, then possible starting slashes of m and s are ignored and

         – if R = (g, m, s): resolve(B, R) = (G + g)?m?s,
         – if R = (g, m, ⊥): resolve(B, R) = (G + g)?m,
         – if R = (g, ⊥, ⊥): resolve(B, R) = G + g,

      where G + g denotes the resolution of the URI reference g relative to the URI G as defined in RFC
      3986 ([BLFM05]).

   • If g = ⊥ and m 6= ⊥, then a possible starting slash of s is ignored and

         – if R = (⊥, m, s): resolve(B, R) = G?M + m?s,
         – if R = (⊥, m, ⊥): resolve(B, R) = G?M + m,

      where M + m resolves m relative to M: If M is not defined or if m starts with a slash, M + m is m
      with a possible starting slash removed; otherwise, it is M/m.

   • If g = m = ⊥ and M is defined, then resolve(B, R) = G?M?S + s, where S + s is defined like M + m
     above.

   • resolve(B, R) is undefined otherwise.
   Relative references can also be encoded as URIs: The triple (g, m, s) is encoded as g?m?s. If com-
ponents are omitted, they are encoded as the empty string. Trailing (but not leading) ? characters can be
dropped. For example,
   • (g, m, ⊥) is encoded as g?m,

   • (⊥, /m, s) is encoded as ?/m?s,

   • (⊥, ⊥, s) is encoded as ??s,
This encoding can be parsed back uniquely by splitting a URI into up to three components around the
separator ?.

                                                    62
An Exchange Format for Modular Knowledge                                                        Rabe, Kohlhase


                                                             <constant name=”c”>
                                                              [<type>
                                                                 <OMOBJ>E(τ)</OMOBJ>
                                                              </type>]
          Constant       E(c : [τ] := [δ ])                   [<definition>
                                                                 <OMOBJ>E(δ )</OMOBJ>
                                                              </ definition >]
                                                             </constant>
                                                             <structure name=”i” from=”E(S)”>
                                    [µ]
                                                              [<metamorphism>E(µ)</metamorphism>]
          Structure      E(i : S := {σ })                     E(σ )
                                                             </structure>
                                                             <structure name=”i” from=”E(S)”>
                                                              <definition>
                         E(i : S := µ)                           <OMOBJ>E(µ)</OMOBJ>
                                                              </ definition >
                                                             </structure>
          Assignments    E(Ass1 , . . . , Assn )             E(Ass1 ) . . . E(Assn )
                                                             <conass name=”E(c)”>
                         E(c 7→ ω)                            <OMOBJ>E(ω)</OMOBJ>
                                                             </conass>
                                                             <strass name=”E(i)”>
                         E(i 7→ µ)                            <OMOBJ>E(µ)</OMOBJ>
                                                             </ strass >
          Term           E(c)                                <om:OMS E(c)/>
                         E(x)                                <om:OMV name=”x”/>
                         E(>)                                OMDoc(hidden)
                         E(ω µ )                             <om:OMA>E(µ) E(ω)</om:OMA>
                         E(@(ω1 , . . . , ωn ))              <om:OMA>E(ω1 ) . . . E(ωn )</om:OMA>
                                                             <om:OMBIND>
                                                              E(ω1 )
                                                              <om:OMBVAR>
                         E(β (ω1 , x1 , . . . , xn , ω2 ))      E(x1 ) . . . E(xn )
                                                              </om:OMBVAR>
                                                              E(ω2 )
                                                             </om:OMBIND>
                                                             <om:OMATTR>
                                                              <om:OMATP>E(ω2 ) E(ω3 )</om:OMATP>
                         E(α(ω1 , ω2 7→ ω3 ))                 E(ω1 )
                                                             </om:OMATTR>
                                                             <om:OMA>
                                                              OMDoc(identity)
          Morphism       E(idT )                              <om:OMS E(T )/>
                                                             </om:OMA>
                                                             <om:OMA>
                                                              OMDoc(composition)
                         E(µ1 • . . . • µn )                  E(µ1 ) . . . E(µn )
                                                             </om:OMA>
                         E(i)                                <om:OMS E(i)/>
                         E(v)                                <om:OMS E(v)/>

                      Figure 10: XML Encoding of Symbol and Object Level


                                                        63
An Exchange Format for Modular Knowledge                                                       Rabe, Kohlhase


3    Advanced Concepts
In this section, we give an overview over the most important advanced concepts related to M MT. Full
definitions are given in [Rab08].

Well-formed Expressions The well-formedness of expressions is defined by an inference system about
M MT fragments. This system guarantees in particular the uniqueness of all names. The most difficult of
the well-formedness definition is to deal with assignments to structures: In general, an assignment i 7→ µ
to a structure with domain S may cause inconsistencies if a constant S?c has a definition in S that differs
from (S?c)µ . The definition of well-formedness makes sure that such inconsistencies are prevented. And
it does so in a way that can be checked efficiently because the modular structure is exploited throughout
the well-formedness checking.
     M MT does not provide a definition of well-typed terms or of logical consequence. Rather, the M MT
inference system is parametric in the judgments for equality and typing of terms. The definitions of these
judgments must be provided externally. In our M MT implementation, these judgments are defined by
plugins where the meta-theory of the home theory of a term ω determines which plugin is used to check
the well-typedness of ω.

Semantics The representation of theory graphs is geared towards expressing mathematical knowledge
with the least redundancy by using inheritance between theories. This style of writing mathematics has
been cultivated by the Bourbaki group ([Bou74]) and lends itself well to a systematic development of
theories. However, it is often necessary to eliminate the modular structure, for example when interfacing
with a system that cannot understand it or to elaborate modular theories into a non-modular trusted core.
     Therefore, we define a so-called flattening operation that eliminates all structures, meta-theories, and
morphisms, and reduces theories to collections of constants, possibly with types and definitions, which
conforms to the non-modular logical view of theories. For a given M MT-library Λ, we can view the
flattening of Λ as its semantics, since flattening eliminates all specific M MT-representation infrastruc-
ture. Essentially, the flattening replaces all structures with the induced symbols and all assignments to
structures with the induced assignments as described in Sect. 2.2. The crucial invariant of the flattening
is that if a library is well-formed, then so is its flattening; and furthermore, a library and its flattening are
indistinguishable by the lookup function Λ(−). This guarantees that flattening can be performed trans-
parently. In particular, the above-mentioned external definitions of typing and equality are only needed
for the non-modular case.

Applications There are two main use cases for M MT: the use of M MT as a simple interface language
between (i) two logical systems as well as between (ii) a logical system and a knowledge management
service.
    The first use case uses M MT as an interlingua for system interoperability. This works well because
the choice of primitives in M MT is the result of a careful trade-off between simplicity and expressivity.
Thus, M MT provides a simple abstraction over the most important language constructs that are com-
mon to systems maintaining logical libraries, while deliberately avoiding other features that would have
increased the burden on systems reading M MT. Here the logics-as-theories approach provides an ex-
tremely helpful way to document the semantics of the logic-specific concepts and their counterparts in
other logics; furthermore, systems can use the meta-theory relation to detect which theories they can
interpret and how to interpret them.


                                                       64
An Exchange Format for Modular Knowledge                                                      Rabe, Kohlhase


    By knowledge management services, we mean services that can be applied to logical knowledge
but are not primarily logical in nature. Such services include user interaction, concurrent versioning,
management of change, and search. Implementing such services often requires intricate details about
the underlying system, thus severely limiting the set of potential developers. But in fact these services
can often be developed independently of the logical systems and with relatively little logical expertise.
Here M MT comes in as an interface language providing knowledge management services with exactly
the information they need while hiding all logic- and system-specific details. Therefore, for example, the
concrete syntax of M MT fully marks up the term structure, while the well-formedness definition of M MT
does not take type-checking into account.


4    Conclusion
We have presented an exchange format for mathematical knowledge that supports system interoperability
by providing an infrastructure for efficiently re-using representations of mathematical knowledge and for
formalizing foundational assumptions and structures of the underlying logical systems themselves in a
joint web-scalable representation infrastructure.
    We consider it a major achievement of the work reported here that the M MT format integrates the-
oretical aspects like the syntax/semantics interface of the module systems or meta-logical aspects with
practical considerations like achieving web-scalability via a URI-based referencing scheme that is well-
integrated with the module system.
    The proposed M MT format takes great care to find a minimal set of primitives to keep the representa-
tional core of the language small and therefore manageable but expressive enough to cover the semantic
essentials of current (semi)-automated theorem provers, model checkers, computer algebra systems, con-
straint solvers, concept classifiers and mathematical knowledge management systems.
    We are currently evaluating the M MT format and in particular are developing translators from the
Isabelle [Pau94] and CASL [CoF04] formats into M MT. Both representation formats have strong module
systems and well-specified underlying semantics, which make them prime evaluation targets. We are also
integrating the M MT infrastructure into the upcoming version 2 of the OMD OC format [Koh06].


References
[ABC+ 03] Ron Ausbrooks, Stephen Buswell, David Carlisle, Stéphane Dalmas, Stan Devitt, Angel Diaz, Max
          Froumentin, Roger Hunter, Patrick Ion, Michael Kohlhase, Robert Miner, Nico Poppelier, Bruce
          Smith, Neil Soiffer, Robert Sutor, and Stephen Watt. Mathematical Markup Language (MathML)
          version 2.0 (second edition). W3C recommendation, World Wide Web Consortium, 2003.
     +
[BCC 04] S. Buswell, O. Caprotti, D. Carlisle, M. Dewar, M. Gaetano, and M. Kohlhase. The Open Math
          Standard, Version 2.0. Technical report, The Open Math Society, 2004. See http://www.openmath.
          org/standard/om20.
[BLFM05] Tim Berners-Lee, Roy. Fielding, and L. Masinter. Uniform resource identifier (URI): Generic syntax.
          RFC 3986, Internt Engineering Task Force, 2005.
[Bou74]   N. Bourbaki. Algebra I. Elements of Mathematics. Springer, 1974.
[CoF04]   CoFI (The Common Framework Initiative). C ASL Reference Manual. LNCS 2960 (IFIP Series).
          Springer, 2004.
[Fie00]   R. Fielding. Architectural Styles and the Design of Network-based Software Architectures. PhD thesis,
          University of California, Irvine, 2000.
[Koh06]   Michael Kohlhase. OMD OC – An open markup format for mathematical documents [Version 1.2].
          Number 4180 in LNAI. Springer Verlag, 2006.
[Pau94]   Lawrence C. Paulson. Isabelle: A Generic Theorem Prover. LNCS. Springer Verlag, 1994.


                                                      65
An Exchange Format for Modular Knowledge                                                   Rabe, Kohlhase


[Rab08]   F. Rabe. Representing Logics and Logic Translations. PhD thesis, Jacobs University Bremen, 2008.


                                                   66