Introduction

0 Computer Science, Jacobs University , Bremen , Germany

Mmt is a mathematical knowledge representation language, whose object layer is strongly inspired by OpenMath. In fact, the rst version of the Mmt system implemented exactly OpenMath objects. But over time Mmt has evolved and deviated from OpenMath in some respects. Some of these deviations are experimental or speci c to Mmt applications, but others may be interesting for future versions of OpenMath. This papers presents Mmt objects and discusses the di erences to OpenMath objects.

Introduction

Mmt [RK13] is a formal knowledge representation language following the OMDoc approach [Koh06]. It focuses on the foundation-independent and modular representation of formal mathematical content. Like OMDoc, it subsumes OpenMath as the representation language for formal mathematical objects.

The Mmt system [Rab13] coherently implements the Mmt data structures along with various logical services (e.g., type reconstruction, simpli cation), knowledge management services (e.g., presentation, search), and applications (e.g., IDE, library browser).

Notably, the Mmt language and system are generic in the sense that they can be easily instantiated with di erent formal languages such as logics or type theories.

Over 7 years of development, the implementation of objects in Mmt has deviated from OpenMath in several ways. The main motivation was to reduce the number of productions in the grammar. This greatly simpli es the development of inductive algorithms and is a key enabling factor to keep the implementation manageable.

In this paper, we present and discuss the di erences between OpenMath and Mmt objects. We start by presenting the two grammars in Sect. 2. Then we discuss individual di erences in Sect. 4, 5, 6. Mmt also provides a more precise de nition of variable scope and substitution than OpenMath, which we describe in Sect. 3.

Overview

Symbols and Variables We write c for a symbol reference (cdbase; cd; name). This is a triple of a URI cdbase and two names of a content dictionary cd and a symbol name de ned in cd.

We write x for a variable. This is just a name.

OpenMath Objects

grammar:

OpenMath objects O are formed by the following O KV ::= j j ::=

I(i) j F (f ) j S(s) j BA(b) c j x A(O; O ) j AT T (O; KV ) j B(O; AT T (x; KV ) ; O) j E (c; O ) c 7! O where is (possibly empty) repetition.

There are 10 di erent productions for objects: 4 literals: integers I(i), oating point numbers F (f ), strings S(s), and byte arrays BA(b), 2 name references to symbols (named objects in a global namespace) and variables (named objects in the local context), 4 complex objects: { application A(O; O1; : : : ; On) of a function to arguments, { attribution AT T (O; KV1; : : : ; KVn) of a key-value list to an object, { binding B(B; V1; : : : ; Vn; S) of a binder B with attributed variables

Vi (which are bound in S) and scope S, { errors E (c; O1; : : : ; On) formed from applying a symbol to arguments.

In the grammar above, we omitted foreign objects. These are arbitrary nonOpenMath data, which may occur as the arguments of an error and the values of a key-value pair.

MMT Objects

Mmt objects E are formed by the following grammar:

E ::= j j ::= ::=

Lc(s) c j x c( ; ; E ) (x[: E][= E]) (x = E) There are 4 di erent productions for objects:

Literals Lc(s) consists of a symbol c identifying the type and a string encoding s of the value.

Symbols c and variables x are as for OpenMath.

Complex objects c( ; ; E1; : : : ; En) are formed from a head symbol c, a substitution , a context (whose variables are bound in the Ei), and a list of arguments Ei.

In addition to objects, Mmt de nes two additional concepts: contexts and substitutions.

Contexts are lists of variable declarations x[: T ][= D] where the type T and the de niens D are optional. Substitutions are lists of assignments x = E. 3

Scoping

We will use the notation Xi for an Mmt variable declaration xi[: Ti][= Di] Objects in Context OpenMath does not clarify whether bound variables may occur in attributions of other bound variables of the same binding.

Mmt formalizes this as follows. A variable x is free in E if it occurs in E without being bound. And in c( ; X1; : : : ; Xm; E1; : : : ; En), any occurrence of xi in Ti+1; Di+1; : : : ; Tm; Dm; E1; : : : ; En is bound.

In particular, Mmt allows every bound variable to occur in the type and de niens of later variables of the same binding. Note that this includes the degenerate case of a context x; x : x, which declares two variables of the same name with the rst one occurring in the type of the second one.

A context = X1; : : : ; Xm is well-formed if every Ti and Di (if given) is well-formed in context X1; : : : ; Xi 1. And an object E is well-formed in context if all free variables of E are declared in .

We further de ne that if xi = xj = x for i < j, i.e., multiple variables in have the same name, then all free occurrences of x in E are occurrences of xj . Intuitively, the declaration of xj shadows the one xi. This is irrelevant for well-formedness but matters when looking up the type or de niens of x in . Substitution Well-formedness relative to a context permits a formal de nition of substitution. of the form x1 = E1; : : : ; xm = Em is a well-formed substitution from = X1; : : : ; Xm to 0 if every Ei is well-formed in context 0.

In that case, if E is well-formed in context , we write E[ ] for the object arising from replacing every free occurrence of xi in E with Ei. Then E[ ] is well-formed in context 0.

As usual, we assume substitution to be capture-avoiding. This, however, means that E[ ] is under-speci ed because the necessary -renaming of the bound variables of E is not speci ed. The Mmt implementation -renames bound variables only when necessary and then does so by appending distinguished characters to the name. 4

Attributions

Attributed Variables Mmt contexts can be seen as a list of attributed variables restricted to 2 distinguished keys for type and de niens.

Let type and def be special symbols. If we have the correspondence O '

E and

O0 '

E0 between OpenMath and Mmt objects, then

AT T (x; [type 7! O]; [def 7! O0]) ' and accordingly for lists of attributed OpenMath variables corresponding to Mmt contexts.

E ectively, Mmt allows only 2 distinguished keys when attributing variables. This is a major limitation compared to OpenMath, which allows any symbol as a key. However, to the author's knowledge, type and de niens attributions account for the vast majority of variable attributions in practice.

At the same time, building type and de niens into the grammar explicitly tremendously simpli es the grammar and arguably makes the treatment of bound variables more intuitive.

Attribution Objects Mmt does not allow for attribution objects other than the limited variable declarations from above.

However, OpenMath objects like AT T (O; k 7! V ) for a semantic attribution key k can often be represented as A(k; O; V ). Therefore, this might not be a critical limitation in practice.

Ignorable Attributions The above does not cover non-semantic (i.e., ignorable) attributions.

However, the Mmt implementation anyway permits attaching metadata to any object (including subobjects). Mmt metadata is orthogonal to the syntax of objects and includes URI-object pairs (which subsumes OpenMath attributions) as well as RDF-style typed links and tags.

Such extra-linguistic attributions may be preferable in practice because they make it much easier for applications to ignore the ignorable attributions (e.g., during pattern-matching). 5

Complex Ob jects

Mmt uses only one constructor for complex objects. The basic idea is that every inner node of an Mmt syntax tree is labeled with a symbol, which de nes the meaning of the subtree.

In particular, the syntactic constraints and intended semantics of a complex Mmt object are completely relegated to the head symbol (i.e., speci ed in the content dictionary in which the head symbol is declared). Note that this is similar to OpenMath binding and attribution objects, but di erent from OpenMath application objects (for which OpenMath does not prescribe but strongly suggests function application as the intended semantics).

Complex Mmt objects subsume most OpenMath application and binding and all error objects as follows. If

Oi '

Vj '

Xj ; and then

A(c; O1; : : : ; On) B(c; V1; : : : ; Vm; O1) ' ' c( ; ; E1; : : : ; En) c( ; X1; : : : ; Xn; E1) E (c; O1; : : : ; On)

' Here we write for the empty substitution, context, or argument list. c( ; ; E1; : : : ; En) 5.1

Limitations

A major limitation of Mmt objects is that the rst child of application and binding objects is assumed to be a symbol.

Application Objects This limitation is quite severe for application objects, where it excludes objects A(O; O1; : : : ; On) where O is, e.g., a variable or a -abstraction.

In the setting of Mmt, this is acceptable because Mmt assumes that such application objects are anyway only meaningful in the presence of a formal language that de nes the meaning of functions, e.g., a -calculus.

Representations of -calculi in Mmt are most natural if they provide three symbols arrow, lambda, and apply. In that case, application objects with complex functions can be written in Mmt as apply( ; ; E; E1; : : : ; Em).

However, for OpenMath in general, it is much less clear whether such a limitation would be reasonable.

Binding Objects The limitation is less severe for objects B(O; V1; : : : ; Vm; O1) because O is usually a symbol anyway.

Occasionally, it is useful to allow binders where O = A(c; o1; : : : ; ol), e.g., for a de nite integral Rab with O = A(R ; a; b). In Mmt, these arguments can be represented by using the substitution as in

B(A(R ; a; b); x; f (x)) '

R (from = a; to = b; x; f (x)) Arguably, the Mmt representation is even more natural than the OpenMath representation in this example because it makes all arguments available at the toplevel of the object.

Error Objects The rst child of an OpenMath error object is a symbol anyway so that the representation in Mmt does not limit expressivity.

However, it has the e ect that applications and errors are represented in the same way. This may be seen as a limitation because the syntactical distinction between error and application objects is lost. For example, Mmt makes it harder to spot errors and to separate errors from the mathematically meaningful objects.

On the other hand, the syntactic distinction employed by OpenMath can also cause additional work. For example, in many programming languages exceptions are normal values that can, e.g., be passed as arguments to functions.

In the setting of Mmt, more emphasis is placed on the head symbol of a complex object anyway. Thus, it remains reasonably easy to distinguish errors from non-error objects by inspecting the type or role of the head symbol. 5.2

Generalizations

Complex Mmt objects are more general than OpenMath objects in 2 ways: by allowing 0 or more scopes in a binder and by adding the substitution . Scopes The generalization to 0 or more scopes in binding objects has already been proposed for future versions of OpenMath in [DK09, Hel13]. Substitution There are various situations where the substitution is useful. For example, it can be used to represent rei ed substitutions (e.g., in an explicit substitution calculus), record values (which OpenMath does not support naturally), or named arguments (instead or in addition to unnamed positional arguments as used in OpenMath application objects).

Examples The following gives example uses for the various combinations of substitution, context, and argument list in an complex Mmt object: None: Objects of the form c( ; ; ) can represent the application of a function to 0 arguments. For example, +( ; ; ) is the empty sum (which simpli es to 0). This case is the reason why Mmt does not identify c and c( ; ; ) (even though that would save one more production in the grammar). Substitution: Objects of the form c( ; ; ) can represent record values. Context: Objects of the form c( ; ; ) can represent record types, classes, or dependent products.

Argument list: Objects of the form c( ; ; E~ ) can represent applications in the usual way.

Substitution+Context: Objects of the form c( ; ; ) can represent matching pairs of a context and a substitution out of it.

Substitution+Argument list: Objects of the form c( ; ; E~ ) can represent applications with named arguments and unnamed positional arguments E~ .

Context+Argument list: Objects of the form c( ; ; E~ ) can represent bindings (possibly with multiple scopes) in the usual way.

Substitution+Context+Argument list: Objects of the form c( ; ; E~ ) can represent bindings where the binder takes additional parameters, e.g., as in the integral example above. Further Generalizations Some of the above examples would in fact be even more natural in the absence of -conversion. It may be interesting to investigate whether -conversion could be made optional in languages like OpenMath or Mmt.

Another possible generalization is to allow the cases of the substitution , the declarations in , and the arguments in E~ to occur mixed. The scoping rule would be that every variable is bound in anything to the right of it. 6

Literals

Literals present a challenge to knowledge representation languages like OpenMath because there is no good canonical choice which xed set of literals to build into the language. OpenMath somewhat arbitrarily xes 4 types of literals. Thus, it lacks, e.g., bounded or arbitrary-base integer literals or character literals (not to mention less mathematical types like URIs, dates, or colors).

Mmt merges all literals into a single production and keeps the set of literals extensible by content dictionaries. This simpli es the language tremendously while making it more general.

Every literal Lc(l) value occurs as a string l in the concrete syntax. The symbol c has two functions. Firstly, it de nes the mappings between the string l and the actual values. This information should be speci ed in the content dictionary that de nes c. Secondly, it is the type of Lc(l) in the presence of a type system.

For example, by using four xed symbols OMI, OMF, OMSTRING, and OMB, we can recover OpenMath literals as

I(i) F (f )

BA(b) S(s) ' ' ' '

LOMI(i0) LOMF(f 0)

LOMB(b0) OMSTRING(s0) where l0 denotes an appropriate string-encoding of the literal value l.

Note that we could already use Z instead of OMI, but there seem to be no corresponding symbols for the other 3 literal types in existing content dictionaries.

[DK09]

Davenport and

Kohlhase . Unifying Math Ontologies: A Tale of Two Standards . In J. Carette,

Dixon ,

Sacerdoti Coen , and S. Watt, editors, Intelligent Computer Mathematics , pages 263 { 278 . Springer, 2009 .

[Hel13]

Hellstr om. Quanti ers and n-ary binders: an OpenMath standard enhancement proposal . In C. Lange,

Aspinall ,

Carette ,

Davenport ,

Kohlhase ,

Libbrecht ,

Quaresma ,

Rabe ,

Sojka , I. Whiteside , and W. Windsteiger, editors, Joint Proceedings of the MathUI , OpenMath, PLMMS, and ThEdu Workshops and Work in Progress at CICM. CEUR-WS .org, 2013 .

[Koh06]

Kohlhase . OMDoc: An Open Markup Format for Mathematical Documents (Version 1.2) . Number 4180 in Lecture Notes in Arti cial Intelligence . Springer, 2006 .

[Rab13]

Rabe . The MMT API: A Generic MKM System . In J. Carette,

Aspinall ,

Lange ,

Sojka , and W. Windsteiger, editors, Intelligent Computer Mathematics , pages 339 { 343 . Springer, 2013 .

[RK13]

Rabe and

Kohlhase . A Scalable Module System . Information and Computation , 230 ( 1 ):1{ 54 , 2013 .