<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mathematical Grammar Library: from OpenMath to natural languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olga Caprotti</string-name>
          <email>caprotti@chalmers.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jordi Saludes</string-name>
          <email>jordi.saludes@upc.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chalmers and University of Gothenburg</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universitat Politecnica de Catalunya</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Since 2005 we have been developing a software library for rendering, reading, and translating mathematical expressions, either expressed using formal languages such as OpenMath or LATEX, or in a number of natural languages. The work begun with the WebALT [1] project as a way to serve mathematical exercises in the native language of the student: in fact the library can be used to generate natural language descriptions of formally encoded mathematical expressions with no loss of meaning. The applications of this technology, coming from the area of grammarbased machine translation are related to the possibility of parsing and generating high quality representations of mathematics. In this short paper we concentrate on few technical details that made the work interesting from the linguistic point of view. Therefore we introduce the computational linguistic software used as backbone to the work, called Grammatical Framework, and proceed with the presentation of the mathematical library, its organization and modular design. We then discuss some examples that required careful thought.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>1.1</p>
      <p>
        The Grammatical Framework
The Grammatical Framework (GF) is a type theoretic programming language for writing
grammars for multiple languages at once [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Multilingual applications use an interlingua: the
semantics of an expression in natural language that should be rendered or translated is captured
in an abstract tree, which is its language-independent representation. As it turns out, the
abstract tree representation is also a natural representation of mathematical expressions, one that
is also akin to the OpenMath abstract objects.
      </p>
      <p>These trees are described by an abstract grammar de ning what is possible to express in
the speci c application, whilst the concrete grammars (one for each language) de ne how the
abstract meaning is converted to the given language. Once an abstract grammar is given, to add
yet another language to the application amounts to adding a new concrete grammar. Ideally,
if a concrete grammar for a language in the same linguistic group is already available, the
grammar for the new language is almost an exact copy of the existing grammar, modulo some
lexicon adaptations. GF hides all linguistic details of a speci c language from the programmer
in a low-level resource grammar library, so that in principle a domain expert is able to develop
new languages for a given application. Details of the GF Grammar Library, including language
coverage, are online1.</p>
      <p>A GF abstract grammar de nes how expressions in given categories are combined. An
example tree in the Mathematical Grammar Library (MGL) looks as follows:
m k P r o p
( l t _ n u m ( abs ( plus ( B a s e V a l N u m ( V a r 2 N u m x ) ( V a r 2 N u m y ) ) ) )
( plus ( B a s e V a l N u m ( abs ( V a r 2 N u m x ) ) ( abs ( V a r 2 N u m y ) ) ) ) )
When linearized with the English and Spanish concrete grammars, it yields the natural
language expressions2:
the absolute value of the sum of x and y is less than the sum of the absolute value
of x and the absolute value of y
el valor absoluto de la suma de x e y es menor que la suma del valor absoluto de x
y el valor absoluto de y</p>
      <p>As mentioned above, the abstract tree is not far from the OpenMath expression. The
linguistic function mkProp wraps the wording produced by the subexpressions. In terms of
computational linguistic technology, this approach di ers from the standard statistical based
approaches, such as Google Translate, in that it can generate high quality translations for
arbitrarly deep nesting of subexpressions, as opposed to being limited by n-grams distance.</p>
      <p>The number of categories on a GF application is a trade-o between how much ambiguity
is tolerable and the expressiveness of the whole system. The de ned categoriesin the MGL
are Value X, and Variable X where X is a Number, a Function, a Set or a Tensor (namely
vectors or matrices). The actual version of the library implements these by de ning a xed
category for each combination fVariable; Valueg fNumber; Set; Function; Tensorg. Thus, for
instance, VarNum = Variable Number and ValSet = Value Set. Other categories stand for
propositions, geometric constructions and indexes.</p>
      <p>Each abstract category corresponds to a linguistic category in a concrete grammar of a
speci c language. Usually a Value points to a noun phrase and a Variable to a string. More
complex expressions, those combining categories, correspond in a natural way to linguistic
entities composed from these elements: propositions are mapped into clauses with grammatical
polarity, operations to sentences and simple exercises to texts.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Mathematical Grammar Library</title>
      <p>The library can be organized in a matrix, where the horizontal axis runs over the languages
while the vertical axis covers layers of complexity of mathematical expressions.</p>
      <p>
        At present the languages are: Bulgarian, Catalan, English, Finnish, French, German, Hindi,
Italian, Polish, Romanian, Russian, Spanish, Swedish and Urdu. As a proof of concept, it
includes also a couple of computer software languages which are relevant to mathematics, namely
LATEX and Sage [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>The vertical axis runs over three layers of increasing complexity:</p>
      <sec id="sec-2-1">
        <title>1. Ground: literals, indexes and variables</title>
        <p>2. OpenMath: modeled after the following Content Dictionaries, considered useful for
expressing the mathematical fragments at the time of the WebALT project:
2Notice the special form of the conjunction \x e y": The usual Spanish conjunction \y" must be changed for
euphony before a vowel that sounds alike. It is automatically taken care by the GF Spanish resource grammar.
• arith1, arith2, complex1, integer1, integer2, logic1, nums1, quant1, relation1,
rounding1;
• calculus1, fns1, fns2, interval1, limit1, transc1, veccalc1;
• linalg1, linalg2;
• minmax1, plangeo1, s data1, set1, setname1.
3. Operations: takes care of simple mathematical exercises. These appear in drilling
exercises and usually begin with directives such as `Compute', `Find', `Prove', `Give an
example of', etc.</p>
        <p>
          Objects in the OpenMath standard [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] relate to GF types, namely each symbol in a Content
Dictionary (CD) roughly corresponds to a production of the same name in a GF module named
after that CD. Application of functions to numbers are expressed by the production At that
takes a Value Function and a Value Number and return a Value Number. More examples are
in the table 1.
        </p>
        <p>Following the lines of the Small Type System [4, principle 4], we imposed that binary
associative functions take a list of values and return a value of the same kind. For example, plus
in arith1 has signature plus : [ValNum] ! ValNum, while the category [ValNum] (meaning
a list of numeric values) is declared to take at least two values. Therefore is impossible by
construction to add a single number (i. e. \the sum of 3").
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Linguistic peculiarities</title>
      <p>Some interesting points on the implementation are related to language speci cs. For example,
the simple exercise that asks for computing a numeric value 3:
D o C o m p u t e N C o m p u t e V</p>
      <p>( d e t e r m i n a n t ( V a r 2 T e n s o r M ) )
gives in English:</p>
      <sec id="sec-3-1">
        <title>Compute the determinant of M .</title>
        <p>This pattern is shared in most of the languages, so it got abstracted into an incomplete concrete
grammar le OperationsI. From this module, one can get OperationsL for language L simply
by speci ying the lexicon and paradigms modules for this L, in a similar way a function is applied
to its arguments. But in French is impolite to use an imperative in this case; Therefore the
module OperationsFre should re-implement this production in a speci c way.</p>
        <p>Another point worth mentioning is function application. Notice the di erent forms:
• \the cosine of 3"
• \f at 3"
• \the derivative of the sine at 3"
• \x to the cosine of x where x is 3"
They are all mathematically equivalent but di er in structure: in the rst case, the function
being applied is a named symbol (the cosine) while in the last one is a -abstraction. In the
other cases, it is a function variable or it comes from a functional operator.</p>
        <p>3determinant belongs to the OpenMath layer of the library and Var2Tensor makes a value out of a variable.
DoComputeN denotes an exercise asking to compute a number, while ComputeV gives ner control on which verb
to use to denote computation (`to compute' in this case).</p>
      </sec>
      <sec id="sec-3-2">
        <title>OpenMath Symbol name in CD Integer n</title>
      </sec>
      <sec id="sec-3-3">
        <title>Variable name Application of a on b Binding z app Attribution, Error, Bytearray</title>
        <p>GF
name in module CD
n converted to Value from prede ned type Int in
module Literals
name in category Variable X
a b
lambda z app, where z is a Variable and app a Value.
Not supported</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Applications and future development</title>
      <p>
        The library is publicly available at the MOLTO repository [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and is documented at [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. It is
being used in the mathbar demo in the MOLTO project [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] accessible from [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. An example of
natural language interaction with a computer algebra system can be retrieved from the sage
directory of the library distribution and has been recently presented at [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>For the future, the library needs to grow in breadth and shape: at this moment, it is
systematically tested for three languages but depends on domain experts native speakers to
polish the remaining ones.</p>
      <p>
        Integration of natural language productions and formulas is also prominent in the TODO
list. This is a variegated issue as [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] shows, but it is necessary for uent mathematics in
applications. Also more natural renderings of logical propositions [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] will open the door to
usage in automatic reasoners and theorem provers.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] http://webalt.math.helsinki.fi/content/index_eng.
          <source>html Last viewed June</source>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] http://www.sagemath.org/ Last viewed May
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ranta</surname>
          </string-name>
          , \
          <article-title>Grammatical Framework: programming with Multilingual Grammars," CSLI Studies in Computational Linguistics</article-title>
          , Standford,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Davenport</surname>
          </string-name>
          , \
          <article-title>A small OpenMath type system,"</article-title>
          <source>ACM SIGSAM Bulletin</source>
          , vol.
          <volume>34</volume>
          , no.
          <issue>2</issue>
          , pp.
          <volume>16</volume>
          {
          <issue>21</issue>
          ,
          <string-name>
            <surname>Jun</surname>
          </string-name>
          .
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>OpenMath</given-names>
            <surname>Consortium</surname>
          </string-name>
          , \
          <article-title>The OpenMath Standard,"</article-title>
          <source>OpenMath Deliverable</source>
          , vol.
          <volume>1</volume>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] The MOLTO project</article-title>
          . http://www.molto-project.eu/ Last viewed May
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[7] The mathematical library svn://molto-project.eu/mgl.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Saludes</surname>
          </string-name>
          et al.,
          <article-title>\Simple drill grammar library," http://www.molto-project</article-title>
          .eu/sites/ default/files/d61.pdf.
          <source>Last viewed May</source>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] Grammatical framework demos</article-title>
          . http://www.grammaticalframework.org/demos/index. html.
          <source>Last viewed May</source>
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Mohan</surname>
            <given-names>Ganesalingam,</given-names>
          </string-name>
          \
          <source>The Language of Mathematics." PhD thesis</source>
          , Cambridge University,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ranta</surname>
          </string-name>
          , \
          <article-title>Translating between language and logic: what is easy and what is di cult."</article-title>
          <source>Automated Deduction, CADE-23</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Dominique</surname>
            <given-names>Archambault</given-names>
          </string-name>
          , Olga Caprotti,
          <article-title>Aarne Ranta and Jordi Saludes, \Using GF in multimodal assistants for mathematics." Digitization and</article-title>
          E-Inclusion
          <source>in Mathematics and Science</source>
          <year>2012</year>
          , Tokyo, Japan.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>