<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pattern matching for mathematical expressions with OpenMath</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ken Wenzel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer-Institute for Machine Tools and Forming Technology IWU</institution>
          ,
          <addr-line>Chemnitz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>An OpenMath Content Dictionary with symbols for pattern matching of tree-like structures is presented. Furthermore, a mapping to RDF and SPARQL is introduced that allows to execute search queries against an OpenMath RDF representation. Structural search on mathematical expressions can be useful in the field of mathematical knowledge management, in diferent engineering disciplines and also in modeling and simulation environments, e.g., for refactoring of existing formulas. Searching tree-like structures is usually not possible with common text-based search engines, so specialized systems were developed in the past. One of these systems is MathWebSearch that implements formula search by using substition tree indexing [1, 2, 3]. Another system is Sewelis, a tool for guided exploration and editing of RDF graphs, that was extended with query operators for tree-like structures to enable search in mathematical expressions [4]. Both systems use some kind of pattern language to express search queries. MathWebSearch uses Content MathML as internal query language which is extended by additional tags (mq:query, mq:and, mq:or and mq:not) and attributes (mq:target, mq:generic, mq:anyorder and mq:anycount). &lt;math xmlns="http://www.w3.org/1998/Math/MathML"&gt; &lt;apply&gt;&lt;csymbol cd="calculus1"&gt;int&lt;/csymbol&gt; &lt;bind&gt;&lt;csymbol cd="fns1"&gt;lambda&lt;/csymbol&gt; &lt;bvar&gt;&lt;ci mq:generic="x"&gt;x&lt;/ci&gt;&lt;/bvar&gt; &lt;apply&gt;&lt;csymbol cd="arith1"&gt;power&lt;/csymbol&gt; &lt;ci mq:generic="x"&gt;x&lt;/ci&gt;&lt;cn type="integer"&gt;0&lt;/cn&gt; &lt;/apply&gt; &lt;/bind&gt; &lt;/apply&gt; &lt;/math&gt;</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Sewelis defines an extended Turtle [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and SPARQL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] syntax to express patterns for
mathematical expressions including placeholders, containment relationships and complex filters as
supported by SPARQL.
      </p>
      <p>SELECT ?e WHERE {
?e is ...math:Power(?x,2)... .</p>
      <p>math:Integral(?e,?x) .
}</p>
      <p>Both approaches use some kind of mathematical notation and combine it with boolean,
binding or more complex operators, for example, to match containment relationships. While
the resulting query languages are powerful, they use proprietary extensions for MathML or
SPARQL to express patterns and search operations.</p>
      <p>OpenMath and its extension mechanism allows to define mathematical symbols for the
representation of search patterns as mathematical expressions. This makes it possible to
exchange patterns as normal objects with OpenMath or Strict Content MathML.</p>
      <p>The following section introduces the newly developed OpenMath Content Dictionary that
defines a basic set of operators to represent search patterns as mathematical objects.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The patterns Content Dictionary</title>
      <p>The patterns Content Dictionary defines symbols to encode simple queries for mathematical
objects with OpenMath. The symbols and their proposed textual representations (extended
POPCORN syntax) are shown in Table 1. Most symbols are application symbols and can be
used in the form
(1, . . . , )
(1)
where  is the particular symbol and 1, . . . ,  are one or more OpenMath objects that may
also recursively contain patterns. The only exceptions are the symbol any, that can be used as
a constant to match an arbitrary expression, and the attribution symbol bind for binding of
result values to named variables.</p>
      <p>By using symbols of the patterns Content Dictionary, for example, the following pattern can
be defined:
patterns:root(patterns:descendant(</p>
      <p>patterns:any_of(arith1:sum, arith1:product)
))
This pattern finds mathematical expressions that contain sums or products at any nesting level.
The rather long names of the symbols (like descendant or any_of) hinder the readability of
such expressions. Therefore we propose shorthand notations as defined by Table 1 that would
allow to express the pattern as:</p>
      <p>Symbol
all_of
any
any_of
argument
descendant
none_of</p>
      <p>root
self_or_descendant
.&amp;
?
.|
.,
..+
.!
.</p>
      <p>...</p>
      <sec id="sec-2-1">
        <title>Matches if all of the given patterns are satisfied.</title>
      </sec>
      <sec id="sec-2-2">
        <title>Matches an arbitrary expression.</title>
      </sec>
      <sec id="sec-2-3">
        <title>Matches if at least one of the given patterns is satisfied.</title>
      </sec>
      <sec id="sec-2-4">
        <title>Matches argument lists that satisfy all of the given patterns.</title>
      </sec>
      <sec id="sec-2-5">
        <title>Matches subexpressions at any hierarchy level below an expression that satisfy all of the given patterns.</title>
      </sec>
      <sec id="sec-2-6">
        <title>Matches if none of the given patterns is satisfied.</title>
      </sec>
      <sec id="sec-2-7">
        <title>Matches expressions that are not part of any other expression and that satisfy all of the given patterns.</title>
      </sec>
      <sec id="sec-2-8">
        <title>Matches the expression or any subexpression that satisfy all of the given patterns.</title>
      </sec>
      <sec id="sec-2-9">
        <title>Binds values matched by the annotated patterns to a result</title>
        <p>variable. When used with an Openmath variable the value of
bind is ignored, the variable is interpreted as placeholder and
the target variable name is used for binding of result values.</p>
      </sec>
      <sec id="sec-2-10">
        <title>Multiple occurrences of a named variable defined with bind must match an equal expression.</title>
        <p>bind</p>
        <p>bind</p>
        <p>Patterns with named placeholders as used by the examples given in the introduction can be
expressed as follows:
patterns:root(
calculus1:int($x{patterns:bind -&gt; 1} -&gt;</p>
        <p>patterns:self_or_descendant($x{patterns:bind -&gt; 1}^2)
)</p>
        <p>)
A possible short form of the previous pattern could be:
.^(calculus1:int(?x -&gt; ...(?x^2)))
The term ?x is the short form of $x{patterns:bind -&gt; 1} and represents a placeholder
that matches variables with arbitrary names.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Query execution</title>
      <p>If the OpenMath objects are encoded in OpenMath-RDF then the query language SPARQL
can be used to evaluate the patterns. This is similar to the approach followed by Sewelis. The
transformation of the patterns to SPARQL reuses the transformation operator  that translates
OpenMath-XML to OpenMath-RDF. This approach is possible since SPARQL uses the Turtle
syntax for graph patterns whereby OpenMath-RDF can be directly represented in SPARQL.</p>
      <p>Table 2 shows additional rules that are necessary to support all patterns as defined in Table 1.
The SPARQL representation uses extended functions of the query language like complex path
expressions and filters to traverse parent-child relationsships as well as argument lists.</p>
      <p>By using shorthands for the symbols the previous example search pattern for sums and
products can be compactly represented as:
.^(..+(.|(sum, product)))</p>
      <p>This expression can be translated to SPARQL by applying the rules from Table 2. The result
is the following SPARQL query:</p>
      <p>SELECT ?result WHERE {
{</p>
      <p>?n0 (&lt;&gt;)? &lt;http://www.openmath.org/cd/arith1#sum&gt; .
} UNION {</p>
      <p>?n0 (&lt;&gt;)? &lt;http://www.openmath.org/cd/arith1#product&gt; .
}
?result (math:arguments|math:symbol|...|rdf:rest)+ ?n0 .</p>
      <p>FILTER NOT EXISTS {</p>
      <p>[] math:arguments|math:symbol|...|rdf:rest ?result .
}</p>
      <p>}
The generated query represents the alternative (.|) between the symbols sum and product
by using the UNION set operator. Furthermore, path expressions are used to determine the
descendants (..+) and to filter any non-root nodes .^ with the FILTER NOT EXISTS SPARQL
operator. As can be seen, the OpenMath-based representation of the search pattern is
considerably more compact and understandable as the equivalent SPARQL query.</p>
      <p>Besides the unpredictable execution time for complex search patterns another limitation
should be considered if OpenMath-RDF and SPARQL are used to execute search queries: SPARQL
uses names (URIs or blank node identifiers) to compare nodes of an RDF graph and not their
structural equality. Therefore it is necessary to preprocess the RDF data if patterns with named
variables, that use the patterns:bind annotation, should be matched via SPARQL. A possible
algorithm is the Universal RDF Dataset Canonicalization Algorithm 20151 that generates stable
content-based identifiers for blank (anonymous) nodes of an RDF graph. After applying the
algorithm blank nodes of an RDF graph have the same name if they recursively share the same
properties with the same values. This allows to use a simple comparison of node names to
reason about their structural equality.
1https://json-ld.github.io/rdf-dataset-canonicalization/spec/ (09.07.2021)</p>
      <p>For verifying the functionality of the developed Content Dictionary a prototypical web
application was created that allows to execute search queries on available OpenMath Content
Dictionaries. A screenshot of the application is shown in Figure 1.</p>
      <p>The upper part contains an input field for the search pattern. This pattern is parsed into
an OpenMath-XML representation and then converted to SPARQL by applying the rules from
Example(This represents the summation of the reciprocals of all the integers between 1 and 10 inclusive., ∑x1=01 x1)
meta:Example("This represents the summation of the reciprocals of all the integers between\n 1 and 10 inclusive.", su
m(interval1:integer_interval(1, 10), $x -&gt; 1 / $x))</p>
      <p>n
Bell(n) = ∑Stirling2(n, k)</p>
      <p>k=0
combinat1:Bell($n) = sum(interval1:integer_interval(alg1:zero, $n), $k -&gt; combinat1:Stirling2($n, $k))

Examples SPARQL
describe</p>
      <p>XML
describe</p>
      <p>XML
describe</p>
      <p>XML
Stirling1(n, m) = n∑−m −1k × (nn−− m1++kk) × (n2−nm− −mk) × Stirling2(n, m)</p>
      <p>k=0
combinat1:Stirling1($n, $m) = sum(interval1:integer_interval(alg1:zero, $n - $m), $k -&gt; -(alg1:one) ^ $k * binomial
($n - alg1:one + $k, $n - $m + $k) * binomial(2 * $n - $m, ($n - $m) - $k) * combinat1:Stirling2($n, $m))</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>A Content Dictionary to express search patterns for mathematical expressions with OpenMath
was presented. This enables the exchange of such patterns in a standardized way as OpenMath
objects.</p>
      <p>Furthermore, a mapping of patterns to RDF and SPARQL was given as an option to execute
search patterns against an OpenMath RDF representation. Although the scalability of this
2The RDF version of the Content Dictionaries is available at https://github.com/numerateweb/openmath-cd
(29.06.2021)
approach has to be investiged it is a proof of concept to use general purpose graph databases
for storage and retrieval of mathematical formulas.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kohlhase</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sucan,</surname>
          </string-name>
          <article-title>A search engine for mathematical formulae</article-title>
          ,
          <source>in: Proceedings of the 8th International Conference on Artificial Intelligence and Symbolic Computation</source>
          ,
          <source>AISC'06</source>
          , Springer-Verlag, Berlin, Heidelberg,
          <year>2006</year>
          , pp.
          <fpage>241</fpage>
          -
          <lpage>253</lpage>
          . URL: https://doi.org/10. 1007/11856290_21. doi:
          <volume>10</volume>
          .1007/11856290_
          <fpage>21</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kohlhase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Anca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jucovschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Palomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Sucan</surname>
          </string-name>
          ,
          <article-title>Mathwebsearch 0.4, a semantic search engine for mathematics</article-title>
          , Manuscript at http://mathweb. org/projects/mws/pubs/mkm08. pdf (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kohlhase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Matican</surname>
          </string-name>
          , C.-C.
          <article-title>Prodescu, Mathwebsearch 0.5: Scaling an open formula search engine</article-title>
          , in: J.
          <string-name>
            <surname>Jeuring</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Campbell</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Carette</surname>
            ,
            <given-names>G. Dos</given-names>
          </string-name>
          <string-name>
            <surname>Reis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Sojka</surname>
          </string-name>
          , M. Wenzel, V. Sorge (Eds.),
          <source>Intelligent Computer Mathematics</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2012</year>
          , pp.
          <fpage>342</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferré</surname>
          </string-name>
          ,
          <article-title>An RDF Vocabulary for the Representation and Exploration of Expressions with an</article-title>
          <source>Illustration on Mathematical Search</source>
          ,
          <year>2012</year>
          . URL: https://hal.inria.fr/hal-00812197.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Beckett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          , E. Prud'hommeaux, G. Carothers, RDF
          <volume>1</volume>
          .1 Turtle, W3C
          <string-name>
            <surname>Recommendation</surname>
          </string-name>
          (
          <year>2014</year>
          ). URL: http://www.w3.org/TR/turtle/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Prud'hommeaux, SPARQL 1.1 Query Language</article-title>
          , W3C
          <string-name>
            <surname>Recommendation</surname>
          </string-name>
          (
          <year>2013</year>
          ). URL: http://www.w3.org/TR/sparql11-query/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>