=Paper= {{Paper |id=None |storemode=property |title=Getting the Units Right |pdfUrl=https://ceur-ws.org/Vol-1785/W45.pdf |volume=Vol-1785 |authors=Moritz Schubotz,David Veenhuis,Howard S. Cohl |dblpUrl=https://dblp.org/rec/conf/cikm/SchubotzVC16 }} ==Getting the Units Right== https://ceur-ws.org/Vol-1785/W45.pdf
                        Getting the units right

            Moritz Schubotz1 , David Veenhuis1 , and Howard S. Cohl2
              1
                 Database Systems and Information Management Group,
         Technische Universität Berlin, Einsteinufer 17, 10587 Berlin, Germany
          schubotz@tu-berlin.de, david.veenhuis@campus.tu-berlin.de
                  2
                    Applied and Computational Mathematics Division,
    National Institute of Standards and Technology, Gaithersburg, Maryland, U.S.A.
                                 howard.cohl@nist.gov
                        http://units.formulasearchengine.com



        Abstract. To understand applied physics, and physical formulae in par-
        ticular, the investigation of identifier units is beneficial. However, nor-
        mally the units are not given explicitly in formulae and have to be in-
        ferred. In this paper, we investigate how this process can be automated.
        As an example application, we use physical formulae from Wikipedia to-
        gether with information from the related knowledge-base Wikidata. We
        envision that this method can be generalized and describe how, in the
        future, hard logical constraints may be used as a feedback mechanism
        for statistical methods in the context of natural language processing.


Keywords: Wikidata, Wikipedia, Units, Natural Language Processing, Infer-
ence, Mathematical Language Processing, Constraint propagation


1     Introduction
Units play an essential role in the physical sciences. Especially, dimensional anal-
ysis is one of the most significant tools for comprehension and understanding
of physical formulae. We claim that this technique is not only beneficial for hu-
mans on their way to physical understanding, but also to machines that are
programmed to semantically enrich mathematical and especially, physical con-
tent.
    In this paper, we consider physical units in Wikipedia, as a first step towards
a general solution of the underlying problem. We aim to: (1) identify formulae
that deal with physical relationships; (2) automatically derive the units of the
identifiers used in those formulae; and (3) integrate and store the learned data
in the central Wikimedia triple store Wikidata.
    Our paper is structured as follows. First, we analyze related works that can
be used to complete the task at hand. In that context, we recap how possible
definitions can automatically be extracted from the text surrounding formulae.
Thereafter, we describe our method to relate the identifier to dimensions using
the Wikidata knowledge base and our approach to unit constraint propagation.
This will be followed by a refinement of the definition candidates based on the
2       Moritz Schubotz, David Veenhuis, and Howard S. Cohl




Fig. 1: As of January 2016, Wikidata users (visualized by the grey box ‘User’, top
left in the figure) can store mathematical expressions in Wikidata. Thereafter,
these expressions can be displayed in all language versions of Wikipedia (visual-
ized by “Wikipedia user”, bottom left). Moreover, other use cases for this data
are possible. For example, there is the article place-holder service which displays
information regarding a topic in languages, for which human generated articles
in that language do not exist yet. For more details see [6] (picture by Julian
Hilbig and Duc Linh Tran [6]).


constraints. Lastly, we describe the reinsertion of learned units into the Wikidata
knowledge base, which will be used for the formulae we processed. Finally, we
provide an outlook on how this method can be used for feedback driven self-
tuning of Mathematical Language Processing [11].


2    Related Work

A method to find identifier definiens tuples in natural language text is proposed
in [11]. There, the authors use Natural Language Processing (part of speech tag-
ging combined with word distance based scoring) to get the tuples. This approach
shows advantages over the more static pattern-matching approaches, because it
is able to also retrieve results that do not follow the pattern ⟨identifier⟩ is
⟨description⟩(see also [8]).
    This method has been applied to Wikipedia articles to enrich formulae with
definitions for their included identifiers. It works for Wikipedia sites with different
languages. The result may have more than a single possible definition for an
identifier, each with a probability that expresses the likeliness for the definition
to be the relevant. The selection of the correct definition is a problem addressed
in our paper.
                                                     Getting the units right      3

    A method to map the meaning to identifiers is used in [12]. In that paper
the namespace concept known from programming languages is used to assign
documents to namespaces and thus the meaning of the identifiers are mapped
to the meaning belonging to the chosen namespace.
    In [9], an algorithm is proposed to use the need for dimensional homogeneity
in physical formulae compared with constraint propagation to prove formulae
that students gave as answers to physics problems. It aims on validating the
formulae for known units.
    Dimensional analysis is also a widely investigated field in the area of program-
ming languages. In [4], the authors use the need for dimensional homogeneity in
physical equations in conjunction with constraint solving to automatically infer
unit types for programs handling scientific problems. Their approach infers a
general set of unit types using constraints created over the variables and con-
stants occurring in the program. The user can then annotate the inferred unit
types with real units thereby cannot violate the dimensional correctness as it
is proved for the inferred unit type system. In [1], constraint solving is used to
prove the unit correctness of calculations in spreadsheets. More examples for
validating dimensional correctness in programs can be found in [3, 7].
    As preparation of this work, a new feature in Wikidata, the data-type math-
ematical expression, has been developed [6]. Properties with data-type mathe-
matical expression with for instance, defining formulae, represent mathematical
expressions. As of January 2016, these expressions are rendered by the software
which runs Wikidata.

3     Our Method
3.1   Identify Physical Formulae
In this paper, we limit ourselves to the following definition.
Definition 1 (physical formula). A physical formula is a binary mathemati-
cal relation of type equation or inequality containing one or more physical quan-
tities.
Consider the following examples:
                             E = mc2 ,                                          (1)
                                2        2
                             sin θ + cos θ = 1,                                 (2)
                             mmoon < mearth < msun ,                           (3)
                                  λp
                             ℏ=      .                                         (4)
                                  2π
Expressions (1), (4) are physical formulae according to Definition 1. They contain
the physical quantities E, m, c, λ, p, ℏ of which c, ℏ are physical constants.
Expressions (2), (3) are not physical formulae since (2) does not contain physical
quantities, and (3) is not binary. Note that the above definition can be extended
to non-binary relation chains without loss of generality.
    This definition implies the following algorithm to identify physical formulae:
4      Moritz Schubotz, David Veenhuis, and Howard S. Cohl

1. identify mathematical expressions;
2. check if they are binary relations of type equation or inequality;
3. extract identifiers; and
4. decide for each identifier, if it is a physical quantity or expression.
    While we will apply the heuristics from [12] for steps 1 and 3, we need to
develop new approaches for steps 2 and 4. A simple approach to 2, is to convert
the mathematical expression to content MathML using LATExml [10] and after-
wards analyze the content MathML tree using fixed rules. We thereby rely on
LATExml. Possibly occurring problems and limitations of LATExmlfor our appli-
cation will be listed in the final report. The main focus of our work will be on
step 4. While we can find Wikidata items from the algorithm presented in [12],
we might need to improve these algorithms.
    Our main focus is on the development of an algorithm which decides if an
item is a physical quantity or entity. Our approach to address this problem is to
analyze the semantic properties of the relevant Wikidata item using the SPARQL [5]
3
  endpoint. This means all information from Wikidata that expose information on
the units or dimensions respectively. More technically, we will develop a method
to check the relatedness to Q107715 (physical quantity)4 . This will be one of the
key contributions of our research project.

3.2   Identifying units and dimension of physical quantities




Table 1: Dimension, base unit, and symbol according to the international system
of units (SI)
                     Dimension            Unit     Symbol
                     Length               meter      L
                      Mass                kilogram   M
                      Time                second     T
                      Electric Current    ampere     I
                      Luminous Intensity  candela    J
                      Temperature         kelvin     θ
                      Amount of Substance mole       N



   The dimension of a physical quantity is an inherent property of each quantity.
Dimensions are for example length L, mass M or time T . The derived quantity
3
  SPARQL is used to query RDF (Ressource Description Framework) triples. They
  consist of subject, predicate, object. Example: Find all subjects (items) in Wikidata
  that have predicate "subclass of" and object "physical quantity"
4
  Wikidata stores information in form of triples like ("length","subclass of","physical
  quantity") where the unique id of "physical quantity" in Wikidata is Q107715. If a
  Item like "length" has a relation like "subclass of" or "instance of" to the Item for
  "physical quantity" we suppose it to be of kind physical quantity.
                                                       Getting the units right     5

speed has the dimension LT −1 . For all physical quantities, we try to derive their
dimension from Wikidata using SPARQL queries.
     To obtain that, we also take unit information into account, since it is more
prevalent in the Wikidata dataset compared to dimension information. Because
there are multiple unit systems (e.g., imperial units using yard for length versus
SI units using meter) more than one unit per dimension exists. However, physical
laws are usually valid independent from the unit system that is used. In this
context, it has to be noted that some adjustment needs to be done for units that
disregard conceptually important physical properties. For instance the Carnot
efficiency ηCarnot depends on the absolute temperature scale (e.g., kelvin) of the
hot TH and the cold TC reservoir via

                                                TC
                                ηCarnot = 1 −      .                             (5)
                                                TH

Note that the fact that those temperatures are based on an absolute tempera-
ture scale is essential. This requires that non-cardinal temperature units such as
Celsius and Fahrenheit to be converted to prior to computation.
    Given the extracted dimensions, the mathematical domain of physical quan-
tities can be specified better. Approaches to formally describe this domain are
presented in [2]. We introduce the following notation for a physical quantity x
                                           [ ]
                                            x
                                      x≡       ,
                                            d

where x is the spatial part of x as defined in [2] and d is the dimension of the
unit of x. To simplify readability, we use the identifier for x and its spatial part.
We write x = x if d = 1. Thus, (1) can be written as
                          [          ] [ ][     ]2
                              E         m   c
                                      =            ,
                           M L2 T −2    M LT −1

and (5) reads
                                [ ]
                                  TC
                                            [ TC ]
                                   θ                   TC
                  ηCarnot = 1 − [    ] = 1 − T−1H  =1−    .                      (6)
                                 TH          θθ        TH
                                   θ


3.3   Compatible operations

This notation leads to our next definition

Definition 2 (valid physical formula). We call a physical formula valid, if
the dimensions are compatible with the mathematical operators used in that
formula.
6         Moritz Schubotz, David Veenhuis, and Howard S. Cohl


Table 2: Compatibility of Mathematical Operations with Physical units. Here,
|.|1 denotes the 1-norm.
class                   rule               constraint operators
                             [           ]
                                o(a, b)
               ([ ] [ ])
                 a    b
 3: map      o      ,     →      o(x,y)               times, division, integration, differ-
                 x    y        |o(x,y)|1              entiation
               ([ ] [ ])     [         ]
                 a    b        o(a, b)
 2: restrict o      ,     →                  x=y      plus, minus, equals
                 x    y           x
               ([ ] [ ])     [         ]
                 a    b        o(a, b)
 1: apply    o      ,     →                  y=1      power, roots
                 x    y        o(x, b)
                    ([ ])    [      ]
                      a        o(a)
 0: unitless      o       →                  y=1      function application
                      y          y




We exemplify the meaning of compatible operations based on (1). This physical
formula contains the mathematical operators equals (=), times (·), and power
(∧). For ‘equals’, the dimensions on the right-hand side and left-hand side must
be the same, for ‘times’, no restrictions apply and for ‘power’ the unit of the
exponent must be 1. If any of these constraints are violated (e.g., with the
incorrect E = mc) then the units or the formula can not be correct. Note also
that this method is not limited to scalar physical quantities and can be applied
to physical quantities of higher dimensions such as vectors like

      [           ] ∫ [       ] [ ] ∫ [t2 ] [        ][     ] [ ]
           W              F      s      T
                                                F        v     t
                   =        −2 d   = [ ]                  −1 d    .
        M L2 T −2    C M LT      L    t1     M LT −2
                                                       LT      T
                                                 T



We will implement validation for the mathematical operations enumerated in
Table 2, which we group into four classes. For some of those cases [9] defined
detailed unit propagation rules.



3.4     Unit Inference


After having defined those fundamental concepts, we apply them to the actual
data extracted by the Mathematical Language Processing Project and the Unit
information fetched from Wikidata.
    As an example, we demonstrate the work-flow for adding dimensional infor-
mation to formulae extracted from Wikipedia. The result of the process, proposed
in [11], is a probability distribution over identifiers and their possible definiens.
An identifier can have more than one definition candidate. That may lead to
more than one possible dimension for an identifier.
                                                             Getting the units right          7


    Example 1: Mass-energy equivalence

    The relation between energy and mass is described by the mass-energy
    equivalence formula E = mc2 , where E is energy, m is mass, and c is the
    speed of light.

    For Example 1 (from Wikipedia), which was also used in [12], the identifier-
definiens pairs for E may have this form:
     Id. Definition score Id. Definition score Id. Definition score
      E    energy      0.42 m    energy      0.35 c    energy      0.30
      E     mass       0.42 m     mass       0.35 c     mass       0.35
      E speed of light 0.16 m speed of light 0.30 c speed of light 0.35
All three definitions are found in the same sentence, but have different distances
between identifier and definiens. That results in different scores. Note, that the
actual scoring computation is more evolved and the score values have been made
up for demonstration purposes. We now describe this more formally.
    For a physical formula f (x1 , . . . , xn , on+1 , . . . , om ), with the identifiers (phys-
ical quantities and other identifiers) xi and mathematical operators oj , and a set
of definiens candidates N , ∑ the MLP Project returns         ∑ a probability distribution
for the identifiers χ(x) =       N ∈N w 0,n N , with             N ∈N w0,N = 1. From Wiki-
data
∑     we  get  the unit and
                          ∑  implied      dimension       information     denoted as ρ1 (x) =
   N ∈N w 0,n dim(N ) =      i w 1,i di , where w 1,j is the sum over all w 0,N with
dim N = dj . Next, we apply the operator compatibility rules, which affects
the probability distribution according to their constraints. The dimension prob-
ability distribution of an operator o, dim (o(a, b)) implies operator dependent
constraints to dim a and dim b.

                    class(o) = 3 =⇒ dim o(a, b) = o(dim a, dim b).
                    class(o) = 2 =⇒ dim a = dim b.
                    class(o) = 1 =⇒ dim b = 1.
                    class(o) = 0 =⇒ dim a = 1, o(a, b) = o(a).
                                                                         ∑
   To∑reflect that, we define a refined probability distribution ρ(x) = i w2,i di ,
with i w2,i di = 1. Finally, we need to solve a system of linear equations describ-
ing the w2 = Aw1 . We propose the following method to realize that. Assume
that the mathematical notation is good enough to extract the operator tree,
with tools such as LATExml. We convert this operator to the root form and call
O(f ) ∈ Nm×m the adjacent matrix. Since the tree is not directed O(f ) = O(f )T ,
and since identifiers are always connected by an operator, the top left entries of
the adjacent matrix are zero, i.e., ∀i ≤ n ∧ j ≤ n =⇒ O(f )i,j = 0. Based on
that, we will elaborate different strategies for the most efficient execution of the
constraint propagation problem. Our first approach is the following sketch of an
algorithm:
 1. start with the identifiers (x) as working set;
8         Moritz Schubotz, David Veenhuis, and Howard S. Cohl
                                                                                      ⎡             ⎤
                                                                                            =
                                                                                      ⎢ ? M L2 T −2 ⎥
                                                                                      ⎢             ⎥
                                                                                      ⎣ +? M        ⎦
                                                                                        +? LT −1

                           =
                                                                      ⎡               ⎤
                                                                              E                                           ·
                                                                      ⎢ .42 M L2 T −2 ⎥
                                                                      ⎢               ⎥
⎡               ⎤                                                     ⎣ +.42 M
                                      ·
                                                                                      ⎦
        E
⎢ .42 M L2 T −2 ⎥                                                       +.16 LT −1      ⎡
                                                                                                m
                                                                                                        ⎤                         ⎡
                                                                                                                                          ∧
                                                                                                                                                    ⎤
⎢
⎣ +.42 M
                ⎥
                ⎦                                                                       ⎢ .35 M L2 T −2 ⎥                         ⎢ .30 M 2 L4 T −4 ⎥
                                                                                        ⎢               ⎥                         ⎢                 ⎥
                                                                                        ⎣ +.35 M                                  ⎣ +.35 M 2
  +.16 LT −1      ⎡               ⎤                                                                     ⎦                                           ⎦
                          m                      ∧                                        +.30 LT −1                                +.35 L2 T −2
                  ⎢ .35 M L2 T −2 ⎥
                  ⎢               ⎥⎡                ⎤
                  ⎣ +.35 M                  c         2                                                               ⎡               ⎤
                                                                                                                              c         2
                                  ⎦
                    +.30 LT −1      ⎢ .30 M L2 T −2 ⎥
                                    ⎢               ⎥                                                                 ⎢ .30 M L2 T −2 ⎥
                                                                                                                      ⎢               ⎥
                                    ⎣ +.35 M        ⎦                                                                 ⎣ +.35 M        ⎦
                                      +.35 LT −1                                                                        +.35 LT −1

        (1) Set identifier dimensions                                               (2) Propagate upwards


                                                                                      ⎡             ⎤
               ⎡             ⎤                                                              =
                     =                                                                         2 −2
               ⎢ ? M L2 T −2 ⎥                                                        ⎢ ? ML T ⎥
               ⎢             ⎥                                                        ⎢             ⎥
               ⎣ +? M        ⎦                                                        ⎣ +? M        ⎦
                 +? LT −1                                                                     −1
                                                                                        +? LT

⎡               ⎤                  ⎡             ⎤
        E                                 ·                           ⎡               ⎤                           [                    ]
⎢ .42 M L2 T −2 ⎥                  ⎢ ? M L2 T −2 ⎥                            E                                               ·
                                                                      ⎢ .42 M L2 T −2 ⎥                           M L2 T −2
⎢               ⎥                  ⎢             ⎥
⎣ +.42 M        ⎦                  ⎣ +? M        ⎦
  +.16 LT −1                         +? LT −1
                                                                      ⎢               ⎥
                                                                      ⎣ +.42 M        ⎦
                                                                        +.16 LT −1      [                     ]                       [          ]
                    ⎡
                            m
                                    ⎤           ⎡
                                                        ∧
                                                                  ⎤                                       m                               ∧
                    ⎢ .35 M L2 T −2 ⎥
                    ⎢               ⎥
                                                ⎢ .30 M 2 L4 T −4 ⎥
                                                ⎢                 ⎥                                       M                             L T −2
                                                                                                                                           2
                    ⎣ +.35 M        ⎦           ⎣ +.35 M 2        ⎦
                              −1                        2 −2
                      +.30 LT                     +.35 L T                                                        ⎡               ⎤
                                                                                                                          c         2
                                            ⎡
                                                    c
                                                           ⎤
                                                              2                                                   ⎢ .30 M L2 T −2 ⎥
                                            ⎢ .30 M L2 T −2 ⎥                                                     ⎢               ⎥
                                            ⎢
                                            ⎣ +.35 M
                                                            ⎥
                                                            ⎦
                                                                                                                  ⎣ +.35 M        ⎦
                                              +.35 LT −1                                                            +.35 LT −1

          (3) Propagate downwards                                                         (4) Process maps

                    [                  ]
                           =
                        M L2 T −2
                                                                                [                     ]
                                                                                    =
                                                                                 M L2 T −2
⎡                        ⎤             [               ]
        E                                     ·
⎢ .42 M L2 T −2 ⎥                          M L2 T −2
⎢               ⎥
⎣ +.42 M        ⎦                                                     [               ]       [                       ]
  +.16 LT −1                                                                 E                            ·
                                                                          M L2 T −2               M L2 T −2
                  [                ]                 [            ]
                               m                          ∧
                               M                       L2 T −2
                                                                                          [       ]           [               ]
                                       ⎡                       ⎤                              m            ∧
                                               c         2                                    M         L2 T −2
                                       ⎢ .30 M L2 T −2 ⎥                                              [      ]
                                       ⎢               ⎥
                                       ⎣ +.35 M        ⎦                                                 c     2
                                                 −1                                                       −1
                                         +.35 LT                                                       LT
            (5) Propagate upwards                                               (6) Propagate downwards


           Fig. 2: Demonstration of our constraint propagation algorithm.
                                                      Getting the units right      9

 2. propagate the constraints to their parent operators of classes 0-2 and re-
    member the parent operators of class 3 (maps);
 3. update the working set to consist of the updated parents;
 4. propagate the constraints downwards;
 5. the changed children are now the new working set;
 6. continue with upwards and downwards propagation until the working set is
    empty, and a steady state is reached;
 7. update the working set with the class 3 operators;
 8. resolve the class 3 operators, in appropriate order (note that this might lead
    to many possible dimensions for the class 3 operators);
 9. propagate the new constraints by going back to step 2;
10. the algorithm terminates if the working set and the set of unprocessed class
    3 operators is empty.
We demonstrate this algorithm in figure 2 based on Example 1. Formula (1)
depends on (E, m, c, 2, =, ·, ∧) and the adjacent matrix reads

                                                    ⎛E m c 2 = · ∧⎞
                                  ⎛         ⎞
                                      =
                                  ⎜         ⎟    E 0000100
                                  ⎜E · ⎟ m ⎜0 0 0 0 0 1 0⎟
                         2
            O(“E = mc ”) = O ⎜
                                  ⎜
                                            ⎟= c ⎜
                                            ⎟         0 0 0 0 0 0 1⎟ .
                                                  2 ⎜ 0 0 0 0 0 0 1⎟
                                  ⎜ m ∧⎟
                                  ⎜         ⎟
                                                 = ⎝1 0 0 0 0 1 0⎟
                                                    ⎜
                                                  · 0100101
                                  ⎝         ⎠                      ⎠
                                         c2      ∧    0011010
One alternative to this approach is [9]. The evaluation of our approach must
show the performance of the probabilistic approach.

3.5   Wikidata insert/update
After having propagated the units, we might have found a unique solution as
demonstrated in Example 1. In cases where we found this unique solution, we
will write back the dimension information to Wikidata. If the formula already
has an item in Wikidata, we check the properties and update/insert the missing
information. Since we use the Wikidata database for finding information for all
the identifiers in a formula we have automatically a defining item to link to for
every identifier.

3.6   Limitations
For level 3 operators, numerical artifacts (such as scalar ∫factors from the integra-
tion) and probability density are mixed. For example in sdt and a hypothetical
ρ0 (s) = .5M + .5T leads to a propagated probability of
          (∫     )       ∫               (               )
                           ρ0 (s)dt     4 1         1 2        2        1
        ρ     sdt = ⏐  ⏐ ∫         ⏐  =      MT + T         = M T + T 2.
                           ρ0 (s)dt⏐1   3 2         4          3        3

The likelihood for M T is higher in comparison to T 2 , which does not seem
plausible at the first place. Consequently, we will search for better suitable ways
to re-scale the unit vector to length 1.
10      Moritz Schubotz, David Veenhuis, and Howard S. Cohl

4    Future work
The finding of items in Wikidata can be improved by using methods like stem-
ming to match the definition with the item caption. This is due to the fact that
the definitions are extracted from free-text and may be conjugated. The iden-
tification of the units/dimensions can be done by the querying property, has
quality. Due to the community-driven inserts of items, it is not guaranteed that
every item has all of the possible properties. For example, the authors may omit
a property such as, has quality. Instead the property, quantity symbol, may be
queried.
     Moreover, our algorithm can be used as a feedback mechanism for the MLP
process. With the additional unit and dimension information, the algorithm
will learn about mistakes from the unit checking. With this information, the
algorithm will be able to tune itself.


5    Conclusion
The knowledge about units in physical formulae is fundamental. However, in
most cases they are not marked up explicitly. We investigated how they can
be determined automatically. We described a process to automatically infer the
unit information for identifiers in physical formulae. This process is based on for-
mulae and identifier-definition pairs that are extracted from text using Natural
Language Processing techniques. The process may result in multiple definition
candidates for identifiers. To infer a consistent identifier-definition mapping, we
use an algorithm based on the principle of dimensional homogeneity in physical
formulae. The unit/dimensional information for the definitions is retrieved from
the structured data store Wikidata. The results are then used to extend Wikidata
with the formulae and links Wikidata entries describing the identifiers.

Acknowledgements. We would like to thank Volker Markl, Michael Kohlhase and
Abdou Youssef for their support of this research project.


Bibliography
 [1] T. Antoniu, P. A. Steckler, S. Krishnamurthi, E. Neuwirth, and M. Felleisen.
     Validating the unit correctness of spreadsheet programs. In A. Finkelstein,
     J. Estublier, and D. S. Rosenblum, editors, 26th International Conference
     on Software Engineering (ICSE 2004), 23-28 May 2004, Edinburgh, United
     Kingdom, pages 439–448. IEEE Computer Society, 2004.
 [2] J. B. Collins. A mathematical type for physical variables. In S. Autexier,
     J. A. Campbell, J. Rubio, V. Sorge, M. Suzuki, and F. Wiedijk, editors, In-
     telligent Computer Mathematics, 9th International Conference, AISC 2008,
     15th Symposium, Calculemus 2008, 7th International Conference, MKM
     2008, Birmingham, UK, July 28 - August 1, 2008. Proceedings, volume
     5144 of Lecture Notes in Computer Science, pages 370–381. Springer, 2008.
                                                  Getting the units right   11

 [3] M. Contrastin, A. C. Rice, M. Danish, and D. A. Orchard. Units-of-measure
     correctness in fortran programs. Computing in Science and Engineering,
     18(1):102–107, 2016.
 [4] P. Guo and S. McCamant. Annotation-less unit type inference for c. In
     Final Project, 6.883: Program Analysis, CSAIL, MIT, 2005.
 [5] S. Harris and A. Seaborne. SPARQL 1.1 Query Language. https://www.
     w3.org/TR/sparql11-query. seen May, 2016.
 [6] J. Hilbig and D. L. Tran.          Mathematical expression as new data
     type for WikiData - Database project - supervised by Moritz Schubotz.
     Technical Report Winter-term 2015/2016, Technische Universität Berlin,
     feb 2016.     https://github.com/TU-Berlin/WikidataMath/releases/
     download/v1.0.0/ReportWikiDataDBPRO.pdf.
 [7] L. Jiang and Z. Su. Osprey: A practical type system for validating dimen-
     sional unit correctness of C programs. In Proceedings of the International
     Conference on Software Engineering, 2006.
 [8] G. Y. Kristianto, G. Topic, and A. Aizawa. Extracting textual descrip-
     tions of mathematical expressions in scientific papers. D-Lib Magazine,
     20(11/12), 2014.
 [9] C. W. Liew. Checking for dimensional correctness in physics equations.
     In In Proceedings of Fourteenth International Florida AI Research Society
     Conference, 2002.
[10] B. R. Miller. LaTeXML: A LATEX to XML converter. http://dlmf.nist.
     gov/LaTeXML. seen May, 2016.
[11] R. Pagel and M. Schubotz. Mathematical language processing project. In
     M. England, J. H. Davenport, A. Kohlhase, M. Kohlhase, P. Libbrecht,
     W. Neuper, P. Quaresma, A. P. Sexton, P. Sojka, J. Urban, and S. M. Watt,
     editors, Joint Proceedings of the MathUI, OpenMath and ThEdu Workshops
     and Work in Progress track at CICM, number 1186 in CEUR Workshop
     Proceedings, Aachen, 2014.
[12] M. Schubotz, A. Grigorev, M. Leich, H. S. Cohl, N. Meuschke, B. Gipp,
     A. S. Youssef, and V. Markl. Semantification of identifiers in mathematics
     for better math information retrieval. In Proceedings of the 39th Int. ACM
     SIGIR Conference on Research and Development in Information Retrieval,
     2016.