<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>KAT: Enabling the Semanti cation of STEM Documents</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Felix Schmoll</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Considering a constantly growing body of mathematical knowledge it becomes more and more di cult for individuals to take full advantage of all information available. To semantify documents, we need to be able to create annotations e ciently and conveniently { marking de nitions or declarations as well as usages of concepts { on a large corpus of documents. Eventually this can be achieved automatically, but as a rst step a gold standard has to be created by humans. KAT { the KWARC Annotation Tool { has the goal of allowing users to create, view and update annotations on arbitrary (X)HTML documents. We have presented our approach before and in this paper we want to give an update on our progress.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>We have developed KAT { the KWARC Annotation Tool { to address and solve these problems. KAT is a
browser-based tool that will allow users to annotate arbitrary (X)HTML documents. In this paper we want to
present the newest iteration of it, building on the previous versions presented in [DGKMMW14] and [GLKW15].
We proceed in Section 2 by giving a short review of the existing KAT system architecture. In Section 3 we then
proceed by giving a more detailed look into how annotations can be created in the browser and in particular
focus on the progress we have achieved since the last iteration. We then conclude in Section 4 with an outlook
on how we plan to develop the system further.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Recap of KAT system architecture</title>
      <p>KAT is implemented as a JavaScript library
working with (X)HTML5 documents and can be inte- interact KAT
grated into arbitrary websites with little e ort. Annotator read
cuTorhmeep1bo.anseiTcnhtssyi:sstaermic)hatihtrecechtKiuteArceTtucrAoenncsniasonttsabteoofrs,efeotnuhreinmKFAaigiTn- CorTeX load export import KAnnSpec
tool running in the browser; ii) the KAnnSpec, ref
tswiyohsnitcehmonste(ohrlveoergseyacasalnlaedddieCvs)corraiTpetdXioo)cn.uomfeiniit) mthaenaAgnenmoetnat- DoSctuomreent ref AnnSototaretion ref
AtOanntnitooon-</p>
      <p>Instead of describing this architecture in detail, (X)HTML5 RDF logy
we refer interested readers to [DGKMMW14] and
[GLKW15]. KAT is capable of working as a
standalone tool, however it is best used in the context Figure 1: The KAT System Architecture
of a corpus management system such as CorTeX
system [CT]. This scenario is intended for creating and improving a \gold standard". Users receive a document
from the document store in CorTex and create annotations for it using KAT. These annotations are then sent
back to the system in RDF form. Alternatively the users receive an already annotated document and check
existing annotations. These two scenarios drive the KAT development.</p>
      <p>KAT annotations are represented as RDF subject / predicate / object triples. The subjects are the text
fragments that we are annotating whereas the objects are either concepts from the annotation ontology (in the
case of classi cational annotations) or other text fragments (in the case of relational annotations). KAT is not
tied to a particular annotation ontology. On the contrary, at startup time it loads a set of annotation descriptions
referred to as KAT Annotation Speci cation, or KAnnSpec for short. These are a set of custom XML documents
that describe the annotation ontology, its concepts and additional constraints for the KAT user interface.</p>
      <p>As KAT is XHTML based we reference to text fragments using URIs. For this we make use of the XPointer
framework [GMMW03] and developed a custom XPointer scheme. Each text fragment is a contiguous range
of elements in the DOM tree and can be identi ed by giving the rst and last elements contained in it. This
obviously imposes the limitation that text fragments must consist of entire elements, making it very di cult to
annotate linguistically meanigful concepts. In previous iterations of the system we worked around this by
TEItokenizing document, that is adding elements around each word in the document, however we are in progress of
implementing an updated XPointer scheme that allows us to reference text ranges within elements and thereby
eliminating the limitation entirely.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Editing Annotations In The Browser</title>
      <p>KAT o ers three di erent operating modes that can be navigated via a sidebar. Each of them facilitates a
di erent kind of working with annotations:
1. Annotation Mode. This mode provides a user interaction to create and edit annotations on documents.</p>
      <p>This is semantically seperate from the viewing and evaluating of annotations by a di erent mode, allowing
specialized operations e.g. on right-clicking.
2. Reading Mode. Once annotations have been created they can be used to obtain a better understanding of
the structure of a mathematical document as a whole. This mode attempts to provide as much information
as possible about all given annotations in an intuitive way.
3. Review Mode. When automating the creation of annotations, one wants to go over the created annotations
and evaluate them. This mode allows an evaluation of annotations by providing an interface for judging
them as good (thumbs up) or bad (thumbs down).
3.1</p>
      <sec id="sec-3-1">
        <title>Creating Annotations</title>
        <p>As KAT is a browser-based tool, the process of creating annotations is a heavily form-oriented process. A range
of text is selected and then an annotation category is chosen from a right-click context menu. Subsequently a
form appears in the sidebar where more speci c information can be entered. An example of this can be seen in
Figure 2.</p>
        <p>It is possible to make restrictions on input in the KAnnSpec, such as the format of a name by providing a
Regular Expression to be ful lled. To ensure that entered content is valid it is only possible to save an annotation
once all conditions are ful lled. To visually indicate this the color of a text eld changes from red to green, and
only once the validation constraints are full lled the save button becomes active. Additionally the question mark
right next to the input- eld can be used to obtain more information about said constraints (in this particular
case the Regular Expression restricting the input format would be shown).
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Visualizing Annotations And References</title>
        <p>In Reading mode all annotations are displayed to the user. The respective information are conveyed via multiple
means:</p>
        <p>Highlighting. Each concept from the KAnnSpec is assigned its own color and each annotation is highlighted
in the appropriate color.</p>
        <p>Tooltip. When hovering an annotation, information about its elds is displayed. The tooltip is generated
using the &lt;template/&gt; tag in the KAnnSpec.</p>
        <p>Relations. Some annotations may have referential elds. These are visualised on demand.</p>
        <p>Upon creation of an annotation it is assigned a UUID (universal unique identi er). This is later on the only
way to refer to a speci c annotation. As this is not a very intuitive way to reference an annotation there is a
need for visual cues in displaying relations.</p>
        <p>To achieve this, KAT provides the ability to show the relations using paths between annotations that appear
by clicking on them. The paths are labeled with the type of reference between two given annotations. The
direction of a path becomes clear due to the user interaction that causes them to be displayed.</p>
        <p>Internally this is implemented using an SVG-overlay over the document. Paths are then drawn from the
selected annotation to all other annotations that are referenced in respective elds of the KAnnSpec and decorated
with captions describing the kind of relation. As start and end points of the paths the upper left corners of their
selections are used.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Import And Export Of Annotations</title>
        <sec id="sec-3-3-1">
          <title>Document semiEuclidean space</title>
          <p>Run
kat:run
kat:concept</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>KAnnSpec</title>
          <p>kat:type
kat:kannspec
o:symbolname
kat:annotates</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>Annotation http://omdoc.org/ KAnnSpec# Symbol</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>Symbol</title>
          <p>semi-Euclidean space</p>
          <p>By itself KAT only saves the state of annotations for the length of a given session. To make annotations
persistent, it is possible to export and later re-import annotations via buttons in the side pane, which is
implemented based on the Resource Description Framework [SR14]. This approach has been successfully tested in the
creation of a gold standard of declarations in math [Sch16].</p>
          <p>The current implementation also features a prototype for retrieving new documents from a document storage
and submitting annotations to it. If one would fully integrate the system with CorTeX it would then no longer
be necessary to store annotations manually but could store it centralised together with the document.</p>
          <p>In Listing 1 we show a sample of how an annotation is exported to RDF. Each annotation consists of a single
node (lines 6-12). Additionally, each annotation has meta-data, such as the used KAnnSpec, which is omitted
here. To generate this node, we use the properties of the annotation graph shown above in Figure 4.</p>
          <p>Listing 1: Exported RDF generated for a single annotation of an OMDOC Symbol
1 &lt;rdf:RDF xmlns:o="http://omdoc.org/KAnnSpec#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdfsyntax-ns#" xmlns:kat="https://github.com/KWARC/KAT/"&gt;
&lt;!-- omitted a lot of meta-information here --&gt;
&lt;rdf:Description rdf:nodeID="KAT_1433087821332_4477"&gt;
&lt;kat:annotates rdf:resource="https://kwarc.github.io/KAT/content/sample1.html#cse(%2F%2F
*%5B%40id%3D'sentence.11'%5D%2C%2F%2F*%5B%40id%3D'word.202'%5D%2C%2F%2F*%5B%40id%3D'
word.203'%5D)" /&gt;
&lt;kat:run rdf:nodeID="kat_run"/&gt;
&lt;kat:kannspec rdf:nodeID="KAT_1433087757661_OMDoc"/&gt;
&lt;kat:concept&gt;Symbol&lt;/kat:concept&gt;
&lt;kat:type rdf:resource="http://omdoc.org/KAnnSpec#Symbol" /&gt;
&lt;o:symbolname&gt;semi-Euclidean space&lt;/o:symbolname&gt;
&lt;/rdf:Description&gt;</p>
          <p>The export starts by declaring the text fragment it annotates using the kat:annotates relation (line 5). For this
we use the KAT XPointer scheme { in this case given by the URI https://kwarc.github.io/KAT/content/
sample1.html#cse(//*[@id='sentence.11'],//*[@id='word.202'],//*[@id='word.203']). In this URI
cse stands for Container, Start and End. The URI itself consists of
the document URI https://kwarc.github.io/KAT/content/sample1.html, the document in which the
annotated text fragment is located;
an XPath to the deepest element that fully contains the annotated text fragment, here
//*[@id='sentence.11'];
an XPath that points to the start of the annotated text fragment { the rst element that is contained in it
{ here //*[@id='word.202'] and
an XPath that points to the end of the annotated fragment { the last element that is contained in it { here
//*[@id='word.203'].</p>
          <p>Next, a kat:run is passed (line 6). This is intended to provide meta-information such as when and how this
annotation was generated, which has been omitted from the listing. Continuing, it references a KAnnSpec (line
7) and then the actual concept it annotates (line 8). We further specify the type of the annotation with the
kat:type relation (line 9) as given in the KAnnSpec. Finally, we provide all the elds and their values. In this
case, we just give the concept the name semi-Euclidean space (line 10).</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4 Evaluating Annotations In The Review Mode</title>
        <p>After annotations have been created it is possible to evaluate them using a special review mode. Here
automatically created annotations can be assessed by a human operator. An example of the review mode can be found
in Figure 5.</p>
        <p>At any given time exactly one annotation is in focus { the one that is supposed to be assessed. This annotation
is visually contrasted by a dark overlay over the rest of the document. An information eld in the side pane
displays all available information for the current annotation (which would appear in a tooltip when hovering
over an annotation in reading mode) and facilitates endorsing or agging of a given annotation, such that the
automatic annotation generation can be improved based on the given feedback. The user can then iterate through
the annotations in the order of their appearance in the document.</p>
        <p>Using this eld it is possible to evaluate the accuracy of automated annotation-systems or to use ensure the
quality of existing annotations. One possible use case is as a stage in the development cycle in order to obtain
relevant feedback, as was done is the development of the declaration spotter in [Sch16].</p>
        <p>The reviews of annotations made in review mode can likewise be exported, whereas the format is essentially a
list of (UUID, Boolean)-pairs indicating which annotation was reviewed and what the binary review choice was.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion And Future Work</title>
      <p>We have presented the KAT system, an open and browser-based annotation system for STEM documents encoded
in XHTML5. In particular we have presented the new and improved visualisation of referential annotations and
the new review mode. The code base is released freely under the terms of the GNU Public License and available
at [KG]. A running demo is further available at [KD].</p>
      <p>While the basic utility of annotating documents is functional, there are some aspects that we want to improve
upon in order to allow a smoother user experience. The most important ones are
1. Allowing annotated text fragments within elements. At the moment it is only possible to annotate text
fragments on a DOM Element basis. Thus KAT works best with documents are TEI-tokenized into words
in order to process them. We are working on an XPointer-implementation that would allow any XHTML5
document to be immediately marked up. This would furthermore allow a better integration with CorTeX
[CG].
2. Distinction between overlapping annotations. Currently di erent annotation categories are visually set apart
by color. This is however not su cient once there are overlapping annotations. In future versions the ranges
of overlapping annotations should be discernable by giving each of them a di erent height.
3. Adding feedback and other improvements to the review mode. It should be possible to provide more speci c
feedback in review mode, as it is currently only possible to make a binary choice. One way would be to
provide additionally an input eld in order to give an explanation, another one to pick a speci c reason why
an annotation is inappropriate from a drop-down menu. Furthermore it is an open question how to properly
export the feedback given in review mode.
4. Using arrows for visualising paths. One might want to extend the visualisation of relations as to use actual
arrows indicating a direction. The current paths are suitable while using the system, however insu cient
when looking at a static image of the representation.
5. Allowing changing document content. Published papers are rarely changed, but one might nevertheless want
to accommodate the ability to handle a changing document structure for example when annotating just
sections of a document and merging them later on. The current implementation does not allow this.</p>
      <p>We will also try to get more immediate user feedback by providing a version of the system to a larger audience.
4.1</p>
      <sec id="sec-4-1">
        <title>Acknowledgements</title>
        <p>We thank Frederik Schaefer for his feedback on further improvements of KAT especially in terms of usability
and Deyan Ginev for his input on how to avoid the need to tokenize documents. Additionally we would like to
thank Michael Kohlhase for his supervision.</p>
        <p>Annotation tool. url: Http://www.annotatiesysteem.nl (visited on 02/15/2014).</p>
        <p>Brat rapid annotation tool. url: htp://brat.nlplab.org (visited on 02/15/2014).</p>
        <p>GitHub repository. url: https://github.com/dginev/CorTeX/.</p>
        <p>CorTEX framework. url: http://cortex.mathweb.org (visited on 02/14/2014).
[DGKMMW14] Mircea Alex Dumitru, Deyan Ginev, Michael Kohlhase, Vlad Merticariu, Stefan Mirea, and
Tom Wiesing. System description: KAT an annotation tool for STEM documents. 2014. url:
http://kwarc.info/kohlhase/submit/cicm14-kat.pdf.</p>
        <p>Deyan Ginev, Sourabh Lal, Michael Kohlhase, and Tom Wiesing. KAT: an annotation tool
for STEM documents. In Mathematical user interfaces workshop at CICM. Andrea Kohlhase
and Paul Libbrecht, editors, July 2015. url: http://www.cermat.org/events/MathUI/15/
proceedings/Lal-Kohlhase-Ginev_KAT_annotations_MathUI_15.pdf.
[AN]
[KD]
[KG]
[Sch16]
[SR14]
[YW]</p>
        <p>GitHub repository. url: https://github.com/KWARC/KAT/.</p>
        <p>Jan Frederik Schaefer. Declaration spotting in mathematical documents. B. Sc. Thesis. Jacobs
University Bremen, 2016.</p>
        <p>Guus Schreiber and Yves Raimond. RDF 1.1 primer. W3C Working Group Note. World Wide
Web Consortium (W3C), 2014. url: http://www.w3.org/TR/rdf-primer.</p>
        <p>Yawas - the original web highlighter. url: http : / / www. keeness . net / yawas/ (visited on
02/15/2014).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Paul</given-names>
            <surname>Grosso</surname>
          </string-name>
          , Eve Maler, Jonathan Marsh, and Norman Walsh.
          <article-title>W3c xpointer framework</article-title>
          .
          <source>W3C Recommendation. World Wide Web Consortium (W3C)</source>
          ,
          <source>March</source>
          <volume>25</volume>
          ,
          <year>2003</year>
          . url: http://www. w3.org/TR/2003/REC-xptr-framework-
          <volume>20030325</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          url: http://kwarc.github.io/KAT/ (visited on 07/18/
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>