<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Demo: Tools for Information Fragmentation in Knowledge Graphs?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sandro Rama Fiorini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guilherme Ferreira Lima</string-name>
          <email>guilherme.limag@bm.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcio F. Moreno</string-name>
          <email>mmoreno@br.ibm.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM Research</institution>
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Integration of symbolic representations with multimodal data is an important problem in multiple domains. The Hyperknowledge Framework (HKF) is a multimodal knowledge representation framework that allows users to integrate non-graph data to knowledge graphs. In this particular demo, we will show how HKF API can be used to more easily integrate and consume raw data fragments in knowledge graphs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        model that is independent of an specific media type. It can also complement document
and information description models, such as [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], with a base format for constructing
information artifact identifiers based on the structure of such artifacts. HKF API
implements GFM, providing a material syntax for programmers to create references to and
dynamically extract parts of information artifacts, allowing these to be linked to other
entities in the graph. The main contribution of HKF API is that the very descriptions
of fragmentation operations serve as identifiers for the fragments themselves within the
knowledge graph. So, for example, the operation extracting a rectangular blob from an
image becomes the identifier of the blob, which can then be linked in the knowledge
graph or retrieved for further processing. In this demo, we demonstrate some of these
operations are handled in our framework with a small example.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>General Fragment Model</title>
      <p>The General Fragment Model defines a formal model for information reference. It
describes a conceptual structure that can be instantiated to create resolvable reference
names for parts of information artifacts. An information artifact is a codification of
some propositional content that realized by some physical or virtual object. Examples
are images, text, drawings, sound files, sensor readings, databases and ontologies.</p>
      <p>GFM establishes that fragments of information artifacts are specified by anchors.
An anchor on an information object o 2 O is defined by an indexer function
f (o; d) : O</p>
      <p>D ! O0
that maps an arbitrary token d 2 D to a set of parts O0 of O. The tokens in D can be
any other information artifacts, especially vectors, dictionaries or strings. For instance,
given a text document e, the text fragment e0 between characters 10 and 20 can denoted
by the application of an indexer function subtext to the target e and with the argument
token [10; 20]. In this case, the function application subtext(e; [10; 20]) is said to be an
anchor on e and it is a reference (or a name) to e0.</p>
      <p>Anchors can be composed by other fragments. For example, consider an indexer
function rect that extract sub-images from figures and an indexer channel that takes
an specific color channel of an image. Considering an image img, we can define the
fragment</p>
      <p>channel(rect(img; [10; 10; 20; 20]); "blue")
which takes the blue channel of a 20-pixels square positioned at coordinates 10 10 on
img. We can go even further by composing the same indexer function multiple times:
xywh(channel(xywh(img; [10; 10; 20; 20]); "blue"); [2; 2; 4; 4])
which denotes a 4-pixel square fragment of the blue channel. Specific implementations
might decide on the applicability of a given indexer function on a specific type of
information artifact.
john
depictedBy</p>
      <p>
        rect({x: 10,y: 10, w: 30, h: 230})
picture1
rect({x: 10,y: 34, w: 30, h: 230})
depictedBy
mary
The HypeKnowledge Base (HKB), is a knowledge graph database part of HKF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It
has been used as a knowledge base in applications for domains ranging from Sports to
Agriculture, with particular success in Oil &amp; Gas. It is based on a hypergraph model,
where n-ary links associate multiple nodes. It is also a property graph, where links and
nodes have their own properties. Sub-graphs can be compartmentalized into contexts
in which nodes can be imported. More importantly for the discussion in this abstract,
it allows representation of raw data as part of the knowledge graph itself, in specific
nodes called content nodes. These nodes work as any other node in the graph, but can
be resolved into the media they represent, including images, text, videos and 3D models.
      </p>
      <p>All nodes might have associated anchors. Anchors represent a fragment of the
inhering node. In our new model, an anchor is identified by what we call Fragment Identifiers,
which allows representation of GFM constructs. In the following, we briefly describe
(a) the FI language and (b) the anchor resolver API for this service.</p>
      <p>Fragment identifiers (FI) is concrete language for entity naming that implements
GFM. It allows definition of fragmentation operations based on a JavaScript-like
syntax. The basic form is artifact.indexer(token). Artifacts can be any node
identifier in HK. Indexers are usually function names. Tokens can be lists or json-like
objects. For instance, given a content node identified by document, our previous
example of subtext anchor can be specified as:</p>
      <p>document:subtext(fstart : 10; end : 20g)
FIs allow for anchor composition as well. Considering an information artifact figure
representing an image, the following anchor composition is a valid FI:
picture1:rect(fx : 10; y : 10; w : 20; h : 20g):channel(fc : "blue"g)
HKB currently allows users to specify and resolve FI anchors on content nodes.
In practice, these features allow HKB data processing capabilities within the graph
database itself. As mentioned before, content nodes are nodes carrying some raw data.
Their fragments are represented as FI anchor strings associated to them. Fig. 1 depicts
a simple 3-node graph where two KG nodes representing John and Mary are related to
their respective depictions in fragments of picture1 specified as FI fragments.</p>
      <p>The semantics of these anchors is given by a FI resolution API (Fig. 2). The base
API is a REST endpoint implemented within the knowledge graph engine. Currently, we
have API bindings for JavaScript and Python, the latter of which we show in this demo.</p>
      <p>Indexer Function</p>
      <p>Repository
Rest API</p>
      <p>Resolver</p>
      <p>Rect function</p>
      <p>Object Storage
resolve('picture1.rect({x: 10,y: 34, w: 30, h: 230})')</p>
      <p>resolve('picture1')
return blob
lookupResolver('rect')
return rect func. 
return image blob
fetch('picture1')
return image blob
resolve(picture1, 'rect({x: 10,y: 34, w: 30, h: 230})')</p>
      <p>return image blob</p>
      <p>The FI resolution endpoint is able to take full FIs (i.e., artifact and multiple anchors)
and recursively resolve them to produce the actual fragment of the content nodes in the
knowledge graph. Each indexer function is coded as simple pluggable modules1 using
a standard API provided by HKB. Fig. 2 shows the resolution of one of the anchors in
Fig. 1.
4</p>
    </sec>
    <sec id="sec-3">
      <title>The Demo</title>
      <p>In our demonstration, we plan to demonstrate creation and access to media fragments
in a simple example using our Python API and a Jupyter notebook:
1. Demonstration of basic Hyperknowledge constructs in KES (HKF’s KG UI);
2. Ingestion of text and image files as content nodes in a knowledge graph via the</p>
      <p>Python API;
3. Creation and resolution of simple and composed fragments on the ingested files;
4. Demonstration of the creation of these operations in KES;
5. Association of the created fragments with a simple domain ontology.
1 These modules can be implemented based on existing information and metadata extraction
toolkits, such as Apache Tika (https://tika.apache.org) and other components of Apache UIMA
(http://uima.apache.org).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Fiorini</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          , dos Santos,
          <string-name>
            <given-names>W.S.</given-names>
            ,
            <surname>Mesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.C.</given-names>
            ,
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.F.</given-names>
            ,
            <surname>Moreno</surname>
          </string-name>
          , M.F.:
          <article-title>General fragment model for information artifacts (</article-title>
          <year>2019</year>
          ), arXiv:
          <year>1909</year>
          .04117
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gibson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velterop</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The anatomy of a nanopublication</article-title>
          .
          <source>Information Services &amp; Use</source>
          <volume>30</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>51</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cerqueira</surname>
          </string-name>
          , R.:
          <article-title>Extending hypermedia conceptual models to support hyperknowledge specifications</article-title>
          .
          <source>Int. J. Semant. Comput</source>
          .
          <volume>11</volume>
          (
          <issue>01</issue>
          ),
          <fpage>43</fpage>
          -
          <lpage>64</lpage>
          (mar
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Troncy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfeiffer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Deursen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Media fragments uri 1.0 (basic)</article-title>
          .
          <source>W3c recommendation</source>
          ,
          <source>W3C</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>