<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Business Analytics on Knowledge Graphs for Market Trend Analysis*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jens Albrecht</string-name>
          <email>jens.albrecht@th-nuernberg.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Belger</string-name>
          <email>andreas.belger@scs.fraunhofer.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralph Blum</string-name>
          <email>ralph.blum@scs.fraunhofer.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roland Zimmermann</string-name>
          <email>roland.zimmermann@th-nuernberg.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer SCS</institution>
          ,
          <addr-line>Nordostpark 93, 90411 Nürnberg</addr-line>
          ,
          <country country="DE">Deutschland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technische Hochschule Nürnberg Georg Simon Ohm</institution>
          ,
          <addr-line>Kesslerplatz 12, 90489 Nürnberg</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe an ongoing research project that aims at automating information retrieval for technology and innovation management. It is built around a knowledge graph which is created automatically from selected news sources. Based on the knowledge graph, quantitative measurements of mentions on trendrelevant entities as well as changes in the knowledge graph over time are combined to offer insights into market trends for business users.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Graph</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>Text Mining</kwd>
        <kwd>Trend Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>development over time provides the basis for trend exploration. Thus, analytic queries
on graphs can detect upcoming topics, influential players and new technologies.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Knowledge-Graph-Centric Process</title>
      <p>The core process to support TIM in business aims at automating data acquisition and
knowledge graph development to a large degree, while at the same time allowing for
intuitive assessment by business analysts during trend exploration. Figure 1 shows an
overview, centering around the creation of a knowledge graph termed “Trend Graph”.
The process is divided into three main stages with corresponding research questions:
1. Data Acquisition: How can reliable and representative data sources be selected to
narrow in on relevant technology and market information thus reducing noise in
gathered information while maintaining relevance of data for business users?
2. Knowledge Graph Development: How can market-specific entities (enterprises,
technologies, products, events, etc.) be recognized, unambiguously identified and
inserted with relevant relations between multiple entities into a knowledge graph?
How is the historic development of such entities documented within a knowledge
graph to allow analysis of technology and market developments over time?
3. Trend Exploration: What options are available to extract signals for market-relevant
trends from a complex knowledge graph while at the same time hiding this
complexity from business users? How can such analysis be automated, and results be
visualized to offer access to the relevant factors and relationships between entities?
Data acquisition is currently based on manually selected RSS feeds (&gt;500 are regularly
monitored) which deliver news items for selected domains. The current sample consists
of over 260,000 items in the domain of “e-mobility”. Additional channels will be
incorporated (e.g. Twitter, Blogs, patent databases) to enhance representativeness of facts
and opinions. The focus of this paper, however, lies on Knowledge Graph Development
and Trend Exploration.</p>
    </sec>
    <sec id="sec-3">
      <title>Knowledge Graph Development</title>
      <p>
        The key challenges for knowledge graph development are coverage of relevant
information, correctness as well as consistency of the extracted information, and
freshness, i.e. up-to-date information [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To limit the number of concepts and relation
types in the graph and therefore the effort for manual curation, it is helpful to define a
domain-specific ontology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Our approach uses semantic web technologies for the implementation of a business
knowledge graph, because standards like Resource Description Framework (RDF) and
SparQL provide easy access to and integration of external knowledge from global open
data sources like DBPedia [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or YAGO [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The graph consists of strongly typed nodes
and relationships defined in a domain-specific ontology. Contextual metadata like
temporal validity or trustworthiness are included to support data curation and analysis [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
During named entity recognition, mentions, e.g. potential hits for named entities like
organizations, persons, products and date/time values are identified. The named entity
recognition (NER) modules of Flair [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and SpaCy [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] are used as ensemble to increase
the accuracy of this step. Both provide state-of-the-art deep neural language models
with pretrained word embeddings. The detected mentions need to be disambiguated and
linked to unique entities (URIs) in the knowledge base. Open frameworks like
AGDISTIS [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] can be utilized to link entities to public ontologies like DBpedia, which
allow to infer further information like the type, size and location of a company thus
ensuring basic meaningfulness of the knowledge graph. For each detected entity the
link to the originating document and the date of publication are added to the knowledge
graph as lineage information.
      </p>
      <p>Furthermore, the confidence (trustworthiness) of each detection step is evaluated and
stored in the knowledge graph. All information below a certain confidence threshold is
marked “untrusted” and per default excluded from analysis. Entities included in the
knowledge graph, which are initially given low trustworthiness (“untrusted”), need to
be disambiguated by human curators as part of an active learning loop (see figure 2).
Unknown entities such as new organizations are checked manually once and from
thereon used automatically to match entity candidates in newly arriving texts.</p>
      <p>The next step extracts relations, facts about entities and events using open
information extraction algorithms. Events, i.e. expressions related to time, are particularly
interesting for trend analysis. The relations must be mapped to or newly integrated into
the knowledge graph in a similar process as the entities.</p>
      <p>The knowledge graph is developed as an RDF data model on the specifications of
the W3C standards. All information is modeled as triples consisting of nodes and
relationships stored in an RDF graph database. The current (June 2019) knowledge graph
consists of 17,791,689 RDF triples.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Trend Exploration</title>
      <p>Analyzing the Knowledge Graph is based on a descriptive analysis of selected
mentions and related concepts. Initial questions involve for example the number of
announced initial purchases by industrial or public buyers or the geographical distribution
of mentions as well as key words within selected mentions and their variation over time.
Figure 3 shows an example created with Microsoft Power BI based on the current graph
for e-mobility where the sub-domain of electric busses has been selected. Close to 400
relevant mentions are identified. Key words in these mentions are shown in a word
cloud (Fig. 3, right) disclosing the context around the selected mentions.</p>
      <p>With cross-apply-filtering it is possible to select key words of interest and then
characterize those by their geographical distribution to identify e.g. hot-spots for the initial
installation and use of electric busses (Fig. 3, left). Mentions of commercial e-vehicle
manufacturers are identified and counted (Fig. 3, middle), allowing to infer market
relevance. Thus, end-users analyze the knowledge graph and infer knowledge about
technologies and markets with a business intelligence (BI) tool.</p>
      <p>Basis for the visualization is a group of different SparQL queries resulting in tables
for mentions, enterprises, geography and data sources (e.g. RSS feeds) that are linked
to each other within the BI tool to form a common star schema. In a generalized BI
perspective, the knowledge graph resembles a core data warehouse while the frontend
utilizes BI self-service capabilities to realize a data mart, which is optimized towards a
specific group of end-users.</p>
      <p>As text sources are stored additionally as full-text in the knowledge graph, a direct
reference is permanently granted. SparQL queries are predefined and can be
parametrized to some degree (e.g. restrict the selection to certain concepts) by end-users via
parameter tables. The queries are available via a Rest-API of the triple store and can
then be accessed by the BI frontend.</p>
      <p>The next development steps in the Trend Exploration area focus on defining maturity
level measurements for technologies and identifying structural changes in selected
areas of the knowledge graph. The following example illustrates the idea of structural
change calculations regarding actors, technologies and application projects:</p>
      <p>The transition from time t=1 to t=2 involves a structural change in the knowledge
graph with respect to the competing technologies X and Z. From a social network
analysis perspective, the centrality of node Z has increased. One aim of future development
is to test the applicability of centrality algorithms to identify structural changes (e.g.
changed relevance of concepts) and create quantified indicators for trend exploration.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Insights from Pilot and Future Work</title>
      <p>
        We presented the concept of a knowledge graph for trend analysis as an innovative
approach for business intelligence. One of the key research challenges is the evolution
of the information. Companies split and merge, products enter and leave the market.
This kind of events introduces to areas for research, novelty detection and staleness
detection. Regarding novelties, it would be helpful to generate signals immediately
when interesting new information is integrated into the graph. We examined a sample
from the knowledge graph (e-mobility/ grid topic) to determine whether the knowledge
claims in the graph are interesting new information in comparison to energate
messenger, a leading paid-content publisher for German energy market news [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The sample
includes 26 distinct articles published from January to April 2019. During this period
19 articles were published on energate. Results show that 10 of the knowledge graph
articles were not available on energate while 16 were identical. Three experts
performed the comparison independently. This indicates that the graph contains new and
relevant information for the sample topic and even goes beyond the benchmark
(paidcontent provider). It is part of our further research to define how the significance of
signals can be determined based on the content of the graph. But most information in
the graph can become stale or invalid. However, staleness cannot be fully determined
unless further evidence is found in the data sources. A model for data aging dependent
on the kind of information would be helpful to generate some kind of staleness score
influencing the trustworthiness of analyses. To determine the performance of these two
aspects we are working on extended evaluation processes.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Noy</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayanan</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patterson</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tylor</surname>
            <given-names>J</given-names>
          </string-name>
          (
          <year>2019</year>
          )
          <article-title>Industry-scale Knowledge Graphs. Lessons and Challenges</article-title>
          .
          <source>ACM Queue</source>
          <volume>17</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kertkeidkachorn</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ichise</surname>
            <given-names>R</given-names>
          </string-name>
          (
          <year>2017</year>
          )
          <article-title>T2KG: An End-to-End System for Creating Knowledge Graph from Unstructured Text</article-title>
          .
          <source>The AAAI-17 Workshop on Knowledge-Based Techniques for Problem Solving and Reasoning</source>
          :
          <fpage>743</fpage>
          -
          <lpage>749</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kim</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ju</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jeong</surname>
            <given-names>SR</given-names>
          </string-name>
          (
          <year>2017</year>
          )
          <article-title>Practical Text Mining for Trend Analysis: Ontology to visualization in Aerospace Technology</article-title>
          .
          <source>KSII Transactions on Internet and Information Systems (TIIS) 11</source>
          (
          <issue>8</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Wimalasuriya</surname>
            <given-names>DC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dou</surname>
            <given-names>D</given-names>
          </string-name>
          (
          <year>2010</year>
          )
          <article-title>Ontology-based information extraction: An introduction and a survey of current approaches</article-title>
          .
          <source>Journal of Information Science</source>
          <volume>36</volume>
          (
          <issue>3</issue>
          ):
          <fpage>306</fpage>
          -
          <lpage>323</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>DBpedia</given-names>
            <surname>Homepage</surname>
          </string-name>
          . https://wiki.dbpedia.org/. last accessed:
          <year>2019</year>
          /06/24
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>YAGO</given-names>
            <surname>Homepage</surname>
          </string-name>
          . https://www.mpi-inf.mpg.de/departments/databases-and
          <string-name>
            <surname>-</surname>
          </string-name>
          information-systems/research/yago-naga/yago/. last accessed:
          <year>2019</year>
          /06/24
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Krötzsch</surname>
            <given-names>M</given-names>
          </string-name>
          (
          <year>2017</year>
          )
          <article-title>Ontologies for Knowledge Graphs?</article-title>
          30th International Workshop on Description Logics,
          <article-title>Bd 2017</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Akbik</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blythe</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vollgraf</surname>
            <given-names>R</given-names>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Contextual String Embeddings for Sequence Labeling</article-title>
          . 27th International Conference on Computational Linguistics:
          <fpage>1638</fpage>
          -
          <lpage>1649</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Spacy</given-names>
            <surname>Homepage</surname>
          </string-name>
          . https://spacy.io/models. last accessed:
          <year>2019</year>
          /08/15
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Usbeck</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            <given-names>A-CN</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roder</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerber</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coelho</surname>
            <given-names>SA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Both</surname>
            <given-names>A</given-names>
          </string-name>
          (
          <year>2014</year>
          )
          <article-title>AGDISTIS - Agnostic Disambiguation of Named Entities Using Linked Open Data</article-title>
          .
          <source>ECAI</source>
          <year>2014</year>
          :
          <fpage>1113</fpage>
          -
          <lpage>1114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Energate Homepage. https://www.energate.de/medien/energate-messenger.
          <source>html. last accessed: 2019/08/15</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>