<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LogVis: Graph-Assisted Visual Analysis of Event Logs from Industrial Equipment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tugba Kulahcioglu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitriy Fradkin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ayse Parlak</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Belkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Siemens AG</institution>
          ,
          <addr-line>Nuernberg, DE</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Siemens Corporation</institution>
          ,
          <addr-line>Princeton NJ</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>61</fpage>
      <lpage>72</lpage>
      <abstract>
        <p>Visual reasoning on a graph is often a challenging task mainly due to the vast number of nodes and edges displayed. It becomes particularly challenging on log graph data, where thousands of events may be logged within minutes. In this study, we focus on three common log analysis tasks, namely Event Overview, Root-Cause Analysis and Pattern Analysis, and propose visualization approaches to overcome challenges particularly associated with these tasks. The proposed approaches are demonstrated on sample use-cases on industrial equipment logs.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Graph</kwd>
        <kwd>Log Analysis</kwd>
        <kwd>User Interfaces</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Optimization of operation and maintenance of industrial machines/equipment can lead
to significant benefits for the operators and service organizations. Traditionally, a
domain expert manually reviews the equipment logs for troubleshooting problems and
understanding patterns leading to faults. The logs are typically a combination of
structured (e.g. time, component) and unstructured (e.g. log text) data. A single machine can
easily produce thousands of log entries in minutes, resulting in a semi-structured large
dataset and making manual log analysis challenging and laborious.</p>
      <p>In this paper, we propose graph-based visualization approaches that facilitates
otherwise cumbersome log analysis tasks. The first task we discuss is Event Overview,
during which domain experts review the behavior of the equipment and acknowledge
errors or critical events. The following task is to carry out Root-Cause Analysis to
understand the sequence of events that led to the critical event that is being analyzed.
Finally, the domain experts look for further evidence in the dataset to verify that the
discovered sequence of events typically leads to the critical event. This is achieved through
Pattern Analysis. We propose the following visualization approaches to help facilitate
the aforementioned tasks:
– A Timeline View, to aid in the Root-cause analysis, where the nodes are aligned
based on their temporal relationships, and with further encodings we ensure a quick
understanding of the whole temporal picture of the visualized log messages.
– A Visual Pattern Search method which facilitates Pattern Analysis by leveraging
the power of visualization to quickly observe similarities and differences between
message sequences.</p>
      <p>The paper is structured as follows. Section 2 discusses related work. Section 3
describes knowledge graph construction from log data. Section 4 describes our approach
and Section 5 describes the impact of the tool, and the lessons learned. We conclude in
Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Knowledge graphs have been adopted to cope with the challenges of log datasets in
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These methods, however, mostly focus on extracting some domain-specific
concepts from data, such as detection of an IP address in a log message text using
predefined regular expressions.
      </p>
      <p>
        Visualization is a frequently used approach to summarize and organize large log
data. Similar to our study, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] leverages event templates to extract valuable information
from log messages, and applies data mining methods to make inferences. They
provide an event summarization approach which uses inter-arrival histograms in order to
capture the temporal relationships between the events. Same study also visualizes log
data using parallel-coordinates and scatter-plots. A recent study [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] proposes a log
processing model that generates a natural language report using storytelling techniques for
cyber threat intelligence purposes.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Log-Graph Construction</title>
      <p>A graph G consists of a set of nodes N and relationships R. We construct a graph from
the log data by representing each log message with a node n ∈ N . Additional entities
that log messages are associated with (e.g. customer, machine id, message category) are
also represented as nodes in the graph. The nodes are linked by edges corresponding
the relationships r ∈ R, ex. each log message is linked to machine it occurs on, which
in turn is linked to a customer. Temporal relationships between the messages (prev
representing previous message and next representing next message) are also represented
with a relationship r ∈ R. Each log message node is also associated with a severity
attribute (e.g. info, warning and error), which are used to color-code nodes (green,
yellow, and red, respectively) in the graph visualization.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Graph-Assisted Visual Log Analysis</title>
      <p>In this section, we present our approach to graph-assisted visual log analysis, addressing
the typical tasks carried out on equipment log data. We do not focus on implementation
details in order to keep the discussion general and applicable to multiple domains and
systems. Our specific implementation uses Neo43 backend, and the UI is created using
JavaScript, in particular by adapting libraries neovis4 and vis-network5.
4.1</p>
      <sec id="sec-4-1">
        <title>Event Overview</title>
        <p>We describe the task of reviewing and summarizing log events, associated challenges
and our approach to overcoming these challenges. We exemplify the impact of our
approach on a real-world use-case.</p>
        <p>Task Description. Domain experts often need to review event log data that can span
long time periods. Several higher level tasks could require such analysis, for example
reporting equipment behavior for a specific time period or analyzing critical events for
maintenance purposes.</p>
        <p>
          Challenges. Typically, such analysis involve thousands of logs. As the visualization
community is well aware, displaying very large graphs can create viewability and
usability issues [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In the case of log analysis, although the users could be supported with
additional statistical analysis, the understanding of different relationships between the
nodes remains an issue. A solution for this problem in large graph visualization is to
cluster the nodes (which we will refer to as grouping) based on a selected property that
carries meaningful information for the task to be achieved.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Our Approach: Template-based Group View. We propose applying node grouping</title>
        <p>using Event Templates. A template represents the fixed part of a log message, which is
shared among all log messages that are produced by the same specific lines in the code.</p>
        <p>Consider two messages: ”Error occurred reading file Input12.txt from server A”
and ”Error occurred reading file Input14.txt from server B”. They share the fixed parts
”Error occurred reading file” and ”from server”, while the file and server names appear
to be case-specific. A template for these messages would look like this:</p>
        <sec id="sec-4-2-1">
          <title>Error occurred reading file &lt;filename&gt; from server &lt;servername&gt;</title>
          <p>
            We use DRAIN method [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] to automatically extract template set T from our data.
For each log message node n ∈ N , we add relation (n, t) to R where t represents the
template of the log message n, and t ∈ T . For a summary view that would facilitate
event overview, for each t ∈ T , we merge all nodes n where (n, t) ∈ R into a
supernode. Figure 1 shows group view for the graph on the left. Sizes of the supernodes are
determined based on the count of individual nodes merged to the supernode. Similarly,
edge thickness reflects the number of connections of the individual nodes from the
supernode. Tooltips over nodes and edges show the counts. We use dashed lines for the
connections of supernodes to reflect the virtual nature of these nodes/connections.
          </p>
        </sec>
        <sec id="sec-4-2-2">
          <title>3 https://neo4j.com/</title>
          <p>4 https://github.com/neo4j-contrib/neovis.js/
5 https://github.com/visjs/vis-network</p>
          <p>A larger scale example of the proposed group view is provided in Figure 2 on the
right. Hundreds of red nodes representing error messages are merged into only five
supernodes, two of which seem to cover majority of the messages. This summarization
was possible because all these messages shared five unique templates.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Root-Cause Analysis</title>
        <p>The structure of this section is similar to that of Section 4.1. We analyze challenges of
Root-Cause Analysis task, present our temporal analysis approach to overcome these
challenges, and demonstrate it on a real-world use-case.</p>
        <p>Task Description. One of the primary purposes of equipment log analysis is to be
able to locate and analyze equipment issues. When a problem on the equipment occurs,
domain experts look through the logs to pinpoint it and then analyze preceding time
period logs to find the root-cause of the issue and to develop a potential fix.
Challenges. Just like the previous task, this one also suffers from extensive amounts of
logs. Grouping in this case might not be the best solution since specifics of the
individual messages could carry important information about the error and its causes. Experts
want to be able to filter out the logs that are unrelated, and need to see temporal relations
between nodes. As a result of this analysis, experts come up with potential root causes,
which may need to be verified by support from similar cases from previous such issues
of the same or similar equipment.</p>
        <p>Our Approach: Timeline Analysis. To overcome the challenges of root-cause
analysis, we propose a timeline analysis method that allows the experts to focus on a subset
of logs and their temporal relationships. As the first step, we developed a Timeline View
that supports temporal analysis of the log-graph via the following:</p>
        <p>– Layout and Node Alignment. The nodes are aligned from left-to-right based on the
specified relationship (e.g. ”next message” relationship). As an example, in
Figure 3, msg 32 is connected to msg 33 with a ”next message” relationship, hence
they are aligned horizontally from left-to-right. It shows that msg 32 happens
before msg 33 in the timeline. The nodes that have the same timestamp are aligned
vertically, indicating their shared temporal order (e.g. the green nodes in Figure 3).
– Virtual Edges. Since the experts filter out unrelated nodes, most of the time the
visualization would include several disconnected subgraphs. if the nodes are not
necessarily directly connect with the specified relationship (in our case ”next
message” relationship). To enable the user to see the temporal relationship between
these disconnected graphs, we join them using virtual relationships that are created
on the fly. We use ”dashed” lines for the virtual relationships to distinguish them
from regular ones. In Figure 3, the green nodes that are parallel are connected to
msg 31 with virtual relationships, showing that the currently displayed portion of
the data doesn’t include the messages in between, and that the events associated
with messages 6 to 10 occurs before message 31.
– Edge Thickness. The thickness of the edges between the nodes (in our case ”log
messages”), indicate the time-difference between neighbor log messages. As the
distance between the log messages increase, the edge gets thinner, indicating a
weaker relationship. In Figure 3, the edge between msg 31 and msg 32 is thicker
than other edges, which shows that the distance between these messages are less
than the others, hence the potential of these messages to be a part of a pattern is
stronger.
– We note that Timeline View is separate from Group View. In fact it would be
difficult to combine them since node aggregation would interfere with Time-based
arrangement of nodes. Thus, Timeline view does not have supernodes. The use of
edge thickness here is different from how it is used in Group View where edge
thickness reflects the frequency of connections of the supernode. It is possible to
include other types of nodes (ex. Category or Actor) in the Timeline view, but that
leads to additional non-temporal edges.</p>
        <p>Figure 4 shows, on a larger example, how the timeline view facilitates the root-cause
analysis. On the left part of the figure, the graph is visualized using the default radial
view which is a frequently used graph visualization technique, also employed by our
Group View presented in the previous section. On the top-right is a our Timeline View,
which aligns the nodes based on temporal relationships. The bottom-right figure shows
a zoomed version of the part of the graph6. The Timeline View makes it clear which
messages precede a specific error message, hinting at potential causes of the error.</p>
        <p>We analyze the task of Pattern Analysis on log data, and present our approach to
overcome the challenges typically associated with it.</p>
        <p>Task Description. When a problem occurs, domain experts examine the logs
surrounding the time of the problem to try identifying its root cause. The Pattern Analysis task
aims to find out whether there is a diagnostic or predictive pattern for the problem that
repeats across time and other devices. Based on the outcome of this analysis, proactive
actions can be taken to prevent similar errors in the future.
6 All our visualizations support zooming in and out.</p>
        <p>Challenges. Domain experts can use their knowledge and the logs at specific failure to
come up with candidate patterns7. However, they need to search data across machines
and time to be sure that the pattern is useful. This requires ability to specify the desired
pattern and to easily visualize and review search results.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Our Approach: Visual Pattern Search. To support the users with the Pattern Analy</title>
        <p>sis, we developed a visual pattern search capability where users can select consequent
messages from the displayed graph, and search the database for other occurrences of
the same pattern. The search can be carried out using the original log message texts, or
message templates. Figure 5 shows a sample visual pattern search (on the left) and its
results (on the right).</p>
        <p>Pattern search results are displayed using a table. Each row presents an occurrence
of the input pattern, together with related information, including start and end
timestamps of the retrieved message sequence (hidden in the Figure for customer
anonymization). Each search result is visualized using the “Timeline View’ introduced in Section
4.2. This allows the users to quickly understand the specifics of each result, and to
compare multiple results quickly, uncovering potential patterns in the data. In the sample
results from Figure 5, three consequent input messages form the input, as seen in the
left panel. Based on the results, the second message seem to occur with a specific set of
other warning messages (yellow nodes) in both cases that are found in the data. Figure
6 presents search results for two selected consequent messages, and the results seem to
contain several distinct groups, unlike the single pattern observed in the example from
Figure 5.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Impact and Lessons Learned</title>
      <p>The approaches presented in this work were implemented by the authors and evaluated
by domain experts, specifically service technicians and data analysts from the business
unit. The implementation consisted of several phases, each following an Agile process.
Informal discussions with domain experts were used to identify desirable functionality
and to collect feedback. Their feedback and feature requests enabled us to compile the
following impact and lessons learned:
– Domain experts found the use of automatically mined templates immediately useful
for their overview analysis. However, one of their requests, which we incorporated,
was to generalize some of the templates while specializing a few others.
Generalizing a template essentially is creating a variable from a static part of a template,
which ends up combining several templates into a single one. Specializing a
template is the reverse operation, in which a dynamic part of a template is changed
to static, resulting potentially in multiple templates in place of a single one. These
modifications allow the domain experts to see more meaningful log groups in the
overview analysis. Out of a few hundred templates initially created, these
adjustments included less than 10 templates. This suggest that while automated template
7 By a pattern we mean a set of log messages and relations between them. A sequence matches
a pattern if it is a superset of pattern’s messages and relations.
extraction can provide significant benefits, it is useful to have domain experts to
review the set of templates and provide any adjustments they find helpful.
– In the Timeline View, the edge thickness and virtual edges features were developed
in response to domain experts’ need to visually grasp the time difference and the
ordering information between messages.
– Visual pattern search was found to be very useful to discover frequent patterns in the
data, and to detect outlier behavior. There are cases where the input pattern matches
hundreds or even more message sequences. To handle such cases, domain experts
suggested having an additional column in the results table showing the number of
nodes in the found sequence, to be able to cluster similar sequences together by
sorting the table based on that column. Another suggestion was for functionality
that would allow use of the search results to filter the dataset, giving the experts
a chance to further analyze a result (e.g. expanding the timeframe to see more
messages before and after the search result).
– Graph-assisted visualization made possible fast benchmarking or comparison of
devices, based on distributions of different types of messages made possible by the
Group View.
– Overall, use of the tool led to significant reduction in time needed by service
technicians for root cause analysis from days to hours, while leading to more robust
outcomes due to checking of discovered patterns with Pattern Search over long time
periods and multiple machines and customers. The functionality of the tool also
provided users not only with insights into root cause analysis but also into customer
operations such as problems with the environment or incorrect use/configuration of
the equipment.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Visual log graph analysis suffers from the vast amount of nodes and the lack of
temporal perspective in the visualizations. In this paper, we proposed visualization approaches
specific to log data that can help handle the aforementioned issues. Our template-based
group view allows quick analysis of large log graphs, whereas timeline view and
pattern search facilitate discovery and validation of temporal patterns. We also presented
feedback from domain experts that should prove helpful in design of similar tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Afzaliseresht</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michalska</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>From logs to stories: Humancentred data mining for cyber threat intelligence</article-title>
          .
          <source>IEEE Access 8</source>
          ,
          <fpage>19089</fpage>
          -
          <lpage>19099</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lyu</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          :
          <article-title>Drain: An online log parsing approach with fixed depth tree</article-title>
          .
          <source>In: 2017 IEEE International Conference on Web Services (ICWS)</source>
          . pp.
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          . IEEE (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Herman</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , Melanc¸on, G.,
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>M.S.:</given-names>
          </string-name>
          <article-title>Graph visualization and navigation in information visualization: A survey</article-title>
          .
          <source>IEEE Transactions on visualization and computer graphics 6(1)</source>
          ,
          <fpage>24</fpage>
          -
          <lpage>43</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , et al.:
          <article-title>Flap: An end-to-end event log analysis platform for system management</article-title>
          .
          <source>In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          . pp.
          <fpage>1547</fpage>
          -
          <lpage>1556</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. do Nascimento,
          <string-name>
            <given-names>C.H.</given-names>
            ,
            <surname>Ferraz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.S.</given-names>
            ,
            <surname>Assad</surname>
          </string-name>
          , R.E., e
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>da Rocha</surname>
          </string-name>
          , V.H.:
          <article-title>Ontolog: Using web semantic and ontology for security log analysis</article-title>
          .
          <source>In: The Sixth International Conference on Software Engineering Advances</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Nimbalkar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mulwad</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puranik</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Semantic interpretation of structured log files</article-title>
          .
          <source>In: 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI)</source>
          . pp.
          <fpage>549</fpage>
          -
          <lpage>555</lpage>
          . IEEE (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>