<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Conceptual Modelling of Log Files: From a UML-based Design to JSON Files</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Evelina</forename><surname>Rakhmetova</surname></persName>
							<email>evelina.rakhmetova@univr.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Verona</orgName>
								<address>
									<addrLine>Str. le Grazie, 15</addrLine>
									<postCode>37134</postCode>
									<settlement>Verona</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Carlo</forename><surname>Combi</surname></persName>
							<email>carlo.combi@univr.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Verona</orgName>
								<address>
									<addrLine>Str. le Grazie, 15</addrLine>
									<postCode>37134</postCode>
									<settlement>Verona</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andrea</forename><surname>Fruggi</surname></persName>
							<email>andrea.fruggi@sia.eu</email>
							<affiliation key="aff1">
								<orgName type="institution">SIA s.r.l</orgName>
								<address>
									<settlement>Verona</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Conceptual Modelling of Log Files: From a UML-based Design to JSON Files</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">562434CA14F120586EAD935238969EB9</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T05:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Conceptual Modelling</term>
					<term>UML</term>
					<term>JSON</term>
					<term>Log Files</term>
					<term>Elastic Common Schema</term>
					<term>Modelling Tool</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we describe an application of a recently proposed comprehensive UML-based (Unified Modeling Language) approach to the conceptual modelling of log files. On the real example, we built an ad hoc UML-based (class) diagram to represent the key features of the logs nested structure and generated an artifact (a template in JSON) based on ECS (Elastic Common Schema). We also describe plans for designing a specialized tool through a conjunction of the already developed artifacts. Presented work is a part of a broader study on the proposed initiative for the general concept of log files standardization. A clear structure of log data would allow more systematic development and more straightforward implementation and employment of the latest information systems, minimize anomalies, errors, and time delays.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The stable work of information systems, with the constantly increasing complexity, and security of a tremendous amount of data, they are processing, profoundly rely on log files management. A log message is a piece of information produced during the work of the computer system or software, generated as a response to a running process or an action. The information pulled out of the log message provides an idea of the log message meaning and the reason for it being generated. Despite that modern log files management systems are powerful mechanisms for resolving issues of the IT industry generally, a growing number of custom solutions make every case rather particular <ref type="bibr" target="#b0">[1]</ref>.</p><p>Nowadays the practice of fast development and customization of applications leads to the situation when logs semantics is not always clear. Such messages do not give a distinct perception of processes and interfere with further analysis, hence, are not quite valuable. Even more challenging to keep track of different log file formats in large heterogeneous systems in which software and devices are dynamic in nature <ref type="bibr" target="#b1">[2]</ref>.</p><p>We declare an intention to limit heterogeneity in log files management by proposing a standardization of the log files through developing a conceptual modelling approach and a suitable tool. Conceptual data modelling provides analysts and designers with a high-level representation of the real world and an efficient way to communicate with each other. Such data models promote understanding of the real-world domain and enhance the ability to meet users' requirements <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b2">3]</ref>. A key problem in log file design is the absence of a widely accepted conceptual model.</p><p>Based on the study and scientific literature review, we have determined a lack of studies on conceptual log files modelling <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b6">7]</ref>. There are standardized log files of systems and widely used commercial schemes, but there is no accepted methodology for the development of a log-based system, which could be applied everywhere <ref type="bibr" target="#b3">[4]</ref>.</p><p>In this paper we describe a comprehensive general approach to conceptual modelling of log data with a UML-like <ref type="bibr" target="#b4">[5]</ref> graphical representation compatible with ELK stack (Elasticsearch, Logstash, Kibana) <ref type="bibr" target="#b5">[6]</ref>, elaborate on the first stages of work and then discuss the usage of the UML-based modelling approach and the developed python script (for generating logs templates and documentation). We also outline the future tasks for the tool development.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Applied Approach and Features to Conceptual</head><p>Modelling Log Files</p><p>Our choice to establish log files conceptual modelling on extended UML-based (class) diagrams and on JSON is based on the following motivations:</p><p>-The UML graphical notation is commonly used over decades; it is structured and easily understandable by various users. -Recently the JSON format has widely emerged as the most convenient standardized format for structuring data such as log files. -JSON is relatively (with respect to other structured formats) compact, flexible, as almost every programming language can parse it, and human-readable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Requirements for Logs</head><p>Since log files are mostly created automatically, as a minimum log data must include date/time stamp, description of the event and information unique to that event, in order to provide information to benefit further analysis, troubleshooting processes or data breach investigation. Information must be structured and suitable for running data analysis with the use of various tools.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">JSON Log Files Formation</head><p>JSON logging provides more flexibility to the current logging system, especially when migration from the text logging format (as most common and unsettled format) to JSON can be simply performed. There is currently a vast number of frameworks and programming language drivers that support the translation of log data in JSON format if it was not initially the case. This shows the tendency of the industry to a standardized format for structuring such kinds of data.</p><p>One of the advantages -JSON is simple to implement in languages without built-in JSON functionality. It is important to highlight that on the metadata level ECS (Elastic Common Schema) <ref type="bibr" target="#b5">[6]</ref> is specified through YAML format documentation (git repository), following, we will transform this for usability purposes in JSON format.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">An Ad Hoc UML-based Diagram Modelling Approach</head><p>An application of the recently proposed approach <ref type="bibr" target="#b6">[7]</ref> allows the representation of log files data structure in a more powerful way, thus providing a sound description of log-based systems. The extended UML-based modelling approach considers the use of suitable stereotypes to extend class diagrams <ref type="bibr" target="#b4">[5]</ref> to represent the (mainly nested) structure of a log record.</p><p>A log record is composed of attributes and field sets, which group in their turn attributes related to the same feature the field set is representing. This gives extensive and clear to any user representation with a possibility to implement both top-down and bottom-up strategies on system development and/or adjustments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model Features</head><p>The concepts of the class diagram model were taken as a basis <ref type="bibr" target="#b4">[5]</ref>. Added features and extensions provide the support for an ad-hoc representation of log data. We explicitly highlight in the conceptual model that field sets and attributes partly coming from the ECS specification <ref type="bibr" target="#b5">[6]</ref>.</p><p>Composition associations are used to represent a proper nesting, where the nested parts may appear also within other parts, i.e., they are reusable. Field sets are represented through a class-like shape, where we distinguish three different sub-boxes, for core, extended and custom fields, respectively. An ad-hoc notation is also introduced for local nesting of field set. Other aspects considered in the conceptual data model for log files are types for attributes, associations between attributes, enumeration types, ECS metadata as categorization events, and an array of values.</p><p>For a demonstration, we show our extended diagram model obtained from the common log file record. It helps understanding complex data transformations. Fig. <ref type="figure" target="#fig_0">1</ref> shows a UML-based diagram for a single batch log of the custom application in the banking domain created with the implementation of the proposed conceptual model.</p><p>It is fair to highlight that the application is in use and currently acquiring log files are presented in text form, not structured accordingly, have numerous issues and completely unsuitable for proper monitoring and especially analysis. Applying our approach, together with an application owner, we succeeded to design a set of suitable log files records for the batch processes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Towards the Implementation Process</head><p>As for the usability of the conceptual approach, we started by considering some real-world domains, from bank applications. Indeed, such kind of application covers various general event logging.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Python Script for ECS</head><p>The raw data were taken from the ECS repository opened for contribution at https://github.com/elastic/ecs. Originally ECS provides excessively many fields for log records, and only a few of them are needed to be populated for a certain case. Repository collects various files and tool templates, yet they do not provide universal applicability to any system.</p><p>We have chosen to maintain customizations by taking into consideration the tools provided by ECS and creating our own generator (python script, input and output files) to create relevant artefacts for the unique set of data sources. The script is running through the command line. Here are the main steps of the working process:</p><p>-As an input file, the current version of the ECS log fields set in YAML format is converted into JSON.</p><p>-Users can select the log fields from the set and include custom fields relevant to the project if it is needed. -As an output, the artifact in the format of JSON file is obtained; it represents a sample template for a log record for the particular system.</p><p>Notwithstanding that the script is still in active improvement, it is already has been in use for several test cases of batch log files modelling. Fig. <ref type="figure" target="#fig_1">2</ref> provides an example of the case used as well for the UML-based diagram demonstration. This script is a preliminary development for the future tool and has not been published in open source yet. It is one of the parts with the following that must relate to the UML-based graphical representation part.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Further Steps on Tool Prototype Development</head><p>At this point work not only propose the tool and step for its development but provides preliminary solutions. All together the organizational flow and conceptual model of possible architecture are showed on Fig. <ref type="figure" target="#fig_2">3</ref>.</p><p>The tool is aimed at artifacts creation: log files structural templates (in JSON format according to defined fields from the YAML doc) and related documentation (which includes extended UML-based (class) diagrams).</p><p>The tool is aimed to provide for the conceptual modelling of log files from the beginning of system modelling or act as a supportive solution for redefining system logs. Yet in the second case, it is necessary to integrate loggers i.e. plugins for the application (system) logging library to format logs into compatible JSON format.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>As the result, we provided a comprehensive overview of our work on log-files modelling schema (in form of extended UML-based (class) diagrams) and developed preliminary instruments (python script) for log file modelling in JSON format (according to the predefined structure). In addition, we demonstrated our intention and actual steps for developing a comprehensive tool build based on the proposed conceptual log file models, which will provide both ad hoc UML-based diagrams as documentation and JSON formatted templates for log records.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Log file record of the bank batch application: graphical representation through the ad hoc UML-based conceptual modelling.</figDesc><graphic coords="4,134.77,115.83,345.83,189.63" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. The window with an artifacts generator code (right part), input file (the upper left corner; already formatted in JSON with all available fields in accordance with the current ECS version) and generated output file (the low left corner; the template for the custom log record in JSON format).</figDesc><graphic coords="5,152.06,232.69,311.24,170.96" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. An overall concept of the proposed tool architecture.</figDesc><graphic coords="6,186.64,115.83,242.09,117.88" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chuvakin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Phillips</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Semantic Interpretation of Structured Log Files</title>
		<author>
			<persName><forename type="first">P</forename><surname>Nimbalkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Mulwad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Puranik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Finin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 17th International Conference on Information Reuse and Integration (IRI)</title>
				<imprint>
			<date type="published" when="2016">2016. 2016</date>
			<biblScope unit="page" from="549" to="555" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Enabling instantand interval-based semantics in multidimensional data models: the T+MultiDim Model</title>
		<author>
			<persName><forename type="first">C</forename><surname>Combi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Oliboni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pozzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sabaini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zimányi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Inf. Sci</title>
		<imprint>
			<biblScope unit="volume">518</biblScope>
			<biblScope unit="page" from="413" to="435" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Log clustering based problem identification for online service systems</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-G</forename><surname>Lou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">38th International Conference on Software Engineering Companion -ICSE&apos;16</title>
				<meeting><address><addrLine>Austin, Texas</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="102" to="111" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m">OMG Unified Modelling Language (OMG UML)</title>
				<imprint>
			<date type="published" when="2017-12">December 2017</date>
			<biblScope unit="volume">2</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Elastic Common Schema (ECS) Reference</title>
		<ptr target="https://www.elastic.co/guide/en/ecs/master/ecs-custom-fields-in-ecs.html" />
		<imprint>
			<date type="published" when="2021-06-10">10 June 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">A UML-based Approach to the Conceptual Modelling of Log Files</title>
		<author>
			<persName><forename type="first">E</forename><surname>Rakhmetova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Combi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Fruggi</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
		<respStmt>
			<orgName>Department of Computer Science University of Verona</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
	<note>in press</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
