=Paper= {{Paper |id=Vol-3166/paper04 |storemode=property |title=DELTA - Distributed Elastic Log Text Analyser |pdfUrl=https://ceur-ws.org/Vol-3166/paper04.pdf |volume=Vol-3166 |authors=Pierluigi Di Pilla,Remo Pareschi,Francesco Salzano,Federico Zappone |dblpUrl=https://dblp.org/rec/conf/itasec/PillaPSZ22 }} ==DELTA - Distributed Elastic Log Text Analyser== https://ceur-ws.org/Vol-3166/paper04.pdf
DELTA - Distributed Elastic Log Text Analyser
Piergiuseppe Di Pilla1 , Remo Pareschi1,2 , Francesco Salzano1 and Federico Zappone1,2
1
    Stake Lab, University of Molise, Campobasso, Italy
2
    BB-Smile Srl, Rome, Italy


                                         Abstract
                                         Distributed systems have become ubiquitous in recent years, with those based on distributed ledger
                                         technology (DLT), such as blockchains, gaining more and more weight. Indeed, DLT ensures strong data
                                         integrity thanks to complex cryptographic protocols and high distribution. That said, even the most
                                         powerful systems will never be perfect, and, in fact, the larger they get, the more exposed they become
                                         to threats. For traditional systems, log auditing effectively addresses the problem and makes it possible
                                         to analyze the use of applications. However, DLT systems still lack a wide range of log analyzers due
                                         to the particularities of their distribution. To help remedy this weakness, we propose here a generic
                                         auditing system called DELTA (for Distributed Elastic Log Text Analyzer). By coupling Natural Language
                                         Processing with the Docker Engine of the Filebeat, Logstash stack, Elasticsearch and the visual tool Kibana,
                                         DELTA tracks, analyzes and classifies logs generated by DLT systems. Additionally, it enables real-time
                                         monitoring thanks to visual analysis and querying of structured data. DELTA is the first auditing system
                                         applicable to blockchains that can be integrated with the Docker Engine. In addition to describing its
                                         general principles and specific components, we illustrate its application to Hyperledger Fabric, the most
                                         popular of the platforms for building private blockchains.

                                         Keywords
                                         Distributed Ledger Technology, Log Analysis, Cybersecurity, Natural Language Processing, Blockchain,
                                         NLP, DLT .




1. Introduction
Distributed systems have been spreading rapidly in recent years [30], and the emergence of
Distributed Ledger Technologies (DLTs) such as blockchains have strongly contributed to this
trend. These technologies find a wide range of possible applications in areas such as the Internet
of Things (IoT), healthcare, supply chain management, energy, genomics, fintech, insurance,
automotive, etc. [2, 27, 21, 8, 5, 15, 13]. As a consequence, there is an ongoing strengthening
of development frameworks such as Ethereum, Hyperledger, EOSIO, Corda, Waves, Quorum etc.
which are constantly adding new features.
   The trend is explained by the ability of DLT to provide a high degree of security, compared
to classical systems, by encrypting and decentralizing data, aspects that are both paramount

DLT 2022: 4th Distributed Ledger Technology Workshop, June 20, 2022, Rome, Italy
∗
    Corresponding author.
†
     These authors contributed equally.
Envelope-Open p.dipilla@studenti.unimol.it (P. D. Pilla); remo.pareschi@unimol.it (R. Pareschi); f.salzano1@studenti.unimol.it
(F. Salzano); federico.zappone@unimol.it (F. Zappone)
GLOBE http://docenti.unimol.it/index.php?u=remo.pareschi (R. Pareschi); https://github.com/ZappaBoy (F. Zappone)
Orcid 0000-0001-6455-6575 (F. Zappone)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                       48
for the development of decentralized applications (dApps). Data security is a crucial issue
for information systems in general, so, over time, numerous tools have been developed for
data protection and monitoring system access, including auditing, which provides a wealth
of security-related information. In particular, system log auditing extracts information about
the operations carried out and the conditions in which they took place and keeps track of
their timelines. Logs are therefore essential for analyzing the behavior of IT systems under
both normal and abnormal conditions. Indeed, while the normal case provides a history of the
operations carried out, the anomalous one helps identify system errors and detect vulnerabilities,
thus preventing cyberattacks.
   However, compared to development frameworks, distributed systems auditing tools lag,
especially when it comes to blockchains. The more established ecosystems, such as Bitcoin
and Ethereum, have their log analysis tools. Yet, there is a lack of a standardized tool that can
be integrated with most frameworks and is endowed with real-time monitoring capabilities.
Existing tools designed for cloud and decentralized systems [18] are not easy to integrate with
development frameworks for blockchain applications and are not up to the challenges regarding
complete log auditing of blockchain systems [7]. We started from these premises in carrying
out the design and development of a universal log analysis tool, which takes the name of DELTA
for Distributed Elastic Log Text Analyzer, aimed at analyzing logs of activities on most of the
existing development frameworks for distributed and non-distributed systems - a versatility
which is made possible thanks to the use of the Docker Engine and the stack ELK (Elasticsearch,
Logstash Kibana) integrated via Filebeat.
   For this purpose, we make use of the Docker Engine as a bridge for collecting logs between the
analyzer and distributed systems. Thanks to Docker, it is, in fact, possible not only to integrate
completely different systems but also to analyze the logs produced through the ELK stack.
This stack makes it possible to efficiently control log collection methods by accessing Docker
containers. Furthermore, Filebeat takes care of managing log collection methods in real-time,
and Logstash enables automatic log insertion into the Elasticsearch database, which in turn
supplies data to Kibana for immediate viewing through a customizable graphical interface.
   The developed tool is not limited to traceability in that the textual part of the traced logs is
subjected to analysis through Natural Language Processing (NLP). NLP is used to perform, upon
the text within logs produced by the Docker containers, three types of analysis, namely: keyword
extraction, classification, and sentiment analysis. Keywords are extracted through two different
models: the more precise KeyBERT, based on BERT (Bidirectional Encoder Representations
from Transformers), and the more versatile YAKE! [6]. As regards log classification, the choice
fell on the Zero-Shot facebook / bart-large-mnli developed by Meta (formerly Facebook), which
works without requiring data outside the text. The idea of Zero-Shot models is to analyze
and classify data that are also completely different from those with which the training was
carried out using the methodology for statistical inference described in [20]. Finally, sentiment
analysis is performed through VADER (Valence Aware Dictionary and sEntiment Reasoner), an
open-source analysis tool based on rules for extracting sentiment using dictionaries. All log
collection, analysis, and classification processes occur in real-time, enabling interaction with
resource monitoring processes. In this way, it is possible to focus log analysis on the security
problems and undertake mitigating actions as needed.
   Structure of the paper. The remaining part of the article is organized as follows: Section 2


                                              49
describes the methodology underlying Delta and the components used for its implementation;
Section 3 describes the application of DELTA to Hyperledger Fabric, the most popular platform
for private blockchains; Section 4 describes future work; Section 5 concludes the paper.


2. Methodology
The developed system is mainly divided into two macro sections, the first relating to the
collection of logs through the use of the Elastic components and the second part consisting of
the textual analysis systems provided by the DELTA tool. The following sections describe the
methodologies used to implement the system: Section 2.1 describes the use of the Elastic stack
and the log flow within the system, Section 2.2 is instead dedicated to the illustration of the
developed tool and how the log analysis and their re-elaboration takes place.

2.1. Elastic stack
The acquisition of logs produced by distributed services and systems was managed through the
combined use of several Elastic components. In fact, the stack used includes 4 such components,
namely Filebeat, Logstash, Elasticsearch and Kibana. The Filebeat component has been added on
top of the standard ELK stack as it supports log extraction from highly heterogeneous contexts
and therefore is perfectly suited to distributed environments.
   These components fit together to obtain, manage, index, and view the generated logs auto-
matically and instantly. The logs come from the containers where the distributed services are
located and are first extracted through Filebeat, which reads the data directly by connecting to
the log files managed by the Docker engine. Then Filebeat passes the logs to Logstash which
aggregates them and places the raw information inside Elasticsearch creating a special index
for archiving. Finally, Elasticsearch makes indexing of the entered data available for access
via Kibana which provides the visual analysis of both the aggregate data entered through the
collection process and of the analyzed data (Figure 1).




Figure 1: Structure of the system




                                             50
2.2. DELTA Analyser
DELTA is aimed at processing the logs in the Elasticsearch database to extract relevant informa-
tion using Natural Language Processing techniques to facilitate data monitoring and analysis
operations. This occurs through a continuous activity of detection of relevant patterns such
as the presence of IP addresses, as well as of processing on the following three dimensions
(Figure 2):

    • Keywords Extraction
    • Category Classification
    • Sentiment Analysis


                                    DELTA Analyser
                                       main
                                       core
                                           keyword-extractor

                                           zero-shot-classifier

                                           sentiment-analyser

                                       elastic-client

                                       utilities

Figure 2: Structure of DELTA tool



2.3. Keywords Extraction
Keyword extraction picks out words with the highest informative impact on the text, thus
making it possible to conduct statistical analysis or associate keywords with the triggering of
events across vastly varying contexts. Moreover, keywords can be exploited to boost monitoring
effectiveness to access the logs by filtering their content.
   Many methods, techniques, and algorithms extract keywords or key phrases from texts
(e.g., TF-IDF, Rake, YAKE!). Since DELTA was designed for a generic context, we cannot make
predictions about the input to be analyzed. Vice versa, given the specificity that characterizes
some application contexts, there may be extraction models based on statistical concepts that
ignore specific keywords. When dealing with a distributed system that uses specific terms
to identify its components (for example, the word Tangle in the context of the distributed
ledger IOTA), such elements would be ignored by the most models based on word frequency or
dictionaries. These considerations led to implementing both a keyword extraction technique
based on the semantic similarity of the words and a method based on the statistical properties
of the texts. Therefore, for keyword extraction, we relied on two models, namely KeyBERT and
YAKE!.


                                              51
2.3.1. KeyBERT
KeyBERT is based on BERT (Bidirectional Encoder Representations from Transformers) [17],
published by Google in 2018, a model that uses a transformer [28] architecture to map words,
phrases and text into number vectors that capture their meaning. BERT is a bidirectional model,
which makes it able to interpret a word based on the context of the sentence, regardless of
whether the relevant information is left or right, unlike left-to-right architectures, which look
only at the words preceding the one being processed [12]. Furthermore, BERT does not resort
to recursive levels unlike LSTM-based technologies [16], but instead exploits the Self-Attention
[9] mechanism. The KeyBERT system can be implemented with different transformer models,
with the basic model all-MiniLM-L6-v2 having limited needs for computational resources with a
trade-off in precision levels. While, according to the official documentation, the highest quality
model turns out to be all-mpnet-base-v2, we have chosen to use the less powerful paraphrase-
albert-large model, as log texts are generally compact, so this model achieves excellent accuracy
with lower resource consumption.

2.3.2. YAKE!
YAKE! or Yake! or Yake, is an automatic keyword extraction algorithm that stands out above all
for its simplicity and an excellent balance between computational resource requirements and
quality of the analytics. It is an unsupervised algorithm based on statistical textual characteristics
extracted from individual documents. Therefore, it does not need to be trained on a particular
set of documents, nor does it depend on external dictionaries and corpora, nor has limitations as
regards text sizes, languages, or domains. It also does not use Part of Speech Tagging [26], which
makes it language-independent, except for the use of different but static stopword lists for each
language. This makes it easy to apply to languages other than English, particularly low-diffusion
languages for which open-source language processing tools may be underperforming.

2.3.3. KeyBERT compared to YAKE!
KeyBERT and YAKE! thus provide alternative models for keyword extraction. In testing the two
algorithms on logs, we found out that both offer high accuracy for short texts. Nevertheless, the
model suggested by default is KeyBERT, in virtue of its higher accuracy for longer texts. On the
other hand, KeyBERT turns out to be significantly more onerous in terms of performance and
waiting time. Therefore, DELTA provides both approaches to let users choose the most suitable
one for the analysis context.

2.4. Log Classification
Classifying logs according to labels (i.e., classification categories) can help analyze and monitor
distributed systems. First, it is thus possible to get an idea of the frequency with which logs
that share the same label occur to facilitate the identification of related problems. Furthermore,
through the analysis of the logs of the same category, it is possible to identify the presence of
specific patterns, which can then be used to verify the system’s correct functioning or detect
anomalies attributable to errors or tampering attempts. Finally, labels offer a way to access


                                                52
content in addition to keyword-based querying. The classification approach used in DELTA
is a hybrid that combines machine learning with rules, being a Zero-Shot [24] classifier. As
illustrated in [29], this methodology classifies text, documents, or sentences without resorting
to any previously tagged data by using a natural language processing model, pre-trained in an
environment or domain that may be completely different from the application domain. This
makes it possible to classify texts from heterogeneous contexts. It provides a probabilistic
value of whether the text belongs to a label by taking a text and a list of possible labels as
input. Thus, through a threshold, the text is labeled according to the categories to which it
belongs. The way Zero-Shot is used in Delta provides for multiple labels to be assigned to a
single log with different probabilistic scores by exploiting the fact that the model output is an
independent probabilistic value for each supplied label. This gives a more detailed view of
the system behavior and the consequent possibility of monitoring logs in a more specific and
targeted way. There are five preset tags used within Delta: Security, Connection, Communication,
Transaction and finally Operation. However, these can be changed according to the context of
use. The currently adopted model is the one provided by Facebook: ’facebook/bart-large-mnli’
[1], but this too can be changed according to needs and preferences.

2.5. Sentiment Analysis
In addition to keyword extraction and classification analysis, log sentiment analysis is also
performed. Sentiment analysis consists of language processing and analysis aimed at identifying
the subjectivity value of the text, with the primary goal to determine the polarity of a document,
i.e., to classify it according to the degree of positivity, negativity, or neutrality of the concepts
expressed. As surprising as it may seem, logs carry sentiment that can usefully shed light
on what is going on within the monitored system. In order to detect anomalies and errors, a
monitoring system based on DELTA will indeed benefit from the identification of logs loaded
with negative sentiment, indicative of errors or malfunctions. For the extraction of sentiment
from the logs, the library VADER-lexicon general is used, which, born as a sentiment analysis
library aimed at social media and customer feedback, has valuable features that make it an
excellent analyzer for short texts and hence for logs too. Once a given text has been analyzed,
VADER responds with a polarity value called compound that indicates the degree of positivity
or negativity of the sentiment of the analyzed text. The compound is later processed to define a
sentiment evaluation label through thresholds that were tuned and set according to empirical
evidence. Within DELTA, it is also possible either to modify the values of the thresholds or to
add new ones to refine levels of positivity and/or negativity.

2.6. Additional Log Processing
In addition to the textual analysis of the logs, further processing was carried out to bring
intrinsically relevant information to the fore. First of all, elements deemed irrelevant for the
analysis were removed. We also worked on the Logstash component responsible for collecting
data and creating a very detailed structure containing all the possible information directly
observable from the log generation sources. This structure has been stripped down and simplified
as far as possible to facilitate future analyses and speed up information sharing. Furthermore,


                                               53
Figure 3: Structure of the system



regarding security aspects and the control of the use of the system, the extraction of IP addresses
and connection ports, if present in the logs, was carried out. Finally, the information relating to
the log output standard was also extracted, thus making it possible for the aggregation filters to
operate in a simplified manner based on the type of log produced by the systems (Listing 1).


3. Application to Hyperledger Fabric
We briefly describe a DELTA application to Hyperledger Fabric [4, 11], the most adopted
platform for building private distributed ledgers. The challenges presented by blockchain and
DL management were, in fact, the initial motivation for DELTA, even if its construction was
then generalized to all systems that can be containerized and distributed through technologies
such as Docker and Kubernetes.
   Like any distributed ledger, Fabric’s goals include ensuring a secure environment. However,
it has known vulnerabilities [10, 22, 3] that provide attack points for malicious users. While
these can be mitigated, there is a lack of a monitoring system that can detect potential attacks
and act promptly. This problem can be addressed by using DELTA to analyze the logs produced
by the system during the attack phase. Log checking is performed to identify attack patterns,
and then these are used to develop a monitoring system to spot and mitigate threats in real-time.


                                              54
Since Fabric blockchains are distributable using the Docker engine, DELTA is a close fit for log
analysis of the entire Fabric network. The monitoring system is based on the logs generated by
the Docker containers and then analyzed and processed by DELTA in real-time. Queries can
filter logs and obtain only the relevant ones for attack detection. To this aim, several additional
valuable data are extractable, such as name, unique ID image, execution status, and installed
volumes of Docker containers. Moreover, DELTA provides keywords, sentiment, and the types
of the log, which can be Security, Connection, Communication, Transaction, and Operation.
    The attacks on Fabric fall into two broad categories, depending on whether they are about
the network rather than the execution of smart contracts [23]. DELTA is particularly effective
at dealing with the first ones. They are essentially variants of well-known attacks in distributed
systems, e.g., Distributed Denial of Service[19], Sybil[14] and Eclipse attacks [25], which exploit
some specificities of Fabric, such as the relatively lower level of decentralization, compared to
other blockchains, resulting from design choices like the use of a centralized Ordering Service
for transaction management. A monitoring system aimed at network attacks and consisting of
three microservices was consequently implemented, namely i) a service that relies on DELTA to
detect patterns of potential danger, once anomalous behavior is confirmed, sends a warning to
all configured addresses, ii) a service that takes care of sending warning messages via webhook
based on the detections made as in i), iii) a dedicated mitigation service.


4. Future Work
The DELTA tool provides an initial log auditing approach specific to distributed systems with a
focus on blockchain and DLT platforms. However, it is limited to the Docker engine. Although
widely used and suitable for distributed systems, Docker does not scale effectively to very
large systems. Consequently, the next step will be to integrate DELTA with Kubernetes, an
open-source platform, initially developed by Google, for managing workloads and orchestrating
containerized services, which simplifies both system configuration and automation of service
delivery practices in very large systems.


5. Conclusion
The exponential growth of distributed systems in recent years, mainly due to the advent of
Distributed Ledger Technology and, in particular, blockchains, has led to greater attention to
these systems’ security and analysis issues. In particular, the security of the information present
within distributed systems is paramount because these systems are being increasingly deployed
into contexts characterized by sensitive information. For this purpose, we have designed and
implemented the DELTA tool that collects and stores logs generated by the services that make up
the system through the use of some of the components provided by the Elastic search ecosystem
for data analytics. Then the logs are suitably processed to simplify their analysis. Finally, their
text content goes through Natural Language Processing to extract keywords and sentiment and
is classified according to relevant categories. Keywords enable effective log search, and their
extraction can be done according to needs by choosing between KeyBERT, more precise, and
YAKE!, faster and lighter. Sentiment analysis is performed through the VADER algorithm to


                                              55
measure the degree of text sentiment, where a significant degree of negativity warns about the
need to carry out thorough checks on what is happening to the system. Logs are classified in
categories that can be set based on the characteristics of the execution system to access them
according to classification.
   The purpose of these analytical capabilities is to effectively provide the information extracted
from the logs to external monitoring processes, which can thus carry out specific and detailed
analyses based on the problems at hand and consequently mitigate them. To this end DELTA
was made customizable and is interfaceable with other platforms through REST APIs to query
the system and suitably filter content.


References
 [1] Stanislaw Adaszewski, Pascal Kuner, and Ralf J Jaeger. Automatic pharma news catego-
     rization. arXiv preprint arXiv:2201.00688, 2021.
 [2] Shiroq Al-Megren, Shada Alsalamah, Lina Altoaimy, Hessah Alsalamah, Leili Soltanisehat,
     Emad Almutairi, et al. Blockchain use cases in digital sectors: A review of the literature. In
     2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing
     and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom)
     and IEEE Smart Data (SmartData), pages 1417–1424. IEEE, 2018.
 [3] Nitish Andola, Manas Gogoi, S Venkatesan, Shekhar Verma, et al. Vulnerabilities on
     hyperledger fabric. Pervasive and Mobile Computing, 59:101050, 2019.
 [4] Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis,
     Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich,
     et al. Hyperledger fabric: a distributed operating system for permissioned blockchains. In
     Proceedings of the thirteenth EuroSys conference, pages 1–15, 2018.
 [5] Paolo Bottoni, Nicola Gessa, Gilda Massa, Remo Pareschi, Hesham Selim, and Enrico
     Arcuri. Intelligent smart contracts for innovative supply chain management. Frontiers in
     Blockchain, 3:52, 2020.
 [6] Ricardo Campos, Vítor Mangaravite, Arian Pasquali, Alípio Jorge, Célia Nunes, and Adam
     Jatowt. Yake! keyword extraction from single documents using multiple local features.
     Information Sciences, 509:257–289, 2020.
 [7] Michael P Cangemi and Gerard Brennan. Blockchain auditing–accelerating the need for
     automated audits! EDPACS, 59(4):1–11, 2019.
 [8] Federico Carlini, Roberto Carlini, Stefano Dalla Palma, Remo Pareschi, and Federico
     Zappone. The genesy model for a blockchain-based fair ecosystem of genomic data.
     Frontiers in Blockchain, 3:57, 2020.
 [9] Baiyun Cui, Yingming Li, Ming Chen, and Zhongfei Zhang. Fine-tune BERT with sparse
     self-attention mechanism. In Proceedings of the 2019 Conference on Empirical Methods
     in Natural Language Processing and the 9th International Joint Conference on Natural
     Language Processing (EMNLP-IJCNLP), pages 3548–3553, Hong Kong, China, November
     2019. Association for Computational Linguistics.
[10] Ahaan Dabholkar and Vishal Saraswat. Ripping the fabric: Attacks and mitigations




                                              56
     on hyperledger fabric. In International Conference on Applications and Techniques in
     Information Security, pages 300–311. Springer, 2019.
[11] Stefano Dalla Palma, Remo Pareschi, and Federico Zappone. What is your distributed
     (hyper) ledger? In 2021 IEEE/ACM 4th International Workshop on Emerging Trends in
     Software Engineering for Blockchain (WETSEB), pages 27–33. IEEE, 2021.
[12] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina N. Toutanova. Bert: Pre-training
     of deep bidirectional transformers for language understanding, 2018.
[13] Ali Dorri, Marco Steger, Salil S Kanhere, and Raja Jurdak. Blockchain: A distributed solution
     to automotive security and privacy. IEEE Communications Magazine, 55(12):119–125, 2017.
[14] John R. Douceur. The sybil attack. In Peter Druschel, M. Frans Kaashoek, and Antony
     I. T. Rowstron, editors, Peer-to-Peer Systems, First International Workshop, IPTPS 2002,
     Cambridge, MA, USA, March 7-8, 2002, Revised Papers, volume 2429 of Lecture Notes in
     Computer Science, pages 251–260. Springer, 2002.
[15] Samuel Fosso Wamba, Jean Robert Kala Kamdjoug, Ransome Epie Bawack, and John G
     Keogh. Bitcoin, blockchain and fintech: a systematic review and case studies in the supply
     chain. Production Planning & Control, 31(2-3):115–142, 2020.
[16] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput.,
     9(8):1735–1780, 1997.
[17] Prashant Johri, Sunil K Khatri, Ahmad T Al-Taani, Munish Sabharwal, Shakhzod Suvanov,
     and Avneesh Kumar. Natural language processing: History, evolution, application, and
     future work. In Proceedings of 3rd International Conference on Computing Informatics and
     Networks, pages 365–375. Springer, 2021.
[18] Łukasz Kufel. Tools for distributed systems monitoring. Foundations of Computing and
     Decision Sciences, 41(4):237–260, 2016.
[19] Felix Lau, Stuart H. Rubin, Michael H. Smith, and Ljiljana Trajkovic. Distributed denial
     of service attacks. In Proceedings of the IEEE International Conference on Systems, Man &
     Cybernetics: ”Cybernetics Evolving to Systems, Humans, Organizations, and their Complex
     Interactions”, Sheraton Music City Hotel, Nashville, Tennessee, USA, 8-11 October 2000, pages
     2275–2280. IEEE, 2000.
[20] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed,
     Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. Bart: Denoising sequence-to-sequence
     pre-training for natural language generation, translation, and comprehension. arXiv
     preprint arXiv:1910.13461, 2019.
[21] Giorgio Alessandro Motta, Bedir Tekinerdogan, and Ioannis N Athanasiadis. Blockchain
     applications in the agri-food domain: the first wave. Frontiers in Blockchain, 3:6, 2020.
[22] Cathrine        Paulsen.                    Revisiting     smart       contract      vulnera-
     bilities      in       hyperledger        fabric.                    https://cse3000-research-
     project.github.io/static/d23cb4583e6d97a1e509eafda859c424/poster.pdf, 2021.
[23] Pierluigi Di Pilla, Remo Pareschi, Francesco Salzano, and Federico Zappone. Hyperledger
     fabric attacks mitigation (extended abstact). In FOCODILE 2022 - 3rd International Workshop
     on Foundations of Consensus and Distributed Ledgers, 2022.
[24] Amit Chaudhary published in amitnes. Zero shot learning for text classification.
[25] Atul Singh, Miguel Castro, Peter Druschel, and Antony I. T. Rowstron. Defending against
     eclipse attacks on overlay networks. In Yolande Berbers and Miguel Castro, editors,


                                              57
     Proceedings of the 11st ACM SIGOPS European Workshop, Leuven, Belgium, September 19-22,
     2004, page 21. ACM, 2004.
[26] Atro Voutilainen. Part-of-speech tagging. The Oxford handbook of computational linguistics,
     pages 219–232, 2003.
[27] Qiang Wang and Min Su. Integrating blockchain technology into the energy sector—from
     theory of blockchain to research and application of energy blockchain. Computer Science
     Review, 37:100275, 2020.
[28] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony
     Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Huggingface’s
     transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771,
     2019.
[29] Wenpeng Yin, Jamaal Hay, and Dan Roth. Benchmarking zero-shot text classification:
     Datasets, evaluation and entailment approach. In Kentaro Inui, Jing Jiang, Vincent Ng,
     and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in
     Natural Language Processing and the 9th International Joint Conference on Natural Language
     Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 3912–3921.
     Association for Computational Linguistics, 2019.
[30] Zibin Zheng, Shaoan Xie, Hongning Dai, Xiangping Chen, and Huaimin Wang. An
     overview of blockchain technology: Architecture, consensus, and future trends. In 2017
     IEEE international congress on big data (BigData congress), pages 557–564. Ieee, 2017.




                                             58
     A. Structure of Processed Logs

1 {
2      "_index":"data_parsed",
3      "_type":"_doc",
4      "_id":"HkEy0n8BYv2pyDWhsjXR",
5      "_score":1.0,
6      "_source":{
7         "@timestamp":"2022-03-20T12:48:09.287Z",
8         "type":"ERROR",
9         "message":"[31m2022-03-20 12:48:09.287 UTC 007d ERRO [0m [core.comm] [31;1
               mServerHandshake [0m -> Server TLS handshake failed in 607.238772ms with error
               server=Orderer remoteaddress=167.94.138.46:45236",
10        "sentiment":"very negative",
11        "id_log":"MQhep38BYv2pyDWh9xpE",
12        "container":{
13           "id":"0c06a22e0a5f43b7d2ef9b6bfbfa227ae828a3fd2be0e1ab44e3f46926106640",
14           "name":"orderer.example.com",
15           "image":{
16               "name":"hyperledger/fabric-orderer:2.4.2"
17           }
18        },
19        "keywords":[
20           [
21               "1mserverhandshake",
22               "remoteaddress",
23               "failed"
24           ]
25        ],
26        "classification_labels":[
27           [
28               "Communication",
29               "Security",
30               "Connection"
31           ]
32        ],
33        "ip":[
34           "167.94.138.46:45236"
35        ],
36        "name_image_doc":{
37           "name":"hyperledger/fabric-orderer:2.4.2"
38        },
39        "stream":"stderr"
40     }
41 }

        Listing 1: Simplified structure of processed logs used on Hyperledger Fabric network




                                               59