A Forensic Methodology for the Identification of Illicit Data Leakage Alessandro Simonetta1 , Luciano Fazio2 and Maria Cristina Paoletti1 1 Department of Enterprise Engineering, University of Rome “Tor Vergata”, Via del Politecnico n.1, 00133, Rome, Italy 2 Studio Giorgio ®, via Gallarate n.112, 20155, Milan, Italy Abstract The digital revolution had and is having profound impacts on modern society and, with it, we are witnessing the birth of new digital illicits, increasingly widespread both in Italy and in the USA. The most common case is the exfiltration of company data by unfaithful employees or former employees, who, for economic interests, act imprudently thinking that such activities are difficult to identify. This article deals with a methodology that allows you to find, in compliance with existing laws, such behaviors with the use of dedicated software tools. Furthermore, this innovation, which makes use of sophisticated data analysis techniques, must provide results that are immediately understandable and accessible even to a non-technical expert in the field such as a lawyer or a judge. For this reason, particular emphasis is given to the presentation of the found evidences through the formulation of a technical-legal report. Keywords computer forensics, digital proof, forensic tools, forensic analysis, civil illicit, civil proceeding, presentation of evidence 1. Introduction • truthful, obtained through the correct interpreta- tion of the computer data; When we use any electronic device, such as a computer or • complete, through the certainty of having ana- smartphone, our activities remain permanently recorded lyzed all the aspects connected to it, avoiding to in the device’s memory. These data are commonly called leave out relevant information that could modify "digital traces" [1] and they can be created depending on its status; the type of activity the user has carried out, such as: • forensic, obtained by respecting the laws in • execution of a system program; force [2]. • read or copy a file; In the field of private law, digital evidence is comparable • send or receive files via the Internet; to an IT document that is defined in the Italian Digi- • print a file; tal Administration Code, Legislative Decree 07/03/2005 • access to a network resource; n.82. In it we find the prerequisites that a digital docu- ment should have in order to be suitable for evidential • access to a remote or cloud system; evaluation. Unfortunately, because of the continuous • execution of a query on a database. technological evolution of the subject, the standards of- However, in order for a digital trace to be used in any ten fail to guarantee a perfect synchrony between the proceeding and, therefore, to take on probative value crystallization in legal terms and the change of technical (thus becoming a digital evidence) it must be: standards [3]. Therefore, the lack of some key concepts such as the • authentic, it is necessary to have absolute cer- criteria for the identification, collection, acquisition, stor- tainty of the authenticity of the source from age and transport of digital evidence has led to the use of which it comes; the international standard ISO1 /IEC2 27037:2012 (Fig. 1). • intact, it is necessary to have a series of procedu- For the management of digital evidence, the standard ral precautions during its collection, in order not identifies four fundamental characteristics: to alter its form or content in any way; • verifiability: it must be possible for any involved SYSTEM 2021 @ Scholar’s Yearly Symposium of Technology, party to evaluate the activities carried out in each Engineering and Mathematics. July 27–29, 2021, Catania, IT phase of the life of a digital evidence; " alessandro.simonetta@gmail.com (A. Simonetta); lf@albertogiorgio.com (L. Fazio); • repeatability: it must be possible for any involved mariacristina.paoletti@gmail.com (M. C. Paoletti) party to be able to reach the same results and, ~ https://www.albertogiorgio.com/ (L. Fazio) therefore, digital evidence, starting from the same  0000-0003-2002-9815 (A. Simonetta) © 2021 Copyright for this paper by its authors. Use permitted under Creative 1 Commons License Attribution 4.0 International (CC BY 4.0). https://www.iso.org/home.html CEUR Workshop Proceedings (CEUR-WS.org) 2 https://www.iec.ch/homepage CEUR http://ceur-ws.org Workshop ISSN 1613-0073 Proceedings 1 Alessandro Simonetta et al. CEUR Workshop Proceedings 1–6 2. Data collection in the United States and in Italy The preliminary stage to a civil proceeding in Italian law is the crystallization of the evidence in order to make it legally usable. This crystallization operation can take place through an acquisition by means of bailiffs or with the inclusion in deeds directly by the parties. In the latter circumstance, the parties involved are not obliged to present all the evidence (if these, for example, are not in their favor) but they must ensure, in any case, that the product complies with the regulations in force in terms of admissibility of the evidence. In the United States procedural law, on the other hand, there is a preliminary phase to a proceeding called discov- ery (known in England as disclosure). During this phase, the parties can both obtain evidence relating to their own questions (evidence gathering), and investigate the opposing field to seek new information with the hope of obtaining further evidence admissible for the hearing (evidence seeking) [6]. Figure 1: Scheme for the treatment of a digital evidence In case of non-production of documents, and even worse, of incorrect or inadequate conservation of elec- tronic documents, the consequences are serious and can conditions and following the same actions per- compromise the subsequent procedural phase. formed during the data analysis; In the Italian legal system there is a similar mechanism • reproducibility: it must be possible for any in- (art. 210 cpc3 ) [7] but less effective, which is based on a volved party to be able to reach the same results diametrically opposite principle: the investigating judge, and, therefore, digital evidence, using different under certain limits (art. 118 cpc) and at the request of tools than the original ones, in order to be able a party, can order the other party or a third party to to demonstrate that under certain conditions the show in court a document or other thing which it deems original result is achieved regardless of the instru- necessary for the trial. Both legislations, however, agree ment used; on the methods for the material collection of digital data, • justifiability: it must be possible for the operator the so-called acquisition and preservation from voluntary who analyzed the data that led to digital evidence and involuntary alterations. to justify every action and all the methods used In order for an acquisition to produce a digital data to arrive at the result. that can be used in any type of judicial procedure, in addition to being performed according to the standards The use of the standard makes it possible to guarantee already described, it must be accompanied by a docu- the integrity of the digital evidence from the acquisition ment that describes all the handovers that the support phase and the subsequent analysis phase, and, at the object of acquisition undergoes between its identification, same time, to obtain the admissibility characteristic of its possible seizure (where foreseen) and the crystalliza- the evidence in a proceeding. tion of the data within it. This document is known as We remind you that starting from 2016 the GDPR (Gen- “Chain of Custody”. It is the answer tested by practice eral Data Protection Regulation) [4] was launched, which to satisfy a rule of the discipline of the acquisition of came into force in Italy from May 2018. The introduction evidence: the party interested in the acquisition of an of a strict regulation on personal data, however, had no object must present sufficient elements to make it appear impact on the issue of the processing of digital evidence, that it corresponds to what is claimed to be [8]. since art. 9 (c.2 letter f) of the Regulation provides that At this point it is necessary to identify the suitable the processing of personal data is lawful if it is neces- tool to physically carry out the data acquisition from a sary to ascertain, exercise or defend a right in court or variety of possible candidate tools [9]. Once the tool has whenever the judicial authorities exercise their judicial been decided, we move on to the data extraction phase functions [5]. from the digital source and to the creation of the so-called 3 Italian Code of Civil Procedure 2 Alessandro Simonetta et al. CEUR Workshop Proceedings 1–6 forensic image, in one of the possible formats available in the new one (e.g. customer list, company secrets, con- and in relation to the goal we want to achieve [10]. fidential information, source code or banks data). Before starting the analysis, it is necessary to confirm To the ex employee could be challenged various of- that what was collected and crystallized corresponds fenses, for example, for having violated the contractual exactly to the original format. This is possible through rules that bind him to the old company, or the rules in the generation of the hash code of the two objects, which force in the field of copyright protection, of company obviously must provide the same result. The use of this jurisprudence or unfair competition (art. 2598 cc4 ). coding technique makes it possible to verify the exact According to the data provided courtesy of Studio Gior- correspondence of the two objects in any process phase. gio®5 on over 100 cases handled in Italy, the exfiltration techniques used by the former employee are the same compared to those used in the US (Table 2): sending 3. Case study emails to personal mailboxes is the tool used for 50% of cases, while external USB devices are used for over 30% Between 2018 and 2020, into the United States inci- (much higher than 9% of the US statistic). dents caused (or involving) by internal staff increased by 47% [11]. The frequency of accidents varies accord- ing to the type of company. The Verizon 2021 Breach Table 2 Investigation Report [12] provides an overview of the Typical unfaithful employee behavior in IT different types of incidents in the various types of com- Behavior % panies involved. Companies in the Health and Finance E-mail forwarding to personal e-mail account 51.75 sector recorded the largest number of incidents caused Using unauthorized/unencrypted USB devices 30.80 by the incorrect use of their employees’ access privileges Data exfiltration using external sites 9.85 and suffered the largest number of data thefts. The exfil- Others 7.60 tration of data by the unfaithful employee in the United States, according to a 2020 statistic that involved 300 accidents in 8 different types of industrial sectors[13], was perpetrated for as many as 43% of the cases through 3.1. Forensic analysis software platforms forwarding to personal email accounts, while, for 16% of To prove wrongdoing by a former employee the company cases through the incorrect use of cloud sharing privi- has the right, by virtue of the clauses normally required leges. The remaining number of data exfiltration cases for the use of company tools (PC, telephone, e-mail box, involve using USB devices (9%) and more. See Table 1. storage disks, ...), to access the information contained therein. On the market there are various [16] software Table 1 platforms that allow you to support the digital foren- Typical unfaithful employee behavior in US sic expert in all activities, starting from the creation of Behavior % forensic images [17]: E-mail forwarding to personal e-mail account 43.75 • AccessData FTK (Forensic ToolKit)6 Misusing cloud collaboration privileges 16.07 • X-ways Forensic 7 Data aggregation - downloads 10.71 • EnCase 8 Using unauthorized/unencrypted USB devices 8.93 Data snooping using sharepoint 8.04 • Magnet AXIOM9 Data exfiltration using external sites 6.25 All data analysis platforms have peculiarities that dis- E-mails sent to non-business domains 3.57 E-mails sent to competitor domains 2.68 tinguish them from each other and, therefore, pros and cons, but all strive to provide a comprehensive solution for the analysis of the most common hardware/software It is interesting to note that the main reasons that in- environments. duce employees to make such a gesture [11, 14, 15] are The aforementioned tools allow you to process a huge economic (64%), followed by espionage (17%), entertain- amount of data but, before allowing full use of their func- ment (17%) and issues of resentment (14%). So, if the tions, they need to have the computer used for their economic leverage is so strong, it will be even higher for operation carry out a preliminary data processing phase. a former employee who will feel free from constraints in 4 leaving the old company. Italian Civil Code 5 https://www.albertogiorgio.com/ For this reason we will analyze the case study of the 6 https://www.exterro.com/forensic-toolkit former employee who, after moving from one company 7 http://www.x-ways.net/forensics/ to another, uses documents owned by the old company 8 https://security.opentext.com/encase-forensic 9 https://www.magnetforensics.com/products/magnet-axiom/ 3 Alessandro Simonetta et al. CEUR Workshop Proceedings 1–6 During the pre-processing phase, the entire content there are database access monitoring software that can of the data extracted from the original finds is read (in detect "suspicious" activities that cannot be performed. the form of a forensic image) and, by means of machine Finally, the expert will draw up a technical report learning techniques [18][19][20], now increasingly used aimed at showing the evidence found. in various scientific contexts [21][22][23][24][25][26], the data and images are classified and indexed [27][28], 3.2. Presentation of the evidence thus creating the most common artifacts from the source system. This phase is typically onerous from a computa- The presentation of the results of a forensic analysis is tional point of view and requires a fair amount of time, crucial to understand the behavior of the former em- which often clashes with the need for speed of an analysis. ployee. For this reason, new computing architectures are being The presentation takes place by means of the drafting studied, such as quantum computing [29] or computing of a technical-legal report, that brings together what was solutions based on multi-valued algebra (MVL) [30][31]. found with any specific violations identified. Once this phase has been completed, the analysis soft- Fig. 2 shows the structure of an expert report in its fun- ware allow access to the statistically most relevant be- damental sections. The aim is to highlight the evidence haviors of the former employee, such as: found, using a language suitable for understanding even for a non-technical reader such as a lawyer or a judge. 1. USB devices connected in the last working period of the former employee [32]; 2. files accessed and possibly copied to an external device or remotely, in the last working period; 3. cloud storage services used without authorization from the company; 4. emails containing company information sent to personal email addresses; 5. printing of company documentation. Obtained this minimum set of information, it is pos- sible to have sufficient elements to have the legitimate suspicion (if not proof) of the export of confidential and protected company data. However, this approach is not comprehensive because it considers the activities per- formed by employees on corporate devices and tools. As the internal network can be a valid vehicle for dis- seminating data that can also be used with non-company workstations, the analysis can also be extended to this potentially available class of activity. For example, the monitoring and logging tools of net- work activities (if present) allow to detect, even after- wards, the data read/copied by a specific user within the company storage, such as QRadar Risk manager, CA Spectrum and Netwrix Auditor [33]. It is important to underline that, the initial phase of crystallization of the entire amount of data, to which the Figure 2: Scheme for the presentation of the evidence former employee had access during the employment rela- tionship (forensic image), is fundamental both as a term of comparison for the research of the exported data, and The premise shows who conferred the assignment, the to demonstrate the origin of the data for which protection objective of the assignment and any other useful ele- is requested. ment to motivate the choices in the methodology adopted. All these analyzes are based on the employee using a Sometimes it is useful, already at this stage, to provide the digital data transfer. There are also analog modes (which reader with an anticipation of the evidence subsequently leave no trace) and are more difficult to detect, such as a found. screen photograph. The following sections are all intended for an expert However, it should also be considered that enterprise- in the field, therefore they technically describe the deve- level companies should adopt solutions to protect data lopment of the various operations. dynamically also based on the type of request. In fact, 4 Alessandro Simonetta et al. CEUR Workshop Proceedings 1–6 The description of the acquisition operations section by a former employee and what is the correct procedure contains all the elements necessary to verify the integrity that the forensic technician must follow to identify any and authenticity of what has been acquired, at any time offenses. The method of presenting the results allows after the presentation of the technical report. It is es- a non-technical reader to have all the necessary tools sential to enter in detail the picture of the acquisitions available to be able to act in the best possible way during performed, describing each intervention up to the cre- all the subsequent phases of the procedure, that will be ation of the forensic image. established against the former employee. The description of the analysis operations section de- scribes all the technical methods implemented for the analysis of the forensic images of the acquired finds. In References it, it is important to indicate the tools used to process the [1] R. Brighi, Informatica forense, algoritmi e garanzie data, but also the logical process used to examine them. processuali, Ars interpretandi, Rivista di ermeneu- We then arrive at the (technical-legal) section of presen- tica giuridica (2021). doi:10.7382/100798. tation of results. The goal is to describe, in a clear, linear [2] V. G. Calabro, La fragilità delle tracce digitali, Mas- and precise way, every evidence found and every ele- ter breve in Diritto e Tecnologie Informatiche (2009). ment useful to describe the events, creating a sort of [34] doi:10.13140/RG.2.2.27355.62240. timeline of them. It is also useful to insert fragments of [3] C. Galli, A. Giorgio, Others, L’acquisizione forense data, screenshot or reports extracted from forensic soft- delle prove in materia di violazione dei segreti azien- ware, in order to match the resulting evidence with the dali, in: Il nuovo diritto del know how e dei segreti underlying objective data. commerciali, Wolters Kluver, 2018. The technical-legal suggestions section contains any [4] European Union, Regulation 2016/679 (Gen- proposals or indications so that whoever is entitled to eral Data Protection Regulation), 2016. URL: a legal proceeding can act in the most appropriate and https://eur-lex.europa.eu/legal-content/EN/TXT/ technically correct manner. Furthermore, it is appropri- PDF/?uri=CELEX:32016R0679. ate to remember the presence of the touchstone created [5] G. Barrera, Il trattamento ai fini di ricerca dei (in the form of a forensic image) in order to identify the dati personali relativi a condanne penali e reati. material presumably extracted in the subsequent stages a proposito di gdpr, Rivista di studi e ricerche of the procedure to be established. sulla criminalità organizzata (2019). doi:10.13130/ Finally, in the conclusions, it is appropriate to enclose cross-11272. a broad summary of the previous sections, confirming [6] M. Gradi, L’obbligo di verità delle parti, ISBN the evidence found (anticipated in the introduction), pro- 9788892114036, G. Giappichelli Editore, 2018. viding a cross-section of the events and actions carried [7] L. Dittrich, L’esibizione delle prove, in: Diritto out by the former employee. Processuale Civile, Utet Giuridica, 2019. [8] L. Bartoli, La catena di custodia del materiale infor- 4. Conclusions matico: soluzioni a confronto, Universidad de La Laguna. Servicio de Publicaciones, España (2016). The Italian judicial system, in the civil field, does not URL: http://riull.ull.es/xmlui/handle/915/6247. have a specific reference standard for the management [9] M. Faiz, W. Prabowo, Comparison of acquisition of digital evidence, for this reason the methodologies ap- software for digital forensics purposes, 2018. doi:10. plied by professionals in the sector refer to international 22219/KINETIK.V4I1.687. standards, such as ISO/IEC 27037:2012. [10] E. Akbal, S. Dogan, Forensics image acquisition Furthermore, the methods of introducing digital data process of digital evidence, International Journal of into civil proceedings in Italy differ considerably from Computer Network & Information Security (2018). what happens overseas, however the method of acquir- [11] Insider threat statistics you should know, ing and analyzing the evidence remains valid in both 2021. URL: https://www.tessian.com/blog/ doctrines. insider-threat-statistics/. The statistics collected in the USA have shown an ever [12] Verizon 2021 breach investigations report, 2021. greater growth in data exfiltration from companies, iden- URL: https://www.verizon.com/business/en-sg/ tifying the unfaithful employee as the cause of greater resources/reports/dbir/. frequency. While, in Italy, we observed the same trend [13] Most common data exfiltration behaviors for the former employee who fraudulently commits the during insider threats in the united states same offense using the same techniques. in 2020, https://www.statista.com/statistics/ This article describes the methodology to be adopted 1155846/most-common-data-exfiltration-insider- to protect the company in the event of data exfiltration threat-types-usa/, 2020. 5 Alessandro Simonetta et al. CEUR Workshop Proceedings 1–6 [14] R. Avanzato, F. Beritelli, M. Russo, S. Russo, M. Vac- [25] F. Bonanno, G. Capizzi, L. G. Sciuto, A neuro caro, Yolov3-based mask and face recognition al- wavelet-based approach for short-term load fore- gorithm for individual protection applications, in: casting in integrated generation systems, in: 2013 CEUR Workshop Proceedings, 2020, pp. 41–45. International Conference on Clean Electrical Power [15] G. Capizzi, C. Napoli, S. Russo, M. Woźniak, Lessen- (ICCEP), 2013, pp. 772–776. doi:10.1109/ICCEP. ing stress and anxiety-related behaviors by means 2013.6586946. of ai-driven drones for aromatherapy, volume 2594, [26] F. Bonanno, G. Capizzi, G. Lo Sciuto, C. Napoli, 2020, pp. 7–12. Wavelet recurrent neural network with semi- [16] Popular computer forensics, 2021. URL: parametric input data preprocessing for micro-wind https://resources.infosecinstitute.com/topic/ power forecasting in integrated generation sys- computer-forensics-tools/. tems, 2015, pp. 602–609. doi:10.1109/ICCEP. [17] K. Ghazinour, D. M. Vakharia, K. C. Kannaji, 2015.7177554. R. Satyakumar, A study on digital forensic [27] G. C. Cardarilli, L. D. Nunzio, R. Fazzolari, D. Gi- tools, IEEE International Conference on Power, ardino, A. Nannarelli, M. Re, S. Spanò, A pseudo- Control, Signals and Instrumentation Engineering softmax function for hardware-based high speed (ICPCSI) (2017) 3136–3142. doi:10.1109/ICPCSI. image classification, Scientific Reports 11 (2021). 2017.8392304. doi:10.1038/s41598-021-94691-7. [18] R. M. A. Mohammad, M. Alqahtani, A com- [28] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana, parison of machine learning techniques for M. Woźniak, A novel neural networks-based tex- file system forensics analysis, Journal of ture image processing algorithm for orange defects Information Security and Applications 46 classification, Int. J. Comput. Sci. Appl. 13 (2016) (2019) 53–61. URL: https://www.sciencedirect. 45–60. com/science/article/pii/S2214212618307579. [29] S. K. Sharma, M. Khaliq, The role of quantum doi:10.1016/j.jisa.2019.02.009. computing in software forensics and digital ev- [19] C. Napoli, G. Pappalardo, E. Tramontana, A math- idence: Issues and challenges, in: Limitations ematical model for file fragment diffusion and a and Future Applications of Quantum Cryptogra- neural predictor to manage priority queues over phy, IGI Global, 2021, pp. 169–185. doi:10.4018/ bittorrent, International Journal of Applied Mathe- 978-1-7998-6677-0.ch009. matics and Computer Science 26 (2016) 147–160. [30] A. Simonetta, M. C. Paoletti, M. Muratore, A new [20] S. Spanò, G. C. Cardarilli, L. Di Nunzio, R. Fazzo- approach for designing of computer architectures lari, D. Giardino, M. Matta, A. Nannarelli, M. Re, using multi-value logic, International Journal on An efficient hardware implementation of rein- Advanced Science, Engineering and Information forcement learning: The q-learning algorithm, Technology (in press). IEEE Access 7 (2019) 186340–186351. doi:10.1109/ [31] A. Simonetta, M. C. Paoletti, Designing digital ACCESS.2019.2961174. circuits in multi-valued logic, International [21] A. A. Jaber, R. Bicker, Fault diagnosis of industrial Journal on Advanced Science, Engineer- robot gears based on discrete wavelet transform and ing and Information Technology 8 (2018) artificial neural network, Insight 58 (2016) 179–186. 1166–1172. URL: http://ijaseit.insightsociety. doi:10.1784/INSI.2016.58.4.179. org/index.php?option=com_content&view= [22] M. Wozniak, D. Polap, G. Borowik, C. Napoli, A article&id=9&Itemid=1&article_id=5966. first attempt to cloud-based user verification in doi:10.18517/ijaseit.8.4.5966. distributed system, in: 2015 Asia-Pacific Confer- [32] A. Neyaz, N. Shashidhar, Usb artifact analysis ence on Computer Aided System Engineering, IEEE, using windows event viewer, registry and file 2015, pp. 226–231. system logs, Electronics (2019). doi:10.3390/ [23] A. A. Jaber, A. Saleh, H. F. M. Ali, Prediction of electronics8111322. hourly cooling energy consumption of educational [33] A. Khurat, P. Sangkhachantharanan, An au- buildings using artificial neural network, Inter- tomatic networking device auditing tool based national Journal on Advanced Science, Engineer- on cis benchmark, 18th International Con- ing and Information Technology 9 (2019) 159–166. ference on Electrical Engineering/Electronics, doi:10.18517/IJASEIT.9.1.7351. Computer, Telecommunications and Information [24] A. A. Jaber, K. M. Ali, Artificial neural network Technology (ECTI-CON) (2021). doi:10.1109/ based fault diagnosis of a pulley-belt rotating sys- ECTI-CON51831.2021.9454830. tem, International Journal on Advanced Science, [34] V. Calabrò, P. D. Checco, B. Fiammella, La time- Engineering and Information Technology 9 (2019) line: aspetti tecnici e rilevanza processuale, IISFA 544–551. doi:10.18517/IJASEIT.9.2.7426. Memberbook (2011). 6