=Paper=
{{Paper
|id=Vol-2621/CIRCLE20_35
|storemode=property
|title=Prediction and Visual Intelligence for Security Information: The PREVISION H2020 Project
|pdfUrl=https://ceur-ws.org/Vol-2621/CIRCLE20_35.pdf
|volume=Vol-2621
|authors=Konstantinos Demestichas,Thi Bich Ngoc Hoang,Josiane Mothe,Olivier Teste,Md Zia Ullah
|dblpUrl=https://dblp.org/rec/conf/circle/DemestichasHMTU20
}}
==Prediction and Visual Intelligence for Security Information: The PREVISION H2020 Project==
Prediction and Visual Intelligence for Security Information: The PREVISION H2020 Project Konstantinos Demestichas Thi Bich Ngoc Hoang Josiane Mothe Institute of Communication and IRIT UMR5505 CNRS, Toulouse, IRIT UMR5505 CNRS, INSPE, Univ. de Computer Systems, Athens, Greece France; Danang Univ. of Economics, Toulouse cdemest@cn.ntua.gr Vietnam Josiane.Mothe@irit.fr Thi-Bich-Ngoc.Hoang@irit.fr Olivier Teste Md Zia Ullah IRIT UMR5505 CNRS, Univ. de IRIT UMR5505 CNRS, Toulouse, Toulouse, France. Olivier.Teste@irit.fr France. mdzia.ullah@irit.fr ABSTRACT • SO-2: Semantically integrate heterogeneous data streams This paper presents the on going work within PREVISION H2020 delivering powerful knowledge graphs combined with ad- project. The mission of PREVISION is to empower the analysts and vanced reasoning and machine learning engines, investigators of agencies with tools and solutions not commercially • SO-3: Configure and tailor situation awareness enabling tech- available today, to handle and capitalize on the massive heteroge- niques and applications to meet specific operational needs neous data streams that must be processed during complex crime of LEAs and address human factors, investigations and threat risk assessments. • SO-4: Integrate and deploy the developed functions and ca- pabilities into a common platform architecture, making it CCS CONCEPTS available to end-users for thorough validation, • SO-5: Demonstrate and evaluate the developed technologies • Information systems → Information integration; • Computing in realistic cases, organize relevant training activities and methodologies → Machine learning algorithms. create a framework for the transfer of knowledge in the use KEYWORDS of PREVISION tools from one LEA to another, • SO-6: Ensure compliance with the legal, ethical, privacy, so- Data stream management, Data heterogeneity, Cybercrime, Social cietal and court-acceptance guidelines and EU best practices, media analysis, Data fusion, Linguistic analysis, Machine Learning • SO-7: Ensure the high multi-dimensional impact, continuity and business perspective of project results and allow for 1 INTRODUCTION incremental investments. http: The emerging threats caused by terrorism, organized crime and cybercrime as interlinked cross-border challenges are showing how important a joint European answer to these threats is. Espe- cially, the protection of so-called soft targets is a challenge for LEAs (Law Enforcement Agencies). In these complex cases, the inves- tigators are also more and more confronted with huge amounts of data, which have to be analysed in a short time. The heteroge- neous nature of these data streams forces the LEAs to link together and priorities in order to be able to understand and analyse them. PREVISION intends to improve LEAs operational capacities and capabilities by providing a unique and innovative platform. This Figure 1: Overview of special objectives of PREVISION. platform is built by 28 consortium partners (IT companies, uni- versities, research centers...) from 13 different European countries. Moreover, some results inherited from other PREVISION partners’ 3 USE CASES AND DATA projects. Five use-cases have been developed. In each of them, the LEAs described a typical case, according to their interests. These use- 2 PREVISION OBJECTIVES cases will be the basis for any testing done within the framework of Therefore, PREVISION has seven specific and measurable objectives the project. When defining initial use-cases the LEAs also describe (SO), which are described in Figure 1 and are as follows: there currently implemented procedures and structures. This will • SO-1: Deliver an open, scalable and customizable toolset that be the basis for identifying problems and weaknesses of the current provides support for extreme-scale data streams analytics, procedures. The reported timeline of the events will be useful to provide an overview of the particular data sources selected for the "Copyright © 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0)." specific investigation that the PREVISION platform has to be able rt to analyse. LEAs also identified all end-user requirements for the CIRCLE’20, July 06–09, 2020, Samatan, France Konstantinos Demestichas, Thi Bich Ngoc Hoang, Josiane Mothe, Olivier Teste, and Md Zia Ullah development of the PREVISION platform. By distinguishing be- tween the different requirement categories “Security”, “Functional”, “Operational” and “Communication”, a complex picture of all needs will be compiled. To ensure interoperability of the requirements they have been prioritized using the MoSCoW methodology (Must have, Should have, Could have, and Won’t have) [5]. The initial uses cases are briefly detailed in Table 1. Table 1: Topics of the use cases defined by LEA partners. Figure 2: Schema independent querying component flow Topic UC1 Soft targets protection –Attempted terrorist attack at stadium 5 CONCLUSIONS UC2 Radicalization detection and terrorist threat prevention –Terrorist PREVISION is a two year long project that gathers both LEAs and threats at EU summit top level technical partners including IT companies and universi- UC3 Financial crime investigation –Detection of fraudulent companies ties/labs. This helps to strengthen collaborations of experts from UC4 Fighting cyber-enabled crime –CNP fraud as terrorist act facilitator several disciplines. However, differences among viewpoints, or dif- UC5 Illicit markets investigation –trafficking of cultural goods ficulty in seamless integration among partner’s modules might happen during the project implementation. The results of the project will be an open and future-proof plat- The necessity of analyzing big data coming from diverse sources form which handle and capitalize on the massive heterogeneous such as camera devices, deep web, dark web, etc. has become a big data streams processed during complex crime investigations and challenge in the field of security and are indeed necessary to develop threat risk assessments. This platform will provide cutting-edge the five use cases the project targets. These data sources are of practical support to LEAs in their fight against terrorism, organized three main types: Video surveillance cameras, Deep/Dark/Shallow crime and cybercrime. Results will be made publicly available for web, Social Networks data. Indeed, the collection of the data that those that can be but also will serve the LEAs in their daily work. will be used by PREVISION’s platform includes datasets crawled A workshop was organized earlier this year on related topics [3]. from deep/dark/shallow web. These data sets are textual-based Ethical issues. in order to achieve its purpose PREVISION will pseudonymized data sets. PREVISION also considers visual content process big amounts of heterogeneous data including personal generated by CCTVs or video files as well as social network data. data and carry out research with humans (interviews, surveys, workshops etc.). This arises key ethical issues. Moreover, there is 4 HANDLING HETEROGENEOUS DATA a risk of misuse of research results for unethical purposes. These These heterogeneous data sources need to be managed and to be issues are carefully taken into account in the project. Acknowlegments. This work has been performed in the context of analyzed in a short time for the vast amount of data. NoSQL data the PREVISION project, which has received funding from the European stores are well-tailored to efficiently load and manage massive col- Union’s Horizon 2020 research and innovation programme under GA No lections of heterogeneous data without any structural validation 833115. The paper reflects the authors’ view and the Commission is not (shemaless principle). This flexibility becomes a serious challenge responsible for any use that may be made of the information it contains. when querying data; i.e. users have both to build queries taking into account multi-structured datasets and reformulate existing queries REFERENCES whenever new structures are introduced. This also implies to set [1] Hamdi Ben Hamadou, Faiza Ghozzi, André Péninou, and Olivier Teste. 2019. up modules for homogenizing the data search and analysis. Among Schema-independent querying for heterogeneous collections in NoSQL document stores. Inf. Syst. 85 (2019), 48–67. https://doi.org/10.1016/j.is.2019.04.005 them, we will develop a component following the approach devel- [2] Thi Bich Ngoc Hoang. 2020. Topical Community Detection: an Embedding User oped by Ben Hamadou et al. [1] for building schema-independent and Content Similarity Method. In [3]. 1–7. [3] Thi Bich Ngoc Hoang, Pascal Marchand, Béatrice Milard, and Josiane Mothe. 2020. queries, which is designed to query multi-structured datasets into Workshop on Machine Learning for Trend and Weak Signal Detection in Social NoSQL document stores such as MongoDB. This component au- Networks and Social Media, Toulouse, France, Feb. 27-28, 2020, Proceedings. tomates the process of query reformulation via a set of rules that [4] George Kalpakis and et al. 2019. Identifying Terrorism-Related Key Actors in Multidimensional Social Networks. In MultiMedia Modeling. 93–105. reformulate most document store operators (select, project, un- [5] Eduardo Miranda. 2011. Time boxing planning: buffered moscow rules. ACM wind, aggregate and lookup). The component then produces queries SIGSOFT Software Engineering Notes 36 (11 2011), 1–5. across multi-structured documents, which are compatible with the native query engine (MongoDB) of the underlying document store. The schema of this component is presented in the Figure 2. Community detection and key actor identification framework is one of the tool of the PREVISION platform; preliminary result has already been proposed [2, 4]. PREVISION linguistic analysis is based on multiple entities and multiple languages. Social analytics services could be able to consider proposed linguistic features, as the outcome of the deep linguistic analysis.