=Paper= {{Paper |id=Vol-2621/CIRCLE20_35 |storemode=property |title=Prediction and Visual Intelligence for Security Information: The PREVISION H2020 Project |pdfUrl=https://ceur-ws.org/Vol-2621/CIRCLE20_35.pdf |volume=Vol-2621 |authors=Konstantinos Demestichas,Thi Bich Ngoc Hoang,Josiane Mothe,Olivier Teste,Md Zia Ullah |dblpUrl=https://dblp.org/rec/conf/circle/DemestichasHMTU20 }} ==Prediction and Visual Intelligence for Security Information: The PREVISION H2020 Project== https://ceur-ws.org/Vol-2621/CIRCLE20_35.pdf
       Prediction and Visual Intelligence for Security Information:
                     The PREVISION H2020 Project
        Konstantinos Demestichas                                   Thi Bich Ngoc Hoang                                Josiane Mothe
      Institute of Communication and                           IRIT UMR5505 CNRS, Toulouse,               IRIT UMR5505 CNRS, INSPE, Univ. de
     Computer Systems, Athens, Greece                        France; Danang Univ. of Economics,                        Toulouse
            cdemest@cn.ntua.gr                                           Vietnam                                 Josiane.Mothe@irit.fr
                                                                Thi-Bich-Ngoc.Hoang@irit.fr

                                               Olivier Teste                                    Md Zia Ullah
                                  IRIT UMR5505 CNRS, Univ. de                         IRIT UMR5505 CNRS, Toulouse,
                               Toulouse, France. Olivier.Teste@irit.fr                  France. mdzia.ullah@irit.fr
ABSTRACT                                                                                  • SO-2: Semantically integrate heterogeneous data streams
This paper presents the on going work within PREVISION H2020                                delivering powerful knowledge graphs combined with ad-
project. The mission of PREVISION is to empower the analysts and                            vanced reasoning and machine learning engines,
investigators of agencies with tools and solutions not commercially                       • SO-3: Configure and tailor situation awareness enabling tech-
available today, to handle and capitalize on the massive heteroge-                          niques and applications to meet specific operational needs
neous data streams that must be processed during complex crime                              of LEAs and address human factors,
investigations and threat risk assessments.                                               • SO-4: Integrate and deploy the developed functions and ca-
                                                                                            pabilities into a common platform architecture, making it
CCS CONCEPTS                                                                                available to end-users for thorough validation,
                                                                                          • SO-5: Demonstrate and evaluate the developed technologies
• Information systems → Information integration; • Computing
                                                                                            in realistic cases, organize relevant training activities and
methodologies → Machine learning algorithms.
                                                                                            create a framework for the transfer of knowledge in the use
KEYWORDS                                                                                    of PREVISION tools from one LEA to another,
                                                                                          • SO-6: Ensure compliance with the legal, ethical, privacy, so-
Data stream management, Data heterogeneity, Cybercrime, Social                              cietal and court-acceptance guidelines and EU best practices,
media analysis, Data fusion, Linguistic analysis, Machine Learning                        • SO-7: Ensure the high multi-dimensional impact, continuity
                                                                                            and business perspective of project results and allow for
1    INTRODUCTION                                                                           incremental investments.
http: The emerging threats caused by terrorism, organized crime
and cybercrime as interlinked cross-border challenges are showing
how important a joint European answer to these threats is. Espe-
cially, the protection of so-called soft targets is a challenge for LEAs
(Law Enforcement Agencies). In these complex cases, the inves-
tigators are also more and more confronted with huge amounts
of data, which have to be analysed in a short time. The heteroge-
neous nature of these data streams forces the LEAs to link together
and priorities in order to be able to understand and analyse them.
PREVISION intends to improve LEAs operational capacities and
capabilities by providing a unique and innovative platform. This                         Figure 1: Overview of special objectives of PREVISION.
platform is built by 28 consortium partners (IT companies, uni-
versities, research centers...) from 13 different European countries.
Moreover, some results inherited from other PREVISION partners’                      3    USE CASES AND DATA
projects.                                                                            Five use-cases have been developed. In each of them, the LEAs
                                                                                     described a typical case, according to their interests. These use-
2    PREVISION OBJECTIVES                                                            cases will be the basis for any testing done within the framework of
Therefore, PREVISION has seven specific and measurable objectives                    the project. When defining initial use-cases the LEAs also describe
(SO), which are described in Figure 1 and are as follows:                            there currently implemented procedures and structures. This will
    • SO-1: Deliver an open, scalable and customizable toolset that                  be the basis for identifying problems and weaknesses of the current
       provides support for extreme-scale data streams analytics,                    procedures. The reported timeline of the events will be useful to
                                                                                     provide an overview of the particular data sources selected for the
"Copyright © 2020 for this paper by its authors. Use permitted under Creative Com-
mons License Attribution 4.0 International (CC BY 4.0)."                             specific investigation that the PREVISION platform has to be able
rt                                                                                   to analyse. LEAs also identified all end-user requirements for the
CIRCLE’20, July 06–09, 2020, Samatan, France                         Konstantinos Demestichas, Thi Bich Ngoc Hoang, Josiane Mothe, Olivier Teste, and Md Zia Ullah


development of the PREVISION platform. By distinguishing be-
tween the different requirement categories “Security”, “Functional”,
“Operational” and “Communication”, a complex picture of all needs
will be compiled. To ensure interoperability of the requirements
they have been prioritized using the MoSCoW methodology (Must
have, Should have, Could have, and Won’t have) [5]. The initial
uses cases are briefly detailed in Table 1.


    Table 1: Topics of the use cases defined by LEA partners.                      Figure 2: Schema independent querying component flow
        Topic
 UC1 Soft targets protection –Attempted terrorist attack at stadium               5    CONCLUSIONS
 UC2 Radicalization detection and terrorist threat prevention –Terrorist
                                                                                 PREVISION is a two year long project that gathers both LEAs and
     threats at EU summit
                                                                                 top level technical partners including IT companies and universi-
 UC3 Financial crime investigation –Detection of fraudulent companies
                                                                                 ties/labs. This helps to strengthen collaborations of experts from
 UC4 Fighting cyber-enabled crime –CNP fraud as terrorist act facilitator
                                                                                 several disciplines. However, differences among viewpoints, or dif-
 UC5 Illicit markets investigation –trafficking of cultural goods
                                                                                 ficulty in seamless integration among partner’s modules might
                                                                                 happen during the project implementation.
                                                                                    The results of the project will be an open and future-proof plat-
   The necessity of analyzing big data coming from diverse sources               form which handle and capitalize on the massive heterogeneous
such as camera devices, deep web, dark web, etc. has become a big                data streams processed during complex crime investigations and
challenge in the field of security and are indeed necessary to develop           threat risk assessments. This platform will provide cutting-edge
the five use cases the project targets. These data sources are of                practical support to LEAs in their fight against terrorism, organized
three main types: Video surveillance cameras, Deep/Dark/Shallow                  crime and cybercrime. Results will be made publicly available for
web, Social Networks data. Indeed, the collection of the data that               those that can be but also will serve the LEAs in their daily work.
will be used by PREVISION’s platform includes datasets crawled                   A workshop was organized earlier this year on related topics [3].
from deep/dark/shallow web. These data sets are textual-based                       Ethical issues. in order to achieve its purpose PREVISION will
pseudonymized data sets. PREVISION also considers visual content                 process big amounts of heterogeneous data including personal
generated by CCTVs or video files as well as social network data.                data and carry out research with humans (interviews, surveys,
                                                                                 workshops etc.). This arises key ethical issues. Moreover, there is
4     HANDLING HETEROGENEOUS DATA                                                a risk of misuse of research results for unethical purposes. These
These heterogeneous data sources need to be managed and to be                    issues are carefully taken into account in the project.
                                                                                    Acknowlegments. This work has been performed in the context of
analyzed in a short time for the vast amount of data. NoSQL data
                                                                                  the PREVISION project, which has received funding from the European
stores are well-tailored to efficiently load and manage massive col-              Union’s Horizon 2020 research and innovation programme under GA No
lections of heterogeneous data without any structural validation                  833115. The paper reflects the authors’ view and the Commission is not
(shemaless principle). This flexibility becomes a serious challenge               responsible for any use that may be made of the information it contains.
when querying data; i.e. users have both to build queries taking into
account multi-structured datasets and reformulate existing queries                REFERENCES
whenever new structures are introduced. This also implies to set                  [1] Hamdi Ben Hamadou, Faiza Ghozzi, André Péninou, and Olivier Teste. 2019.
up modules for homogenizing the data search and analysis. Among                       Schema-independent querying for heterogeneous collections in NoSQL document
                                                                                      stores. Inf. Syst. 85 (2019), 48–67. https://doi.org/10.1016/j.is.2019.04.005
them, we will develop a component following the approach devel-                   [2] Thi Bich Ngoc Hoang. 2020. Topical Community Detection: an Embedding User
oped by Ben Hamadou et al. [1] for building schema-independent                        and Content Similarity Method. In [3]. 1–7.
                                                                                  [3] Thi Bich Ngoc Hoang, Pascal Marchand, Béatrice Milard, and Josiane Mothe. 2020.
queries, which is designed to query multi-structured datasets into                    Workshop on Machine Learning for Trend and Weak Signal Detection in Social
NoSQL document stores such as MongoDB. This component au-                             Networks and Social Media, Toulouse, France, Feb. 27-28, 2020, Proceedings.
tomates the process of query reformulation via a set of rules that                [4] George Kalpakis and et al. 2019. Identifying Terrorism-Related Key Actors in
                                                                                      Multidimensional Social Networks. In MultiMedia Modeling. 93–105.
reformulate most document store operators (select, project, un-                   [5] Eduardo Miranda. 2011. Time boxing planning: buffered moscow rules. ACM
wind, aggregate and lookup). The component then produces queries                      SIGSOFT Software Engineering Notes 36 (11 2011), 1–5.
across multi-structured documents, which are compatible with the
native query engine (MongoDB) of the underlying document store.
The schema of this component is presented in the Figure 2.
   Community detection and key actor identification framework
is one of the tool of the PREVISION platform; preliminary result
has already been proposed [2, 4]. PREVISION linguistic analysis is
based on multiple entities and multiple languages. Social analytics
services could be able to consider proposed linguistic features, as
the outcome of the deep linguistic analysis.