=Paper= {{Paper |id=Vol-1606/paper7 |storemode=property |title=NormaSearch: a Big Data Application for Financial Services. |pdfUrl=https://ceur-ws.org/Vol-1606/paper07.pdf |volume=Vol-1606 |authors=Ylenia Maruccia,Giovanni Pansini,Gloria Polimeno,Felice Vitulano |dblpUrl=https://dblp.org/rec/conf/finrec/MarucciaPPV16 }} ==NormaSearch: a Big Data Application for Financial Services.== https://ceur-ws.org/Vol-1606/paper07.pdf
       N ORMA S EARCH: a Big Data application for financial
                           services.
              Ylenia Maruccia1 and Giovanni Pansini1 and Gloria Polimeno1 and Felice Vitulano1


Abstract. In the recent years banking and financial markets are                In these years banking and financial markets firms are continuing
trying to learn how Big Data can help to transform their processes         to learn how Big Data can help to transform their processes and orga-
and organizations, improving customer intelligence, reducing risks,        nizations. In particular, for banks, Big Data initiatives predominately
and meeting regulatory objectives. The collection and the analysis of      still revolve around improving customer intelligence, reducing risk,
new legislations, understanding if they are introducing new aspects        and meeting regulatory objectives.
with potential impacts on different fields, could be the basis of a sys-       Machine learning techniques, for example, can be applied within
tem able to give support in the strategic decision making process and      the fraud and risk sectors, improving models and allowing accelera-
to evaluate the potential impacts on both management and strategic         tion towards more real-time analysis and alerting. Finding new legis-
activities. Here we want to present NormaSearch, a Big Data appli-         lations, understanding what are the differences with the existing ones
cation developed by Exprivia, an international leading company in          and/or if new aspects have been introduced, can be a very important
Italy in the process consulting, technology services and information       challenge in this field of application, with the purpose of evaluating
technology solutions. NormaSearch is able to analyse specifical in-        the potential impacts on both management and strategic activities and
formation taken from the web, both in a structured and unstructured        of giving support in taking those strategic decisions that could min-
form, and its application in the financial fields.                         imise potential costs.
                                                                               Here we want to present our solution, called NormaSearch, devel-
                                                                           oped in Exprivia to manage the data generated from different sources
1     Introduction                                                         and coming mainly in an unstructured format, at the aim of adapting
The last years have seen a continuous increase of data generated in        it in the banking system. In Section 2 it will be described the scenar-
many fields, from science to social life, passing through industries,      ios in which NormaSearch could be applied, while in Sections 3 and
which we refer to as “Big Data”. Each actor, e.g. individual, adminis-     4 it will be presented the application with its component.
tration, organization or business, is a producer of new forms of data,
both in structured or unstructured form: they can be personal data,
conversations on social network, medical data, meteorological infor-
                                                                           2    Scenarios
mation, shared photos, and so on.                                          The financial crisis and the speculative use of the derivative instru-
   The challenge of both public and private companies is to easily         ments has placed the reform of the derivative markets “Over The
manage these huge amount of data with improved technologies, dif-          Counter” among the priorities of the legislature in terms of standard
ferent from the traditional ones, and extract knowledge from all these     negotiation procedures, as well as more stringent rules pertaining to
hidden information.                                                        the capitalization of financial intermediaries:
   The main characteristics of Big Data are that they are too big,
move too fast and do not fit the structures of traditional database
                                                                            • In terms of rules designed to standardise the trading of OTC
architectures, so new technologies are necessaries to manage them.
                                                                              derivatives, it has been promulgated different regulations, such as
Moreover, one of the main difficulties, as aforementioned, is the for-
                                                                              EMIR/DOTT Frank Act (European Market Infrastructure Regula-
mat in which all the information are generated: in particular, they
                                                                              tion), that revived the role of the Central Counter-Parties (CCPs),
can be in a structured, unstructured or semi-structured form. So new
                                                                              with the aim of increasing transparency and reduce both the coun-
forms of databases, programming languages and hardware architec-
                                                                              terpart risk and the operational one (see Fig. 1).
tures are used to either store Big Data or to transform it from un-
                                                                            • To ensure the soundness of the banking system, the Basel agree-
structured or semi-structured format into a well-structured one, with
                                                                              ments require the banks of the leading world countries some limits
consequences in many fields of application.
                                                                              about their operational activities, especially regarding the amount
   According to [1], Big Data help to better listen to customers, un-
                                                                              of assets which they have to equip themselves for their clients’
derstand their ways of using services and hence the offer, simplifying
                                                                              protection, thus allowing the capitalization of banks (and, conse-
also the decision making process. To this aim, an important role is
                                                                              quently, liquidity guarantees), to guarantee the operations - collec-
played by those applications that tailor the information based on the
                                                                              tion, financing an investment - put in place with customers.
needs of the customers. Technologies such as Recommender System
are now used by many brands with the aim of suggesting products or
                                                                              Therefore, it can be deduces as today a Financial Intermediary is
services which a user may be interested in.
                                                                           called to observe the dictates imposed by the regulations in the area
1     Exprivia Spa, Bari - Italy, email:{ylenia.maruccia,           gio-   of interest, involving adjustments to the operational processes and/or
    vanni.pansini,gloria.polimeno, felice.vitulano}@exprivia.it            IT architectures, in compliance with regulations.




                                                                           29
                                                   Figure 1.   The role of the Central Counter-Parties.



   The granularity and, at the same time, the complexity of these reg-             machine training activity by examples (weakly supervised train-
ulations, necessitate a constant attention and monitoring of them, in              ing); the system is able to consult a set of predetermined sites
order to anticipate future changes, or integrations, or evolutions.                (authoritative sites) or even the whole www.
   In this context it engages the idea of providing a machine learn-             • Identify, in every web page retrieved, the individual portions of
ing tool which, through the analysis of the newly introduced legisla-              text (HTML page section) in which are expressed the sophisti-
tion (or in the approval process) and/or the changes in the require-               cated concepts, by associating a percentage indicator of relevance
ments previously promulgated (detectable by special certificates in-               to such concepts with each section identified.
ternet sites), may provide guidance on the bank process involved and,            • Automatically classify and organize web sites and pages that be-
therefore, indicate with almost predictive function the impacts on the             long to them according to a predetermined conceptual taxonomy
IT application, in terms of changes or new implementations, in sup-                or derivable during the training phase machine.
port of the above processes.                                                     • Filter, as needed, specific types of web sites that tend to generate
   To this aim, we developed a Big Data application that is able to                noise, such as for example search engines based on search engine
analyse specifical information, both in a structured and unstructured              spamming techniques.
form, taken from the web. This application, called NormaSearch, is               • Identify only new content found on each new consultation.
described in the next section.                                                   • Present the results through a simple web interface or as a report
                                                                                   directly downloadable from the interface; reports can also be sent
                                                                                   from the application via e-mail.
3   NormaSearch Functionalities                                                  • Independently identify potentially authoritative sites and recog-
                                                                                   nise inactivity of authoritative sites.
As aforementioned, NormaSearch is a Big Data application that al-
lows to browse the web, search and analyse specifical information on
the bases of given rules that are defined by the user. The application           4   NormaSearch Architecture
has two distinct sections, one of Administration and one of Fruition:            In Fig. 2 it is shown the architecture of this application.
the first allows to train to recognise and classify the information of its           It is made up of two main components, a client and a server ones,
own interest through a series of examples (weakly supervising train-             both described below.
ing), while the second allows users the analysis of the sites collected              NormaSearch Client. It is specialised on the interaction with the
independently by the system. Once trained, the application allows to             user and the transmission of user requests to the server component.
analyse web pages and documents in the sites, news group, blogs,                 It is structured into two main parts:
forums and so on, according to a process specifically designed for
linguistic and conceptual analysis of online content.                            • A fruition console (in Fig. 2, Retr.UI). Here the user can manage
   This process allows to achieve an optimum precision to coverage                 documents and decide which of them have to be processed, or
ratio in the search, as well as to limit in an important way the amount            could be useful for the launch of new experimental projects on
of downloaded web pages and, consequently, the hardware resource                   specific themes, or dismissed.
consumption.                                                                     • An administration console (in Fig. 2, Admin. Ui). Here the user
   The application operates their own research in an incremental way:              can define the security rules, the loader dedicated to the web mon-
by doing so, the pages are presented to the user only in the case in               itoring and to the retrieval of documents of interest, the definition
which they have never previously been recognised or, in the case                   of categories and subcategories of the safety rules through which
where the content is changed. In details, the application is therefore             the conceptual framework of the rule itself is defined in terms of
able to:                                                                           topics and the organization of them, the training of the security
                                                                                   rules and the related conceptual categories. Moreover, here it can
• Refer autonomously a set of sites, blogs, forums and so on, look-                be also specified new projects where where the user can insert
  ing for info about a set of concepts of interests identified by the              additional or parallel categories to the security rules established

                                                                             2


                                                                             30
                                                     Figure 2. Norma Search Architecture.



  above, in order to look “cool stuff” that can be used as a starting            ames2 , it annotates the documents gathering: nations, regions,
  point to expand the research in the field of banking regulation or             cities, airport, port and generically geographic points of interest.
  to add/modify existing safety rules.                                           A kml export of single of multiple document, is provided when a
                                                                                 WMS3 compliant system is integrated.
   NormaSearch Server. It is specialized on the receipt of the users’
requests and the sorting of the same to the server components, these         5    Conclusion
ones suitable of taking charge of specific requests. As NormaSearch
Client, it has two components: the administration component and the          Big Data are changing the industrial world and, for this reason, all
fruition one, designed to manage requests from the administration            kind of companies need to be capable of managing this huge amount
and fruition consoles, respectively, and send them to the specified          of data and to extract useful information from them. This great in-
server components.                                                           terest in Big Data is present also in the financial services, which
   At the moment NormaSearch is in productive use by an Italian              can obtain important information from the analysis of both struc-
bank. Moreover, in NormaSearch Server there is an important soft-            tured and unstructured data. It is also important to have these infor-
ware component, called Big Knowledge [4] , developed by Exprivia.            mation in an useful time, in order to prevent losses and to predict
   Big Knowledge is able to manage both structured and unstructured          important event before they happen. In this paper we discussed a Big
data and, as it can be seen in Fig. 2, it is made up by six main com-        Data solution in financial service field developed by Exprivia. It is
ponents:                                                                     called NormaSearch and it aims at predicting the impact of a legisla-
                                                                             tion change, or the introducing of a new one. After a brief introduc-
                                                                             tion about Big data and machine learning technologies, we described
• DAC The Data Access Component is the centralized component
                                                                             NormaSearch and its components and how it works. It can be anal-
  to access data, either in a DBMS or, in the BigData cases Solr.
                                                                             ysed the new introduced legislation and also provided a guidance on
• Document Manager It is the component through which BK is
                                                                             the bank process involved. It can be predicted, in particular, the im-
  able to convert the document provided in textual form, and clean
                                                                             pacts on the IT application in support of that bank process. Moreover,
  it from useless portion of text (e.g. html banners).
                                                                             in this paper it has been described an important Big Data solution that
• Information Extraction Manager It annotates the document, ex-
                                                                             is present in NormaSearch. It is called Big Knowledge and it is com-
  tracting relevant information like Named Entity Recognition by
                                                                             posed by six components that speak together in order to manage all
  the usage of Finite State Automata, and elements inside custom
                                                                             the documents in input and extract important information that can be
  gazeteers.
                                                                             then classified. This paper showed how a Big Data solution can be
• Clusterer It is dedicated to the extraction of conceptual groups
                                                                             useful in financial field and can predict important information in an
  (clusters). These ones are automatically extracted using advanced
                                                                             useful time in the strategic decision making process.
  techniques of NLP (Natural Language Processing) based on
  Latent Semantic Analysis[2] (LSA) and a Markov clustering
  algorithm[3]. The generated clusters are crucial for the training          REFERENCES
  tuning of the system.
• Categorizer It is intended for the automatic classification and or-        [1] C.L.P. Chen and C.Y. Zhang, ‘Data-intensive applications, challenges,
                                                                                 techniques and technologies: A survey on Big Data’, Informatics and
  ganization of the document according to a conceptual taxonomy                  Computer Science Intelligent Systems Applications, 275.
  expressed as a set of clusters and constructed manually or semi-
  automatically by describing the category with a small text.                2 http://www.geonames.org/
                                                                             3 http://www.opengeospatial.org/standards/wms
• Geo Recognizer Using a gazeteers of places kept from Geon-

                                                                        3


                                                                        31
[2] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and
    R. Harshman, ‘Indexing by latent semantic analysis’, Journal of the
    American Society for Information Science.
[3] Stijn van Dongen, Graph Clustering by Flow Simulation, PhD thesis,
    University of Utrecht, 2010.
[4] F. Vitulano, M. Cammisa, and Y.Maruccia, ‘Unleashing big data power
    for sea emergency control’, in Proceedings Tethys 2015.




                                                                          4

                                                                          32