=Paper=
{{Paper
|id=Vol-1606/paper7
|storemode=property
|title=NormaSearch: a Big Data Application for Financial Services.
|pdfUrl=https://ceur-ws.org/Vol-1606/paper07.pdf
|volume=Vol-1606
|authors=Ylenia Maruccia,Giovanni Pansini,Gloria Polimeno,Felice Vitulano
|dblpUrl=https://dblp.org/rec/conf/finrec/MarucciaPPV16
}}
==NormaSearch: a Big Data Application for Financial Services.==
N ORMA S EARCH: a Big Data application for financial
services.
Ylenia Maruccia1 and Giovanni Pansini1 and Gloria Polimeno1 and Felice Vitulano1
Abstract. In the recent years banking and financial markets are In these years banking and financial markets firms are continuing
trying to learn how Big Data can help to transform their processes to learn how Big Data can help to transform their processes and orga-
and organizations, improving customer intelligence, reducing risks, nizations. In particular, for banks, Big Data initiatives predominately
and meeting regulatory objectives. The collection and the analysis of still revolve around improving customer intelligence, reducing risk,
new legislations, understanding if they are introducing new aspects and meeting regulatory objectives.
with potential impacts on different fields, could be the basis of a sys- Machine learning techniques, for example, can be applied within
tem able to give support in the strategic decision making process and the fraud and risk sectors, improving models and allowing accelera-
to evaluate the potential impacts on both management and strategic tion towards more real-time analysis and alerting. Finding new legis-
activities. Here we want to present NormaSearch, a Big Data appli- lations, understanding what are the differences with the existing ones
cation developed by Exprivia, an international leading company in and/or if new aspects have been introduced, can be a very important
Italy in the process consulting, technology services and information challenge in this field of application, with the purpose of evaluating
technology solutions. NormaSearch is able to analyse specifical in- the potential impacts on both management and strategic activities and
formation taken from the web, both in a structured and unstructured of giving support in taking those strategic decisions that could min-
form, and its application in the financial fields. imise potential costs.
Here we want to present our solution, called NormaSearch, devel-
oped in Exprivia to manage the data generated from different sources
1 Introduction and coming mainly in an unstructured format, at the aim of adapting
The last years have seen a continuous increase of data generated in it in the banking system. In Section 2 it will be described the scenar-
many fields, from science to social life, passing through industries, ios in which NormaSearch could be applied, while in Sections 3 and
which we refer to as “Big Data”. Each actor, e.g. individual, adminis- 4 it will be presented the application with its component.
tration, organization or business, is a producer of new forms of data,
both in structured or unstructured form: they can be personal data,
conversations on social network, medical data, meteorological infor-
2 Scenarios
mation, shared photos, and so on. The financial crisis and the speculative use of the derivative instru-
The challenge of both public and private companies is to easily ments has placed the reform of the derivative markets “Over The
manage these huge amount of data with improved technologies, dif- Counter” among the priorities of the legislature in terms of standard
ferent from the traditional ones, and extract knowledge from all these negotiation procedures, as well as more stringent rules pertaining to
hidden information. the capitalization of financial intermediaries:
The main characteristics of Big Data are that they are too big,
move too fast and do not fit the structures of traditional database
• In terms of rules designed to standardise the trading of OTC
architectures, so new technologies are necessaries to manage them.
derivatives, it has been promulgated different regulations, such as
Moreover, one of the main difficulties, as aforementioned, is the for-
EMIR/DOTT Frank Act (European Market Infrastructure Regula-
mat in which all the information are generated: in particular, they
tion), that revived the role of the Central Counter-Parties (CCPs),
can be in a structured, unstructured or semi-structured form. So new
with the aim of increasing transparency and reduce both the coun-
forms of databases, programming languages and hardware architec-
terpart risk and the operational one (see Fig. 1).
tures are used to either store Big Data or to transform it from un-
• To ensure the soundness of the banking system, the Basel agree-
structured or semi-structured format into a well-structured one, with
ments require the banks of the leading world countries some limits
consequences in many fields of application.
about their operational activities, especially regarding the amount
According to [1], Big Data help to better listen to customers, un-
of assets which they have to equip themselves for their clients’
derstand their ways of using services and hence the offer, simplifying
protection, thus allowing the capitalization of banks (and, conse-
also the decision making process. To this aim, an important role is
quently, liquidity guarantees), to guarantee the operations - collec-
played by those applications that tailor the information based on the
tion, financing an investment - put in place with customers.
needs of the customers. Technologies such as Recommender System
are now used by many brands with the aim of suggesting products or
Therefore, it can be deduces as today a Financial Intermediary is
services which a user may be interested in.
called to observe the dictates imposed by the regulations in the area
1 Exprivia Spa, Bari - Italy, email:{ylenia.maruccia, gio- of interest, involving adjustments to the operational processes and/or
vanni.pansini,gloria.polimeno, felice.vitulano}@exprivia.it IT architectures, in compliance with regulations.
29
Figure 1. The role of the Central Counter-Parties.
The granularity and, at the same time, the complexity of these reg- machine training activity by examples (weakly supervised train-
ulations, necessitate a constant attention and monitoring of them, in ing); the system is able to consult a set of predetermined sites
order to anticipate future changes, or integrations, or evolutions. (authoritative sites) or even the whole www.
In this context it engages the idea of providing a machine learn- • Identify, in every web page retrieved, the individual portions of
ing tool which, through the analysis of the newly introduced legisla- text (HTML page section) in which are expressed the sophisti-
tion (or in the approval process) and/or the changes in the require- cated concepts, by associating a percentage indicator of relevance
ments previously promulgated (detectable by special certificates in- to such concepts with each section identified.
ternet sites), may provide guidance on the bank process involved and, • Automatically classify and organize web sites and pages that be-
therefore, indicate with almost predictive function the impacts on the long to them according to a predetermined conceptual taxonomy
IT application, in terms of changes or new implementations, in sup- or derivable during the training phase machine.
port of the above processes. • Filter, as needed, specific types of web sites that tend to generate
To this aim, we developed a Big Data application that is able to noise, such as for example search engines based on search engine
analyse specifical information, both in a structured and unstructured spamming techniques.
form, taken from the web. This application, called NormaSearch, is • Identify only new content found on each new consultation.
described in the next section. • Present the results through a simple web interface or as a report
directly downloadable from the interface; reports can also be sent
from the application via e-mail.
3 NormaSearch Functionalities • Independently identify potentially authoritative sites and recog-
nise inactivity of authoritative sites.
As aforementioned, NormaSearch is a Big Data application that al-
lows to browse the web, search and analyse specifical information on
the bases of given rules that are defined by the user. The application 4 NormaSearch Architecture
has two distinct sections, one of Administration and one of Fruition: In Fig. 2 it is shown the architecture of this application.
the first allows to train to recognise and classify the information of its It is made up of two main components, a client and a server ones,
own interest through a series of examples (weakly supervising train- both described below.
ing), while the second allows users the analysis of the sites collected NormaSearch Client. It is specialised on the interaction with the
independently by the system. Once trained, the application allows to user and the transmission of user requests to the server component.
analyse web pages and documents in the sites, news group, blogs, It is structured into two main parts:
forums and so on, according to a process specifically designed for
linguistic and conceptual analysis of online content. • A fruition console (in Fig. 2, Retr.UI). Here the user can manage
This process allows to achieve an optimum precision to coverage documents and decide which of them have to be processed, or
ratio in the search, as well as to limit in an important way the amount could be useful for the launch of new experimental projects on
of downloaded web pages and, consequently, the hardware resource specific themes, or dismissed.
consumption. • An administration console (in Fig. 2, Admin. Ui). Here the user
The application operates their own research in an incremental way: can define the security rules, the loader dedicated to the web mon-
by doing so, the pages are presented to the user only in the case in itoring and to the retrieval of documents of interest, the definition
which they have never previously been recognised or, in the case of categories and subcategories of the safety rules through which
where the content is changed. In details, the application is therefore the conceptual framework of the rule itself is defined in terms of
able to: topics and the organization of them, the training of the security
rules and the related conceptual categories. Moreover, here it can
• Refer autonomously a set of sites, blogs, forums and so on, look- be also specified new projects where where the user can insert
ing for info about a set of concepts of interests identified by the additional or parallel categories to the security rules established
2
30
Figure 2. Norma Search Architecture.
above, in order to look “cool stuff” that can be used as a starting ames2 , it annotates the documents gathering: nations, regions,
point to expand the research in the field of banking regulation or cities, airport, port and generically geographic points of interest.
to add/modify existing safety rules. A kml export of single of multiple document, is provided when a
WMS3 compliant system is integrated.
NormaSearch Server. It is specialized on the receipt of the users’
requests and the sorting of the same to the server components, these 5 Conclusion
ones suitable of taking charge of specific requests. As NormaSearch
Client, it has two components: the administration component and the Big Data are changing the industrial world and, for this reason, all
fruition one, designed to manage requests from the administration kind of companies need to be capable of managing this huge amount
and fruition consoles, respectively, and send them to the specified of data and to extract useful information from them. This great in-
server components. terest in Big Data is present also in the financial services, which
At the moment NormaSearch is in productive use by an Italian can obtain important information from the analysis of both struc-
bank. Moreover, in NormaSearch Server there is an important soft- tured and unstructured data. It is also important to have these infor-
ware component, called Big Knowledge [4] , developed by Exprivia. mation in an useful time, in order to prevent losses and to predict
Big Knowledge is able to manage both structured and unstructured important event before they happen. In this paper we discussed a Big
data and, as it can be seen in Fig. 2, it is made up by six main com- Data solution in financial service field developed by Exprivia. It is
ponents: called NormaSearch and it aims at predicting the impact of a legisla-
tion change, or the introducing of a new one. After a brief introduc-
tion about Big data and machine learning technologies, we described
• DAC The Data Access Component is the centralized component
NormaSearch and its components and how it works. It can be anal-
to access data, either in a DBMS or, in the BigData cases Solr.
ysed the new introduced legislation and also provided a guidance on
• Document Manager It is the component through which BK is
the bank process involved. It can be predicted, in particular, the im-
able to convert the document provided in textual form, and clean
pacts on the IT application in support of that bank process. Moreover,
it from useless portion of text (e.g. html banners).
in this paper it has been described an important Big Data solution that
• Information Extraction Manager It annotates the document, ex-
is present in NormaSearch. It is called Big Knowledge and it is com-
tracting relevant information like Named Entity Recognition by
posed by six components that speak together in order to manage all
the usage of Finite State Automata, and elements inside custom
the documents in input and extract important information that can be
gazeteers.
then classified. This paper showed how a Big Data solution can be
• Clusterer It is dedicated to the extraction of conceptual groups
useful in financial field and can predict important information in an
(clusters). These ones are automatically extracted using advanced
useful time in the strategic decision making process.
techniques of NLP (Natural Language Processing) based on
Latent Semantic Analysis[2] (LSA) and a Markov clustering
algorithm[3]. The generated clusters are crucial for the training REFERENCES
tuning of the system.
• Categorizer It is intended for the automatic classification and or- [1] C.L.P. Chen and C.Y. Zhang, ‘Data-intensive applications, challenges,
techniques and technologies: A survey on Big Data’, Informatics and
ganization of the document according to a conceptual taxonomy Computer Science Intelligent Systems Applications, 275.
expressed as a set of clusters and constructed manually or semi-
automatically by describing the category with a small text. 2 http://www.geonames.org/
3 http://www.opengeospatial.org/standards/wms
• Geo Recognizer Using a gazeteers of places kept from Geon-
3
31
[2] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and
R. Harshman, ‘Indexing by latent semantic analysis’, Journal of the
American Society for Information Science.
[3] Stijn van Dongen, Graph Clustering by Flow Simulation, PhD thesis,
University of Utrecht, 2010.
[4] F. Vitulano, M. Cammisa, and Y.Maruccia, ‘Unleashing big data power
for sea emergency control’, in Proceedings Tethys 2015.
4
32