=Paper= {{Paper |id=Vol-2180/paper-82 |storemode=property |title=Using Knowledge Graph to Improve Enterprise Search Experience |pdfUrl=https://ceur-ws.org/Vol-2180/paper-82.pdf |volume=Vol-2180 |authors=Dmytro Dolgopolov,Elena Romanova |dblpUrl=https://dblp.org/rec/conf/semweb/DolgopolovR18 }} ==Using Knowledge Graph to Improve Enterprise Search Experience == https://ceur-ws.org/Vol-2180/paper-82.pdf
    Using Knowledge Graph to improve enterprise search
                       experience

                       Dmytro Dolgopolov1 and Elena Romanova1
       1 FINRA (Financial Industry Regulatory Authority) Rockville, MD 20850, USA




       Abstract. FINRA has many millions of documents and database records that
       staff need to search through to find information relevant to regulatory activities.
       Searching across the large set of documents and structured database records using
       relevance ranked text search does not present items together that the users know
       are related. Relevance ranking discriminates using TF/IDF, and related tech-
       niques, but does not bring together items that are not related by relevance.
       The solution was to build a structured and navigable visual representation of the
       data returned by the underlying multiple query engines. Text mining and seman-
       tic web techniques were used extensively to build the enhanced metadata and
       create the linkages among the data objects needed in order to support the visual
       navigation paradigm. The resulting knowledge graph gives users the ability to
       see semantically related items.

       Keywords: Knowledge Graph, Semantic Web, Text Mining, Enterprise Search,
       Graph Analysis, RDF store


1      Background

FINRA [1] is not-for-profit organization authorized by US Congress to protect Amer-
ica’s investors by making sure the broker-dealer industry operates fairly and honestly.
As part of its regulatory mission FINRA’s staff has to review millions of structured and
unstructured data elements located in numerous internal systems. This includes infor-
mation found in the free style text fields as well as various documents. Investigators,
examiners and analysts get easily overwhelmed with the amount of information they
have to deal with. These challenges are exacerbated by the copies and ‘near’ duplicates.
Staff spends days collecting the information required in preparation to exam or inves-
tigation. FINRA needed a solution to help users navigate through internal and external
data sets collected by various systems.


2      FINRA Knowledge Graph

We combined the power of Semantic Web, Text Mining, Enterprise Search and Graph
Analysis to create FINRA Knowledge Graph. Our solution uses Semantic Web to con-
nect these technologies and make the whole to be greater than the sum of its parts. We
implemented an ETL pipeline that leverages Spark’s DataFrames to prepare and load
2


millions of records to the RDF store in less than 20 minutes. We created a scalable
semantic inference engine in Spark to produce new connections across heterogeneous
data. This engine derives new facts from an existing set of data using humanlike rea-
soning. Text mining enriches the data by extracting individuals, organizations and their
features from documents and free-style comments that are then persisted to an RDF
store. We are building FINRA’s ontology as an extension of schema.org ontology.
As any big organization FINRA stores multiple copies of the same information. That
makes it hard to retrieve important information. Our proprietary logic feeds connections
between records to the Machine Learning model which creates clusters of related data
elements. The quality of Enterprise Search results is now enhanced with SPARQL que-
ries providing a better insight into millions of structured and unstructured data elements
to our users. Additionally new analytics can be produced, leveraging the existing and
new connections. Our community detection algorithm takes in account these new con-
nections stored in Semantic Web to create more accurate results for cliques of ‘bad
actors’.




Fig. 1. Solution: FINRA Knowledge Graph


3         Conclusion

Combining Semantic Web [2], Enterprise Search, Text Mining [3], and Graph Analysis
have proven to improve overall data quality and ease of data discovery and navigation.
This new approach has significantly improved effectiveness of regulatory analysis.


References
    1. FINRA (Financial Industry Regulatory Authority) http://www.finra.org
    2. Discovering The Social Connections http://technology.finra.org/articles/discovering-social-
       connections.html
    3. Unlocking Unstructured Data with Text Analysis http://technology.finra.org/articles/un-
       locking-unstructured-data-with-text-analysis.html