=Paper= {{Paper |id=Vol-1986/SML17_paper_5 |storemode=property |title=Semantic Extraction of Named Entities From Bank Wire Text |pdfUrl=https://ceur-ws.org/Vol-1986/SML17_paper_5.pdf |volume=Vol-1986 |authors=Ritesh Ratti,Himanshu Kapoor,Shikhar Sharma,Anshul Solanki,Pankaj Sachdeva |dblpUrl=https://dblp.org/rec/conf/ijcai/RattiKSSS17 }} ==Semantic Extraction of Named Entities From Bank Wire Text== https://ceur-ws.org/Vol-1986/SML17_paper_5.pdf
                      Semantic E
                               extraction of Named Entities
                              from Bank Wire text

             Ritesh Ratti               Himanshu Kapoor               Shikhar Sharma
        Pitney Bowes Software         Pitney Bowes Software        Pitney Bowes Software
              Noida India                  Noida India                  Noida India
         ritesh.ratti@pb.com        himanshu.kapoor@pb.com        shikhar.sharma@pb.com
                          Anshul Solanki                Pankaj Sachdeva
                      Pitney Bowes Software          Pitney Bowes Software
                           Noida India                    Noida India
                      anshul.solanki@pb.com        pankaj.sachdeva@pb.com



                                                                      data etc. Hence we require a system which should be
                                                                      robust enough to deal with the issues such as degraded
                          Abstract                                    and un-structured text rather than natural language
                                                                      text with correct spelling, punctuations and grammar.
     Online transactions have increased dramati-                      Existing information extraction methods are not able
     cally over the years due to rapid growth in dig-                 to deal with these requirements as most of the infor-
     ital innovation. These transactions are anony-                   mation extraction tasks work over natural language
     mous therefore user provide some details for                     text. Since the context of language is missing in un-
     identification. These comments contain infor-                    structured text, it is difficult to extract the entities
     mation about entities involved and transfer                      from it and features are based on the natural language
     details which are used for log analysis later.                   hence it requires semantic processing capabilities to
     Log analysis can be used for fraud analytics                     understand the hidden meaning of content using dic-
     and detect money laundering activities. In                       tionaries, ontologies etc.
     this paper, we discuss the challenges of en-                        Wire text is an example of such kind of text which
     tity extraction from such kind of data. We                       is un-formatted and non-grammatic in nature. It can
     briefly explain what wired text is, what are                     contain some letters in capital and some in small. For
     the challenges and why semantic information                      example people generally write the comments in short
     is required for entity extraction. We explore                    form and use multiple abbreviations. Bank wire text
     why traditional IE approaches are in-sufficient                  can be of this following format:
     to solve the problem. We tested the approach
     with available open source tools for Entity ex-                    EVERITT 620122T NAT ABC INDIA LTD
     traction and describe how our approach is able                   REF ROBERT REASON SHOP RENTAL
     to solve the problem of entity identification.                   REF 112233999 - REASON SPEEDING FINE
                                                                      GEM SS HEUTIGEM SCHIENDLER
1    Introduction                                                     PENSION CH1234 CAB28
Named Entity Extraction is the process of extract-
ing entities like Person, Location, Address, Organi-                    There are two major challenges in creating the
zation etc. from natural language text. However,                      machine learning model for wire text :
named entities might also exist in non-natural text
like Log data, Bank transfer content, Transactional
                                                                      • Non-availability of data set due to confidentiality
Copyright c by the paper’s authors. Copying permitted for
private and academic purposes.                                        • Non-contextual representation of text
InIn:Proceedings
      Proceedings of
                  of IJCAI
                     IJCAI Workshop
                           WorkshopononSemantic Machine
                                         Semantic  MachineLearning
                                                             Learn-
ing (SML  (SML 2017),
            2017),     Aug
                     Aug   19-25
                         19-25   2017,
                               2017    Melbourne, Australia.
                                    , Melbourne,  Australia
    To identify the entities from such kind of text, it
is therefore required special pre-processing of the text
using semantic information of content. In this paper,
we discuss the solution to extract entities from such
kind of text. We evaluate our approach for Bank wire
transfer text and make use of wordnet taxonomy for
identifying the semantics for each of keyword. This
paper is arranged in following sections. In Section 2
we discuss available methods of entity extraction. In
Section 3 we describe the algorithm in detail and com-
ponents involved. Section 4 we show the experimenta-
tion results and comparison with open source utilities.                 Figure 1: Component Diagram
Section 5 is for conclusion & future work.                   ing combined contextual, word-shape and alignment
                                                             models.
2    Background                                                 Semantic Approaches also exists for named entity
                                                             extraction. [MNPT02] used the wordnet specification
Supervised machine learning techniques are primary
                                                             to identify the W ordClass and W ordInstances list for
solutions to solve the named entity recognition prob-
                                                             each of the word to identify based on predefined rules.
lem which requires data to be annotated. Supervised
                                                             But that list is limited. [Sie15] uses word2Vec rep-
methods either learn disambiguation rules based on
                                                             resentation of words to define the semantics between
discriminative features or try to learn the parameter
                                                             words, that enhances the classification accuracy. It
of assumed distribution that maximizes the likelihood
                                                             uses a continuous skipgram model which requires huge
of training data. Conditional Random fields [SM12]
                                                             computation for learning word vectors. [ECD+ 05]
is the discriminative approach to solve the problems
                                                             specifiy the gazetteer based feature as external knowl-
which uses sequence tagging. Other supervised learn-
                                                             edge for good performance. Given these findings, sev-
ing models like Hidden Markov Model (HMM) [RJ86],
                                                             eral approaches have been proposed to automatically
Decision Trees, Maximum Entropy Models (ME), Sup-
                                                             extract comprehensive gazetteers from the web and
port Vector Machines (SVM) also used to solve the
                                                             from large collections of unlabeled text [ECD+ 04] with
classification problem. HMM is the earliest model ap-
                                                             limited impact on NER. Kazama [KT07] have suc-
plied for solving NER problem by Bikel [BSW99] for
                                                             cessfully constructed high quality and high coverage
English. Bikel introduced a system, IdentiFinder, to
                                                             gazetteers from Wikipedia.
detect NER using HMM as a generative model. Cur-
                                                                In this paper, we propose the semantic disambigua-
ran and Clark [CC03] applied the maximum entropy
                                                             tion of named entities using wordnet and gazetteer.
model to the named entity recognition problem. They
                                                             Our approach is based on pre-processing the text be-
used the softmax approach to formulate. McNamee
                                                             fore passing it to Named entity recognizer.
and Mayfield [MMP03] tackle the problem as a binary
decision problem, i.e. if the word belongs to one of the
8 classes, i.e. B- Beginning, I- Inside tag for person,      3     Algorithm
organization, location and misc tags, Thus there are 8
                                                             3.1   Method
classifiers trained for this purpose. Because of unavail-
ability of wire text, it is difficult to create the tagged   Named Entity Recognition involve multiple features
content hence supervised approaches are not able to          related to the structural representation of entities
solve the problem.                                           hence proper case information imparts a valuable role
   Various unsupervised schemes are also proposed to         in defining the entity type. For example : Person is
solve the entity recognition problem. People suggest         generally written in Camel Case in english language
the gazetteer based approach which help in identify-         & Organization are in Capitalized format. Our ap-
ing the keywords from the list. KNOWITALL is such a          proach is based on orthogonal properties of entities. It
system which is domain independent and proposed by           is based on conversion of input data using wordnet af-
Etzioni [ECD+ 05] that extracts information from the         ter looking into the semantics for each of the word and
web in an unsupervised, open-ended manner. It uses           providing existing NER the converted output. Now
8 domain independent extraction patterns to gener-           converted text is more probable to extract the Named
ate candidate facts. Manning [GM14] have proposed a          entities once provided. We hereby propose the in-
system that generates seed candidates through local,         termediate layer so called Pre-Processor as shown in
cross-language edit likelihood and then bootstraps to        Figure 1. Pre-Processor contains three major compo-
make broad predictions across two languages, optimiz-        nents called WordnetMatcher, GazetteerMatcher and
CaseConverter, whose purpose is to match the text ef-      to the WordNet API to get list of SynSets. If synsets
ficiently with the given content list and converting the   are non-empty, such a word is likely to have some
text to required case. LowerCaseConverter, Camel-          meaning so it will be checked with Names list first
CaseConverter and UpperCaseConverter are instances         if found convert it to Camel Case like: John Miller
of CaseConverter.                                          , Robert Brown. If not found in namesList, later
Tokenizer’s main job is to convert the sentence into       check in organization list and Location list. If match
tokens. Named Entity Recognizer is used to extract         found convert to Upper Case otherwise convert in
the named entities.                                        Camel Case. Now this pre-processed text is having
    We used Wordnet [Mil95] which provides the             meaningful representation of entities which is further
information about synsets. English version contains        passed to Named Entity Recognizer to extract the
129505 words organized into 99642 synsets . In word-       entities from the converted text.
net two kinds of relations are distinguished: semantic
relations (IS-A , part of etc. ) which hold among
synsets and lexical relations (synonymy , antonymy         3.3    Model Description
) which hold among words. Our gazetteer contains
                                                           Our Named Entity Recognizer is based on Condi-
the dictionary for Person names, Organization names,
                                                           tional Random Field [SM12], which is a discriminative
Locations etc. Our approach work according to the
                                                           model. We used cleartk library [BOB14] for model
following algorithm.
                                                           generation which uses mallet internally for implemen-
                                                           tation. Conditional random fields (CRFs) are a proba-
                                                           bilistic framework for labeling and segmenting sequen-
3.2   Approach                                             tial data, based on the conditional approach.
                                                              Laferty [LMP+ 01] define the the probability of a
 Algorithm 1: Semantic NER                                 particular label sequence y given observation sequence
  Input : Sentence S as collection of words W              x to be a normalized product of potential functions,
             and gazateers ListN ames ,                    each of the form .
             ListOrganization , ListLocation ,                        P                                  P
             ListIgnore                                       exp (       j   j tj (yi 1 , yi , x, i)+       k   k sk (yi , x, i) )
  Output: Set of entities ei 2 E
    for each wi 2 S do                                         where tj (yi 1 , yi , x, i) is a transition feature func-
       wi     LowerCaseConverter(wi )                      tion of the entire observation sequence and the labels
       if wi 2 / ListIgnore then                           at positions i and i 1 in the label sequence; sk (yi , x, i)
          synsets[]    W ordN etM atcher(wi )              is a state feature function of the label at position i and
          if synsets[] 2/ Empty then                       the observation sequence; and j and µk are parame-
            if wi 2 ListN ames then                        ters to be estimated from training data.
               wi    CamelCaseConverter(wi )                   When defining feature functions, we construct a set
            end if                                         of real-valued features b(x, i) of the observation to ex-
          else                                             presses some characteristic of the empirical distribu-
            if wi 2 ListOrganization orwi 2 ListLocation   tion of the training data that should also hold of the
            then                                           model distribution. An example of such a feature is :
               wi    U pperCaseConverter(wi )              b(x, i) is 1 if observatuin at i is ”Person” else 0
            else                                               Each feature function takes on the value of one of
               wi    CamelCaseConverter(wi )               these real-valued observation features b(x, i) if the cur-
            end if                                         rent state (in the case of a state function) or previous
          end if                                           and current states (in the case of a transition func-
       end if                                              tion) take on particular values. All feature functions
    end for                                                are therefore real-valued. For example, consider the
    (ei )    N amedEntityRecognizer(S)                     following transition function:

    Our algorithm works by looking up the pre-defined         tj (yi 1 , yi , x, i) = b(x,i)
list in multiple steps. For each word in your input,
first it converts to all lower-case, then check the word      and ,
against the ignore list containing pronouns, preposi-                         Pn
tions, conjunctions and determiners. If it exists then        Fj (y, x) =       i=1 fj (yi 1 , yi , x, i)
we ignore the keywords. Else pass the lower-case-word
             Table 1: Features used for NER                                    Table 2: Comparison Results
      Entity Type    Feature
      Person         preceding = 1 succeeding = 2 ,                 Entity Type     Approach          Precision   Recall      Acc.
                     posTag , characterPattern ,                    Person          Our Approach      0.65        0.306       0.27
                     middleNamesList                                                Stanford-NER      0.23        0.175       0.12
      Location       preceding = 3 succeeding = 3 ,                 Location        Our Approach      0.88        0.57        0.53
                     characterPattern , isCapital                                   Stanford-NER      0.71        0.58        0.51
      Organization preceding = 3 succeeding = 3 ,                   Organization    Our Approach      0.18        0.32        0.28
                     posTag , characterPattern ,                                    Stanford-NER      0.03        0.018       0.012
                     orgSuffixList
                                                                   93232 documents with 3232 di↵erent entities. We used
   where each fj (yi 1 , yi , x, i) is either a state func-        the bank wire transfer text to verify the approach. Due
tion sk (yi , x, i) or a transition function t(yi 1 , yi , x, i)   to non-availability of bank wire text because of secu-
. This allows the probability of a label sequence y                rity reasons, We have to generate test set based on our
given an observation sequence x to be written as                   client experience and understanding multiple user sce-
                                                                   narios. We implemented the approach to our product
                1
                              P
    p(y|x, ) = Z(x) exp (         j   j Fj (y, x) )                [Pit] which is used by our clients.

    where Z(x) is a normalization factor.                          4.2   Comparison
                                                                   Our test dataset contains di↵erent types of comments
3.4     Feature Extraction                                         which are non-natural in nature. We compare the
We used multiple syntactic and linguistic features spe-            approach with existing open source solutions like
cific to entities. We also used pre-defined list match             Open NLP [Apa14] and Stanford NER [MSB+ 14]
as a feature in couple of entities which improves the              and we justify that our approach works better due
accuracy of our model. Our feature selection is based              to the semantic conversion of the text. We observed
on following table 1. Explanation for the features is              that Open nlp is not able to detect much entities
as follows :                                                       however Stanford NER is able to detect some of them.
                                                                   Table 2 describes the results of precision, recall and
                                                                   accuracy for entities Person, Location & Organization.
• Preceding: Number of words to be considered for
    feature generation before the current word.
                                                                   5     Conclusion & Future Work
• Succeeding: Number of words to be considered for
    feature generation after the current word.                     We hereby proposed the approach for semantic con-
                                                                   version of bank wire text and extract the entities from
• posTag : Part of Speech tag as linguistic feature.               converted text. Currently, we tested our approach for
                                                                   person, organization and location but it is easily ex-
• characterPattern : Character pattern as feature in               tensible for other entities like address, contact num-
    token like Camel Case, Numeric, AlphaNumeirc                   ber, email information etc. The approach uses seman-
    etc.                                                           tic information from wordnet for preprocessing which
• isCapital : True if all the letters are in capitalized           can further be used to extract the entities from similar
    format.                                                        types of dataset like weblogs, DBlogs, transaction logs
                                                                   etc.
• xxxList : Specific keyword list to match with
    the current word.True if word matches.For ex :                 References
    orgSuffix contains list of suffixes used in organi-
                                                                   [Apa14]     Apache Software Foundation. openNLP
    zation names and middleNames consists the key-
                                                                               Natural Language Processing Library,
    words used in middle name.
                                                                               2014. http://opennlp.apache.org/.

4      Experimentation Results                                     [BOB14]     Steven Bethard, Philip Ogren, and Lee
                                                                               Becker. Cleartk 2.0: Design patterns for
4.1     Dataset
                                                                               machine learning in uima. In Proceed-
We trained our NER model over MASC (Manually An-                               ings of the Ninth International Confer-
notated Sub-Corpus) dataset [PBFI12] which contains                            ence on Language Resources and Evalua-
           tion (LREC’14), pages 3289–3293, Reyk-                  features. In Proceedings of the Seventh
           javik, Iceland, 5 2014. European Language               Conference on Natural Language Learn-
           Resources Association (ELRA). (Accep-                   ing at HLT-NAACL 2003 - Volume 4,
           tance rate 61%).                                        CONLL ’03, pages 184–187, Stroudsburg,
                                                                   PA, USA, 2003. Association for Computa-
[BSW99]    Daniel M Bikel, Richard Schwartz, and
                                                                   tional Linguistics.
           Ralph M Weischedel. An algorithm that
           learns what’s in a name. Machine learn-      [MNPT02] Bernardo Magnini, Matteo Negri, Roberto
           ing, 34(1-3):211–231, 1999.                           Prevete, and Hristo Tanev. A wordnet-
                                                                 based approach to named entities recogni-
[CC03]     James R. Curran and Stephen Clark. Lan-
                                                                 tion. In Proceedings of the 2002 workshop
           guage independent ner using a maximum
                                                                 on Building and using semantic networks-
           entropy tagger. In Proceedings of the
                                                                 Volume 11, pages 1–7. Association for
           Seventh Conference on Natural Language
                                                                 Computational Linguistics, 2002.
           Learning at HLT-NAACL 2003 - Volume
           4, CONLL ’03, pages 164–167, Strouds-        [MSB+ 14] Christopher D. Manning, Mihai Surdeanu,
           burg, PA, USA, 2003. Association for                   John Bauer, Jenny Finkel, Steven J.
           Computational Linguistics.                             Bethard, and David McClosky. The Stan-
[ECD+ 04] Oren Etzioni, Michael Cafarella, Doug                   ford CoreNLP natural language process-
          Downey, Ana-Maria Popescu, Tal Shaked,                  ing toolkit. In Association for Computa-
          Stephen Soderland, Daniel S Weld, and                   tional Linguistics (ACL) System Demon-
          Alexander Yates. Methods for domain-                    strations, pages 55–60, 2014.
          independent information extraction from       [PBFI12]   Rebecca J Passonneau, Collin Baker,
          the web: An experimental comparison. In                  Christiane Fellbaum, and Nancy Ide. The
          AAAI, pages 391–398, 2004.                               masc word sense sentence corpus. In Pro-
[ECD+ 05] Oren Etzioni, Michael Cafarella, Doug                    ceedings of LREC, 2012.
          Downey, Ana-Maria Popescu, Tal Shaked,        [Pit]      Pitney Bowes Software CIM Suite
          Stephen Soderland, Daniel S. Weld, and                   http://www.pitneybowes.com/us/customer-
          Alexander Yates. Unsupervised named-                     information-management.html.
          entity extraction from the web: An ex-
          perimental study. Artificial Intelligence,    [RJ86]     L. Rabiner and B. Juang. An introduction
          165(1):91 – 134, 2005.                                   to hidden markov models. IEEE ASSP
                                                                   Magazine, 3(2):4–16, Jan 1986.
[GM14]     Sonal Gupta and Christopher D Man-
           ning. Improved pattern learning for boot-    [Sie15]    Scharolta Katharina Sienčnik. Adapting
           strapped entity extraction. In CoNLL,                   word2vec to named entity recognition. In
           pages 98–108, 2014.                                     Proceedings of the 20th Nordic Conference
[KT07]     Junichi Kazama and Kentaro Torisawa.                    of Computational Linguistics, NODAL-
           Exploiting wikipedia as external knowl-                 IDA 2015, May 11-13, 2015, Vilnius,
           edge for named entity recognition. 2007.                Lithuania, number 109, pages 239–243.
                                                                   Linköping University Electronic Press,
[LMP+ 01] John La↵erty, Andrew McCallum, Fer-                      2015.
          nando Pereira, et al. Conditional random
          fields: Probabilistic models for segmenting   [SM12]     Charles Sutton and Andrew McCallum.
          and labeling sequence data. In Proceed-                  An introduction to conditional random
          ings of the eighteenth international con-                fields. Foundations and Trends in Machine
          ference on machine learning, ICML, vol-                  Learning, 4(1):267–373, 2012.
          ume 1, pages 282–289, 2001.
[Mil95]    George A. Miller. Wordnet: A lexical
           database for english. Commun. ACM,
           38(11):39–41, November 1995.
[MMP03]    James Mayfield, Paul McNamee, and
           Christine Piatko. Named entity recog-
           nition using hundreds of thousands of