=Paper=
{{Paper
|id=None
|storemode=property
|title=Unsupervised Information Extraction using BabelNet and DBpedia
|pdfUrl=https://ceur-ws.org/Vol-1019/paper_32.pdf
|volume=Vol-1019
|dblpUrl=https://dblp.org/rec/conf/msm/Jadidinejad13
}}
==Unsupervised Information Extraction using BabelNet and DBpedia==
<pdf width="1500px">https://ceur-ws.org/Vol-1019/paper_32.pdf</pdf>
<pre>
Unsupervised Information Extraction using BabelNet and
                      DBpedia

                                      Amir H. Jadidinejad

                             Islamic Azad University, Qazvin Branch,
                                          Qazvin, Iran.
                                     amir@jadidi.info


         Abstract. Using linked data in real world applications is a hot topic in the field
         of Information Retrieval. In this paper we leveraged two valuable knowledge ba-
         ses in the task of information extraction. BabelNet is used to automatically rec-
         ognize and disambiguate concepts in a piece of unstructured text. After extracting
         all possible concepts, DBpedia is leveraged to reason about the type of each con-
         cept using SPARQL.

         Keywords: Concept Extraction, Linked Data, BabelNet, DBpedia, SPARQL.


1        BABELNET

    BabelNet[1] is a multilingual lexicalized semantic network and ontology. It was au-
tomatically created by linking the largest multilingual Web encyclopedia – i.e. Wikipe-
dia1 – to the most popular computational lexicon of the English language – i.e. Word-
Net[2]. It contains an API for programmatic access of 5.5 million concepts and a mul-
tilingual knowledge-rich Word Sense Disambiguation (WSD) [3]. With the aid of this
API, we can extract all possible concepts in a piece of text. These concepts are linked
to DBpedia, one of the more famous parts of the Linked Data project.


2        DBPEDIA

    DBpedia[4] is a project aiming to extract structured content from the information
created as part of the Wikipedia project. This structured information is made available
on Semantic Web formats. DBpedia allows users to query relationships and properties
associated with Wikipedia concepts. In this paper we used SPARQL to query DBpedia.
It's possible to reason about the type of each concept (PER, LOC, ORG, MISC) with
the aid of a classic deductive reasoning using classes and subclasses. For example, "Set-
tlement" is defined as a subclass of "Place" (although maybe not directly). That means
that all Things that are "Settlements" are also "Places". "Tehran" is a "Settlement", so
it is also a "Place". Using the following query:

1 http://www.wikipedia.org


         #MSM2013 Workshop Concept Extraction Challenge Proceedings ·
Copyright   c   2013 held by author(s)/owner(s). Published as part of the
     ·
available online as CEUR Vol-1019, at:   http://ceur-ws.org/Vol-1019
Making Sense of Microposts Workshop @ WWW'13, May 13th 2013, Rio de Janeiro, Brazil
    ASK {
       {
         ?thing a ?p .
         ?p rdfs:subClassOf dbpedia-owl:Place OPTION (transi-
    tive).
       }
      UNION
       {
         ?thing a dbpedia-owl:Place .
       }
    }

    It's possible to reason about the type of every “?thing” such as: http://dbpe-
    dia.org/resource/Tehran. A similar query is used for LOCATION and
    ORGANIZATION.


    3      IMPLEMENTATION DETAILS

    Our proposed solution shows in Figure 1. The input text is passed to "Text2Concept"
    module. This module is used "BabelNet" and "Knowledge-rich WSD" algorithm to rec-
    ognize a list of concepts. Finally, "Text Reasoner" module reason about the type of each
    concept with the aid of DBpedia using a simple deductive reasoning.


                                                      Knowledge-rich
                         BabelNet
                                                        WSD


          Input Text                   Text2Concept                         C1, C2,..., Ck


                          C1:T1, C2:T2,..., Ck:Tk                           Text Reasoner


                                                                                DBpedia


                          Fig. 1. Different parts of the proposed method.


·   #MSM2013 · Concept Extraction Challenge · Making Sense of Microposts III · 55
    Table 1 shows the impact of the proposed solution in on the training data set. Our pro-
    posed solution is achieved 𝐹1 = 0.50 on training set and 𝐹1 = 0.52 on testing set (See
    Fig. 2).

             Table 1. Concept Extraction using the proposed solution on training data set

                Data Set              Precision               Recall                 F1
                  Train                 0.5099               0.5003                 0.5050


                          Fig. 2. Overall results between different participants.


    4      REFERENCES
    [1] Navigli, R., Ponzetto, S. P. 2012. BabelNet: The Automatic Construction, Evaluation and
        Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence,
        193, 217-250.
    [2] George A. Miller (1995). WordNet: A Lexical Database for English. Communications of
        the ACM. 38, 11, 39-41.
    [3] Navigli, R., Ponzetto, S. P. 2012. Multilingual WSD with Just a Few Lines of Code: the
        BabelNet API. In Proc. of the 50th Annual Meeting of the Association for Computational
        Linguistics (ACL 2012), Jeju, Korea, 67-72.
    [4] Bizer, C., Lehmann, J., Kobilarov, G., Becker, C., Cyganiak, R., Hellmann, C. 2009.
        DBpedia – A Crystallization Point for the Web of Data. Journal of Web Semantics:
        Science, Services and Agents on the World Wide Web, 7, 154–165.


·   #MSM2013 · Concept Extraction Challenge · Making Sense of Microposts III · 56

</pre>