=Paper= {{Paper |id=None |storemode=property |title=Semantic Tagging with Linked Open Data |pdfUrl=https://ceur-ws.org/Vol-1054/paper-13.pdf |volume=Vol-1054 |dblpUrl=https://dblp.org/rec/conf/csws/CuzzolaJBGJB13 }} ==Semantic Tagging with Linked Open Data== https://ceur-ws.org/Vol-1054/paper-13.pdf
               Semantic Tagging with Linked Open Data
    John Cuzzola, Zoran                       Dragan Gasevic            Jelena Jovanovic                          Reza Bashash
 Jeremic, Ebrahim Bagheri                   Athabasca University      University of Belgrade                  SideBuy Technologies
     Ryerson University


    Abstract—Making sense of text is a challenge for computers      disambiguation and topic categorization. Denote's database is
particularly with the ambiguity associated with language.           DBPedia [3]; an ontology derived from Wikipedia. In this
Various annotators continue to be developed using a variety of      respect, it resembles DBPedia Spotlight (DBPedia) and
techniques in order to provide context to text. In this paper, we   TagME (Wikipedia). However, Denote distinguishes itself in
describe Denote – our annotator that uses a structured              key ways. First, it attempts to assign context to the
ontology, machine learning, and statistical analysis to perform     annotations by its [Acting_As] lexicon. Second, it attempts
tagging and topic discovery. A short screencast for the curious     to annotate numbers [With_Value] through statistical
is also available at http://youtu.be/espItTRQVzY as well as         analysis of similar concepts whose : are
demonstration links provided in the conclusion.
                                                                    of the same data type [Of_Units]. Third, Denote has an
   Keywords—semantic web, disambiguation, entity recognition,
                                                                    extensive list of topic categories, made available through
annotators, tagging, wikifying, linked-data, LOD                    DBPedia’s  predicate, which it assigns to
                                                                    its annotations [Cat_Of]. These key differences were the
                                                                    motivation for Denote’s creation. While other annotators
                         I.    INTRODUCTION                         perform in a similar manner by first spotting word phrases
    The availability of structured link open data, through          and linking them to the disambiguated top-surface form;-
initiatives such as the “Linked Open Data (LOD)” project1,          Denote attempts to find related concepts that will be used to
has given rise to a new class of annotators for unstructured        determine the properties of the spotted word phrases. This
text. Annotators like TagME [1], DBPedia Spotlight [2], and         allows for role-based annotations [Acting_As]. We coin this
Alchemy2 all offer such capability. In this systems paper we        process as deep tagging as opposed to the shallow tagging of
describe Denote – our semantic tagging platform based on            Denote’s peers.
Linked Open Data. In section II, we outline Denote’s
algorithm, describe its vocabulary, and key features. In III,            TABLE I.      DENOTE’S ANNOTATION LEXICON EXPLAINED
we demonstrate these features and compare Denote’s output
with other annotators.                                               Lexicon         Explanation
                                                                     Is_A {}         "is a", "is an", "is used by". Asks: What is it?
                                                                     Acting_As {}    Context/role. Asks: How is it used?
                    II.       DENOTE’S DESIGN
                                                                     With_Value •    If number, Asks: What is the number value?
    Denote searches its ontology for similar concepts to the         Of_Units •      If number, Asks: What is the units of measure?
input text by performing keyword extraction then calculating         Cat_Of {}       Asks: What relevant topic categories?
a weighted Jaacard coefficient on resource descriptions. This
provides a measure of text similarity. For each resource, its                          III.    DEMONSTRATION
known categories (defined in the ontology) are subjected to a           In this section, we describe three core functions in
Bayesian filter to exclude those resources and categories that      Denote’s toolkit: text annotation, number annotation, and
do not appear relevant. This provides a measure of semantic         category disambiguation.
similarity. The surviving resources are then used for the
annotations. Denote’s output is in the form of a synopsis           A. The Text Annotator
whose lexicon is given in Table I. The output is a single
sentence per annotation with a set of relevant URIs sorted in           Table II demonstrate Denote’s capabilities when
order of likelihood with confidence and available support           compared to TagME and DBPedia Spotlight using the same
statistics.                                                         input text of: “BLT. The sub that proves great things come in
                                                                    threes. In this case, those three things happen to be crisp
        “Text” [Is_A {}] [[[With_Value •] Of_Units •] |             bacon, lettuce and juicy tomato. While there's no scientific
                 Acting_As {}] [Cat_Of {}]                          way of proving it, this BLT might be the most perfect BLT
                                                                    sandwich in existence. The default configuration for Denote,
               Fig. 1. The output of an annotated text.             TagME and Spotlight were unchanged. Spotlight does not
                                                                    perform category analysis. TagMe gives a topic listing but
    Denote uses a database of linked open data, represented         this list is simply the annotated text rather than a separate
in the form of n-triples (), to         categorization. Consequently, the [Cat_Of] portion of
perform      annotations,      similarity     identification,       Denote’s synopsis was omitted and left for part C.
                                                                        DBPedia Spotlight was the least effective with the
                     1
                      http://linkeddata.org/                        fewest annotations and an incorrect disambiguation of BLT
                2
                  http://www.alchemyapi.com/                        as a “Bizarre Love Triangle”. TagME performed well with
numerous annotations with few mistakes (incorrectly tagged                          movie was retrieved from the Internet Movie Database
words “crisp” and “juicy”. Both Denote and TagME shared                             (IMDb) and annotated. Table III gives the results.
similar annotations but it is through Denote’s [Acting_As]
vocabulary that provided context information. For example,                          TABLE III.         DENOTE VERSUS ALCHEMY IN CATEGORY/TOPIC TAGGING
both correctly annotated “lettuce” to its surface form, but it
                                                                                     Annotated            Denote with Category Determination       Alchemy Entity
was Denote that identified that lettuce was acting as a main                          Word(s)                                                        Extraction
ingredient. Similarly, Denote linked the phrase “bacon,                               Corleone            Is_A {/The_Family_Corleone} Cat_Of
lettuce, and juicy tomato” as an alias or alternate name.                              Family                  {/Italian_American_novels,
                                                                                                     /Novels_about_organized_crime_in_the_United
                                                                                                            _States,/Novels_by_Mario_Puzo,
 TABLE II.          ANNOTATION OF “BLT. THE […] IN EXISTENCE.” WITH                                               /Family_saga_novels}
                  DENOTE, TAGME AND DBPEDIA SPOTLIGHT.                                  Don                                                        TelevisionShow
  Annotated                   Denote                  TagME            DBPedia          Vito                 Is_A {/Vito_Corleone} Cat_Of              Person
   Word(s)                   (DBPedia)              (Wikipedia)       Spotlight      Corleone                {/The_Godfather_characters}
                                                                    (DBPedia/Wiki       Vito                      Acting_As {Person}
                                                                        pedia)       New York                    Acting_As {Location}                  City
     BLT           Is_A {/BLT} Acting_As {/name}                    /Bizarre_Love     Micheal                                                         Person
                                                                      _Triangle      Don Vito                                                         Person
BLT sandwich       Is_A {/BLT} Acting_As {/name}        /BLT                         Don Vito              Is_A {/Don_Vito_Corleone} Cat_Of
   sandwich                                                          /Sandwich       Corleone                 {/The_Godfather_characters}
 in existence                                                        /Existence        Don’s                                                          Person
      sub                                           /Submarine_                        Mafia                   Is_A {/Mafia_Don} Cat_Of
                                                      sandwich                                               {/The_Godfather_characters}
     crisp                                          /Potato_chip                          Drugs                   Is_A {/Drugs} Cat_Of
    bacon              Is_A {/Bacon_sandwich,          /Bacon                                                {/The_Godfather_characters}
                    Bacon,Side_bacon} Acting_As
                   {/mainIngredient, /ingredient}                                       Alchemy results were limited to primitive named entity
   lettuce            Is_A {/Lettuce} Acting_As       /Lettuce
                   {/mainIngredient, /ingredient}                                   types of city and person with the exception of an incorrect
    juicy                                              /Juice                       categorization of “television show”. In contrast, Denote
   tomato            Is_A {/Tomato} Acting_As         /Tomato                       tagged text into rich categories that include “Italian-
                   {/mainIngredient, /ingredient}                                   American novels”, “organized crime novels”, and
bacon , lettuce          Acting_As {/alias,
   and juicy             /alternateName}
                                                                                    “Godfather characters ”.
    tomato
 scientific way                                     /Scientific_m
                                                        ethod
                                                                                                                IV.     CONCLUSION
                                                                                        In this paper we demonstrated Denote – a semantic
                                                                                    annotator based on the DBPedia ontology and compared its
B. The Number Annotator                                                             features with that of same-class text taggers. Denote’s
   The number annotator is unique with respect to other                             middleware         engine     demo     is    available     at
annotators in that Denote attempts to identify text that is                         http://ls3.rnet.ryerson.ca/annotator while a developer-
normally associated with a numerical value. Using statistical                       friendly demo is at http://inextweb.com/denote_demo.
analysis on the Jaacard/Bayes-discovered list of similar                            Denote’s annotation capabilities are wrapped around a
concepts, Denote attempts to match up number values with                            RESTful interface allowing for 3rd-party developers to create
annotated text. Figure 2 demonstrates on the input text “The                        their own semantic-aware applications. The result, we hope,
radio shack color computer has only 16 kb of memory”.                               is an improvement in information search and retrieval for the
                                                                                    end user. Our future work involves parallelisation to scale
         “memory” With_Value 16 Of_Units #int Cat_Of                                the service for a large number of concurrent clients. We are
        {/Home_Computers, TRS-80_Color_Computer}                                    also developing proof-of-concept demonstrations including a
             Fig. 2. An example of number annotation with Denote                    semantic movie recommender whose database will be
                                                                                    included as a data-set to the LOD project.
C. The Categorizer
                                                                                                                    REFERENCES
    Denote has access to over 656,000 categories defined in
DBPedia’s  ontology. A Bayesian filter is
                                                                                    [1]     P. Ferragina, and U. Scaiella, “TAGME: On-the-fly Annotation of
used on each similar concept in order to determine if the                                   Short Text Fragments (by Wikipedia Entities)∗”, In Proceedings of
subject(s) of which the concept belongs to is contextually                                  the 19th ACM international conference on Information and
related to the text being annotated. DBPedia Spotlight demo                                 knowledge management (CIKM '10). 2010.
does not perform topic category determination. TagME’s                              [2]     P. Mendes, M. Jakob, A. García-Silva, and C. Bizer. “DBpedia
demo performs topic categorization by simply listing its                                    spotlight: shedding light on the web of documents”, In Proceedings of
annotated text in a cloud-tag structure rather than a defined                               the 7th International Conference on Semantic Systems (I-Semantics
                                                                                            '11), 2011.
set of category topics. Consequently, we compare Denote’s
                                                                                    [3]     C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak,
output with Alchemy. The Alchemy annotator can perform                                      and S. Hellmann. “DBpedia - A crystallization point for the Web of
named entity extraction from a list of 200+ defined (sub)-                                  Data.” In Web Semant. 7, 3 (September 2009), 154-165. 2009.
entities. In this comparison, the “storyline” of The Godfather