=Paper=
{{Paper
|id=None
|storemode=property
|title=Semantic Tagging with Linked Open Data
|pdfUrl=https://ceur-ws.org/Vol-1054/paper-13.pdf
|volume=Vol-1054
|dblpUrl=https://dblp.org/rec/conf/csws/CuzzolaJBGJB13
}}
==Semantic Tagging with Linked Open Data==
https://ceur-ws.org/Vol-1054/paper-13.pdf
Semantic Tagging with Linked Open Data
John Cuzzola, Zoran Dragan Gasevic Jelena Jovanovic Reza Bashash
Jeremic, Ebrahim Bagheri Athabasca University University of Belgrade SideBuy Technologies
Ryerson University
Abstract—Making sense of text is a challenge for computers disambiguation and topic categorization. Denote's database is
particularly with the ambiguity associated with language. DBPedia [3]; an ontology derived from Wikipedia. In this
Various annotators continue to be developed using a variety of respect, it resembles DBPedia Spotlight (DBPedia) and
techniques in order to provide context to text. In this paper, we TagME (Wikipedia). However, Denote distinguishes itself in
describe Denote – our annotator that uses a structured key ways. First, it attempts to assign context to the
ontology, machine learning, and statistical analysis to perform annotations by its [Acting_As] lexicon. Second, it attempts
tagging and topic discovery. A short screencast for the curious to annotate numbers [With_Value] through statistical
is also available at http://youtu.be/espItTRQVzY as well as analysis of similar concepts whose : are
demonstration links provided in the conclusion.
of the same data type [Of_Units]. Third, Denote has an
Keywords—semantic web, disambiguation, entity recognition,
extensive list of topic categories, made available through
annotators, tagging, wikifying, linked-data, LOD DBPedia’s predicate, which it assigns to
its annotations [Cat_Of]. These key differences were the
motivation for Denote’s creation. While other annotators
I. INTRODUCTION perform in a similar manner by first spotting word phrases
The availability of structured link open data, through and linking them to the disambiguated top-surface form;-
initiatives such as the “Linked Open Data (LOD)” project1, Denote attempts to find related concepts that will be used to
has given rise to a new class of annotators for unstructured determine the properties of the spotted word phrases. This
text. Annotators like TagME [1], DBPedia Spotlight [2], and allows for role-based annotations [Acting_As]. We coin this
Alchemy2 all offer such capability. In this systems paper we process as deep tagging as opposed to the shallow tagging of
describe Denote – our semantic tagging platform based on Denote’s peers.
Linked Open Data. In section II, we outline Denote’s
algorithm, describe its vocabulary, and key features. In III, TABLE I. DENOTE’S ANNOTATION LEXICON EXPLAINED
we demonstrate these features and compare Denote’s output
with other annotators. Lexicon Explanation
Is_A {} "is a", "is an", "is used by". Asks: What is it?
Acting_As {} Context/role. Asks: How is it used?
II. DENOTE’S DESIGN
With_Value • If number, Asks: What is the number value?
Denote searches its ontology for similar concepts to the Of_Units • If number, Asks: What is the units of measure?
input text by performing keyword extraction then calculating Cat_Of {} Asks: What relevant topic categories?
a weighted Jaacard coefficient on resource descriptions. This
provides a measure of text similarity. For each resource, its III. DEMONSTRATION
known categories (defined in the ontology) are subjected to a In this section, we describe three core functions in
Bayesian filter to exclude those resources and categories that Denote’s toolkit: text annotation, number annotation, and
do not appear relevant. This provides a measure of semantic category disambiguation.
similarity. The surviving resources are then used for the
annotations. Denote’s output is in the form of a synopsis A. The Text Annotator
whose lexicon is given in Table I. The output is a single
sentence per annotation with a set of relevant URIs sorted in Table II demonstrate Denote’s capabilities when
order of likelihood with confidence and available support compared to TagME and DBPedia Spotlight using the same
statistics. input text of: “BLT. The sub that proves great things come in
threes. In this case, those three things happen to be crisp
“Text” [Is_A {}] [[[With_Value •] Of_Units •] | bacon, lettuce and juicy tomato. While there's no scientific
Acting_As {}] [Cat_Of {}] way of proving it, this BLT might be the most perfect BLT
sandwich in existence. The default configuration for Denote,
Fig. 1. The output of an annotated text. TagME and Spotlight were unchanged. Spotlight does not
perform category analysis. TagMe gives a topic listing but
Denote uses a database of linked open data, represented this list is simply the annotated text rather than a separate
in the form of n-triples (), to categorization. Consequently, the [Cat_Of] portion of
perform annotations, similarity identification, Denote’s synopsis was omitted and left for part C.
DBPedia Spotlight was the least effective with the
1
http://linkeddata.org/ fewest annotations and an incorrect disambiguation of BLT
2
http://www.alchemyapi.com/ as a “Bizarre Love Triangle”. TagME performed well with
numerous annotations with few mistakes (incorrectly tagged movie was retrieved from the Internet Movie Database
words “crisp” and “juicy”. Both Denote and TagME shared (IMDb) and annotated. Table III gives the results.
similar annotations but it is through Denote’s [Acting_As]
vocabulary that provided context information. For example, TABLE III. DENOTE VERSUS ALCHEMY IN CATEGORY/TOPIC TAGGING
both correctly annotated “lettuce” to its surface form, but it
Annotated Denote with Category Determination Alchemy Entity
was Denote that identified that lettuce was acting as a main Word(s) Extraction
ingredient. Similarly, Denote linked the phrase “bacon, Corleone Is_A {/The_Family_Corleone} Cat_Of
lettuce, and juicy tomato” as an alias or alternate name. Family {/Italian_American_novels,
/Novels_about_organized_crime_in_the_United
_States,/Novels_by_Mario_Puzo,
TABLE II. ANNOTATION OF “BLT. THE […] IN EXISTENCE.” WITH /Family_saga_novels}
DENOTE, TAGME AND DBPEDIA SPOTLIGHT. Don TelevisionShow
Annotated Denote TagME DBPedia Vito Is_A {/Vito_Corleone} Cat_Of Person
Word(s) (DBPedia) (Wikipedia) Spotlight Corleone {/The_Godfather_characters}
(DBPedia/Wiki Vito Acting_As {Person}
pedia) New York Acting_As {Location} City
BLT Is_A {/BLT} Acting_As {/name} /Bizarre_Love Micheal Person
_Triangle Don Vito Person
BLT sandwich Is_A {/BLT} Acting_As {/name} /BLT Don Vito Is_A {/Don_Vito_Corleone} Cat_Of
sandwich /Sandwich Corleone {/The_Godfather_characters}
in existence /Existence Don’s Person
sub /Submarine_ Mafia Is_A {/Mafia_Don} Cat_Of
sandwich {/The_Godfather_characters}
crisp /Potato_chip Drugs Is_A {/Drugs} Cat_Of
bacon Is_A {/Bacon_sandwich, /Bacon {/The_Godfather_characters}
Bacon,Side_bacon} Acting_As
{/mainIngredient, /ingredient} Alchemy results were limited to primitive named entity
lettuce Is_A {/Lettuce} Acting_As /Lettuce
{/mainIngredient, /ingredient} types of city and person with the exception of an incorrect
juicy /Juice categorization of “television show”. In contrast, Denote
tomato Is_A {/Tomato} Acting_As /Tomato tagged text into rich categories that include “Italian-
{/mainIngredient, /ingredient} American novels”, “organized crime novels”, and
bacon , lettuce Acting_As {/alias,
and juicy /alternateName}
“Godfather characters ”.
tomato
scientific way /Scientific_m
ethod
IV. CONCLUSION
In this paper we demonstrated Denote – a semantic
annotator based on the DBPedia ontology and compared its
B. The Number Annotator features with that of same-class text taggers. Denote’s
The number annotator is unique with respect to other middleware engine demo is available at
annotators in that Denote attempts to identify text that is http://ls3.rnet.ryerson.ca/annotator while a developer-
normally associated with a numerical value. Using statistical friendly demo is at http://inextweb.com/denote_demo.
analysis on the Jaacard/Bayes-discovered list of similar Denote’s annotation capabilities are wrapped around a
concepts, Denote attempts to match up number values with RESTful interface allowing for 3rd-party developers to create
annotated text. Figure 2 demonstrates on the input text “The their own semantic-aware applications. The result, we hope,
radio shack color computer has only 16 kb of memory”. is an improvement in information search and retrieval for the
end user. Our future work involves parallelisation to scale
“memory” With_Value 16 Of_Units #int Cat_Of the service for a large number of concurrent clients. We are
{/Home_Computers, TRS-80_Color_Computer} also developing proof-of-concept demonstrations including a
Fig. 2. An example of number annotation with Denote semantic movie recommender whose database will be
included as a data-set to the LOD project.
C. The Categorizer
REFERENCES
Denote has access to over 656,000 categories defined in
DBPedia’s ontology. A Bayesian filter is
[1] P. Ferragina, and U. Scaiella, “TAGME: On-the-fly Annotation of
used on each similar concept in order to determine if the Short Text Fragments (by Wikipedia Entities)∗”, In Proceedings of
subject(s) of which the concept belongs to is contextually the 19th ACM international conference on Information and
related to the text being annotated. DBPedia Spotlight demo knowledge management (CIKM '10). 2010.
does not perform topic category determination. TagME’s [2] P. Mendes, M. Jakob, A. García-Silva, and C. Bizer. “DBpedia
demo performs topic categorization by simply listing its spotlight: shedding light on the web of documents”, In Proceedings of
annotated text in a cloud-tag structure rather than a defined the 7th International Conference on Semantic Systems (I-Semantics
'11), 2011.
set of category topics. Consequently, we compare Denote’s
[3] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak,
output with Alchemy. The Alchemy annotator can perform and S. Hellmann. “DBpedia - A crystallization point for the Web of
named entity extraction from a list of 200+ defined (sub)- Data.” In Web Semant. 7, 3 (September 2009), 154-165. 2009.
entities. In this comparison, the “storyline” of The Godfather