<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Semi-Automatic Semantic Annotation and Authoring Tool for a Library Help Desk Service</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antti Vehvila¨ inen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eero Hyvo¨ nen</string-name>
          <email>Eero.Hyvonen@tkk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olli Alm</string-name>
          <email>Olli.Alm@tkk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Helsinki University of, Technology (TKK), Laboratory of Media, Technology and, University of Helsinki, Semantic Computing, Research Group, http://www.seco.tkk.fi</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Helsinki University of, Technology (TKK), Laboratory of Media, Technology, Semantic Computing, Research Group, http://www.seco.tkk.fi</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Helsinki and, Helsinki University of, Technology (TKK), Semantic Computing, Research Group, http://www.seco.tkk.fi</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper discusses how knowledge technologies can be utilized in creating help desk services on the semantic web. To ease the content indexer's work, we propose semi-automatic semantic annotation of natural language text for annotating question-answer (QA) pairs, and case-based reasoning techniques for nding similar questions. To provide answers matching with the indexer's and end-user's information needs, methods for combining case-based reasoning with semantic search and browsing are proposed. We integrate di erent data sources by using large ontologies of upper common concepts, places, and agents. Techniques to utilize these sources in authoring answers are suggested. A prototype implementation of a real life ontology-based help desk application is presented as a proof of concept. This system is based on the data set of over 20,000 QA pairs and the operational principles of an existing national library help desk service in Finland.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Companies and public organizations widely use help desk
services in order to solve problems for their customers. The
classic example of a help desk service is a call center, where
support persons answer questions by phone or by email. As
help desk services are being transferred to the Web, it's more
and more common that the customers have also the
possibility to solve their problems by themselves by using the
knowledge and content accumulated at the service,
without contacting a support person directly [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A simple
approach, for example, is to publish Frequently Asked
Questions (FAQ) lists on the web. The option to use a simple
and fast question-answer (QA) self-service is appreciated not
only by the customers, but by the authors of the answers,
too. Their time is saved, if the QA service can
automatically provide an answer to the customer. Furthermore, the
author can use the accumulated QA knowledge of the
service by herself, which helps in authoring the answers and
improves the quality of the answers.
      </p>
      <p>This paper discusses applications of semantic web
technologies to help desk services. We focus on QA help desk
services, where the database of the service is composed of
previously answered questions, i.e., QA pairs. In such a service
the user has a question in mind, and the service has two
major tasks:
1. Finding relevant previous answers. A search method is
needed to nd the already answered relevant QA-pairs
from the repository.
2. Authoring a new answer. An existing QA pair may
satisfy the customers information need, but usually
some kind of adaptation of the old answer case is needed.
Usually answers are created and modi ed manually by
a human editor.</p>
      <p>The research problem of this paper is to investigate how to
support semi-automatic answer authoring in a QA help desk
service. Our methodology is to use semantic web
technologies in content annotation, in utilizing the QA repository,
and in integrating information available online on the web
with the authoring process and the answers.</p>
      <p>In this paper, when we use the term indexing we refer to
the old, existing way of doing indexing where index terms
are just strings without an ontological reference. We use the
term annotation to refer to the new way of using annotation
concepts that have an ontological reference.</p>
    </sec>
    <sec id="sec-2">
      <title>1.1 The Existing Service</title>
      <p>The research is based on a real life case study: we use the
data set of the operational Ask a librarian service1 o ered
nationally in Finland by the editors of the Libraries. 2
portal. In this service the clients can send questions to a virtual
librarian via email, and a librarian of the service provides
an answer within three working days. Some of the
questions that the clients send are simple and the librarian can
answer them straight away. These include questions about
the opening times of a library, how to make an inter-library
loan etc. However, most of the questions require that the</p>
      <sec id="sec-2-1">
        <title>1http://www.kirjastot. /tietopalvelu</title>
        <p>2Libraries. provides access to Finnish Library Net Services
under one user interface, see http://www.libraries. .
librarian uses more time to investigate the subject of the
question. These include questions like I'm wondering where
I could nd information about studies of the library and
information science? or I'm giving a presentation of Nokia.
Where I could nd helpful information? Answers to these
questions span typically a few paragraphs of text and
contain some links to useful web sites. The librarians report
that on average they use from half an hour to an hour to
compose such an answer.</p>
        <p>Each QA pair has been indexed using the YSA thesaurus3
of some 23,000 common Finnish terms. At the moment the
data set consists of over 20,000 QA pairs. A keyword-based
search service is available on the web for both end-users and
answering librarians to use.</p>
        <p>In the service, several problems were identi ed by enquiring
the librarians employed by the service:
1. Accessing accumulated knowledge. For a new
submitted question, the rst thing to do is often to nd out if
there already exists a similar or at least related answer
in the knowledge base.
2. Exploiting external resources in authoring. How to
integrate di erent data sources and services, such as
library systems on the web, and then use these sources
in authoring a new answer?
3. Semantic annotation. How to help the librarian in
choosing the appropriate annotation concepts for a</p>
      </sec>
      <sec id="sec-2-2">
        <title>3http://vesa.lib.helsinki.</title>
        <p>new QA pair? This problem was considered especially
crucial by the practitioner.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>1.2 The Proposed Solution</title>
      <p>
        The problems described above are approached by
describing a prototype of a semantic annotation and authoring tool
Opas4 [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The system is intended to be used by the
librarians in authoring answers in the Ask the librarian service.
In the following, we rst show how semi-automatic semantic
annotation can be used to help in choosing concepts for the
semantic annotation of QA pairs, based on ontologies. Then
the problem of nding relevant answers for a new incoming
question is approached by using ideas of case-based
reasoning (CBR) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It is also shown present how a common upper
ontology can be used to integrate di erent data sources to
help in authoring answers. We then present the results of
the early evaluations conducted with the prototype. In
conclusion, contributions of the work are summarized, related
work discussed, and directions of further research outlined.
      </p>
    </sec>
    <sec id="sec-4">
      <title>2. SEMI-AUTOMATIC SEMANTIC ANNO</title>
    </sec>
    <sec id="sec-5">
      <title>TATION</title>
      <p>When interviewing the librarians, two problems related to
the indexing the QA pairs were brought up: 1) Choosing
the appropriate indexing terms for annotating a
questionanswer pair is often consuming and di cult. 2) There are
di erent conventions used in indexing by di erent people,
which makes the content unbalanced. For example, one
li</p>
      <sec id="sec-5-1">
        <title>4http://www.seco.tkk. /applications/opas/</title>
        <p>
          brarian may use a few general terms to describe an answer,
whereas another uses a large number of more detailed terms.
Our solution approach to these problems is to combine
ontology-based semi-automatic annotation [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and machine
reasoning. The idea is to create a knowledge-based system
that automatically provides the annotator with a suggestion
of potential annotation concepts based on the textual
material and other knowledge available, such as the QA database,
earlier annotations, and common knowledge about indexing
practices. The initial suggestion is then checked and edited
by the human editor as she likes. This strategy not only
helps the annotator in nding annotation terms (from tens of
thousands of choices) but also enforces the annotators to use
right terms based on the underlying annotation ontologies.
Furthermore, content is likely to become more balanced
because every annotator starts her job from a suggestion based
on the same logic. By encoding indexers' knowledge and
common indexing practices as rules, or by using automatic
techniques such as collaborative ltering [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], it is possible to
help especially novice indexers in their job even further.
As a rst step towards such a knowledge-based
semiautomatic annotation tool, we created an ontology-based
information extraction tool Poka5 for textual data, and
in
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5http://www.seco.tkk. /applications/poka/</title>
        <p>tegrated it with Opas. The following describes brie y how
Poka works.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>2.1 Extracting Annotation Concepts</title>
      <p>Poka provides the QA indexer with a list of possible
annotation concepts as ontological concepts (URIs), and the
indexer chooses which concepts she wants to use. The
selection of the concepts is based on the words and expressions
found in the question and answer.</p>
      <p>The librarians currently choose the indexing terms manually
from the General Finnish Thesaurus YSA6. The terms in
YSA are (with some exceptions) common noun terms, such
as dog, astronomy, or child. In addition, the indexer may
use free indexing terms that are not explicitly listed in the
thesaurus. Free terms can be common nouns, such as names
of owers or animals, or proper nouns, such as person names
(e.g., John F. Kennedy) or geographical places (Finland,
Beijing). These categories of words, and free indexing terms
not explicitly listed in the thesaurus, are treated by Poka in
the following way.</p>
      <sec id="sec-6-1">
        <title>6http://www.vesa.lib.helsinki.</title>
        <sec id="sec-6-1-1">
          <title>2.1.1 Common Nouns</title>
          <p>
            In order to map common nouns in YSA with corresponding
ontology concepts, YSA was transformed into the General
Finnish Upper Ontology (YSO)7 [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. YSO contains over
20,000 Finnish indexing concepts organized into 10 major
subsumption hierarchies. Each concept is associated with
one or more term labels, which allows mapping of words
and terms onto YSO concepts (URIs).
          </p>
          <p>
            First, the input question is analysed by a morphological
analyser and a syntactic parser FDG8[
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. It produces
tokenized output of the text in XML-form. FDG produces a
lemmatized form of the word(s), morphological information,
syntactical information, and type and reference of functional
dependency to another token within a sentence, if there exist
one.
          </p>
          <p>
            For concept matching, also the labels of YSO-concepts are
lemmatized. Lemmatized concepts are indexed in a
prex trie for e cient extraction. Lemmatization of text and
concept names helps to achive better recall in the extraction
process; syntactical forms of words vary greatly in languages
with heavy morphological a xation[
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]. The architecture
can be extended to support other languages with di erent
language-dependent syntactic parsers.
          </p>
        </sec>
        <sec id="sec-6-1-2">
          <title>2.1.2 Place Names</title>
          <p>
            Place name recognition in Poka is based on the same
method as common noun recognition. In this case, the place
ontology of the MuseumFinland portal [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] extended in the
CultureSampo-project9 is used instead of YSO.
          </p>
        </sec>
        <sec id="sec-6-1-3">
          <title>2.1.3 Person Names</title>
          <p>
            Poka's name recognition tool is a rule-based information
extraction tool without initial gazetteers. The main idea
7http://www.seco.tkk. /ontologies/yso/
8http://www.connexor.com, Machinese Syntax
9http://www.seco.tkk. /projects/kulttuurisampo/
of the recognizer is rst to search for full names within the
text at hand. After that, occurrences of the rst and last
names are mapped to full names. Simple coreference
resolution within a document is implemented by mapping the
individual name occurrences to corresponding unambiguous
full name if there exist one. Individual rst names and
surnames without corresponding full names are discarded.
A strength of Poka's extraction process is that it recognizes
also untypical names, unlike the tools based on gazetteers,
such as tools that use the initial named entity recognition of
the Gate framework[
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. Searching potential names is started
from the uppercase words of the document. With
morphosyntactic clues some hits can be discarded. For example,
rst names in Finnish rarely have certain morphological
afxation like -ssa (similar to English preposition in) or -lla
(preposition on). Also the FDG-parser's surface-syntactic
analysis is used as clues for revealing the proper names.
Person name recognition may produce false hits. One wrong
hit of full name may cause the corresponding wrong rst and
last name occurrences to be mapped to a full name. The
good thing is that all the occurrences of the false name can
be corrected by discarding the full name.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>2.2 Free Annotation Concepts</title>
      <p>Poka doesn't always suggest all annotation concepts that
the librarian wants to use, even if the corresponding word
can be found in the text to be annotated, and the word
is considered a legal annotation concept. This happens
always with free annotation concepts that by de nition are
not included in the ontology explicitly. Obviously, human
intervention is necessary in such cases.</p>
      <p>
        Our approach to the problem of extracting free annotation
concepts is to provide a mechanism by which the end-users
can de ne new free annotation entries in the ontology and
share them with other annotators. A new annotation
concept is de ned by simply telling the system its class, label,
and an optional comment. For example, the term
"leikkiauto" (toy car) is not present in YSO ontology because lots
of things can be used as toys, and it does not make much
sense to list them all in the system. On the other hand, the
concept toy car is useful from the indexing and information
retrieval view points. In this case, the user can interactively
create a new concept as a subclass of an existing ontological
concept, here toy (\lelu"), label it, here \leikkiauto" (toy
car), and use it in the annotation. When searching for
content later on by using the concept toy (\lelu"), also QA pairs
annotated with toy car (\leikkiauto") can be retrieved with
the additional information that in this case the QA pair is
about toy cars in particular. The new concept of toy car
also be utilized in various ways in the user interface, e.g., as
a search category in view-based semantic search [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Free
indexing terms with the same name can be distinguished
with di erent URIs and with an additional comment.
Unknown but relevant annotation concepts without a
corresponding concept in the ontologies are frequently
encountered also in name recognition because new names (e.g.,
names of pop stars) are constantly introduced as time goes
by. The same approach used with free annotation concepts
can be employed here, too.
      </p>
      <p>In some cases where a word does not have an exact match
with an ontological concept, Poka is able to suggest related
annotation concepts based on the ontology. Such reasoning
can be based, for example, on the morphological structure of
a compound word or the functional dependencies produced
by the FDG-parser.</p>
    </sec>
    <sec id="sec-8">
      <title>2.3 Ranking Annotation Concepts</title>
      <p>
        Previous sections analyzed situations where a semantic
annotator produces too few relevant annotation concepts. A
reverse problem with automatic semantic annotation is that
often too many irrelevant concepts are suggested.
Especially, if the input text is long, a considerable number of
possible annotation concepts are usually found. In such cases
it is useful to rank the concepts according to their likely
relevance, and provide the end-user with a simple mechanism
for evaluating and deleting the irrelevant annotations.
Opas uses the idea [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] of searching for semantic cluster(s)
from the term set for determining the relevance of indexing
concepts: terms in semantic clusters are ranked more
relevant than semantically isolated terms. For example terms
doctor, sickness and medication form a semantic cluster. For
common noun terms we use the concept relations de ned in
the YSO ontology to identify these clusters.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], an ontological extension of the classic tf-idf (term
frequency { inverse document frequency) method is developed,
which enables us to identify synonyms and to utilize the
concept hierarchies of the ontology. We apply this work so
that more weight is given to concepts that appear frequently
in the text but haven't been used often as annotation
concepts in previous questions. In addition, Opas can suggest
annotation concepts that are usually used together. For
example, if a question has the concept aviation extracted, and
there are lots of questions annotated with both aviation and
airplane, the concept airplane can be suggested for
annotation concept, even though it is not explicitly present in the
question text.
      </p>
      <p>Our preliminary experiments with annotation concept
weighting seem to suggest that relatively more weight should
be given to terms that have a high term frequency, and the
e ect of inverse document frequency should be relatively
smaller. The reasoning behind this is that if, say, the
concept poetry appears in a question many times, it seems that
the concept is relevant to the question even though it has
been used frequently as an annotation concept in previous
questions. So, in Opas the main weight is determined by the
term frequency, whereas inverse document frequency and
semantic clusters have a smaller impact on the weight.</p>
    </sec>
    <sec id="sec-9">
      <title>2.4 An Example</title>
      <p>Figure 1 depicts the rst screen that the librarian sees when
she has decided to answer a question. The end-user has
submitted a question about Arto Paasilinna's (a Finnish
author) life and his books (on the left, in the box
\Kysymysteksti" (Question Text). On the right, in the box \Oppaan
loytamat kasitteet" (Indexing Concepts Found) there are
two common noun concepts \teokset" (writings) and
\esitelmat" (plays). Poka has also identi ed the person name
\Arto Paasilinna". Below the question text, there is the
authoring component (\Vastaajan apurit") (Authoring Tools)
to be discussed in detail in section 4.</p>
      <p>Figure 2 depicts the case where the free annotation
concept \leikkiauto" (toy car) is encountered. In this case,
Poka analyses the compound term into pieces and suggests
the concept \leikkikalu" toy because it is found in the YSO
ontology as a potentially related concept based on the rst
part of the compound. The librarian can then de ne the
narrower concept toy car with the label \leikkiautot" toy
cars by clicking on the link in the middle.</p>
      <p>
        Figure 3 depicts the case where Poka is unable to make any
suggestions, and the librarian wants to add the new
annotation concept writer (\kirjailijat") in the ontology. As she
is typing in the word, Opas uses semantic autocompletion
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to suggest matching annotation concepts in YSO. The
oating box on the bottom right displays information about
a concept, its preferred and alternative labels, related
concepts, subconcepts, and superconcepts. This information is
displayed when the librarian points the concepts with the
mouse. The purpose of the autocompletion component is to
1) ensure that the indexer uses a concept found in the
ontology and 2) suggest semantically related indexing concepts
that the librarian perhaps didn't consider.
      </p>
    </sec>
    <sec id="sec-10">
      <title>3. UTILIZING CASE-BASED REASONING</title>
    </sec>
    <sec id="sec-11">
      <title>TO FIND SIMILAR QUESTIONS</title>
      <p>
        Case-based reasoning (CBR) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a problem solving
paradigm in arti cial intelligence where new problems are
solved based on previously experienced similar problems,
cases. The CBR cycle consists of four phases: 1) Retrieve
he most similar case or cases, 2) Reuse the retrieved case(s)
to solve the problem, 3) Revise the proposed solution and
4) Retain the solution as a new case in the case base.
Since similar QA pairs recur in QA services, we decided to
investigate the usefulness of CBR in QA indexing and
information retrieval. CBR has been used in help desk
applications previously. For example, Goker and Roth-Berghofer [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
argue that CBR can successfully be used in a help desk
service and by using CBR in help desk service an organization
can strengthen the common knowledge and reduce the time
needed to answer a help request. Kai et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] have found
out that users of a CBR-based help desk system tend to
remember solutions longer since they feel that they've solved
the problem themselves, even though the solution was
retrieved and possibly adapted from the case base.
What Opas brings in to traditional CBR approach is that it
integrates semantic annotation to the steps of the CBR
cycle. For the rst step, Opas contains a CBR component
that automatically searches for similar questions based on
the concepts that Poka has extracted from the question
text. The weighted annotation concept list discussed in
section 2.3 is used as the basis for the search with the following
modi cations: 1) The concepts that the indexer has selected
are given a substantially higher weight since their relevance
has been con rmed by the indexer. 2) The extracted places,
names and speci ed concepts are given a higher weight due
to their speci city.
      </p>
    </sec>
    <sec id="sec-12">
      <title>4. INTEGRATING DIFFERENT DATA</title>
    </sec>
    <sec id="sec-13">
      <title>SOURCES IN ANSWER AUTHORING</title>
      <p>When discussed the current service with the librarians, a
few things were remarkable about the information sources
that the librarians use when answering a question. Firstly,
nearly all of the librarians said that they use the reference
library with real books to nd useful resources. Secondly,
even though nearly all the librarians agreed that the
questions tend to repeat themselves, not many of them
systematically use the question archive to nd old similar
questions. Besides that, it is remarkable that when the
librarians aren't able to answer a question in three working days,
they nevertheless send an answer to the client. This answer
usually contains pointers to di erent information resources,
for example web sites, that might contain the answer to the
question.</p>
      <p>Based on the remarks described above, we decided to add
an authoring component to Opas. The purpose of this
component is to help the librarian to compose the answer using
di erent information sources. The authoring component can
be seen in the gure 1 ("Vastaajan apurit"). What is
common to these authoring components is that each of them uses
the annotation concept suggestions produced by Poka to
query external resources. The common upper ontology YSO
acts as a "glue" between di erent information resources. In
the following the subcomponents of the authoring
component are explained.</p>
    </sec>
    <sec id="sec-14">
      <title>4.1 Authoring Using Existing QA Pairs</title>
      <p>Existing QA pairs can be used as a basis for composing
the new answer. In gure 4 the librarian has opened one
of the questions in order to see whether it provides useful
information for answering the question. The answer can
be used as basis for the new answer by clicking the link
(the white paper sheet with a pen). Figure 5 depicts how
the librarian has used an existing answer as a basis for the
answer.</p>
      <p>As the retrieval of similar QA pairs can be seen as the rst
step in the CBR cycle, using them in authoring component
can be seen as a part of the second step: Reuse the retrieved
case(s) to solve the problem.</p>
    </sec>
    <sec id="sec-15">
      <title>4.2 Authoring Using a Library Classification</title>
    </sec>
    <sec id="sec-16">
      <title>System</title>
      <p>
        An ontology for a library classi cation system was created
for Opas, and then the Helsinki City Library Classi
cation System (HCLCS) 10 was converted into this ontologized
form. The basis for the classi cation ontology is Simple
Knowledge Organisation System (SKOS)11 and the
conversion was made following the guidelines given in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. In
addition to class hierarchies the HCLCS contains index terms,
and each of these terms has got a relation to a library class.
For example the term Treatment of alcoholics has got a
relation to the library class 371.71 Alcohol policy.
Index terms in the HCLCS contain also views, as can be seen
in the gure 6. For example the term pieces of art
("Teokset") embodies di erent viewpoints such as bibliographies
and art collections. Each of these viewpoint is related to
a library class. These relations between index terms and
library classes are used to search for books that could be
relevant to the answer. These books are searched based on
the library class, as depicted in the gure 6. The librarian
can use the results of the book search 1) for searching an
answer for the question and 2) by enhancing the answer with
links to interesting books.
10http://hklj.kirjastot. /
11http://www.w3.org/2004/02/skos/
      </p>
    </sec>
    <sec id="sec-17">
      <title>4.3 Authoring Using a Link Library</title>
      <p>The editors of the Libraries. maintain a collection of links
to interesting web sites. This link library is categorized using
the same classi cation system that is used in the HCLCS.
An ontology was created and then the data was converted
into an ontologized form in a similar manner than described
in the previous section. The gure 7 depicts a screenshot of
this link library. The links are categorized by the HCLCS
("Henkilobibliogra at", "Lastenkirjastotyo", etc.), and the
librarian has opened one category to see whether there are
interesting links. These links can be added to the answer
text as can be seen in the gure 5.</p>
    </sec>
    <sec id="sec-18">
      <title>5. EVALUATION</title>
      <p>To evaluate the current version of the prototype and to nd
out librarians' initial attitudes towards the new version of
the system, a few user tests were run with real users of the
service. The tests were conducted so that the librarian was
rst introduced with the prototype and its features. Then,
she was asked to answer a question using the prototype.
The questions were real questions of the existing version of
the service. Finally, the librarian was interviewed about the
answering process.</p>
      <p>The results of the evaluation were encouraging. All
librarians found the features of the prototype useful and said that
they would take the prototype into use, if it were possible.
The most impressing and useful feature for the librarians
seemed to be the authoring features of the prototype,
especially the component that searches for existing similar
questions automatically. All librarians were also pleased with
the authoring features that enable to add resources (old
answers, links, book references) to the answer by clicking a
button.</p>
      <p>The annotation concept suggestions were welcomed, but
not as eagerly as the authoring components. Some of the
librarians said that the concept suggestions were entirely
irrelevant. The semantic autocompletion component that
searches for concepts in YSO was considered useful. Based
on the tests, nothing can yet be said about how good the
ranking of the concept suggestions was.</p>
      <p>When a librarian hasn't selected and con rmed any of the
suggested annotation concepts, the authoring component
fetches resources based on all of the concepts in the list.
However, when the librarian had selected one or more
suggestions to be used, it was confusing that still the authoring
component fetched resources related to unselected concepts.
Although these resources were given a smaller weight and
thus they were lower in the result list, it seems that when
the librarian has selected one or more concept suggestion
or inserted a free annotation concept, the other, unselected
concepts should be ignored totally in the result lists of the
authoring components.</p>
    </sec>
    <sec id="sec-19">
      <title>6. DISCUSSION</title>
      <p>First experiments with combining semi-automatic
semantic annotation and authoring with the ideas of case-based
reasoning seem promising. Even though the evaluation of
the prototype wasn't extensive, it can be concluded that
Opas would be a valuable tool to librarians if taken into
use. However, systematic empirical evaluations of the
application are yet to be done.</p>
      <p>
        Currently the book search component isn't using
semantically annotated content, but instead fetches web pages and
then parses the results from the HTML content. In
consequence, one of the major bene ts of the semantic web,
disambiguation of terms (for example, "Nokia" as an
enterprise and as a city) is not possible. Opas would bene t more
from a system with semantically annotated content.
The utilization of case-based reasoning in Opas can be seen
somewhat shallow. The ideas of CBR and the steps of the
CBR-process t well with Opas, but the details of each step
could be examined more carefully. For example a framework
for similarity assessment presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] could be utilized
for the retrieval of similar QA pairs.
      </p>
      <p>A result of the the evaluation was that the annotation
concept suggestions weren't optimal. Sophisticated methods for
ranking the suggestions and nding out which concepts
really are relevant for a user query should be investigated and
developed further.</p>
    </sec>
    <sec id="sec-20">
      <title>6.1 Related Work</title>
      <p>
        To search for similar questions some other approaches would
have been possible as well. For example Kohonen et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
demonstrate how Self Organizing Maps [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] (SOM) can be
used to organize a vast collection of patent abstracts and
then use the SOM to search if similar patents exist for a
new patent application. A standard text search by using for
example the Java search engine Lucene12 would also
probably yield su cient results when searching for similar
questions. However these methods don't take into account the
semantics of the text, and we want to be able to utilize the
semantic relations de ned in the common upper ontology
YSO.
      </p>
      <p>
        As for semantic authoring, David Aumuller [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] presents a
technique to semantically author Wiki pages. The technique
is not just for adding annotations to the pages but also for
editing the text. His ideas could be applied in authoring the
answers.
      </p>
    </sec>
    <sec id="sec-21">
      <title>6.2 Future Work</title>
      <p>Currently Opas is focused on the indexers' role in QA
applications but Opas will include the end-users' side, too. Here
we work on questions such as: how to classify the QA pairs
for semantic view-based search, how to do semantic
recommending in order to show other interesting answers, and
how to integrate the system with semantic content and
services at other locations on the web related to the end-user's
information needs. The CBR component that searches for
similar questions can be used with little modi cations at the
end-users' side, too.</p>
    </sec>
    <sec id="sec-22">
      <title>Acknowledgments</title>
      <p>Our work is a part of the National Semantic Web Ontology
Project in Finland (FinnONTO)13, funded by the National
Funding Agency for Technology and Innovation (Tekes) and
a consortium of 36 public organizations and companies.
12http://lucene.apache.org
13http://www.seco.tkk. /projects/ nnonto/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Aamodt</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Plaza</surname>
          </string-name>
          .
          <article-title>Case-based reasoning: foundational issues, methodological variations, and system approaches</article-title>
          .
          <source>AI Commun</source>
          .,
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <volume>39</volume>
          {
          <fpage>59</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Aumueller</surname>
          </string-name>
          .
          <article-title>Semantic authoring and retrieval within a wiki</article-title>
          ,
          <year>Aug 2005</year>
          . Demo paper,
          <source>2nd European Semantic Web Conference</source>
          <year>2005</year>
          (
          <article-title>ESWC2005).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tablan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          .
          <article-title>Evolving gate to meet new challenges in language engineering</article-title>
          .
          <source>Natural Language Engineering</source>
          ,
          <volume>10</volume>
          (
          <issue>3</issue>
          /4):
          <volume>349</volume>
          |
          <fpage>373</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Falkman</surname>
          </string-name>
          .
          <article-title>Issues in Structured Knowledge Representation A De nitional Approach with Application to Case-Based Reasoning and Medical Informatics</article-title>
          .
          <source>PhD thesis</source>
          , Chalmers University of Technology, Goteborg University,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Foo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Leong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>An integrated help support for customer services over the world wide web: a case study</article-title>
          .
          <source>Comput. Ind.</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ):
          <volume>129</volume>
          {
          <fpage>145</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Goker</surname>
          </string-name>
          and
          <string-name>
            <surname>T.</surname>
          </string-name>
          Roth-Berghofer.
          <article-title>The development and utilization of the case-based help-desk support system homer</article-title>
          .
          <source>Engineering Applications of Arti cial Intelligence</source>
          ,
          <volume>12</volume>
          (
          <issue>6</issue>
          ):
          <volume>665</volume>
          {
          <fpage>680</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Herlocker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <article-title>Explaining collaborative ltering recommendations</article-title>
          .
          <source>In Computer Supported Cooperative Work</source>
          , pages
          <volume>241</volume>
          {
          <fpage>250</fpage>
          . ACM,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Holi</surname>
          </string-name>
          , E. Hyvonen, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Lindgren</surname>
          </string-name>
          .
          <article-title>Integrating tf-idf weighting with fuzzy view-based search</article-title>
          .
          <source>In Proceedings of the ECAI Workshop on Text-Based Information Retrieval (TIR-06)</source>
          ,
          <year>Aug 2006</year>
          . To be published.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvo</surname>
          </string-name>
          <article-title>nen</article-title>
          and E. Makela.
          <article-title>Semantic autocompletion</article-title>
          .
          <source>In Proceedings of the 1st Asian Semantic Web Conference (ASWC-2006)</source>
          , Beijing, Sep 4-
          <issue>9</issue>
          ,
          <year>2006</year>
          . forth-coming.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvo</surname>
          </string-name>
          <article-title>nen</article-title>
          , E. Makela,
          <string-name>
            <given-names>M.</given-names>
            <surname>Salminen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Valo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Viljanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saarela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Junnila</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Kettula.</surname>
          </string-name>
          <article-title>MuseumFinland { Finnish museums on the semantic web</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          {3):
          <volume>224</volume>
          {
          <fpage>241</fpage>
          ,
          <string-name>
            <surname>Oct</surname>
          </string-name>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hyvo</surname>
          </string-name>
          <article-title>nen,</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Valo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Komulainen</surname>
          </string-name>
          , K. Seppala, T. Kauppinen,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ruotsalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Salminen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ylisalmi</surname>
          </string-name>
          .
          <article-title>Finnish national ontologies for the semantic web - towards a content and service infrastructure</article-title>
          .
          <source>In Proceedings of International Conference on Dublin Core and Metadata Applications (DC</source>
          <year>2005</year>
          ),
          <year>Nov 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Raman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Carlisle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Cross</surname>
          </string-name>
          .
          <article-title>A self-improving helpdesk service system using case-based reasoning techniques</article-title>
          . Computers in Industry,
          <volume>30</volume>
          (
          <issue>2</issue>
          ):
          <volume>113</volume>
          {
          <fpage>125</fpage>
          ,
          <year>September 1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kiyavitskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zeni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Cordy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mich</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Mylopoulos</surname>
          </string-name>
          .
          <article-title>Semi-automatic semantic annotations for web documents</article-title>
          .
          <source>In SWAP</source>
          <year>2005</year>
          ,
          <article-title>Semantic Web Applications and Perspectives</article-title>
          ,
          <source>Proceedings of the 2nd Italian</source>
          Semantic Web Workshop University of Trento, Trento, Italy,
          <fpage>14</fpage>
          -
          <lpage>15</lpage>
          -
          <issue>16</issue>
          <year>December 2005</year>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kohonen</surname>
          </string-name>
          .
          <article-title>The self-organizing map</article-title>
          .
          <source>Proceedings of the IEEE</source>
          ,
          <volume>78</volume>
          (
          <issue>9</issue>
          ):
          <volume>1464</volume>
          {
          <fpage>1480</fpage>
          ,
          <string-name>
            <surname>Sep</surname>
          </string-name>
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kohonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kaski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lagus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Salojarvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Honkela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Paatero</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Saarela</surname>
          </string-name>
          .
          <article-title>Self organization of a massive document collection</article-title>
          .
          <source>Neural Networks, IEEE Transactions</source>
          ,
          <volume>11</volume>
          (
          <issue>3</issue>
          ):
          <volume>574</volume>
          {
          <fpage>585</fpage>
          , May
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Luit</surname>
          </string-name>
          <string-name>
            <surname>Gazendam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Veronique</given-names>
            <surname>Malaise</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Brugman</surname>
          </string-name>
          .
          <article-title>Deriving semantic annotations of an audiovisual program from contextual texts</article-title>
          .
          <source>In Semantic Web Annotation of Multimedia (SWAMM'06) workshop</source>
          ,
          <year>2006</year>
          . http://www.cs.vu.nl/ guus/papers/Gazendam06a.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lo</surname>
          </string-name>
          fberg,
          <string-name>
            <given-names>D.</given-names>
            <surname>Archer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Piao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rayson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>McEnery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Varantola</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Juntunen</surname>
          </string-name>
          .
          <article-title>Porting an english semantic tagger to the nnish language</article-title>
          .
          <source>In Proceedings of the Corpus Linguistics</source>
          <year>2003</year>
          conference, pages
          <volume>457</volume>
          {
          <fpage>464</fpage>
          . UCREL, Lancaster University,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Tapanainen</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ja</surname>
          </string-name>
          <article-title>rvinen. A non-projective dependency parser</article-title>
          .
          <source>Proceedings of the 5th Conference on Applied Natural Language Processing</source>
          , pages
          <volume>64</volume>
          {
          <fpage>71</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>M. van Assem</surname>
            ,
            <given-names>M. R.</given-names>
          </string-name>
          <string-name>
            <surname>Menken</surname>
            , G. Schreiber,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wielemaker</surname>
            , and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Wielinga</surname>
          </string-name>
          .
          <article-title>A method for converting thesauri to rdf/owl</article-title>
          . In
          <source>Third International Semantic Web Conference ISWC</source>
          <year>2004</year>
          , volume
          <volume>3298</volume>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vehvila</surname>
          </string-name>
          <article-title>inen,</article-title>
          <string-name>
            <surname>O. Alm</surname>
            , and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hyvo</surname>
          </string-name>
          <article-title>nen. Combining case-based reasoning and semantic indexing in a question-answer service</article-title>
          ,
          <source>June</source>
          <volume>20</volume>
          2006. Poster paper,
          <source>1st Asian Semantic Web Conference (ASWC2006).</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>