<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Frame-Semantic Web: a Case Study for Korean?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jungyeul Parkyz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sejin Namz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Youngsik Kimz Younggyun Hahmz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dosam Hwangzx</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Key-Sun Choiz</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>FrameNet itself can become a resource for the Semantic Web. It can be represented in RDF. However, mapping FrameNet to other resources such as Wikipedia for building a knowledge base becomes more common practice. By such mapping, FrameNet can be considered to provide capability to describe the semantic relations between RDF data. Since the FrameNet resource has been proven very useful, multiple global projects for other languages have arisen over the years, parallel to the original English FrameNet. Accordingly, signi cant steps were made to further develop FrameNet for Korean. This paper presents how frame semantics becomes a frame-semantic web. We also provide the Wikipedia coverage by Korean FrameNet lexicons in the context of constructing a knowledge base from sentences in Wikipedia to show the usefulness of our work on frame semantics in the Semantic Web environment.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Web</kwd>
        <kwd>Frame Semantics</kwd>
        <kwd>FrameNet</kwd>
        <kwd>Korean FrameNet</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        FrameNet [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]1 is a both human- and machine-readable large-scale on-line lexical
database, not only consists of thousands and thousands of words and sentences,
but, moreover, an extensive and complex range of semantic information as well.
Based on a theory of meaning called frame semantics, FrameNet strongly
supports an idea that the meanings of words and sentences can be best understood
on the basis of a semantic frame, a coherent conceptual structure of a word
describing a type of event, relation, or entity and the participants in it. It is
believed that semantic frames of related concepts are inseparable from each other,
so that, one cannot have complete understanding of a word, without knowledge
of all the semantic frames related to that word. FrameNet itself serves as a great
example of such a principle, wherein 1,180 semantic frames closely link together
by a system of semantic relations and provide a solid basis for reasoning about
the meaning of the entire text.
? This work was supported by the IT R&amp;D program of MSIP/KEIT. [10044494,
      </p>
      <p>WiseKB: Big data based self-evolving knowledge base and reasoning platform]
1 https://framenet.icsi.berkeley.edu</p>
      <p>
        FrameNet itself can become a resource for the Semantic Web as represented
in RDF/OWL [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Mapping FrameNet to other resources such as Wikipedia
for building a knowledge base can also be considered to provide capability to
describe the semantic relations between RDF data. Since the FrameNet resource
has been proven useful in the development of a number of other NLP
applications, even in the Semantic Web environment such as in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], multiple global
projects have arisen over the years, parallel to the original English FrameNet,
for a wide variety of languages around the world. In addition to Brazilian
Portuguese2, French3, German (the SALSA Project)4, Japanese5, Spanish6, and
Swedish7, signi cant steps were made to further develop FrameNet for Korean,
and the following sections of this paper present the process and mechanisms.
By using FrameNet, it can become a frame-semantic web where frame semantics
is enabled for the Semantic Web. We also provide the Wikipedia coverage by
Korean FrameNet lexicons in the context of constructing a knowledge base from
sentences in Wikipedia. It can show how the frame-semantic web would be useful
in the Semantic Web environment.
2
      </p>
      <p>Building a Database of Frame Semantic Information
for Korean
We describe the manual construction of a FrameNet-style annotated corpus for
Korean translated from the FrameNet corpus and its FE transfer based on
English-Korean alignment using cross-linguistic projection proposed in [5, ?].
We also explain this process by using the translated Korean FrameNet corpus
and its counterpart English corpus as our bilingual parallel corpus. We propose
a method for mapping a Korean LU to an existing FrameNet-de ned frame to
acquire a Korean frame semantic lexicon. Finally, we illustrate a self-training
technique that can build a database of large-scale frame semantic information
for Korean.</p>
      <p>Manual Construction: The development of FrameNet for Korean has been
the central goal of our project, and we have chosen to perform this task by
starting o with \manually translating" the already-existing FrameNet from
English to Korean language. Such decision was made on the grounds that, even
though obtaining a large set of data through means of manual translation can be
a di cult, costly and time-consuming process, its expected advantages indeed
far outweigh the charge in the long run. The fact that only humans can really
develop a true understanding and appreciation of the complexities of languages,
subject knowledge and expertise, creativity and cultural sensitivity also makes
manual translation the best option to adopt for our project. Expert translators
2 http://www.ufjf.br/framenetbr
3 https://sites.google.com/site/anrasfalda
4 http://www.coli.uni-saarland.de/projects/salsa
5 http://jfn.st.hc.keio.ac.jp
6 http://sfn.uab.es/SFN
7 http://spraakbanken.gu.se/eng/swefn
performed manual translation for all FrameNet full text annotated corpus with
a word alignment recommendation system. A guideline manual for translating
the FrameNet-style annotated corpus to Korean sentences was prepared for the
clean transferring of English FrameNet annotated sentences to Korean.</p>
      <p>
        Automatic Construction: We also extend previous approaches described
in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] using a bilingual English-Korean parallel corpus. Assuming that the same
kinds of frame elements (FEs) exist for each frame for the English and Korean
sentences, we achieve the cross-linguistic projection of English FE annotation to
Korean via alignment of tokenized English and Korean sentences. English FE
realization can be projected to its corresponding Korean sentences by transforming
consecutive series of Korean tokens in the Korean translation of any given
sentence. Since the alignment of English tokens to Korean tokens de nes the
transformation, the success of token alignment is crucial for the cross-linguistic
projection process. For frame population to Korean lexical units (LUs), we present
our method for the automatic creation of the Korean frame semantic lexicon for
verbs in this section. We start by nding an appropriate translation for each verb
to create a mapping between a Korean LU and an existing FrameNet-de ned
frame. In contrast to mapping from one sense to one frame, mapping to more
than one frame requires using a further disambiguation process to select the most
probable frame for a given verb. We use maximum likelihood estimation (MLE)
for possible frames from the existing annotated corpora to select the correct
frame. For the current work, we only used FrameNets lexicographic annotation
to estimate MLE. We use the Sejong predicate dictionary8 for frame semantic
lexicon acquisition. We place 16,807 Korean verbs in FrameNet-de ned frames,
which constitute 12,764 distinctive orthographic units in Korean. We assume
that FEs with respect to the assigned frame for Korean LUs are directly
equivalent to the FEs in the corresponding English frames. Thus, we do not consider
rede ning FEs speci cally for Korean.
      </p>
      <p>
        Bootstrapping Frame-Semantic Information: Self-training for frame
semantic role projection consists of annotating FrameNet-style semantic
information, inducing word alignments between two languages, and projecting semantic
information of the source language onto the target language. We used the
bilingual parallel corpus for self-training, and a probabilistic frame-semantic parser
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to annotate semantic information of the source language (English). Then, we
induced an HMM word alignment model between English and Korean with a
statistical machine translation toolkit. Finally, we projected semantic roles
information from the English onto the Korean sentences. For the experiment, we
employed a large bilingual English-Korean parallel corpus, which contains
almost 100,000 bilingual parallel sentences to bootstrap the semantic information.
During self-training, errors in the original model would be ampli ed in the new
model; thus, we calibrate the results of the frame-semantic parser by using the
con dence score of the frame-semantic parser as a threshold. As a result, 120,621
pairs of frames with their FEs are obtained and among them 30,149 are unique;
715 frames are used for 10,898 di erent lexica.
      </p>
    </sec>
    <sec id="sec-2">
      <title>8 http://www.sejong.or.kr</title>
      <sec id="sec-2-1">
        <title>Linking FrameNet to Wikipedia</title>
        <p>DBpedia9 is a knowledge base constructed from Wikipedia based on DBpedia
ontology (DBO). DBO can be viewed as a vocabulary to represent knowledge
in Wikipedia. However, DBO is a Wikipedia-Infobox-driven ontology. That is,
although DBO is suitable to represent essential information of Wikipedia, it does
not guarantee enough to represent knowledge in Wikipedia written in a natural
language. In overcoming such a problem, FrameNet has been considered useful
in linguistic level as a language resource representing semantics. We calculate
the Wikipedia coverage rate by DBO and FrameNets LUs to match the relation
instantiation from DBpedia and FrameNet to Wikipedia. Before we calculate
the Wikipedia coverage rate, we need to know which sentences within Wikipedia
actually contain knowledge. We de ne that a typical sentence with extractable
knowledge can be linked to DBpedia entities as a triple. From almost three
millions sentences in Korean Wikipedia, we nd over four millions predicates
for cases where only a subject appears, only an object appears, or both of a
subject and an object appear (2.11 predicates per sentence). We obtain 6.92%
and 95.19% for DBO and FrameNets LUs, respectively. The shortage of DBO
can be explained that DBO is too small to cover actual predicates in Wikipedia
only by pre-de ned predicates in DBO. However, FrameNet gives almost full
coverage for sentences with extractable knowledge, which is very promising for
extracting and representing knowledge in Wikipedia using FrameNet.
4</p>
      </sec>
      <sec id="sec-2-2">
        <title>Discussion and Conclusion</title>
        <p>Throughout this paper, by building a database of frame semantic information,
we explained that FrameNet can become a resource for the Semantic Web and it
can gather lexical linked data and knowledge patterns with almost full coverage
for Wikipedia.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>9 http://dbpedia.org/About</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ruppenhofer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ellsworth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petruck</surname>
            ,
            <given-names>M.R.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>C.R.</given-names>
          </string-name>
          , Sche czyk, J.:
          <source>FrameNet II: Extended Theory and Practice</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fillmore</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>C.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petruck</surname>
            ,
            <given-names>M.R.L.</given-names>
          </string-name>
          :
          <article-title>FrameNet Meets the Semantic Web: A DAML+OIL Frame Representation</article-title>
          .
          <source>In: Proc. of AAAI-02.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>C.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fillmore</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petruck</surname>
            ,
            <given-names>M.R.L.</given-names>
          </string-name>
          :
          <article-title>FrameNet Meets the Semantic Web: Lexical Semantics for the Web</article-title>
          .
          <source>In: ISWC</source>
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fossati</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tonelli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giuliano</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Frame Semantics Annotation Made Easy with DBpedia</article-title>
          .
          <source>In: Proc. of CrowdSem2013</source>
          .
          <volume>69</volume>
          {
          <fpage>78</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Pado</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lapata</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Cross-lingual Bootstrapping of Semantic Lexicons: The Case of FrameNet</article-title>
          .
          <source>In: Proc. of AAAI-05.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>Probabilistic Frame-Semantic Parsing</article-title>
          .
          <source>In: Proc. of NAACL</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>