<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>FarsBase: a Cross-Domain Farsi knowledge Graph?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Central Tehran Branch</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Islamic Azad University</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tehran</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iran moh.sajadi.eng@iauctb.ac.ir</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Science and Technology</institution>
          ,
          <addr-line>Tehran</addr-line>
          ,
          <country country="IR">Iran</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Science and Technology</institution>
          ,
          <addr-line>Tehran</addr-line>
          ,
          <country>Iran b</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <abstract>
        <p>In this study, a cross-domain knowledge graph in Farsi language is presented, which consists of more than 500K of entities and 7 million relations. Data were extracted from Farsi edition of Wikipedia in addition to its structured data such as infoboxes and tables. According to the semantic web, RDF data model and OWL2 ontology were employed to implement the Farsi Knowledge Graph (FKG). An ontology, retrieved from DBpedia ontology, was developed based on resources of Farsi Wikipedia. Moreover, more than 8000 templates and properties of Wikipedia were mapped to the ontology automatically and manually. According to the Linked data, most of entities in the FKG have been connected to DBpedia and Wikidata resources by owl:sameAs. In order to achieve high performance and exible data model, a two-level architecture for storing data was designed to separate data from metadata. This design plays a key role in update operation and managing versions.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Web</kwd>
        <kwd>Knowledge graph Linked Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Knowledge graphs are large collections of interconnected entities enriched with
semantic annotations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In fact, a knowledge graph is a knowledge base of
facts about entities extracted from structured and semistructured information
or obtained from the result of some crowdsourcing process [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. it is widely used
in Search engines, Natural Language Processing (NLP), Question answering and
Information Retrieval (IR) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>In this work, the most comprehensive knowledge graph in Farsi language,
called FarsBase4, based on the Semantic Web concepts is presented.
This study is aimed to develop a knowledge graph based on Farsi entities and
relations enriched with semantic annotations for research and commercial
purposes. Hence, Wikipedia, one of the most important sources of knowledge bases,
is employed as input of the FKG system. Information of the encyclopedia is
converted to RDF format to form a rich knowledge base, a collection of triples.
Fig. 1 shows the process of developing FarsBase.
Wikipedia encyclopedia o ers a mass of information in structured and
unstructured format. The main source in the research is the infobox which is located at
left side of most articles in Farsi version. There are more than 100,000 templates
used in the Farsi edition, while a few of them are de ned for infobox. Infobox
templates have been de ned in Farsi and English to describe the same type of
things. Recognizing infobox templates is a challenge in this study. We obtained
some keywords experimentally through which most of infoboxes were extracted.</p>
      <p>The code implemented in the extraction phase is free accessible to repeat the
experiment and explore the usability of the results5.
FKG ontology has been retrieved from DBpedia ontology. Since the ontology is
based on English, it needs to be customized according to the Farsi Wikipedia.
Table. 1 o ers some information about the customized ontology. Some of the
added classes are: Imam, Marja, County, Rural district, Qanat, Waterfall, etc.</p>
    </sec>
    <sec id="sec-2">
      <title>5 https://github.com/IUST-DMLab/wiki-extractor</title>
      <p>FarsBase: a Cross-Domain Farsi knowledge Graph
One of the most important phases in FKG is to map infobox templates and
properties into the ontology. The aim of mapping is to integrate and organize
information in the ontology structure.</p>
      <p>FKG extractor is able to capture 1712 infobox templates covering more than
430k articles. Therefore, templates with more frequencies are prioritized to map.
Table. 2 reports a brief overview of the number of mappings in FKG and its
e ect on the nal output. The table shows that more than 90% of triples will be
mapped by mapping 40% templates and 30% properties.</p>
      <p>As g. 2a demonstrates, access to the knowledge graph is provided by a
SPARQL endpoint6. Fig. 2b Shows statements for a sample entity of FKG based
on the query result.
2.4</p>
      <p>
        Architecture of Storing
In FarsBase, a variety of data about triples is held such as source, version,
extraction time, expert opinion, triple status, etc. For this purpose, rei cation
technique increases complexity and decreases performance of a knowledge base
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],[6]. To solve this challenge, FarsBase de nes a two-level architecture for
storing data as follows:
{ First level: storing data and metadata in a NoSQL database.
      </p>
      <p>{ Second level: storing nal data in a triplestore.</p>
      <p>This architecture enhances performance and exibility of the knowledge base.
In the rst level, triples along with metadata are stored in MongoDB o ering
dynamic schema. In the second level, nal triples are held in OpenLink Virtuoso.</p>
    </sec>
    <sec id="sec-3">
      <title>6 http://farsbase.net/sparql</title>
      <p>(b) Triples of an entity of FarsBase</p>
      <p>FarsBase comparing to Farsi DBpedia
DBpedia fails to focus on Farsi language so that the Farsi edition su ers from lack
of any mapping. In Farsbase, not only has been e ectively made mappings but
also the ontology has been customized according to entities in Farsi wikipedia.
FKG employs page redirects to o er other labels of an entity to search engines
and other semantic systems. Moreover, it presents a two-level architecture for
storing data and metadata to service NLP and IR systems. In this project,
Transformation operation converts string values to proper formats and integrates
units of measurement. Therefore Farsbase is able to ask the question "what is
the highest mountain in the world?" by SPARQL query.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Marciuska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Zheleznyakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Faceted search over RDF-based knowledge graphs</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web 37-38</source>
          ,
          <issue>55</issue>
          {74 (mar
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Nentwig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hartung</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>A survey of current Link Discovery frameworks</article-title>
          .
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <volume>419</volume>
          {436 (dec
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Presutti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nuzzolese</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Consoli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Recupero</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>From hyperlinks to Semantic Web properties using Open Knowledge Extraction</article-title>
          .
          <source>Semantic Web</source>
          <volume>7</volume>
          (
          <issue>4</issue>
          ), 1{5 (may
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Vacura</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svatek</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An ontological investigation over human relations in linked data</article-title>
          .
          <source>Applied Ontology</source>
          <volume>11</volume>
          (
          <issue>3</issue>
          ),
          <volume>227</volume>
          {254 (oct
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maurino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pietrobon</surname>
          </string-name>
          , R.:
          <article-title>Quality assessment for linked data: A survey</article-title>
          .
          <source>Semantic</source>
          <volume>1</volume>
          ,
          <issue>1</issue>
          {
          <issue>5</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>