<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Saturation in Retrospective Text Document Collections</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victoria Kosa</string-name>
          <email>victoriya1402.kosa@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alyona Chugunenko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eugene Yuschenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Badenes</string-name>
          <email>cbadenes@fi.upm.es</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vadim Ermolayev</string-name>
          <email>vadim@ermolayev.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aliaksandr Birukou</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Key Terms. KnowledgeEngineeringMethodology, KnowledgeEngineering-
Process, SubjectExpert, KnowledgeEvolution, KnowledgeRepresentation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>BWT Group</institution>
          ,
          <addr-line>Mayakovskogo st. 11, 69035, Zaporizhzhya</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Zaporizhzhya National University</institution>
          ,
          <addr-line>Zhukovskogo st. 66, 69600, Zaporizhzhya</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Ontology Engineering Group, Universidad Politécnica de Madrid</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Springer-Verlag GmbH</institution>
          ,
          <addr-line>Tiergartenstrasse 17, 69121, Heidelberg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the motivation for, planning of, and very first results of the PhD project by the first author. The objective of the project is to experimentally assess the representativeness (completeness), for knowledge extraction, of a retrospective textual document collection. The collection is chosen to describe a single well circumscribed subject domain. The approach to assess completeness is based on measuring the saturation of the semantic (terminological) footprint of the collection. The goal of this experimental study is to check if the saturation-based approach is valid. The project is performed at the Dept. of Computer Science of Zaporizhzhya National University in cooperation with BWT Group, Universidad Politecnica de Madrid, and Springer-Verlag GmbH.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Knowledge extraction from text</kwd>
        <kwd>text mining</kwd>
        <kwd>retrospective document collection</kwd>
        <kwd>OntoElect</kwd>
        <kwd>semantic saturation</kwd>
        <kwd>completeness</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>This short paper presents a PhD project aimed at developing the methodological and
instrumental components for measuring the representativeness of high-quality
collections of text documents. It is assumed that the documents in a collection cover a
single and well circumscribed Domain of Discourse and have a timestamp associated
with them. A typical example of such a collection is the set of the full text papers of a
professional journal or a conference proceedings series published from the first issue
to date. The main hypothesis, put forward in this work, is that a collection can be
considered as representative to describe the domain, in terms of its semantic
(terminological) footprint, if any additions of extra relevant documents to the collection do not
noticeably change this footprint. Such a collection could be further considered as
complete and could be used for extracting domain semantic descriptions from it. In
fact, the approach to assess the representativeness outlined above does so by
evaluating the terminological saturation of a document collection.</p>
      <p>It is well known that extracting knowledge from texts for developing domain
ontologies is a complicated and laborious process which requires a substantial part of
highly qualified human effort. So, knowing the smallest possible representative
document collection for a domain is very important to efficiently develop ontologies
with satisfactory domain coverage. Therefore, laying out a method to determine a
saturated subset of documents within the collection is topical. It is also important to
make this method as efficient and automated as possible to lower the overhead on the
core knowledge engineering workflow.</p>
      <p>Yet one more dimension of complexity in the context of knowledge extraction
from texts is terminological temporal drift. Indeed, the semantic footprint of a
retrospective collection could change in time. So, it is not clear how could the saturated
subset of the collection be formed to account for this drift.</p>
      <p>
        The objective of the presented project is to develop and evaluate in industrial
settings an efficient and effective experimental method, supported by an instrumental
toolset, to determine saturated subsets of high-quality domain-bounded retrospective
textual document collections. As a theoretical background, the project uses the
OntoElect approach [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Term extraction from text is done in cooperation with the
Ontology Engineering Group of the Universidad Politécnica de Madrid1. The instrumental
toolset is developed in cooperation with the BWT Group2. The industrial case study,
focused on the Knowledge Management domain, is performed in cooperation with the
internal LOD project of Springer-Verlag GmbH3.
      </p>
      <p>The remainder of the paper is structured as follows. Section 2 presents the
motivation for this project based on the brief analysis of the related work. Section 3 briefly
outlines the OntoElect approach. Section 4 describes our experimental setting in terms
of objectives, instruments, datasets, and workflow. Section 5 presents our early
results. Finally, the plans for the future work are discussed in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work and Motivation</title>
      <p>
        Perhaps one of the most comprehensive sources surveying the existing approaches
and techniques for ontology learning from text is [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Another collection of research
contributions in ontology learning and population, complementary to this review, is
1 http://www.oeg-upm.net/
2 http://www.groupbwt.com/
3 http://www.springer.com/
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This is the research area which often combines linguistic and statistical methods
to process text corpora and extract knowledge fragments in different forms: ranging
from key phrases and their importance / frequency values (e.g. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) to simple ontology
modules e.g. specified in SKOS [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The dominant approach to assess the quality of
extracted knowledge is comparing the resulting artifact to a Gold Standard [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] in the
domain. Golden Standards are however quite rarely available. Another way to
evaluate if the result fits the domain requirements well is to check it against the set of
competency questions [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], provided by the knowledge stakeholders in the domain.
Unfortunately these experts are also not readily available in the vast majority of cases.
Therefore an objective indirect method to extract knowledge for producing ontologies
from a representative document collection for the domain is on demand. An important
question to answer in this context is: what is the minimal subset of a (potentially very
big) document collection which is terminologically complete in statistical terms? The
project presented in this paper aims at developing such an experimental method based
on the OntoElect approach for ontology development and refinement. It also aims at a
thorough experimental evaluation of this method.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 OntoElect Saturation Metric and Measurement</title>
      <p>OntoElect, as a methodology, seeks for maximizing the fitness of the developed
ontology to what the domain knowledge stakeholders think about the domain. Fitness is
measured as the stakeholders’ votes – a metric that allows assessing the stakeholders’
commitment to the ontology under development - reflecting how well their sentiment
about the requirements is met. The more votes are collected – the higher the
commitment is expected to be. If a critical mass of votes is acquired (say 50%+1), the
ontology is considered to satisfactorily meet the requirements.</p>
      <p>It is well known that direct acquisition of requirements from domain experts is not
very realistic as they are expensive and not really willing to do the work falling out of
their core activity. So, in this project, we are focused on the indirect collection of the
stakeholders’ votes by extracting these from high quality and reasonably high impact
documents authored by the stakeholders.</p>
      <p>
        An important feature to be ensured for knowledge extraction from text collections
is that the dataset needs to be statistically representative to cover the opinions of the
domain knowledge stakeholders satisfactorily fully. OntoElect suggests a method to
measure the terminological completeness of the document collection by analyzing the
saturation of terminological footprints of the incremental slices of the document
collection - as e.g. reported in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The full texts of the documents from the retrospective
collection are grouped in datasets in the order of their timestamps. The first dataset
contains the first portion of documents. The second dataset contains the first dataset
plus the second portion of documents. Finally, the last dataset contains all the
documents from the collection. At the next step of the OntoElect workflow the bags of
multi-word terms are extracted from all the datasets using TerMine software [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
together with their significance (C-value) scores reflecting how often a term was met in
the dataset. The workflow, presented below in Section 4 also suggests using an
alternative way to extract the bags of multi-word terms [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for comparing the quality of
term extraction. Further the bags of terms of adjacent datasets (1st and 2nd, 2nd and 3d,
...) are compared and the termhood difference (thd) value is computed for each
consecutive pair of the datasets. Terminological saturation is assessed by comparing the
overall thd to individual term significance4 threshold. A dataset for which stable
saturation is observed is further considered as complete (and statistically representative)
for knowledge extraction.
      </p>
      <p>It is also worth noting that the outlined approach is domain independent as far as
the used term extraction solutions are domain independent.</p>
    </sec>
    <sec id="sec-4">
      <title>4 Experimental Settings and Workflow</title>
      <p>The objective of the presented experimental research project is to check if the
OntoElect approach to assess the representativeness of a subset within a document
collection, based on measuring terminological saturation, is valid. The setting of the
experiments should consider several parameters which may influence the measurements
and, therefore the results of measuring saturation. These parameters are taken into
account while answering the following research questions:</p>
      <p>Q1: Which would be the proper direction in forming the datasets to check
saturation: chronological, reverse-chronological, bi-directional, random selection? Which
direction is the most appropriate to cope with potential terminological drift in time?</p>
      <p>Q2: Would frequently cited documents form a minimal representative subset of
documents? Do the most frequently cited documents indeed provide the biggest
terminological contribution to the document collection?</p>
      <p>Q3: Would the size of a dataset increment influence saturation measurements? Is
there an optimal size of a data chunk for the purpose?</p>
      <p>
        Q4: Which of the term extraction solutions (UPM Extractor [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or Manchester
TerMine [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) yield more adequate and quality sets of terms?
      </p>
      <p>Q5: Is the method for assessing completeness based on saturation measurements
valid? Does it indeed provide a correct indication of statistical representativeness?</p>
      <p>The answers to the outlined research questions are sought based on conducting
experiments in real world industrial settings. For that the document collection has been
formed in cooperation with Springer. Based on the expert advice of the partner,
fifteen Springer journals5 have been selected that are broadly relevant to the domain of
Knowledge Management6.
4 An extracted term is significant if its score puts it in the upper part of the scored list. The
upper part forms the prevailing sentiment of the domain knowledge stakeholders - the majority
vote - as it accumulates 50%+1 stakeholder votes in the terms of the sum of the normalized
scores (C-values) of the respective terms.
5 The list of the selected journals is available at: https://github.com/bwtgroup/SSRTDC-Paper
Catalogues/blob/master/ListOfJournals.xls
6 Knowledge Management has been chosen as a target domain because: (i) the methodology
developed in the presented experimental study is for knowledge engineering and management;
(ii) the partners in the presented project possess extensive expertise in Knowledge Management</p>
      <p>The chosen collection of journal papers appears to be well suited to attack the
outlined research questions. Indeed, it is formed of the journals scoping into different
subfields of Computer Science in broad. The journals in the selection are however
mutually complementary in terms of providing terminology related to Knowledge
Management. So there seems to be a balance between the broadness of the overall
scope and the focus on the target domain. This balance needs to be checked
experimentally by verifying if it contains a saturated terminological footprint on the domain.
Furthermore, individual journal collections chronologically start at very different
times and contain quite different numbers of volumes, issues and papers. So, these
internal disbalances may really help reveal the complications like terminological
temporal drift and different terminological contributions caused by varying data volumes
coming from different journals.</p>
      <p>The experimental workflow is based on the OntoElect workflow described in
Section 3 and is outlined in Fig. 1. This workflow could be generically applied (using
Configure Experiment step) to perform all the series described below.</p>
      <p>Different kinds of experiments, using this workflow, are planned to be conducted
in the presented study.</p>
      <p>The first series of experiments is targeted at checking which direction of choosing
papers for the datasets yields better saturated sets of terms and assesses terminological
temporal drift. In this series the experimental workflow is applied to the datasets
which are formed: (i) chronologically; (ii) reverse-chronologically; (iii)
bidirectionally, i.e. including data increments containing the documents from both ends
of the temporal span in turns (e.g. first issue, than last issue, than second issue, etc.);
and (iv) including documents picked from the data collection uniformly randomly.
Saturation measures and saturated sets of terms will be compared across these
different choices. This series will allow answering Q1.
and therefore could be used as subject experts; (iii) there is a substantially big collection of
high-quality full text documents broadly relevant to this domain available at Springer.</p>
      <p>
        The second kind of experiment will base on the most appropriate selection
direction choice, determined in the first series, and investigate the terminological impact of
the frequently cited documents in the collection. For that, the impact of each
document will be computed based on its citation frequency. The documents with impact
equal to n will be replicated n times in the corresponding dataset. The experimental
workflow will be repeated for these “impact” datasets and the results will be
compared to the first series using “flat” datasets. The comparison will be done in terms of
saturation measures and terminological contribution peaks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This experiment may
allow to answer Q2 and extract the “decisive minority vote” subset of terms for
Knowledge Management, contributed by the high-impact papers, as e.g. been done in
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for Time Representation domain.
      </p>
      <p>To answer Q3, the third series will focus on finding out what might be the optimal
size of an increment to form experimental datasets. For this series, the datasets will be
formed following the best selection direction discovered in the first series. The size of
the increments will however be varying. Saturation measurements will be compared
for different data increment sizes and the optimal value will be discovered if such an
optimum does exist.</p>
      <p>
        The fourth series is planned for experimental cross-evaluation of the available
alternative software tools for multi-word term extraction from texts. Based on the
datasets with the increments of optimal size determined in the series No 3, term extraction
will be done separately using the UPM toolset and TerMine. The results will be
compared in terms of saturation measures for flat datasets and decisive minority subsets of
terms extracted from the impact datasets (series No 2). This may allow answering Q4.
Perhaps, Q5 is the most difficult question to answer and it still requires some thinking
for offering a convincing method to assess the adequacy and validity of the
experimental method investigated in the presented project. One possible way is to do that
based on the cross-evaluation with another method for ontology learning, e.g. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Another possible way is to select a much smaller subset of a document collection, e.g.
only the papers with high terminological impact discovered in the series No 2. The set
of terms extracted from this “decisive minority vote” subset could be manually
checked by human experts.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5 Early Results</title>
      <p>The project has been started in November 2016 and is in its initial phase. Since it has
been started the following steps have been accomplished: (i) the document collection
has been chosen; (ii) the catalogue of the papers in the document collection has been
created; (iii) the full texts of the papers have been downloaded and converted to plain
text format.</p>
      <p>Overall the document collection contains more than 9 000 papers. The composition
of the document collection is diagrammatically shown in Fig. 2. So, performing even
those initial steps could not be done manually due to the volume and incurred manual
effort. It has been therefore decided to develop some software instruments which help
automate these routine steps.
Fig. 2. Distribution of papers in the journals of the document collection. Y-axis shows the years
of publication, X-axis corresponds to the journals. The numbers in the bars are: no of volumes,
no of issues, the total no of papers in the journal.</p>
    </sec>
    <sec id="sec-6">
      <title>6 Conclusive Remarks and Future Work</title>
      <p>The PhD project presented in this paper is at an early stage. It currently focuses on the
7 All the developed instrumental software modules are available at: https://github.com/
bwtgroup/SSRTDC-Springer-article-parser, https://github.com/bwtgroup/SSRTDC-Collections
-Springer-PDF-Downloader, https://github.com/bwtgroup/SSRTDC-PDF2TXT
8 The catalogues of the acquired journal papers in .XLSX format are available at:
https://github.com/bwtgroup/SSRTDC-PaperCatalogues/. The data has been collected on
December 3-4, 2016.
detailed planning of experiments, developing software for the instrumental support of
the experimental workflow, and preparing the data collection. The early results have
been reported in Sections 4 and 5.</p>
      <p>The short-term plans for the future work include: (i) further development of the
instrumental software to support all the steps in the experimental workflow; (ii) the
performance of the experimental study as outlined in the presented experimental
setup; (iii) the assessment of the efficiency of the developed software. The analysis of
the short-term results may further lead to a better understanding of a model and metric
for the completeness of a document collection for knowledge extraction. So, in the
mid-term, based on this refined understanding, the objectives of the study may be
specified in a more detailed manner unfolding into a refined experimental setup and
possibly leading to new kinds of experiments.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>The research leading to this paper has been done in part in frame of FP7 Marie Curie
IRSES SemData project (http://www.semdata-project.eu/), grant agreement
No PIRSES-GA-2013-612551.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Tatarintseva</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ermolayev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , Keller,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Matzke</surname>
          </string-name>
          , W.-E.:
          <article-title>Quantifying Ontology Fitness in OntoElect Using Saturation-</article-title>
          and
          <string-name>
            <surname>Vote-Based Metrics</surname>
            . In: Ermolayev,
            <given-names>V.</given-names>
          </string-name>
          , et al. (
          <source>Eds.) Revised Selected Papers of ICTERI</source>
          <year>2013</year>
          , CCIS 412, pp.
          <fpage>136</fpage>
          --
          <lpage>162</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ermolayev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batsakis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keberle</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tatarintseva</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antoniou</surname>
          </string-name>
          , G.:
          <article-title>Ontologies of Time: Review and Trends</article-title>
          .
          <source>Int. J. of Computer Science &amp; Applications</source>
          .
          <volume>11</volume>
          (
          <issue>3</issue>
          ),
          <fpage>57</fpage>
          --
          <lpage>115</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Frantzi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ananiadou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Mima</surname>
          </string-name>
          , H.:
          <article-title>Automatic Recognition of Multi-Word Terms</article-title>
          .
          <source>Int. J. of Digital Libraries</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>117</fpage>
          -
          <lpage>132</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Badenes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Repository of indexed ROs</article-title>
          .
          <source>Deliverable No. 5</source>
          .4. Dr Inventor project (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Osborne</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salatino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Birukou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motta</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Automatic Classification of Springer Nature Proceedings with Smart Topic Miner</article-title>
          . In: Groth,
          <string-name>
            <surname>P.</surname>
          </string-name>
          et al. (Eds.)
          <source>ISWC</source>
          <year>2016</year>
          , LNCS 9982, pp.
          <fpage>383</fpage>
          -
          <lpage>399</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>W.</given-names>
            ,
            <surname>Bennamoun</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Ontology learning from text: A look back and into the future</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>44</volume>
          (
          <issue>4</issue>
          ),
          <source>Article</source>
          <volume>20</volume>
          , 36 pages (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P</given-names>
          </string-name>
          , Cimiano, P. (eds.):
          <article-title>Ontology Learning and Population: Bridging the Gap between Text and Knowledge</article-title>
          . IOS Press (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Miles</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bechhofer</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>SKOS Simple Knowledge Organization System reference</article-title>
          .
          <source>Technical report, W3C</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Zavitsanos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vouros</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Paliouras</surname>
          </string-name>
          , G.:
          <article-title>Gold Standard Evaluation of Ontology Learning Methods through Ontology Transformation and Alignment</article-title>
          .
          <source>IEEE Trans. on Knowledge &amp; Data Engineering</source>
          ,
          <volume>23</volume>
          (
          <issue>11</issue>
          ),
          <fpage>1635</fpage>
          --
          <lpage>1648</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parvizi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mellish</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          , van Deemter,
          <string-name>
            <given-names>K.</given-names>
            , and
            <surname>Stevens</surname>
          </string-name>
          , R.:
          <article-title>Towards Competency Question-driven Ontology Authoring</article-title>
          . In: Presutti,
          <string-name>
            <surname>V.</surname>
          </string-name>
          et al. (Eds.)
          <source>ESWC</source>
          <year>2014</year>
          , LNCS 8465, pp.
          <fpage>752</fpage>
          --
          <lpage>767</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>