<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Characteristics and Desiderata for Competency Question Benchmarks⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Reham Alharbi</string-name>
          <email>R.Alharbi@liverpool.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacopo de Berardinis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Floriana Grasso</string-name>
          <email>floriana@liverpool.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Terry R. Payne</string-name>
          <email>T.R.Payne@liverpool.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentina Tamma</string-name>
          <email>V.Tamma@liverpool.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Liverpool</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Competency Questions (CQs) are essential in ontology engineering; they express an ontology's functional requirements through natural language questions, ofer crucial insights into an ontology's scope, and are pivotal for various tasks, such as ontology reuse, testing, requirement specification, and pattern definition. Various approaches have emerged that make use of LLMs for the generation of CQs from diferent knowledge sources. However, comparative evaluations are hindered by diferences in tasks, datasets and evaluation measures used. In this paper, we provide a set of desiderata for a benchmark of CQs, where we position state of the art approaches with respect to a categorisation of tasks, and highlight the main challenges hindering the definition of a communitybased benchmark to support comparative studies.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Competency Questions</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Benchmark</kwd>
        <kwd>Evaluation</kwd>
        <kwd>Dataset</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        • In the requirement definition stage of the ontology development process, CQs are used to suggest
possible concepts and relationships to include in the ontology [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2, 3, 4, 5</xref>
        ];
• They are used to verify and validate the knowledge encapsulated in the ontology [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ],
• They are used to support the consumption of ontology content, e.g. through the generation of
      </p>
      <p>
        APIs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and the reuse of ontological fragments [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ].
      </p>
      <p>
        Large Language Models (LLMs) and Generative AI have recently demonstrated remarkable capabilities
in processing natural language within human-level tasks such as question generation and answering.
Consequently, a number of approaches have been proposed to automate knowledge engineering
activities (partially or in full), including the formulation of CQs [
        <xref ref-type="bibr" rid="ref13 ref14 ref15 ref16">13, 14, 15, 16</xref>
        ] and that difer with
respect to the nature of the knowledge resources used. CQ generation approaches can be divided into:
1. Reverse engineering of CQs from KGs: here CQs are reversed engineered from sources of
common sense open data, e.g. Wikidata [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or DBPedia [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In this case, CQs are built in a
bottom-up fashion, rather than being formulated through interviews with domain experts.
2. Retrofitting CQs from ontologies: this applies to cases where an ontology exists, but no
associated CQs are publicly available. Therefore, the aim is to identify possible CQs that were
used in the development of the ontology, thus facilitating its future reuse [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
3. Generating CQs from knowledge sources: these are approaches that generate CQs from
either a set of class and property names [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], or from a corpus of text describing a domain [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>As more automatic, generative methods emerge, there is a growing need to develop resources that
can be used to validate these approaches. This is notwithstanding the challenges arising from the
relative scarcity of CQs (and requirements in general), that are: 1) recognised to be of good quality; and
2) are published together with the ontology whose design they have supported.</p>
      <p>
        To date, few common resources have been used by diferent studies (one exception being Dem@Care
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], which has been used by several studies [
        <xref ref-type="bibr" rid="ref13 ref16">13, 16</xref>
        ]), and there is little consistency on the use of
evaluation measures to assess the CQs. Furthermore, diferent studies have addressed diferent phases
in the ontology development lifecycle, and thus cannot be directly compared. For example, some
approaches target the Requirement specification phase [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ], whereas others address the context
where ontologies have missing or non-existent CQs [
        <xref ref-type="bibr" rid="ref13 ref16">13, 16</xref>
        ].
      </p>
      <p>In this paper, we address the problem of identifying the requirements of a multi-purpose benchmark
for competency question generation together with its task specifications and evaluation criteria, that is
the blueprint for a benchmark generation activity at the special session.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Towards a benchmark for CQs Formulation Approaches</title>
      <p>
        The (semi-)automatic formulation of CQs should ensure that the questions generated are “good”
competency questions. While there is no accepted definition of what “good” means in this context, we can
leverage the literature on automatic question generation [
        <xref ref-type="bibr" rid="ref20">20, 21</xref>
        ] to identify desirable characteristics of
“good” CQs, considering also that the generation of questions is a form of information-seeking activity
that reveals the implicit connection between reasoning ability and language generation [22]. In the
reminder of this section we identify the types of tasks that a CQ generation approach should support
together with the resources needed to manage the tasks, namely: the data, the pre-processing steps and
the evaluation measures.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Tasks definition</title>
        <p>
          There is no consensus on how to assess the quality of competency question [23, 24], especially with
respect to their aim of identifying the purpose and the explicit concepts and relations in an ontology.
However, we can identify a set of criteria, that define the tasks that the benchmark should support
[25]. We can broadly categorise CQs into the following categories: (i) Syntactically or semantically
incorrect CQs: This category addresses common issues in question formulation that can hinder efective
ontology modelling, with consequences for query processing and data retrieval [
          <xref ref-type="bibr" rid="ref13">13, 26</xref>
          ]; (ii) Scoping
CQs: Such questions may help to define the domain, but do necessarily translate into a query that
can be automatically be processed (e.g. a SPARQL query). These require specialised handling to aid
the definition of a domain; (iii) Verified CQs: These CQs can be directly queried and can serve as
benchmarks for system capabilities.
        </p>
        <p>
          For each of these categories we identify those tasks that approaches for generating CQs should be
able to manage:
Syntactically or semantically incorrect CQs:
1. Linguistic Perspectives:
a) Identify Ambiguous Questions: Create a repository of CQ examples that exhibit ambiguity in
wording or context. Example: “Which devices can I see?”,1 which is inherently ambiguous
given its subjectivity;
b) Develop Clarity Guidelines: Formulate standards / templates to help rephrase ambiguous
questions for improved clarity and specificity. Example: the CQ “What are the materials
used for a barbecue?” 2 is inherently ambiguous, since materials here could be interpreted
either as the tools (e.g. spatula, tongs) or as the specific material used to make the barbecue
and it components (e.g. cast iron), and hence would benefit from being clarified.
2. Question Type Identification:
a) Classify Question Types: Systematically categorise CQs into types such as narrative, factual,
or descriptive [26, 27], and assess their suitability in diferent contexts. Narrative and
descriptive questions are typically question that require a subjective view on a topic, but
could contribute to identify relevant knowledge. Example: “What is your favourite pizza
topping”, which might be useful to some extent in defining a domain (e.g. the concept of
popular pizza topping).
b) Evaluate Contextual Appropriateness: Develop criteria to measure the efectiveness of
question types within their intended contexts. In some cases, this is needed to ensure that a CQ
is consistent with the original ontology requirements; especially when generating CQs from
knowledge sources (Section 1) given the potential availability of user stories, interviews, etc.
3. Domain Knowledge Relevance:
a) Align Questions with Domain Relevance: Establish a review process to ensure questions are
pertinent with respect to the relevant domain knowledge.
b) Refine Focus Through Filtering: Implement a mechanism to exclude questions that, while
correct, are irrelevant to the task at hand. This is particularly useful in cases of CQs generated
by some LLMs (e.g. LLama) that tend to formulate illustrative questions [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
4. Incorrect or inappropriate CQs detection:
a) Correct Erroneous Inputs: Introduce a correction mechanism for factually incorrect CQs, e.g.
“Which vegetarian pizza contains ham?”. This can be used as a CQ only to confirm that there
is no entity in the ontology that satisfies this question.
b) Bias: Set up a robust protocol for verifying and eliminating those CQs generated through
generative AI that propagate or reinforce bias due to the pre-training of Large Language
Models, which is particularly critical in domains such as healthcare [28], etc.
        </p>
        <p>Scoping CQs:
1. Catalogue Scoping CQs: Document all CQs that contribute to defining the scope of the information
domain [25, 24]. Example: “Which are the types of CheeseTopping?” [29]
2. Analysis of Domain Contribution: The analysis of how these CQs help in shaping the understanding
of the domain. These can include definition or disambiguation questions or questions to state
modelling choices. Example: “Is dialect a language” [30].
3. Integration into Information Architecture: Strategies that utilise scoping CQs for enhancing the
structure of information repositories should be defined.</p>
        <p>
          Verified CQs:
1. Maintaining databases of Verified CQs: An up-to-date list of CQs that can be directly transformed
into SPARQL queries needs to be maintained together with their SPARQL formulation.
Nonetheless, as long as this is well-documented, even CQs expressing requirement that are not (yet)
1This is req223 for the Vicinity Core ontology listed in the CORAL repository [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
2https://keet.wordpress.com/2022/06/08/only-answering-competency-questions-is-not-enough-to-evaluate-your-ontology/
supported by the ontology can still be of interest for evaluation. For example Zhang et al. define
those as adversarial CQs, and use them for ontology testing.
2. Testing and Validation: Rigorous testing is necessary to ensure that the SPARQL queries
(corresponding to the CQs) retrieve accurate and relevant data. Some of these tests can be automated
through dedicated tools, for example OWLunit 3, a tool that runs unit tests for ontologies, or
OOPS!, the Ontology Pitfall Scanner, that detects common errors in ontologies [32];
3. Documentation and Examples: Create detailed documentation and examples of successful CQ
transformations for training and reference, possibly through tool support (e.g. Widoco[33]).
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Datasets available</title>
        <p>We propose a competency question benchmark, CQ-BEN4, comprising a corpus of competency questions
that have either been curated to support the validation of ontology engineering processes or have been
used to construct ontologies supporting some downstream task, e.g. the Polifonia ontology network [34].</p>
        <p>
          Collecting a suitable dataset to support the tasks defined above is not trivial: often ontologies are
published without the CQs and the requirements used to design them. As a result, open-source repository
data often lack essential components, especially to support the design and testing. We identify two
main implementation steps to organise the process; (i) Gathering all Published Requirements:
Collecting and documenting all existing requirements related to tasks, in a similar efort to repositories
such as CORAL [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and the CQs dataset [23], along with individual ontologies that have published their
CQs; (ii) Categorisation According to Tasks: Organising the requirements based on the respective
tasks they support in order to streamline the benchmark design process. As part of the contribution
to this challenge, we have collected a preliminary repository of resources consisting of ontologies,
related competency questions and the relevant publications describing these resources. This resource is
contributed to the community and is open to extension and improvement.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Ontology pre-processing</title>
        <p>
          Depending on the static context and other information that is given as input, some ontology
preprocessing might be necessary prior to feeding all data to a computational model for CQ extraction.
Approaches that extract triples [27] need to handle the possible presence of blank nodes, e.g. by
projecting an ontology into a simplified graph representation [ 35]. Ontology verbalisation translates
formal ontology structures into natural language [36], and is often used as a preprocessing step.
Ontology verbalisation is the process of translating formal ontology structures into natural language
expressions. As such, the verbalisation strategy impacts all pipelines that process textual/narrative
ontology descriptions. Diferent strategies have been used, such as triple-based verbalisation [
          <xref ref-type="bibr" rid="ref15">15, 37</xref>
          ]
or descriptive ontology verbalisation [31], while some approaches skip verbalisation and feed triples
directly [27]. Measuring the contribution of ontology verbalisation remains an open direction.
        </p>
        <p>In the context of the benchmark, given that ontology pre-processing impacts CQ extraction, we would
expect this to introduce an additional dimension in an experimental setup. For example, if a method uses
verbalisation, accounting for this dimension would allow to address the following research questions:
“How sensitive is a CQ formulation pipeline to verbalisation?”, “Which verbalisation technique/methodology
yields the best performance for CQ?”, “How does a CQ formulation pipeline perform without verbalisation?”.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Evaluation approaches</title>
        <p>Diferent evaluation approaches have been proposed for the tasks identified in Section 2.1, which
complicates further the efort of understanding the performance of CQ generation algorithms. The
evaluation measures include both CQ assessment and performance measures:
3https://github.com/luigi-asprino/owl-unit
4https://github.com/KE-UniLiv/CQ-benchmark/</p>
        <p>Evaluation approach
Expert evaluation
Similarity assessment</p>
        <p>
          Testing for verified CQs
• Expert Evaluation: These measures typically relate to tasks that identify poor or incorrect CQs,
and assess their relevance and accuracy. This type of evaluation is generally performed with
the support from domain experts and knowledge engineers, and as such, is particularly time
consuming and prone to subjectivity and potential bias. Nonetheless, these approaches typically
provide useful insights into the behaviour of the computational models generating CQs, and often
result in data collection activities that support the use of automatic metrics[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ];
• Similarity Assessment: Techniques based on text embeddings, such as Sentence BERT
(SBERT) [38], are often employed for assessing the similarity between generated and ground-truth
CQs, and only pairs of generated and ground-truth CQs whose similarity is above a threshold
(often 70% or above) are considered similar. In turn, this enables the computation of performance
metrics such as accuracy, precision, recall, and F1-scores [
          <xref ref-type="bibr" rid="ref16">27, 16</xref>
          ] for tasks that involve identifying
scoping CQs. Computing the cosine similarity between sentence-level (text) embeddings is often
used as a proxy to detect paraphrasing [34, 27], i.e. when two CQs have the same meaning but
are formulated diferently, e.g. “What is the number of the moons of Jupiter?” “How many moons
does Jupiter have?”. However, as this may be prone to false positives and false negatives (high
cosine similarity, diferent meaning; low cosine similarity, same meaning), other approaches
determine CQ equivalence through transfer learning [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. In this case, given two questions, pairs
of sentence-level embeddings are fed to a feed-forward neural network and trained for paraphrase
detection using related corpora [39].
• Testing for Verified CQs: This involves computing the similarity between CQs and developing
unit/acceptance testing for the corresponding SPARQL queries [40]. Performance measures vary
from the ones used to assess similarity between CQs to those used in Natural Language Processing
(NLP), e.g. the BiLingual Evaluation Understudy (BLEU) score [41] used for assessing automatic
text translation [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
• Emergent approaches: Other approaches are emerging from diferent fields (typically question
generation for education) that aim to assess the complexity of generated questions using a
combination of predefined templates and complexity similarity [42, 43].
        </p>
        <p>Table 1 relates the evaluation approach to specific tasks. The list of approaches and tasks is not
exhaustive, and further approaches tailored to the reuse or adaptation of ontologies will be developed.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Conclusion</title>
        <p>In this paper, we identified the requirements for a multi-purpose benchmark for competency
question generation approaches together with its task specifications and evaluation criteria, that lays the
foundations for a comprehensive benchmark.
engineering: A survey, in: Conceptual Modeling, Springer Nature Switzerland, 2023, pp. 45–64.
[21] N. Mulla, P. Gharpure, Automatic question generation: a review of methodologies, datasets,
evaluation metrics, and applications, Prog. Artif. Intell. 12 (2023) 1–32.
[22] L. Bertolazzi, D. Mazzaccara, F. Merlo, R. Bernardi, ChatGPT’s information seeking strategy:
Insights from the 20-questions game, in: C. M. Keet, H.-Y. Lee, S. Zarrieß (Eds.), Proceedings of
the 16th International Natural Language Generation Conference, Association for Computational
Linguistics, Prague, Czechia, 2023, pp. 153–162. URL: https://aclanthology.org/2023.inlg-main.11.
doi:10.18653/v1/2023.inlg-main.11.
[23] J. Potoniec, D. Wiśniewski, A. Ławrynowicz, C. M. Keet, Dataset of ontology competency
questions to sparql-owl queries translations, Data in Brief 29 (2020) 105098. URL: https://www.
sciencedirect.com/science/article/pii/S2352340919314544. doi:https://doi.org/10.1016/j.
dib.2019.105098.
[24] C. M. Keet, Z. C. Khan, On the roles of competency questions in ontology engineering, in:
24th International Conference on Knowledge Engineering and Knowledge Management (EKAW),
Springer, 2024 – to appear.
[25] R. Alharbi, V. Tamma, T. Payne, F. Grasso, A review and comparison of competency question
engineering approaches, in: 24th International Conference on Knowledge Engineering and
Knowledge Management (EKAW), Springer, 2024.
[26] M. Antia, C. M. Keet, Assessing and enhancing bottom-up CNL design for competency questions
for ontologies, in: Proc. of the Seventh International Workshop on Controlled Natural Language
(CNL 2020/21), Association for Computational Linguistics (ACL), 2021, pp. 1–11.
[27] R. Alharbi, V. Tamma, F. Grasso, T. Payne, The role of Generative AI in competency question
retrofitting, in: Extended Semantic Web Conference, ESWC2024, Hersonissos, Greece, 2024.
[28] S. Chen, J. Gallifant, M. Gao, P. Moreira, N. Munch, A. Muthukkumar, A. Rajan, J. Kolluri, A. Fiske,
J. Hastings, H. Aerts, B. Anthony, L. A. Celi, W. G. L. Cava, D. S. Bitterman, Cross-care: Assessing
the healthcare implications of pre-training data on language model bias, 2024. URL: https://arxiv.
org/abs/2405.05506. arXiv:2405.05506.
[29] C. Bezerra, F. Santana, F. Freitas, CQChecker: A tool to check ontologies in OWL-DL using
competency questions written in controlled natural language, Learning &amp; Nonlinear Models 12
(2014) 115–129.
[30] F. Gillis-Webber, S. Tittel, C. M. Keet, A model for language annotations on the web, in: B.
VillazónTerrazas, Y. Hidalgo-Delgado (Eds.), Knowledge Graphs and Semantic Web - First Iberoamerican
Conference, KGSWC 2019, Villa Clara, Cuba, June 23-30, 2019, Proceedings, volume 1029 of
Communications in Computer and Information Science, Springer, 2019, pp. 1–16. URL: https://doi.
org/10.1007/978-3-030-21395-4_1. doi:10.1007/978-3-030-21395-4\_1.
[31] B. Zhang, V. A. Carriero, K. Schreiberhuber, S. Tsaneva, L. S. González, J. Kim, J. de Berardinis,
Ontochat: a framework for conversational ontology engineering using language models, arXiv
preprint arXiv:2403.05921 (2024).
[32] M. Poveda-Villalón, A. Gómez-Pérez, M. C. Suárez-Figueroa, OOPS! (OntOlogy Pitfall Scanner!):
An On-line Tool for Ontology Evaluation, International Journal on Semantic Web and Information
Systems (IJSWIS) 10 (2014) 7–34.
[33] D. Garijo, Widoco: A wizard for documenting ontologies, in: C. d’Amato, M. Fernandez, V. Tamma,
F. Lecue, P. Cudré-Mauroux, J. Sequeda, C. Lange, J. Heflin (Eds.), The Semantic Web – ISWC 2017,
Springer International Publishing, Cham, 2017, pp. 94–102.
[34] J. de Berardinis, V. A. Carriero, N. Jain, N. Lazzari, A. Meroño-Peñuela, A. Poltronieri, V. Presutti,
The polifonia ontology network: Building a semantic backbone for musical heritage, in: The
Semantic Web – ISWC 2023, Springer Nature Switzerland, 2023, pp. 302–322.
[35] Y. He, J. Chen, H. Dong, I. Horrocks, C. Allocca, T. Kim, B. Sapkota, Deeponto: A python package for
ontology engineering with deep learning, Semantic Web: Interoperability, Usability, Applicability
(2024).
[36] Y. He, J. Chen, E. Jimenez-Ruiz, H. Dong, I. Horrocks, Language model analysis for ontology
subsumption inference, in: Findings of the Association for Computational Linguistics: ACL 2023,
2023, pp. 3439–3453.
[37] G. Amaral, O. Rodrigues, E. Simperl, Wdv: A broad data verbalisation dataset built from wikidata,
in: The Semantic Web – ISWC 2022, Springer International Publishing, Cham, 2022, pp. 556–574.
[38] N. Reimers, I. Gurevych, Sentence-BERT: Sentence embeddings using Siamese BERT-networks, in:
Proc. of the 2019 Conf. on Empirical Methods in Natural Language Proc. and the 9th International
Joint Conf. on Natural Language Proc. (EMNLP-IJCNLP), Association for Computational Linguistics,
2019, pp. 3982–3992.
[39] Z. Wang, W. Hamza, R. Florian, Bilateral multi-perspective matching for natural language sentences,
arXiv preprint arXiv:1702.03814 (2017).
[40] M. Poveda-Villalón, A. Fernández-Izquierdo, M. Fernández-López, R. García-Castro, LOT: An
industrial oriented ontology engineering framework, Engineering Applications of Artificial
Intelligence 111 (2022) 104755.
[41] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine
translation, in: P. Isabelle, E. Charniak, D. Lin (Eds.), Proceedings of the 40th Annual Meeting
of the Association for Computational Linguistics, Association for Computational Linguistics,
Philadelphia, Pennsylvania, USA, 2002, pp. 311–318. URL: https://aclanthology.org/P02-1040.
doi:10.3115/1073083.1073135.
[42] S. AlKhuzaey, F. Grasso, T. Payne, V. Tamma, A framework for assessing the complexity of auto
generated questions from ontologies, in: European Conference on e-Learning, volume 22, 2023,
pp. 17–24.
[43] S. Bi, X. Cheng, Y.-F. Li, L. Qu, S. Shen, G. Qi, L. Pan, Y. Jiang, Simple or complex?
complexitycontrollable question generation with soft templates and deep mixture of experts model, in: M.-F.
Moens, X. Huang, L. Specia, S. W.-t. Yih (Eds.), Findings of the Association for Computational
Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican
Republic, 2021, pp. 4645–4654. URL: https://aclanthology.org/2021.findings-emnlp.397. doi: 10.
18653/v1/2021.findings-emnlp.397.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Grüninger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Fox</surname>
          </string-name>
          ,
          <source>The Role of Competency Questions in Enterprise Engineering</source>
          , Springer US,
          <year>1995</year>
          , pp.
          <fpage>22</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <article-title>Ontology development 101: A guide to creating your first ontology</article-title>
          ,
          <source>Technical Report, Stanford knowledge systems laboratory technical report KSL-01-05</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Daga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          , E. Blomqvist,
          <article-title>Extreme design with content ontology design patterns</article-title>
          ,
          <source>in: Proc. of the 2009 International Conf. on Ontology Patterns</source>
          , volume
          <volume>516</volume>
          ,
          <year>2009</year>
          , p.
          <fpage>83</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Briggs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Miranker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. P.</given-names>
            <surname>Heideman</surname>
          </string-name>
          ,
          <article-title>A pay-as-you-go methodology to design and build enterprise knowledge graphs from relational databases</article-title>
          ,
          <source>in: Proc. of the 18th International Semantic Web Conf., ISWC</source>
          <year>2019</year>
          ,
          <year>2019</year>
          , pp.
          <fpage>526</fpage>
          -
          <lpage>545</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Suárez-Figueroa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gómez-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fernández-López</surname>
          </string-name>
          ,
          <article-title>The neon methodology framework: A scenario-based methodology for ontology development</article-title>
          ,
          <source>Applied ontology 10</source>
          (
          <year>2015</year>
          )
          <fpage>107</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bezerra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Freitas</surname>
          </string-name>
          ,
          <article-title>Verifying description logic ontologies based on competency questions and unit testing</article-title>
          ,
          <source>in: Proc. of the IX Seminar on Ontology Research and I Doctoral and Masters Consortium on Ontologies</source>
          , volume
          <year>1908</year>
          ,
          <year>2017</year>
          , pp.
          <fpage>159</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Keet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ławrynowicz</surname>
          </string-name>
          ,
          <article-title>Test-driven development of ontologies</article-title>
          ,
          <source>in: Proc. of the 13th International Conf. on The Semantic Web, ESWC</source>
          <year>2016</year>
          ,
          <year>2016</year>
          , pp.
          <fpage>642</fpage>
          -
          <lpage>657</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fernández-Izquierdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Poveda-Villalón</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>García-Castro, CORAL: A corpus of ontological requirements annotated with lexico-syntactic patterns</article-title>
          ,
          <source>in: Proc. of the 16th International Conf. on The Semantic Web, ESWC 2019</source>
          , Springer International Publishing,
          <year>2019</year>
          , pp.
          <fpage>443</fpage>
          -
          <lpage>458</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Espinoza-Arias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <article-title>Extending ontology engineering practices to facilitate application development</article-title>
          ,
          <source>in: Knowledge Engineering and Knowledge Management</source>
          , Springer International Publishing,
          <year>2022</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alharbi</surname>
          </string-name>
          ,
          <article-title>Assessing candidate ontologies for reuse</article-title>
          ,
          <source>in: Proc. of the Doctoral Consortium at ISWC</source>
          <year>2021</year>
          (
          <article-title>ISWC-DC)</article-title>
          ,
          <year>2021</year>
          , pp.
          <fpage>65</fpage>
          -
          <lpage>72</lpage>
          . URL: https://api.semanticscholar.org/CorpusID:244895203.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alharbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tamma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Grasso</surname>
          </string-name>
          ,
          <article-title>Requirement-based methodological steps to identify ontologies for reuse</article-title>
          ,
          <source>in: Intelligent Information Systems</source>
          , Springer Nature Switzerland,
          <year>2024</year>
          , pp.
          <fpage>64</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Azzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Assi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gagnon</surname>
          </string-name>
          ,
          <article-title>Scoring ontologies for reuse: An approach for fitting semantic requirements</article-title>
          ,
          <source>in: Proc. of the Reseach Conf. on Metadata and Semantic Research</source>
          ,
          <source>MTSR 2022</source>
          , Springer Nature,
          <year>2023</year>
          , pp.
          <fpage>203</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alharbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tamma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Grasso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Payne</surname>
          </string-name>
          ,
          <article-title>An experiment in retrofitting competency questions for existing ontologies</article-title>
          ,
          <source>in: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing</source>
          , SAC '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2024</year>
          , p.
          <fpage>1650</fpage>
          -
          <lpage>1658</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Antia</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>M. Keet, Automating the generation of competency questions for ontologies with agocqs</article-title>
          ,
          <source>in: Knowledge Graphs and Semantic Web</source>
          , Springer Nature Switzerland,
          <year>2023</year>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciroku</surname>
          </string-name>
          , J. de Berardinis,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Meroño-Peñuela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          , E. Simperl, Revont:
          <article-title>Reverse engineering of competency questions from knowledge graphs via language models</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>82</volume>
          (
          <year>2024</year>
          )
          <fpage>100822</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rebboud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tailhardat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lisena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <article-title>Can LLMs generate competency questions?</article-title>
          , in: Extended Semantic Web Conference, ESWC2024, Hersonissos, Greece,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ives</surname>
          </string-name>
          ,
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          , in: international semantic web conference, Springer,
          <year>2007</year>
          , pp.
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dasiopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Meditskos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efstathiou</surname>
          </string-name>
          ,
          <article-title>Semantic Knowledge Structures and Representation</article-title>
          ,
          <source>Technical Report D5</source>
          .1,
          <fpage>FP7</fpage>
          -288199 Dem@
          <article-title>Care: Dementia Ambient Care: Multi-Sensing Monitoring for Intelligence Remote Management</article-title>
          and
          <string-name>
            <given-names>Decision</given-names>
            <surname>Support</surname>
          </string-name>
          ,
          <year>2012</year>
          . URL: http://www.demcare. eu/downloads/D5.1SemanticKnowledgeStructures_andRepresentation.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>G. K. Q.</given-names>
            <surname>Monfardini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Salamon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Barcellos</surname>
          </string-name>
          ,
          <article-title>Use of competency questions in ontology</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>