<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lucas Lageweg</string-name>
          <email>l.lageweg@cbs.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonas Kouwenhoven</string-name>
          <email>jonaskouwenhoven@live.nl</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benno Kruit</string-name>
          <email>b.b.kruit@vu.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Statistics Netherlands</institution>
          ,
          <addr-line>Henri Faasdreef 312, Den Haag</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Science Park 900, 1098 XH Amsterdam</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Vrije Universiteit Amsterdam</institution>
          ,
          <addr-line>De Boelelaan 1105, Amsterdam</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents GECKO, a knowledge graph-based statistical question answering system currently in beta deployment. GECKO aims to facilitate the retrieval of single statistical values from an extensive database containing over a billion values across more than 4,000 tables. The system integrates a comprehensive framework including data augmentation, entity retrieval, and large language model (LLM)-based query generation. A key feature of the beta deployment is the collection of user feedback, which is critical for improving system performance and accuracy. This feedback mechanism allows users to report issues directly, ensuring continuous improvement based on real-world use.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>questions.
behavior.</p>
      <p>Statistics Netherlands (Centraal Bureau voor de Statistiek; CBS) is an independent administrative
body of the Dutch government tasked with the creation of statistics over a broad spectrum of
social topics and the responsibility to make them accessible to the general public. However,
in-house studies have shown that users struggle to find the correct tables for their needs in the
vast amount of data available. This research aims to develop a Question Answering (QA) system
to provide specific statistical observations from this data as responses to natural-language user</p>
      <p>
        QA systems can take several forms, with most recently free-form generative Large Language
Models (LLMs) like ChatGPT and GPT4 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] getting much attention. Due to the nature of these
models, they are able to generalize very well on a large range of topics, but have shown to
be prone to ‘hallucinations’, where plausible but incorrect or even nonsensical answers are
generated [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Especially for oficial data like governmental statistics, this is highly undesirable
      </p>
      <p>Knowledge Graph Question Answering (KGQA) is a field where knowledge graphs (KGs)
containing real-world facts and relations in structured form are used as a basis for QA systems.
Answers of such systems should always adhere to the KG. Therefore, assuming it contains
correct information, answering by returning parts of the KG, or reasoning over it, cannot lead to
nonsensical answers. In this paper, we introduce an end-to-end pipeline for a generation-based
KGQA system of CBS data.
nEvelop-O
SEMANTICS 2024, Demo Track
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>
        Our approach introduces a data augmentation process for enhancing model training, explores
various encoder architectures for entity retrieval, and proposes a new query generator
mechanism enhanced by Low Rank Adaptation (LoRA) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Additionally, we propose a new prompting
technique that utilizes dynamic prompts, constructing specific prompts based on the generation
phase. These improvements help the process of generating symbolic expressions for querying a
KG, thereby enhancing the overall performance of the QA system.
      </p>
      <p>This paper details the beta deployment of GECKO and its feedback collection mechanism,
emphasizing the role of user input in refining the system. The beta phase is critical for identifying
and addressing potential issues, ultimately enhancing the system’s robustness and reliability.
Periods
1995
2000
2005
2010
2015</p>
      <p>Dairy
production</p>
      <p>Butter Cheese
1 000 kg
132,300 682,900
126,200 683,600
118,800 672,200
133,419 752,638
147,577 844,974
(a) Overview
(b) Example Table Fragment</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Query generation systems, particularly those involving text-to-SQL and KGQA, have made
significant strides [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Recent work [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] focus on grounding queries in knowledge graphs
to avoid hallucinations. Recent advancements [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] highlight the use of LLMs in generating
logical forms for querying databases.
      </p>
      <p>
        Data augmentation techniques [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], are essential for creating diverse and realistic training
datasets. Entity retrieval methods including sparse and dense retrieval approaches [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ],
play a crucial role in identifying relevant data within vast datasets.
      </p>
      <p>Compared to existing KBQA or text-to-SQL systems, we provide a hybrid solution where
statistical tabular data can be represented as knowledge graphs, to which the techniques for
symbolic expression generation instead of more complex query language generation (SQL or
SPARQL) can be applied. With this approach, we propose a novel system that can help find
relevant information in oficial statistics and similar systems, which is vital for governmental
decision making and all fields of research utilising and relying on these statistics.</p>
    </sec>
    <sec id="sec-4">
      <title>3. System Design</title>
      <p>
        In processing a question, GECKO performs four core steps: entity retrieval, filters retrieval,
constrained S-expression decoding (i.e. symbolic expression generation) and observation validation.
We restrict the querying space by performing entity retrieval based on the input question to
determine the closest KB nodes. This is done through either sparse retrieval using BM25+ [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
as a baseline method, using a trained dual encoder [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] or a finetuned ColBERT model [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>After obtaining the closest matching entities based on the query, we retrieve all possible
iflters for tables by exploding a subgraph using the entities found. The result of the subgraph
exploding is a graph containing all table nodes and their related measures and dimensions
having nodes intersecting with the retrieved entities from the previous step. The subgraph
contains all relevant nodes to the query, connected to one or more tables.</p>
      <p>The query and subgraph are used as input for the constrained S-expression decoding. The
S-expressions are generated token-by-token such that, given the subgraph, admissible tokens
are returned at every step. A rule-based baseline was created using the entity retrieval scores to
greedily determine what token from the admissible tokens to select. The second method uses a
transformer-based decoder-only seq2seq model and dynamic prompting.</p>
      <p>When generating a token at a given timestep, the model evaluates the sequences in the list of
admissible/constrained tokens and selects the sequence with the highest assigned score. For
example, when 7425eng is given to the decoder as one of the admissible next tokens, but only a
decomposition of sub-tokens can be embedded by the model (e.g. 7425 followed by ##eng), the
summed log probability for these subtokens will determine the total probability of selecting
this identifier for generation.</p>
      <p>The novelty in this constraining method is the introduction of dynamic prompting, which,
instead of calculating the likelihood of a token sequence based on a static prompt (i.e. text
input for the model), adjusts the prompts according to the generation phase. For example,
when generating a table ID, the prompt is altered to only include the most relevant table IDs
and their descriptions. Similarly, when measures are generated in the next phase, it retrieves
the measures related to the previously generated table ID, and using those to construct a new
prompt. This method applies to the diferent dimension groups as well. Figure 2 contains a
schematic overview of the dynamic prompting technique.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Model training &amp; beta deployment</title>
      <p>For creating training data, we developed a method for manual data annotation. This method
involves annotators writing queries that can be answered by a specific table cell. Annotators
were instructed to write their questions both as full sentences and in a more casual style, aiming
to simulate the formulation of questions posed by users in a search engine. The data obtained
from this manual annotation process contains queries and their corresponding S-expression,
resulting in 2300 annotated pairs.</p>
      <p>The annotated queries were distributed over random tables from the CBS datapool, and
contained a strong class imbalance towards tables that were more easily annotated. This class
imbalance and random distribution motivates extending this study with data augmentation.
In this extension, annotated S-expressions and their associated queries are used to fine-tune
a GPT-3.5 model through the OpenAI fine-tuning services. The query-expression pairs were
transformed into prompts using the descriptions of the IDs for various measures, dimensions,
and table IDs. Training such a model reduces the need for additional manual annotation, while
also significantly increasing the amount of annotated data.</p>
      <p>The initial GECKO model ( 1) and model containing the improvements discussed here ( 2)
were evaluated using a selected sample of this dataset. This was done by evaluation exact table
matches of generated S-expressions ( 1 0.35;  2 0.63), F1-scores for selected dimensions in said
S-expressions ( 1 0.62;  2 0.71) and by manually annotation answer relevancy, as an answer can
be a non-exact match but still be relevant to the question ( 1 0.38;  2 0.71).</p>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>The beta deployment of GECKO1, a generation-based KGQA system for CBS data, marks a
significant milestone in improving user interaction with governmental statistics. This phase
includes mechanisms for feedback collection, which will play a crucial role in refining and
enhancing the system based on user input. The feedback gathered during the beta deployment
will help identify and address potential issues, ensuring the system’s robustness and reliability.
This process is essential for developing a reliable QA system capable of providing accurate and
relevant statistical observations in response to natural-language user questions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] OpenAI, GPT-4
          <source>Technical Report</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , O. Press,
          <string-name>
            <given-names>W.</given-names>
            <surname>Merrill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <surname>How Language Model Hallucinations Can Snowball</surname>
          </string-name>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>13534</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          , W. Chen,
          <article-title>LoRA: Lowrank adaptation of large language models</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A survey on text-to-sql parsing: Concepts, methods</article-title>
          , and future directions,
          <year>2022</year>
          . arXiv:
          <volume>2208</volume>
          .
          <fpage>13629</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vanni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sadler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          , Beyond i.i.d.:
          <article-title>Three levels of generalization for question answering on knowledge bases</article-title>
          ,
          <source>in: Proceedings of the Web Conference</source>
          <year>2021</year>
          , ACM,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1145/3442381.3449992.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Ng,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiang</surname>
          </string-name>
          , Decaf:
          <article-title>Joint decoding of answers and logical forms for question answering over knowledge bases</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2210</volume>
          .
          <fpage>00063</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Klettner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jokinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Simperl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Matthes</surname>
          </string-name>
          ,
          <article-title>Evaluating large language models in semantic parsing for conversational question answering over knowledge graphs</article-title>
          ,
          <source>arXiv preprint arXiv:2401.01711</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Nan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Radev</surname>
          </string-name>
          ,
          <article-title>Enhancing text-to-SQL capabilities of large language models: A study on prompt design strategies</article-title>
          ,
          <source>in: The 2023 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bonifacio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Abonizio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fadaee</surname>
          </string-name>
          , R. Nogueira,
          <article-title>InPars: Unsupervised dataset generation for information retrieval</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '22,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>2387</fpage>
          -
          <lpage>2392</lpage>
          . doi:
          <volume>10</volume>
          .1145/3477495.3531863.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Jeronymo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bonifacio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Abonizio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fadaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lotufo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zavrel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <article-title>Inparsv2: Large language models as eficient dataset generators for information retrieval</article-title>
          ,
          <source>arXiv preprint arXiv:2301</source>
          .
          <year>01820</year>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Recent trends in deep learning based open-domain textual question answering systems</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>94341</fpage>
          -
          <lpage>94356</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Oguz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Edunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W.-t. Yih,
          <article-title>Dense passage retrieval for open-domain question answering</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>6769</fpage>
          -
          <lpage>6781</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>550</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaharia</surname>
          </string-name>
          ,
          <article-title>Colbert: Eficient and efective passage search via contextualized late interaction over bert</article-title>
          ,
          <source>in: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <article-title>Lower-bounding term frequency normalization</article-title>
          ,
          <source>in: Proceedings of the 20th ACM International Conference on Information and Knowledge Management</source>
          , CIKM '11,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2011</year>
          , p.
          <fpage>7</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 2063576.2063584.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>