<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>LINTEXT: A Visual Tool for Exploring and Modeling Knowledge in Text Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Riley Capshaw</string-name>
          <email>riley.capshaw@liu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eva Blomqvist</string-name>
          <email>eva.blomqvist@liu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Knowledge Graphs, Masked Language Models, Machine Reading, Entity Embedding, Document-level Relation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Linköping University</institution>
          ,
          <addr-line>581 83 Linköping</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>A large part of knowledge is commonly encoded into text documents. While extracting this information into a Knowledge Graph (KG) is a common approach, it sufers from challenges when texts are added, removed, or changed, or when the schema of the intended KG changes. Instead we advocate an approach where text and models evolve together in an interactive manner. We present LINTEXT, a system accompanying a published method which allows users to jointly explore and model the information held within text documents. The modeling is accomplished by specifying fill-in-the-blank prompts along with some metadata which are then recorded as specifications for simple relations that can be used to generate an ontology. The exploration aspect is accomplished by having the system complete each prompt with entities identified from the text and presenting the completions as a ranked list to the user, allowing users to verify the quality of the extracted triples. By elevating the development of the ontology to a visual and interactive level, it has an immediate text connection and users can be more certain that the documents they wish to model contain the information they wish to extract or query. Additionally, our system is designed to support the development of relation extraction (RE) pipelines underlying the document analysis, with a particular focus on supporting methods for improving vector representations of the extracted entities. To this end, users can choose to analyze documents from pre-annotated RE data sets to understand how changes in diferent elements of the pipeline afect the results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Knowledge Graphs (KGs) and their related ontologies are commonly recognized as key components
of modern knowledge-based systems. However, a large body of knowledge still exists only in text
documents. Such knowledge can be extracted and encoded into a KG, but if the document corpus
is prone to changing, such extraction must be updated or possibly redone with every change. Even
worse, when the underlying schema of the KG (e.g., the ontology) changes, the whole process may have
to be repeated from scratch. Given such requirements on flexibility and evolution of the knowledge,
hand-crafting KGs is not scalable, particularly due to the rate at which knowledge changes in many
real-world applications. As such, partially automating the process of extracting and even modeling
knowledge has been a subject of research for many years. Nevertheless, accurate and reliable automatic
KG construction from natural language documents still remains a dificult task with many challenges,
even in light of the impressive recent advances in language modeling. Therefore we advocate a diferent
approach, where text and models (KG and ontology) can evolve together in an interactive manner.</p>
      <p>
        While automated ontology extraction from text has been investigated in the area of ontology learning
for several decades, with some systems accompanied by user interfaces, these have not been focused
on the combination of text exploration, fact extraction, and ontology modeling that we are presenting
here. The approaches closest to ours include end-user focused ontology modeling systems such as the
enterprise Wiki approach in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Protégé plugins like [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], although both pre-date the recent major
advances in language modeling. Also related are systems which have recently emerged for linking
entities between text and images, connected to a knowledge base, such as [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, none of these
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org
examples support the exploration and interactive evaluation of the efects of diferent modern extraction
techniques and components, together with schema (ontology) updates by the user.</p>
      <p>
        In this paper, we present Linköping University’s Text Exploration Tool (LINTEXT), a first
demonstration of the interactive part of such a vision. LINTEXT is designed with a prompt-first approach to
text exploration in mind. This means that the act of exploring documents becomes a modeling task
that interactively defines a schema, which itself may be the starting point of an ontology. Users have
the option of specifying more information, such as what types of entities are relevant or whether a
relation can be symmetric. LINTEXT then provides visual feedback in the form of a ranked list of
statements which could possibly be extracted from the document. This allows a user to understand
whether the document they are modeling contains instances of the relations they define, and how
well a particular relation extraction (RE) pipeline extracts correct triples. In addition to exploring and
modeling documents, LINTEXT is built on top of our recent work on zero-shot approaches for RE with
masked language models (MLMs) [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], as well as our work to appear in EKAW 2024 on improving
vector representations for entities [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. As such, it is also designed to help researchers explore the efects
of changes to components of RE pipelines, such as the named entity recognition software, the MLM,
the scoring metrics, or the approach for generating vector representations for the entities. The system
is made publicly available via our GitHub repository1.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. The LINTEXT System</title>
      <p>
        LINTEXT enables users to experiment with diferent configurations of prompt-based zero-shot relation
extraction pipelines for short documents (around 512 tokens after tokenization). It supports reading
documents from established RE data sets (e.g., DocRED [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and BioRED [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) as well as user-provided
documents. Users additionally can load relation definitions from these data sets or define new relations.
These definitions are then used to analyze the documents, allowing users to visually confirm whether
some information in the document can be modeled by the schema. The analysis is presented to the user
as a sorted list of statements that could be extracted from the document. This aspect is discussed in
more detail in Section 2.2. Users can adjust elements of the schema and re-analyze the document until
they are satisfied with the list’s ordering, then save the results to a file to be shared.
      </p>
      <sec id="sec-2-1">
        <title>2.1. User Interface</title>
        <p>
          LINTEXT’s user interface as seen in Figure 1 is broken into two parts. On the left is the document
view, which shows the text of the current document after preprocessing. Users can either select a
document from a data set or “custom” to enter their own text to process. Preprocessing may involve
tokenization specific to a particular language model (LM), such as the word-piece tokens particular to
BERT-like models [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Users can hover over each word or entity in the document view to see how it
was tokenized, or they can check the ”Show tokens” box to display detailed tokenization information
for the entire document. This allows users to understand when tokenization might afect their results
such that they may choose a diferent MLM with a more tailored tokenizer (such as BioBERT [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] for
biomedical texts). Entity mentions are color-coded, with the outline color representing the entity’s type
and the fill color representing when mentions refer to the same entity. For example, “cardiomyopathy,”
“myocardial dysfunction,” and “myocardial necrosis” are all mentions of the same entity, each with the
type DiseaseOrPhenotypicFeature. When loading documents from a data set, these annotations are
provided as the ground truth, or could be provided by a pre-processing step.
        </p>
        <p>On the right is the schema view, which shows the current schema being explored or constructed. This
can be loaded from file, loaded with a document from a known data set, or a user can create this from
scratch. In order for a user to add a relation to the schema, at a minimum they need to add an identifier
and a fill-in-the-blanks prompt written in natural language (with variables). The relation name is meant
to be a human-readable name (i.e., label) while the relation identifier could simply be a code, as occurs</p>
        <sec id="sec-2-1-1">
          <title>1https://github.com/LiUSemWeb/LINTEXT</title>
          <p>
            in many ontologies and datasets. For example, DocRED assigns all relations WikiData [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] property
identifiers. This field can be used to clarify that the relation P569 is also known as date of birth.
The domain and range fields, when not empty, limit statements extracted for a relation to only include
entities which have the types listed, i.e., basically applying a closed-world semantics to these notions
by assuming that all entity types are disjoint2. In Figure 1, Comparison therefore only applies to and
from ChemicalEntity mentions. Users can also select whether some relation properties apply, such as
(ir)reflexivity, (anti)symmetry, and transitivity. BioRED explicitly mentions that its relations should be
considered undirected, which implies that they are all symmetric. Additionally, it might be useful to
mark that a ChemicalEntity cannot be compared with itself, and thus Comparison is irreflexive.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Pipeline</title>
        <p>
          The pipelines we focused on primarily support our ongoing work in the area focused on smaller
MLMs [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ]. Figure 2 shows an overview of the standard pipeline. The pipeline preprocesses an input
document by recognizing all entities, their mentions, and their types, then tokenizing the text. Given
a user-defined prompt as the query ( ?x is in ?y. in the above example), all unique pairs of entity
mentions are gathered. These pairs are then filtered by user-specified restrictions, such as domain,
range, and the relation properties seen in Figure 1 (right). For example, mention pairs which do not
satisfy the domain and range are discarded, or for antisymmetric relations, mention pairs corresponding
to the same entity are discarded. Each mention pair is then used to fill in the two blanks from the
prompt to generate a candidate statement, which is then ranked using a MLM according to some scoring
2While this is not the semantics that would be applied in a resulting OWL ontology modeled through the interface, it has been
shown in previous work [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ] to be a useful way of improving relation extraction results, and is also a bit more user-friendly
for users who are not as familiar with the open-world assumption.
metric. The default metric is pseudo-log-likelihood (PLL) [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ] as used in our earlier work [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], but
this pipeline has been used to support work involving PLL approximations such as cosine similarity [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
In support of our work accepted to EKAW 2024, this pipeline (and user interface) additionally allows
users to experiment with advanced contextualization techniques specific to MLMs.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Implementation</title>
        <p>LINTEXT is composed of two separate processes. The first is the visual front end developed using the
cross-platform user-interface SDK Flutter3, allowing it to run on most major operating systems and web
browsers with minimal efort. The second process is the web server back end developed in Python and
based heavily on the tools provided by the Hugging Face transformers library4 for easy swapping of
LMs and tokenizers. While some form of hardware acceleration is suggested (e.g., CUDA support), most
models can be run on CPU-only hardware, though with a noticeable increase in response times. The
two components communicate using a simple JSON API specification, ensuring their independence and
allowing for greater flexibility, which in turn better enables the swapping of RE pipeline components.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Use Case: Extending the BioRED Schema</title>
      <p>Consider as a use case the example from Figure 1. To load this screen, first select ‘BioRED’ from the
‘Choose data set’ drop-down, select the ‘train’ subset in the next drop-down menu, and enter 12 as
the document number. Ensure that the ‘Fetch Schema’ checkbox is checked before clicking the ‘Fetch’
button so that the system also loads the BioRED schema5 and displays it in the Schema View (right).</p>
      <p>The BioRED schema includes eight relations, most of which encode multiple distinct sub-relations.
For example, according to the annotation guidelines, Positive Correlation between two entities can
represent Upregulation, Exhibition, Response, Sensitivity, Induce, or Increase relations, each
with a diferent domain and range. At the start, a user might enter a single prompt for this relation into
the ‘Prompt (Subject first)’ field, such as ?x is positively correlated with ?y. Since BioRED
explains that all relations are undirected, the user checks the checkbox for ‘symmetric’ and leaves other
boxes unchecked. The user can additionally say that an entity should not be considered positively</p>
      <sec id="sec-3-1">
        <title>3https://flutter.dev/</title>
        <p>4https://huggingface.co/
5BioRED itself does not provide a file with a schema definition. We constructed one manually based both on the data and the
annotation guidelines, then associated it with the data set in a way that the system can handle. We provide a copy of this
schema in the GitHub repository, including simple prompts for each relation.
correlated with itself by checking the box for ‘irreflexive’. Next, the user checks the box beside Positive
Correlation and leaves all others unchecked to inform the system that it is the only relation to be
considered. To start analyzing documents, the user switches to the ‘Analyze’ tab (Figure 3) and, by not
changing any other options, clicks the ‘Analyze’ button to run the default RE pipeline. The interface
now shows a list of candidate statements that could be extracted from the current document.</p>
        <p>The user might find that the results are not particularly good since the highest-ranked candidates
are not supported by the text for many documents. This could be because most documents discuss the
sub-relations mentioned before rather than explicitly saying that two entities are positively correlated,
making it dificult to relate the prompt to the text. Alternatively, the user might realize that they are only
interested in finding documents which say that some ChemicalEntity (drug) will Induce a particular
DiseaseOrPhenotypicFeature (disease). In both cases, the user can improve their results by adding
a new relation to the schema. To do so, the user returns to the Schema View and presses the blue
plus button in the bottom right, then fills in the fields accordingly to define the Induce relation. Now,
the user has the option of either using this new relation on its own or jointly analyzing the Induce
and Positive Correlation relations. The former allows the user to narrow their results to just this
sub-relation, while the latter broadens their search, using more specific evidence of the Induce relation
as evidence for when a document contains instances of Positive Correlation relations. Finally, this
process can be repeated for all other sub-relations, resulting in a much more fine-grained schema which
better models the actual content of the documents.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>
        This demonstration will showcase the possibilities of interactive knowledge extraction from text, coupled
with ontology modeling. LINTEXT is a first attempt to develop a tool for interactive knowledge modeling
and evolution based on text documents. It relies on an approach presented in our EKAW2024 paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
and previous work that does not require fine-tuning LMs or the use of expensive large generative LMs,
while staying modular enough to support the exploration of those as alternative components in the
pipeline. LINTEXT thus both allows for enabling end-users to model and explore texts, as well as
for evaluating and improving current pipelines. Future work includes to conduct user studies on the
usability of the proposed system interface, as well as to use it for evaluating our current pipeline with
end-users (as in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] it was only evaluated on existing standard RE datasets).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kump</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lindstaedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mahbub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Pammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rospocher</surname>
          </string-name>
          , L. Serafini,
          <article-title>Moki: The enterprise modelling wiki</article-title>
          , in: L.
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Traverso</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Ciravegna</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Heath</surname>
            , E. Hyvönen,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Mizoguchi</surname>
            , E. Oren,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sabou</surname>
          </string-name>
          , E. Simperl (Eds.),
          <source>The Semantic Web: Research and Applications</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2009</year>
          , pp.
          <fpage>831</fpage>
          -
          <lpage>835</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Pammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Scheir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lindstaedt</surname>
          </string-name>
          ,
          <article-title>Two protégé plug-ins for supporting document-based ontology engineering and ontological annotation at document level</article-title>
          ,
          <source>in: 10th International Protégé Conference</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dost</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rospocher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ballan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sperduti</surname>
          </string-name>
          ,
          <article-title>Aligning and linking entity mentions in image, text, and knowledge base</article-title>
          ,
          <source>Data &amp; Knowledge Engineering</source>
          <volume>138</volume>
          (
          <year>2022</year>
          )
          <article-title>101975</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S0169023X2100094X. doi:https://doi.org/ 10.1016/j.datak.
          <year>2021</year>
          .
          <volume>101975</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Capshaw</surname>
          </string-name>
          , E. Blomqvist,
          <article-title>Towards tailored knowledge base modeling using masked language models</article-title>
          ,
          <source>in: Proceedings of TEXT2KG, Co-located with ESWC</source>
          <year>2023</year>
          ,
          <article-title>CEUR-</article-title>
          <string-name>
            <surname>WS</surname>
          </string-name>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Capshaw</surname>
          </string-name>
          , E. Blomqvist,
          <article-title>Understanding and estimating pseudo-log-likelihood for zero-shot fact extraction with masked language models</article-title>
          ,
          <source>in: Proceedings of the 12th International Joint Conference on Knowledge Graphs</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Capshaw</surname>
          </string-name>
          , E. Blomqvist,
          <article-title>Contextualizing entity representations for zero-shot relation extraction with masked language models</article-title>
          , in: To appear
          <source>in International Conference on Knowledge Engineering and Knowledge Management (EKAW'24)</source>
          , Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , M. Sun,
          <article-title>DocRED: A large-scale document-level relation extraction dataset</article-title>
          ,
          <source>in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>764</fpage>
          -
          <lpage>777</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Luo</surname>
          </string-name>
          , P.-T. Lai,
          <string-name>
            <surname>C.-H. Wei</surname>
            ,
            <given-names>C. N.</given-names>
          </string-name>
          <string-name>
            <surname>Arighi</surname>
            ,
            <given-names>Z. Lu,</given-names>
          </string-name>
          <article-title>BioRED: a rich biomedical relation extraction dataset</article-title>
          ,
          <source>Briefings in Bioinformatics</source>
          <volume>23</volume>
          (
          <year>2022</year>
          )
          <article-title>bbac282</article-title>
          . URL: https://doi.org/10.1093/ bib/bbac282. doi:
          <volume>10</volume>
          .1093/bib/bbac282. arXiv:https://academic.oup.com/bib/articlepdf/23/5/bbac282/45936115/bbac282.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          - 1423.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>So</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Kang,</surname>
          </string-name>
          <article-title>BioBERT: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>36</volume>
          (
          <year>2020</year>
          )
          <fpage>1234</fpage>
          -
          <lpage>1240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <article-title>Wikidata: A new platform for collaborative data collection</article-title>
          ,
          <source>in: Proceedings of the 21st International Conference on World Wide Web, WWW '12 Companion</source>
          , Association for Computing Machinery, New York, NY, USA,
          <year>2012</year>
          , p.
          <fpage>1063</fpage>
          -
          <lpage>1064</lpage>
          . URL: https://doi.org/10.1145/2187980.2188242. doi:
          <volume>10</volume>
          .1145/2187980.2188242.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <article-title>Efective sentence scoring method using BERT for speech recognition</article-title>
          , in: W. S. Lee,
          <string-name>
            <surname>T.</surname>
          </string-name>
          Suzuki (Eds.),
          <source>Proceedings of The Eleventh Asian Conference on Machine Learning</source>
          , volume
          <volume>101</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1081</fpage>
          -
          <lpage>1093</lpage>
          . URL: https: //proceedings.mlr.press/v101/shin19a.html.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <article-title>BERT has a mouth, and it must speak: BERT as a Markov Random Field language model</article-title>
          ,
          <source>in: Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>