<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEBD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>An Ontology-Driven Knowledge Extraction Tool for Pathology Record Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Laura Menotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Marchesin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianmaria Silvello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departement of Information Engineering, University of Padua</institution>
          ,
          <addr-line>Padua</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>31</volume>
      <fpage>02</fpage>
      <lpage>05</lpage>
      <abstract>
        <p>The information in pathology diagnostic reports is often encoded in natural language. Extracting such knowledge can be instrumental in developing clinical decision support systems. However, the digital pathology domain lacks knowledge extraction systems suited to the task. One of the few examples is the Semantic Knowledge Extractor Tool (SKET), a hybrid knowledge extraction system combining a rule-based expert system with pre-trained ML models. SKET has been designed to extract knowledge from colon, cervix, and lung cancer diagnostic reports. To do so, the system employs an ontology-driven approach, where the extracted entities are linked with concepts modeled through a reference ontology, namely, the ExaMode ontology. In this work, we adapt SKET to a newer version of the ExaMode ontology and extend the method to account for an additional use case: Celiac disease. Our experimental results show that: 1) the new version of SKET outperforms the previous one on colon, cervix, and lung cancer use cases; and 2) SKET is efective on Celiac disease, confirming the ability of the system architecture to adapt to new, unseen scenarios.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Digital Pathology</kwd>
        <kwd>Knowledge Extraction</kwd>
        <kwd>Expert Systems</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Pathology revolves around studying the causes and efects of disease through the microscopic
examination of tissue and human cell samples placed onto glass slides [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In recent years, the
use of Whole Slide Images (WSIs) – obtained from the digital scanning of standard glass slides –
to speed up the diagnostic process on cancer and other diseases has grown significantly [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Nevertheless, analyzing slides remains a time-consuming task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Because of this, the use of
Deep Learning (DL) models to automatically classify WSIs has risen in popularity. However,
DL methods are data-hungry and require large-scale annotated datasets to be efective, which
are scarce and expensive resources in the pathology domain. To overcome this limitation, the
information contained within diagnostic reports can be used as weak labels to train predictive
algorithms for WSI classification [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Beyond image classification, the extraction of structured
knowledge from diagnostic reports can empower several downstream tasks; such as,
clinical decision support [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], keyword search [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], visual analytics [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and automated annotation
support [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        The significant increase in the volume of clinical data can benefit from Information Extraction
(IE) techniques, helping to reduce the burden of manual data curation [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. In this regard,
clinical IE has been applied to a variety of clinical text formats, such as radiology reports [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
discharge summaries [12], and pathology diagnoses [13, 14]. Rules are still common in clinical IE
systems as an integration tool for Machine Learning (ML) models – resulting in hybrid systems.
These methods leverage the strengths of both ML and rule-based architectures to achieve high
performance [14]. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the Semantic Knowledge Extractor Tool (SKET) has been presented.
SKET is an unsupervised hybrid knowledge extraction system that extracts entities from text
and links them to concepts in a reference ontology; namely, the ExaMode ontology [15]. SKET
has been designed to extract knowledge from diagnostic reports for three types of cancer: colon
carcinoma, uterine cervix cancer, and lung cancer.
      </p>
      <p>In this work, we adapt SKET to a newer, improved version of the ExaMode ontology and
we extend the approach to an additional use case: Celiac disease. The new version of SKET is
publicly available at https://github.com/ExaNLP/sket. Due to the nature of Celiac disease, we
broaden the scope of SKET to not only identify the presence of specific concepts, but also extract
additional information related to them. Furthermore, we introduce some consistency checks
that ensure the identified concepts are compliant with what occurs in practice. We perform an
experimental evaluation to assess the efectiveness of the new version of SKET. Overall, the
new version obtains an average performance gain of 8.17% compared to the old one. On the
other hand, the new SKET version reaches an accuracy of 0.9484 on Celiac disease.</p>
      <p>The rest of the paper is organized as follows. Section 2 outlines the source data while Section 3
describes the ExaMode ontology. Section 4 summarizes the system architecture and presents
the new version of SKET. Section 5 provides the evaluation setup and reports the efects of the
changes on the previous use cases, together with results on the Celiac use case. Finally, Section
6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Source Data</title>
      <p>
        The development and evaluation of SKET involved diagnostic reports from two European
medical centers, namely the Azienda Ospedaliera per l’Emergenza Cannizzaro (AOEC) located
in Catania, Italy, and the Radboud University Medical Center (RUMC) situated in Nijmegen,
The Netherlands. Diagnostic reports incorporate the findings of pathology tests and follow the
College of American Pathologists (CAP) international guidelines1 for pathology reports [16, 17].
AOEC provided pathology reports written in Italian for all four use cases. On the other hand,
RUMC reports are written in Dutch and comprise reports for colon and cervix cancer, as well
as Celiac disease cases. In this case, reports were produced using speech-to-text tools, making
them more verbose with respect to AOEC, where reports are collected in the clinical workflow.
Compared to the source data used in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the two medical centers provided additional reports
for the three cancer use cases, and supplied 2,576 reports about Celiac disease. In particular, the
1https://www.cap.org/protocols-and-guidelines
two medical centers provided 4,016 additional colon reports, 7,017 new reports about uterine
cervix cancer, and 235 additional reports concerning lung cancer.
      </p>
      <p>
        Most of state-of-the-art Named Entity Recognition (NER) and Entity Linking (EL) methods
over unstructured data are in English. Thus, following [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we translate Italian and Dutch
reports in English using the open-source, pre-trained Marian Neural Machine Translation (NMT)
models [18] and sanitize translated text from common errors through the use of handcrafted
rules.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The ExaMode Ontology</title>
      <p>The ExaMode ontology is a multi-lingual resource conceived to encode digital histopathology
diagnostic reports associated with WSIs. It was designed by analyzing textual records and
following an iterative process with continuous feedback and validation from pathologists and
clinicians. Specifically, the ExaMode ontology defines the relevant concepts and properties
organized into five semantic areas concerning clinical case reports (i.e., general aspects),
diagnosis results, performed tests, interventions employed to retrieve the specimen, and the
anatomical location of the findings. The new version of the ExaMode ontology 2 preserves the
original structure but we heavily revised the Celiac disease part while focusing on providing a
consistent ontology, where all elements comply with a set of design standards. To this end, we
established some principles to reference external taxonomies to limit the creation of new classes
to a minimum and reuse existing ontologies as much as possible. In addition, we determined
a design model exploiting the Simple Knowledge Organization System (SKOS) data model to
apply to classes whose individuals are general concepts. Overall, we revised the previous use
cases to ensure a more comprehensive representation of diagnostic reports in histopathology.</p>
      <p>For what concerns cancer-related use cases, the total number of elements is almost identical
between the two versions. Nevertheless, we removed some elements used in the previous version
of the ontology and we included some new ones. For instance, for colon cancer, we introduced
6 new elements and removed 4 old ones. In particular, we removed some broader concepts in
favor of specific ones, e.g., diferent degrees of dysplasia. This results in a wider spectrum of
concepts to be matched in the EL component of SKET. We applied the same methodology also
for cervix and lung use cases, where we added some elements about koilocytes and Human
Papilloma Virus (HPV), for cervix cancer, and one class to identify lung carcinoma findings, for
the lung use case.</p>
      <p>Most of the changes to the ExaMode ontology involve the Celiac disease use case, which
has undergone a complete redesign to more accurately reflect the domain of interest. Due
to the availability of diagnostic reports about Celiac disease, we were able to conduct an
indepth analysis of the domain and validate the ExaMode ontology. As a result, we integrated
37 new elements to account for the heterogeneity of symptoms and findings associated with
Celiac disease. In particular, we added several intestinal abnormalities and findings such as
malabsorption, gastric metaplasia, and erosion, together with information about villi that can be
useful when diagnosing Celiac disease. For instance, we added elements about the villi length,
their absence, their degree of atrophy, and the presence of flattened ones. We also added two
2The new version of the ExaMode ontology is available at http://examode.dei.unipd.it/ontology/.</p>
      <p>A Named Entity</p>
      <p>Recognition
Clinical Case</p>
      <p>Reports</p>
      <p>Entity
Mentions</p>
      <p>B</p>
      <p>Entity Linking</p>
      <p>ExaMode
Ontology</p>
      <p>Annotation</p>
      <p>Classes</p>
      <p>Linked
Concepts</p>
      <p>Data Labelling
Graph Creation</p>
      <p>Weak
Annotations</p>
      <p>
        Report-Level
Knowledge Graphs
data properties to specify the severity of the duodenitis and the stage of Celiac disease based on
the classification system proposed by Marsh-Oberhuber [19, 20].
4. SKET 2.0
In this work, we adapt SKET to the updated ExaMode ontology by aligning some aspects of
the NER and EL modules and by tailoring the Graph Creation module to the new data schema.
We recall that in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] only three use cases were considered; namely, colon carcinoma, uterine
cervix cancer, and lung cancer. In this work, we extend SKET to Celiac disease. In this regard,
we expand each module to account for the new use case. In particular, we add Celiac-related
rules to the NER and EL modules, and we also introduce a mapping between Celiac concepts
and annotation classes.
      </p>
      <sec id="sec-3-1">
        <title>4.1. System Architecture</title>
        <p>
          SKET adapts pre-trained NER models and employs unsupervised EL methods to extract relevant
concepts from diagnostic reports and link them to the ExaMode ontology. Extracted information
can serve as weak labels to train predictive models for image classification tasks [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] or as
nodes to build knowledge graphs based on the ontology data schema. We report the system’s
architecture in Figure 4.1. We preserve the same architecture of [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] but we adapt each module
implementation to the new version of the ontology and to a novel use case concerning Celiac
Disease. SKET consists of 4 modules: (A) Named Entity Recognition, (B) Entity Linking, (C)
Data Labelling, and (D) Graph Creation. Note that components (A) and (B) are sequential, while
(C) and (D) can run in parallel.
        </p>
        <p>The Named Entity Recognition module identifies entities within the clinical case report
text. SKET employs a hybrid-NER system, where ScispaCy models [21] and Neural Language
Models [22, 23] are combined with hand-crafted rules to refine the outputs. Rules have been
developed by analyzing and identifying common behaviors among a set of diagnostic reports
and are available in the SKET GitHub repository3.</p>
        <p>In the Entity Linking module, extracted entities are linked to the ExaMode ontology
components. SKET solves the EL task by introducing a two-stage model, where similarity matching is
employed when rule-based, ad-hoc matching fails. SKET also presents a post-processing step
where mentions that are commonly linked to unrelated ontology concepts are removed.</p>
        <p>
          The Data Labelling component produces annotations by mapping extracted concepts to a list
of annotation classes. Through this component, SKET outputs weak labels that can be used to
perform weakly supervised classification tasks [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. For each use case, pathologists have been
consulted to define the most clinically relevant set of annotation classes. The Celiac disease
annotation classes are: (1) Celiac disease; (2) Non-specific duodenitis; (3) Normal.
        </p>
        <p>In the Graph Creation component, extracted concepts are used as nodes to build report-level
knowledge graphs in Resource Description Framework (RDF) format. In particular, RDF graphs
are created by following the data schema provided by the ExaMode ontology. In this way, SKET
can enhance the semantic understanding of the diagnostic reports [24].</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Celiac disease use case</title>
        <p>As opposed to the other use cases, Celiac disease is a non-cancerous disease and can manifest
itself in a large variety of symptoms [25]. The diagnosis of Celiac disease is based on the
description of the small intestine alterations, usually detected with a duodenal biopsy, by
expert pathologists [26]. Microscopic analysis of duodenal samples for Celiac disease provides
information about villi, enterocytes, intra-epithelial lymphocytic infiltrate, and glandular crypts.
The absence or alteration of these structures is crucial for the diagnosis. Furthermore, biopsies
include characteristics that have to be well described, with particular attention to increased
intraepithelial T lymphocytes, decreased enterocyte height, crypt hyperplasia, and villous
atrophy. As a result, we must include some data properties to encompass key aspects related to
intestinal mucosa alterations.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], SKET extracts mentions from the report text and links them to relevant concepts in
the ExaMode ontology. However, this approach merely provides information on the presence
or absence of some features valuable for the diagnosis. Thus, we broaden the scope of SKET by
not only identifying the presence of specific concepts but also extracting additional information
related to them. For instance, if a report includes “moderate villi atrophy" with this new
approach we are able to identify the presence of the concept “villi atrophy" together with its
severity, i.e., “moderate". In order to achieve this, we analyzed a restricted set of Celiac reports
to identify common patterns in word phrases referring to intestinal abnormalities and diagnoses.
In particular, when we identify mentions related to concepts modeled as data properties in
the ontology, we employ some rule-based techniques to identify the data property values and
append such information to the linked concepts. Such additional facts are then exploited in the
graph creation component to instantiate the corresponding data properties.
3https://github.com/ExaNLP/sket/tree/main/sket/nerd/rules/
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], the data labeling module performs a multi-label task, allowing for multiple annotations
on a single report. For example, let us consider the cervix cancer use case, one report can
comprise both “Presence of HPV infection" and “Cancer - adenocarcinoma in situ" annotation
classes. Conversely, in the Celiac disease use case, we assume there can only be one correct
label for each report. We recall that the Celiac disease use case comprises three labels: “Celiac
disease", “Non-specific duodenitis" and “Normal". Hence, the nature of the labels better suits a
multi-class classification scenario. To adapt SKET for this, we include some consistency checks
in the data labeling component that assess the annotation quality of Celiac reports. For cancer
use cases, if the EL component does not extract any concept from the input text, SKET labels
the report as “No Cancer" or “Non-informative". For Celiac disease, instead, we make SKET
check whether multiple annotation classes have been identified for the same report. In that case,
the output violates the multi-class assumption, so SKET removes all annotations and labels the
report as “Inconclusive”. Note that the label “Inconclusive” serves to reflect those cases where it
is not possible to match one of the labels defined by pathologists.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Evaluation</title>
      <sec id="sec-4-1">
        <title>5.1. Experimental Setup</title>
        <p>
          We perform two evaluations: (1) we compare the two versions of SKET on cancer-related use
cases using the setup defined in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]; and (2) we evaluate the new version of SKET on the newly
introduced Celiac disease use case. Table 1 reports the number of annotated reports for each
use case and medical center. For Celiac Disease, the test dataset comprises 456 reports labeled
“Celiac Disease", 102 reports concerning “Non-specific duodenitis" and 2,018 normal reports.
Note that the label distribution across the dataset is imbalanced due to the fact that we are
relying on data coming from a real-case scenario. In other words, certain conditions occur more
often than others in the clinical routine.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Experimental Results</title>
        <sec id="sec-4-2-1">
          <title>5.2.1. Cancer-related use cases</title>
          <p>
            To perform the first evaluation, we employ the same set of manually labeled reports as in [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]
and compare results with those reported in [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] for the original use cases. Table 2 reports the
results obtained by the new version of SKET for the three cancer use cases. Concerning results,
we notice that the new version of SKET achieves better results in all of them for all measures,
with an average accuracy gain of 8.17%. The performance boost for each use case reflects the
changes on the SKET architecture as well as the updates to the ExaMode ontology, which are
described in Section 4. Indeed, since the colon cancer use case experienced the highest amount
of updates in the ontology, it exhibits a peak gain of 14.87% in terms of accuracy. These results
show that SKET performances are afected by the reference ontology used in the EL module.
Specifically, the ExaMode ontology defines the set of concepts to be matched in the report
text. This, in turn, afects the Data Labeling module which relies on the presence or absence
of specific concepts to generate labels. For this reason, having a representative ontology is a
crucial aspect to ensure good performance.
          </p>
        </sec>
        <sec id="sec-4-2-2">
          <title>5.2.2. Celiac disease use case</title>
          <p>To assess the quality of the labels extracted by SKET concerning the newly-added use case,
diagnostic reports presented in Table 1 have been manually labeled by experts.</p>
          <p>Table 3 reports the results on data labeling for the use case concerning Celiac disease. We
report the performance of two runs to assess the efect of the consistency checks and
postprocessing step: “Base” and “Full”. “Base" refers to the standard configuration of SKET, without
any additional steps. On the other hand, “Full" comprises both the post-processing step and
consistency checks. Results confirm the advantages of adding these two modules since an
improvement has been generated from eliminating inaccurately matched concepts and labels.
Concerning the “Full” run, SKET achieves the highest performance scores among all use cases,
with an accuracy of almost 0.95. This can be attributed to the smaller number of labels, i.e.
3, and the multi-class classification assumption. Moreover, Celiac reports usually follow a
more structured format. Thus, the language ambiguities – typical of free text – that hinder IE
applications are limited. Results demonstrate SKET is efective on Celiac Disease, confirming
the adaptation power of the system architecture to unseen scenarios.</p>
          <p>Table 4 reports the performance distribution across the diferent annotation classes for
Celiac disease. In particular, for all labels, we evaluate precision, recall, and F1 score. Results
demonstrate that the new version of SKET is able to correctly predict all labels with high
performance. In particular, precision is above 0.9 for all classes while recall and F1-score are
between 0.82 and 0.97. These oscillations can be attributed to the lower number of annotated</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>
        In this work, we adapt SKET to the new version of the ExaMode ontology and we extend it
to a novel use case concerning Celiac disease. We preserve the same architecture presented
in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], but we revise the NER and EL modules and we tailor the Graph Creation module to
build report-level knowledge graphs based on the updated data schema. Concerning Celiac
disease reports, we implement each module to address the novel use case. Due to the nature of
diagnostic reports about Celiac disease, we broaden the scope of SKET by not only identifying
the presence of specific concepts but also extracting additional information related to them
to populate relevant data properties. Our results outperform [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] in all original use cases in
all 3 performance measures, with an average accuracy gain of 8.17%. The performance boost
for each use case reflects the changes made to SKET as well as the updates to the ExaMode
ontology. Concerning the Celiac disease use case, the system achieves the highest performance
scores among all use cases, with an accuracy of 0.9484 and a weighted F1 score of 0.9531. The
performance on the novel use case demonstrates that SKET is efective on Celiac disease and
confirms its ability to adapt to new, unseen scenarios.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by the ExaMode Project, as a part of the European Union Horizon
2020 Program under grant 825292.
www.sciencedirect.com/science/article/pii/S0933365715001244. doi:https://doi.org/
10.1016/j.artmed.2015.09.007.
[12] H. Yang, I. Spasic, J. A. Keane, G. Nenadic, A text mining approach to the prediction
of disease status from clinical discharge summaries, Journal of the American Medical
Informatics Association 16 (2009) 596–600. URL: https://www.sciencedirect.com/science/
article/pii/S1067502709000929. doi:https://doi.org/10.1197/jamia.M3096.
[13] N. Ashish, L. Dahm, C. Boicey, University of california, irvine–pathology extraction
pipeline: The pathology extraction pipeline for information extraction from pathology
reports, Health informatics journal 20 (2014) 288–305.
[14] E. Santus, T. Schuster, A. M. Tahmasebi, C. Li, A. Yala, C. R. Lanahan, P. Prinsen, S. F.
Thompson, S. Coons, L. Mynderse, R. Barzilay, K. Hughes, Exploiting rules to enhance machine
learning in extracting information from multi-institutional prostate pathology reports,
JCO Clinical Cancer Informatics (2020) 865–874. URL: https://doi.org/10.1200/CCI.20.00028.
doi:10.1200/CCI.20.00028. arXiv:https://doi.org/10.1200/CCI.20.00028,
pMID: 33006906.
[15] S. Marchesin, L. Menotti, G. Silvello, The examode ontology (full version, v.2) (2.0) [data
set], 2023. doi:https://doi.org/10.5281/zenodo.7669237.
[16] J. R. Srigley, T. McGowan, A. MacLean, M. Raby, J. Ross, S. Kramer, C. Sawka, Standardized
synoptic cancer pathology reporting: A population-based approach, Journal of surgical
oncology 99 (2009) 517–524.
[17] D. Ellis, J. Srigley, Does standardised structured reporting contribute to quality in diagnostic
pathology? the importance of evidence-based datasets, Virchows Archiv 468 (2016) 51–59.
[18] M. Junczys-Dowmunt, R. Grundkiewicz, T. Dwojak, H. Hoang, K. Heafield, T. Neckermann,
F. Seide, U. Germann, A. F. Aji, N. Bogoychev, A. F. T. Martins, A. Birch, Marian: Fast
neural machine translation in C++, in: Proceedings of ACL 2018, System Demonstrations,
Association for Computational Linguistics, Melbourne, Australia, 2018, pp. 116–121. URL:
https://aclanthology.org/P18-4020. doi:10.18653/v1/P18-4020.
[19] M. N. Marsh, Gluten, major histocompatibility complex, and the small intestine: a
molecular and immunobiologic approach to the spectrum of gluten sensitivity (‘celiac sprue’),
Gastroenterology 102 (1992) 330–354.
[20] G. Oberhuber, G. Granditsch, H. Vogelsang, The histopathology of coeliac disease: time
for a standardized report scheme for pathologists., European journal of gastroenterology
&amp; hepatology 11 (1999) 1185–1194.
[21] M. Neumann, D. King, I. Beltagy, W. Ammar, ScispaCy: Fast and robust models for
biomedical natural language processing, in: Proceedings of the 18th BioNLP Workshop
and Shared Task, Association for Computational Linguistics, Florence, Italy, 2019, pp.
319–327. URL: https://aclanthology.org/W19-5034. doi:10.18653/v1/W19-5034.
[22] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed representations of
words and phrases and their compositionality, Advances in neural information processing
systems 26 (2013).
[23] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional
transformers for language understanding, in: Proceedings of the 2019 Conference of
the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies, Volume 1 (Long and Short Papers), Association for Computational
Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://aclanthology.org/
N19-1423. doi:10.18653/v1/N19-1423.
[24] M. Agosti, S. Marchesin, G. Silvello, Learning unsupervised knowledge-enhanced
representations to reduce the semantic gap in information retrieval, ACM Trans. Inf. Syst. 38
(2020). URL: https://doi.org/10.1145/3417996. doi:10.1145/3417996.
[25] V. Villanacci, A. Vanoli, G. Leoncini, G. Arpa, T. Salviato, L. R. Bonetti, C. Baronchelli,
L. Saragoni, P. Parente, Celiac disease: histology-diferential diagnosis-complications. a
practical approach, Pathologica 112 (2020) 186.
[26] V. Villanacci, P. Ceppa, E. https://www.overleaf.com/project/617c1835b0e93b5f11026f98Tavani,
C. Vindigni, U. Volta, On behalf of the “Gruppo Italiano Patologi Apparato
Digerente (GIPAD)” and of the “Società Italiana di Anatomia Patologica e Citopatologia
Diagnostica”/International Academy of Pathology, Italian division (SIAPEC/IAP)
Coeliac disease: the histology report, Digestive and Liver Disease 43 (2011) S385–S395.
doi:10.1016/S1590-8658(11)60594-X.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Cross</surname>
          </string-name>
          ,
          <article-title>Underwood's Pathology: A Clinical Approach</article-title>
          , Elsevier Health Sciences,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Hanna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parwani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Sirintrapun</surname>
          </string-name>
          ,
          <article-title>Whole slide imaging: Technology and applications</article-title>
          ,
          <source>Advances in Anatomic Pathology</source>
          <volume>27</volume>
          (
          <year>2020</year>
          ). URL: https://journals.lww.com/anatomicpathology/Fulltext/2020/07000/Whole_Slide_ Imaging__
          <source>Technology_and_Applications</source>
          .5.aspx.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Krupinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Graham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Weinstein</surname>
          </string-name>
          ,
          <article-title>Characterizing the development of visual search expertise in pathology residents viewing whole slide images</article-title>
          ,
          <source>Human pathology 44</source>
          (
          <year>2013</year>
          )
          <fpage>357</fpage>
          -
          <lpage>364</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Marini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Otálora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wodzinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Caputo</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Van Rijthoven</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Aswolinskiy</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-M. Bokhorst</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Podareanu</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Petters</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Boytcheva</surname>
            , G. Buttafuoco,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Vatrano</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Fraggetta</surname>
            ,
            <given-names>J. Van der Laak</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Agosti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciompi</surname>
          </string-name>
          , G. Silvello,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Atzori, Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations, npj Digital Medicine (</article-title>
          <year>2022</year>
          )
          <article-title>102</article-title>
          . URL: https://doi.org/10.1038/ s41746-022-00635-4. doi:https://doi.org/10.1038/s41746-022-00635-4.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Glaser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Silberman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Meeks</surname>
          </string-name>
          ,
          <source>Automated extraction of grade</source>
          , stage, and
          <article-title>quality information from transurethral resection of bladder tumor pathology reports using natural language processing</article-title>
          ,
          <source>JCO Clinical Cancer Informatics</source>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . URL: https://doi.org/10.1200/CCI.17.00128. doi:
          <volume>10</volume>
          .1200/CCI.17.00128. arXiv:https://doi.org/10.1200/CCI.17.00128, pMID:
          <fpage>30652586</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Badan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Benvegnù</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Biasetton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bonato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brighente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cenzato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ceron</surname>
          </string-name>
          , G. Cogato,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Minetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pellegrina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Purpura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Simionato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Soleti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tessarotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tonon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vendramin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <article-title>Towards Open-Source Shared Implementations of Keyword-Based Access Systems to Relational Data, in: Proc. of the Workshops of the EDBT/ICDT 2017 Joint Conference</article-title>
          (EDBT/ICDT 2017), Venice, Italy, March
          <volume>21</volume>
          -24,
          <year>2017</year>
          , volume
          <volume>1810</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2017</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-1810/KARS_paper_01.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giachelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Marini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Boytcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Buttafuoco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciompi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fraggetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Irrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Primov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vatrano</surname>
          </string-name>
          , G. Silvello,
          <article-title>Empowering digital pathology applications through explainable knowledge extraction tools</article-title>
          ,
          <source>Journal of Pathology Informatics</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <article-title>100139</article-title>
          . doi:https://doi.org/10. 1016/j.jpi.
          <year>2022</year>
          .
          <volume>100139</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Giachelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Irrera</surname>
          </string-name>
          , G. Silvello,
          <article-title>Medtag: a portable and customizable annotation tool for biomedical documents</article-title>
          ,
          <source>BMC Medical Informatics Decis. Mak</source>
          .
          <volume>21</volume>
          (
          <year>2021</year>
          )
          <fpage>352</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kreimeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Arya</surname>
          </string-name>
          , G. Halford,
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Forshee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Walderhaug</surname>
          </string-name>
          , T. Botsis,
          <article-title>Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>73</volume>
          (
          <year>2017</year>
          )
          <fpage>14</fpage>
          -
          <lpage>29</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S1532046417301685. doi:https://doi.org/10.1016/j.jbi.
          <year>2017</year>
          .
          <volume>07</volume>
          .012.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Silvello, TBGA: a large-scale gene-disease association dataset for biomedical relation extraction</article-title>
          ,
          <source>BMC Bioinform</source>
          .
          <volume>23</volume>
          (
          <year>2022</year>
          )
          <fpage>111</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hassanpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Langlotz</surname>
          </string-name>
          ,
          <article-title>Information extraction from multi-institutional radiology reports</article-title>
          ,
          <source>Artificial Intelligence in Medicine</source>
          <volume>66</volume>
          (
          <year>2016</year>
          )
          <fpage>29</fpage>
          -
          <lpage>39</lpage>
          . URL: https://
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>