<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SKET: an Unsupervised Knowledge Extraction Tool to Empower Digital Pathology Applications⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Extended Abstract</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Giachelle</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ornella Irrera</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Marchesin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianmaria Silvello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padua</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Large volumes of medical data have been produced for decades. These data include diagnoses, which are often reported as free text, thus encoding medical knowledge that is still largely unexploited. To decode the medical knowledge present within reports, we propose the Semantic Knowledge Extractor Tool (SKET), an unsupervised knowledge extraction system combining a rule-based expert system with pretrained Machine Learning (ML) models. This work demonstrates the viability of unsupervised Natural Language Processing (NLP) techniques to extract critical information from cancer reports, opening opportunities such as data mining for knowledge extraction purposes, precision medicine applications, structured report creation, and multimodal learning.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Extraction</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Expert Systems</kwd>
        <kwd>Digital Pathology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Hundred of thousands of medical reports have been used to communicate diagnoses, encoding a
vast amount of medical knowledge. In this context, free-text reporting is the de facto standard to
communicate diagnoses, guiding patients’ treatment, and conducting therapies. Processing high
volumes of free-text reports to extract the crucial knowledge is usually performed manually.
However, since reports vary widely between institutions, contain noise, and lack a standard
structure, this becomes an extremely time-consuming process. To overcome this limitation,
Natural Language Processing (NLP) methods become essential [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">2, 3, 4, 5, 6, 7, 8, 9</xref>
        ] as they
empower the eficient automatic processing of thousands of reports and the extraction of
relevant information for several (downstream) tasks, such as clinical note mining [10, 11] and
structuring [12], risk prediction [13], clinical decision support [14], and precision medicine
retrieval [15].
      </p>
      <p>
        In the context of digital pathology – a field that involves the analysis of histopathology images
known as Whole Slide Images (WSIs) – this work aims at proving the viability of unsupervised
NLP techniques to automatically extract critical information from pathology reports and use it
for diferent applications, such as automatic report annotation and visualization [ 16], as well as
WSI classification [ 17]. To this end, we present the Semantic Knowledge Extractor Tool (SKET),
an unsupervised hybrid knowledge extraction system that combines rule-based techniques
with pre-trained Machine Learning (ML) models to extract knowledge from pathology reports.
In recent years, NLP has shifted from using rules to ML approaches [
        <xref ref-type="bibr" rid="ref9">18, 9</xref>
        ], which have the
advantage of learning regularities from data and of generalizing to previously unseen patterns.
Moreover, the advent of eficient Neural Language Models ( NLMs) [19, 20, 21, 22] paved the way
for the pre-training era, where large NLMs trained in a self-supervised fashion on huge datasets
are used to develop NLP models for a number of downstream tasks. Nevertheless, similarly
to [10], we argue that rule-based techniques capture critical information that should be used
together with – and not substituted by – ML to improve performance.
      </p>
      <p>We evaluate SKET efectiveness on entity linking and text classification, considering three
use-cases: Colon, Cervix, and Lung cancer. We resort on diagnostic reports coming from two
medical centers based in Italy and The Netherlands. Then, we compare SKET with unsupervised
ML approaches to understand the impact that combining rule-based techniques and pre-trained
ML models have on the extraction of knowledge from diagnostic reports. The results highlight
the efectiveness of ML methods for information extraction in the pathology domain but, at
the same time, they also stress the role of expert knowledge in reaching the high levels of
accuracy required to semi-automate the clinical practice. As further proof, SKET has been
already used as core system in automatic report annotation and visualization [16], as well
as weak supervision for WSI classification [ 17]. SKET source code is publicly available at
https://github.com/ExaNLP/sket.</p>
      <p>The rest of this paper is organized as follows: Section 2 presents SKET. Section 3.2 describes
the experimental evaluation. Finally, Section 4 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The Semantic Knowledge Extractor Tool</title>
      <p>SKET combines pre-trained Named Entity Recognition (NER) models with unsupervised Entity
Linking (EL) methods to extract relevant entities from diagnostic reports and link them to
concepts stored in a reference ontology1. By relying on pre-trained NER models and unsupervised
EL methods, SKET can serve as automated annotator in weak supervision tasks. For instance,
the concepts extracted by SKET can be used as weak labels when training ML models for image
classification [ 23, 24] and relation extraction [25], or as nodes to build knowledge graphs that
can be used for retrieval tasks [26].</p>
      <p>SKET consists of four main components: (1) Named Entity Recognition, (2) Entity Linking,
(3) Data Labeling, and (4) Graph Creation. Components (1) and (2) are sequential, whereas (3)
and (4) can be applied in parallel. We briefly describe each component below.</p>
      <sec id="sec-2-1">
        <title>2.1. Named Entity Recognition</title>
        <p>NER can be defined as the task of identifying and categorizing relevant information within
text. A named entity can be any word or phrase – i.e., a mention – that consistently refers
to a concept or object of the world. Once identified, mentions are classified into predefined
categories, such as disease, gene/protein, symptom, etc.</p>
        <p>To perform NER, SKET combines pre-trained neural models with rule-based techniques.
As neural component, SKET exploits ScispaCy models [27], which provide full NER pipelines
for biomedical data, consisting of large medical vocabularies, as well as Word2Vec [19] word
vectors trained on the PubMed Central Open Access Subset [28]. Regarding the integration of
expert rules, SKET extends the ScispaCy pipeline with two more components: Entity Fusion
and Negation Detection. For Entity Fusion, SKET exploits expert rules to identify and merge
specific mentions that would otherwise be regarded as separate by ScispaCy. For example,
“high-grade” and “dysplasia” are considered as separate mentions, whereas we are interested in
“high-grade dysplasia” as a unique mention. Hence, we developed regular expressions capable
of identifying trigger terms that are indicative of a set of mentions that should potentially be
combined into one. These expert rules have been developed on a holdout dataset, which is
available in the SKET GitHub repository2. The dataset consists of 50 diagnostic reports for each
use-case and medical center, for a total of 250 diagnostic reports. For Negation Detection,
SKET relies on NegEx [29], a negation detection algorithm that evaluates whether extracted
entities are negated within text. NegEx uses regular expressions to identify the scope of trigger
terms that are indicative of negation. Then, the entities extracted within the scope of a trigger
term are marked as negated and removed.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Entity Linking</title>
        <p>EL can be defined as the task of assigning unique meanings to entities mentioned within text.
In a nutshell, EL aims to determine whether a target named entity refers to a specific concept
or object stored within a reference ontology.</p>
        <p>To perform EL, SKET adopts ad-hoc and similarity-based matching. Given an extracted entity,
SKET performs a two-stage matching approach. First, the system tries to link the entity using
ad-hoc matching. Then, if ad-hoc matching fails, it employs the similarity-based matching.
For Ad-Hoc Matching, SKET employs regular expressions to find trigger terms indicative
of a specific concept in the ontology. Once a trigger is found, the system matches the entity
containing the trigger term with the closest ontology concept. In this case, if an extracted entity
contains the (trigger) term “carcinoma”, then SKET links the entity to the “colon adenocarcinoma”
concept. Ad-hoc matching rules have also been developed on the holdout dataset and are
available on GitHub. Regarding Similarity Matching, SKET combines string and semantic
matching techniques. For string matching, SKET adopts the Gestalt Pattern Matching (GPM)
algorithm [30]. For semantic matching, SKET exploits the word vectors provided by ScispaCy
models [27]. Specifically, it computes the cosine distance between the vector representations of
extracted entities and ontology concepts.
2https://github.com/ExaNLP/sket/tree/main/sket/nerd/rules/</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Data Labeling</title>
        <p>Given the set of concepts extracted from each diagnostic report, SKET maps a clinically relevant
subset of such concepts to a set of annotation classes defined by pathologists.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Graph Creation</title>
        <p>SKET builds report-level knowledge graphs using the extracted concepts as nodes and the
semantic relations of the reference ontology as edges. The use of ontology concepts and
relations to describe diagnostic reports increases the semantic understanding of the underlying
data [31]. Once created, report-level knowledge graphs are encoded in a machine-readable
format through RDF.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Evaluation</title>
      <p>3.1. Setup
Tasks: We evaluate SKET on Entity Linking (Task 1) and Text Classification (Task 2). Both
tasks are addressed as multi-label classification problems. Note that the number of possible
labels for entity linking is much higher than for text classification, making the task an extreme
multi-label classification problem [32, 33].</p>
      <p>
        Datasets: For Task 1, we use 1,250 annotated reports coming from both medical centers and
related to all the three use-cases. For Task 2, we resort on 9,798 annotated reports, divided
among medical centers and use-cases. We refer the reader to the original publication [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for a
comprehensive description of the available data.
      </p>
      <p>Baselines: For both tasks, we compare SKET with two unsupervised approaches based on Bio
FastText [20, 34] and BioClinical BERT [22, 35]. For a fair comparison, both approaches adopt
the same NER ScispaCy pipeline used by SKET, but without the extensions introduced with it.
Then, they perform EL by computing the cosine distance between the vector representations of
the extracted entities and the ontology concepts. Both baselines are straightforward approaches
to perform entity linking and text classification without annotated data.</p>
      <sec id="sec-3-1">
        <title>3.2. Results</title>
        <p>Table 1 reports the results obtained by SKET and the considered baselines on Entity Linking
(left) and Text Classification (right).</p>
        <p>For entity linking (Task 1), we observe that SKET achieves high performance for both
microand weighted-average F1 in each considered use-case. Regarding accuracy, its performance
varies depending on the use-case – with the lowest score obtained in colon cancer with a value of
0.6280. As for the comparison of SKET with the considered baselines, we see that it outperforms
them in each use-case for all measures. This result shows the efectiveness of combining ad-hoc,
expert rules with ML models – making SKET both precise and sensitive. Specifically, ad hoc
matching makes SKET precise, while semantic matching makes it sensitive. To support this
intuition, we observe that unsupervised baselines – which only rely on ML models and semantic
matching – have low accuracy values. Since we tackle the entity linking task as a multi-label
classification problem, we resort on subset accuracy, where the set of concepts predicted for a
report must exactly match the corresponding set of ground-truth concepts. Therefore, accuracy
values are prone to rapidly decrease and less precise models are naturally afected by this.</p>
        <p>For text classification (Task 2), we see that SKET performs well on colon and lung cancer
use-cases, whereas it shows lower accuracy values on cervix cancer. This result suggests that
the cervix use-case is harder than the others, as subset accuracy drops fast when a model fails
to predict all labels correctly. The higher values for micro- and weighted-average F1 – which
do not perform exact match between predicted and ground-truth labels – further support this
intuition. Compared to baselines, SKET outperforms them in colon and cervix use-cases. On
the other hand, the BERT-based approach proves more efective in lung cancer. Despite this,
the robustness of SKET across diferent use-cases makes it a viable solution in real scenarios,
where annotated data are hard and expensive to get.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this work, we presented SKET, an unsupervised hybrid knowledge extraction system that
combines rule-based techniques with pre-trained ML models to extract relevant concepts from
diagnostic reports. The experimental evaluation demonstrated the efectiveness of SKET,
making it a viable solution to reduce pathologists’ workload. Besides, the experimental results
highlighted the importance of expert knowledge in developing unsupervised systems for
specialized medicine. As a result, the extracted concepts can serve diferent digital pathology
applications, such as automatic report annotation, visualization, and retrieval, as well as image
classification.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The work was supported by the ExaMode project, as part of the EU H2020 program under Grant
Agreement no. 825292.
S. Mehrabi, S. Sohn, H. Liu, Clinical Information Extraction Applications: A Literature
Review, J. Biomed. Informatics 77 (2018) 34–49. URL: https://doi.org/10.1016/j.jbi.2017.11.
011. doi:10.1016/j.jbi.2017.11.011.
[10] E. Santus, T. Schuster, A. M. Tahmasebi, C. Li, A. Yala, C. R. Lanahan, P. Prinsen, S. F.</p>
      <p>Thompson, S. Coons, L. Mynderse, R. Barzilay, K. Hughes, Exploiting Rules to Enhance
Machine Learning in Extracting Information From Multi-Institutional Prostate Pathology
Reports, JCO Clinical Cancer Informatics (2020) 865–874. URL: https://doi.org/10.1200/
CCI.20.00028. doi:10.1200/CCI.20.00028.
[11] Y. Kim, J. H. Lee, S. Choi, J. M. Lee, J. H. Kim, J. Seok, H. J. Joo, Validation of Deep
Learning Natural Language Processing Algorithm for Keyword Extraction from Pathology
Reports in Electronic Health Records, Sci Rep 1 (2020) 1–9. URL: https://doi.org/10.1038/
s41598-020-77258-w. doi:10.1038/s41598-020-77258-w.
[12] P. Giannaris, Z. Al-Taie, M. Kovalenko, N. Thanintorn, O. Kholod, Y. Innokenteva,
E. Coberly, S. Frazier, K. Laziuk, M. Popescu, C. R. Shyu, D. Xu, R. Hammer, D. Shin,
Artificial Intelligence-Driven Structurization of Diagnostic Information in Free-Text Pathology
Reports, Journal of Pathology Informatics 11 (2020) 10. URL: https://doi.org/10.4103/jpi.
jpi_30_19. doi:10.4103/jpi.jpi_30_19.
[13] J. R. Gregg, M. Lang, L. L. Wang, M. J. Resnick, S. K. Jain, J. L. Warner, D. A. Barocas,
Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical
Records, JCO Clinical Cancer Informatics 1 (2017) 1–8. URL: https://doi.org/10.1200/CCI.
16.00045. doi:10.1200/CCI.16.00045.
[14] A. P. Glaser, B. J. Jordan, J. Cohen, A. Desai, P. Silberman, J. J. Meeks, Automated Extraction
of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor
Pathology Reports Using Natural Language Processing, JCO Clinical Cancer Informatics 1
(2018) 1–8. URL: https://doi.org/10.1200/CCI.17.00128. doi:10.1200/CCI.17.00128.
[15] K. Roberts, D. Demner-Fushman, E. M. Voorhees, W. R. Hersh, S. Bedrick, A. J.</p>
      <p>Lazar, S. Pant, Benchmarking Information Retrieval for Precision Oncology: the
TREC Precision Medicine Track, in: AMIA 2018, American Medical Informatics
Association Annual Symposium, San Francisco, CA, November 3-7, 2018, AMIA, 2018.
URL: https://knowledge.amia.org/67852-amia-1.4259402/t006-1.4263223/t006-1.4263224/
2976780-1.4263306/2970178-1.4263303.
[16] F. Giachelle, O. Irrera, G. Silvello, MedTAG: a portable and customizable annotation tool
for biomedical documents, BMC Medical Informatics Decis. Mak. 21 (2021) 352. URL:
https://doi.org/10.1186/s12911-021-01706-4. doi:10.1186/s12911-021-01706-4.
[17] N. Marini, S. Marchesin, S. Otálora, M. Wodzinski, A. Caputo, M. van Rijthoven, W.
Aswolinskiy, J. M. Bokhorst, D. Podareanu, E. Petters, S. Boytcheva, G. Buttafuoco, S.
Vatrano, F. Fraggetta, J. der Laak, M. Agosti, F. Ciompi, G. Silvello, H. Muller, M.
Atzori, Unleashing the potential of digital pathology data by training computer-aided
diagnosis models without human annotations, npj Digital Medicine 5 (2022). URL:
http://dx.doi.org/10.1038/s41746-022-00635-4. doi:10.1038/s41746-022-00635-4.
[18] L. Chiticariu, Y. Li, F. R. Reiss, Rule-Based Information Extraction is Dead! Long Live
Rule-Based Information Extraction Systems!, in: Proc. of the 2013 Conference on Empirical
Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt
Seattle, Seattle, Washington, USA, ACL, 2013, pp. 827–832. URL: https://aclanthology.org/
D13-1079/.
[19] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed Representations of
Words and Phrases and their Compositionality, in: Proc. of the 27th Annual Conference
on Neural Information Processing Systems 2013, NIPS, Lake Tahoe, Nevada, United States,
December 5-8, 2013, 2013, pp. 3111–3119. URL: https://proceedings.neurips.cc/paper/2013/
hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html.
[20] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword
Information, Trans. Assoc. Comput. Linguistics 5 (2017) 135–146. URL: https://doi.org/10.
1162/tacl_a_00051. doi:10.1162/tacl\_a\_00051.
[21] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep
Contextualized Word Representations, in: Proc. of the 2018 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language
Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, ACL, 2018,
pp. 2227–2237. URL: https://doi.org/10.18653/v1/n18-1202. doi:10.18653/v1/n18-1202.
[22] J. Devlin, M. W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding, CoRR abs/1810.04805 (2018). URL: http://arxiv.
org/abs/1810.04805.
[23] G. Campanella, M. G. Hanna, L. Geneslaw, A. Miraflor, V. W. K. Silva, K. J. Busam, E. Brogi,
V. E. Reuter, D. S. Klimstra, T. J. Fuchs, Clinical-Grade Computational Pathology using
Weakly Supervised Deep Learning on Whole Slide Images, Nat Med 25 (2019) 1301–1309.</p>
      <p>URL: https://doi.org/10.1038/s41591-019-0508-1. doi:10.1038/s41591-019-0508-1.
[24] M. A. Carbonneau, V. Cheplygina, E. Granger, G. Gagnon, Multiple Instance Learning:
A Survey of Problem Characteristics and Applications, Pattern Recognit. 77 (2018) 329–
353. URL: https://doi.org/10.1016/j.patcog.2017.10.009. doi:10.1016/j.patcog.2017.
10.009.
[25] S. Marchesin, G. Silvello, TBGA: a large-scale gene-disease association dataset for
biomedical relation extraction, BMC Bioinform. 23 (2022) 111. URL: https://doi.org/10.1186/
s12859-022-04646-6. doi:10.1186/s12859-022-04646-6.
[26] S. Marchesin, Case-Based Retrieval Using Document-Level Semantic Networks, in: Proc. of
the 41st International ACM SIGIR Conference on Research &amp; Development in Information
Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018, ACM, 2018, p. 1451. URL:
https://doi.org/10.1145/3209978.3210221. doi:10.1145/3209978.3210221.
[27] M. Neumann, D. King, I. Beltagy, W. Ammar, ScispaCy: Fast and Robust Models for
Biomedical Natural Language Processing, in: Proc. of the 18th BioNLP Workshop and
Shared Task, BioNLP@ACL 2019, Florence, Italy, August 1, 2019, ACL, 2019, pp. 319–327.</p>
      <p>URL: https://doi.org/10.18653/v1/w19-5034. doi:10.18653/v1/w19-5034.
[28] S. Pyysalo, F. Ginter, H. Moen, T. Salakoski, S. Ananiadou, Distributional Semantics
Resources for Biomedical Text Processing, Proc. of LBM (2013) 39–44. URL: https://bio.
nlplab.org/pdf/pyysalo13literature.pdf.
[29] W. W. Chapman, W. Bridewell, P. Hanbury, G. F. Cooper, B. G. Buchanan, A Simple
Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries, J.
Biomed. Informatics 34 (2001) 301–310. URL: https://doi.org/10.1006/jbin.2001.1029. doi:10.
1006/jbin.2001.1029.
[30] J. W. Ratclif, D. E. Metzener, Pattern Matching: the Gestalt Approach,
Dr Dobbs Journal 13 (1988) 46. URL: https://www.drdobbs.com/database/
pattern-matching-the-gestalt-approach/184407970.
[31] M. Agosti, S. Marchesin, G. Silvello, Learning Unsupervised Knowledge-Enhanced
Representations to Reduce the Semantic Gap in Information Retrieval, ACM Trans. Inf. Syst. 38
(2020) 38:1–38:48. URL: https://doi.org/10.1145/3417996. doi:10.1145/3417996.
[32] W. C. Chang, H. F. Yu, K. Zhong, Y. Yang, I. S. Dhillon, Taming pretrained transformers for
extreme multi-label text classification, in: KDD ’20: The 26th ACM SIGKDD Conference
on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020,
ACM, 2020, pp. 3163–3171. URL: https://doi.org/10.1145/3394486.3403368. doi:10.1145/
3394486.3403368.
[33] P. Ruas, V. D. T. Andrade, F. M. Couto, Lasige-biotm at MESINESP2: entity linking
with semantic similarity and extreme multi-label classification on spanish biomedical
documents, in: Proc. of the Working Notes of CLEF 2021 - Conference and Labs of the
Evaluation Forum, Bucharest, Romania, September 21st - to - 24th, 2021, volume 2936 of
CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 324–334. URL: http://ceur-ws.org/
Vol-2936/paper-24.pdf.
[34] Y. Zhang, Q. Chen, Z. Yang, H. Lin, Z. Lu, BioWordVec, Improving Biomedical Word
Embeddings with Subword Information and MeSH, Scientific Data 6 (2019) 1–9. URL:
https://doi.org/10.1038/s41597-019-0055-0. doi:10.1038/s41597-019-0055-0.
[35] E. Alsentzer, J. R. Murphy, W. Boag, W. H. Weng, D. Jin, T. Naumann, M. B. A. McDermott,
Publicly Available Clinical BERT Embeddings, CoRR abs/1904.03323 (2019). URL: http:
//arxiv.org/abs/1904.03323.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giachelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Marini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Boytcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Buttafuoco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciompi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fraggetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Irrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Primov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vatrano</surname>
          </string-name>
          , G. Silvello,
          <article-title>Empowering digital pathology applications through explainable knowledge extraction tools</article-title>
          ,
          <source>Journal of Pathology Informatics</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <article-title>100139</article-title>
          . doi:https://doi.org/10. 1016/j.jpi.
          <year>2022</year>
          .
          <volume>100139</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Davenport</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kalakota</surname>
          </string-name>
          ,
          <source>The Potential for Artificial Intelligence in Healthcare, Future Healthc J</source>
          .
          <volume>6</volume>
          (
          <year>2019</year>
          )
          <fpage>94</fpage>
          -
          <lpage>98</lpage>
          . URL: https://doi.org/10.7861/futurehosp.6-2-94. doi:
          <volume>10</volume>
          .7861/ futurehosp.6-2-94.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Buckley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Coopey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sharko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Polubriaginof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Drohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Belli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Garber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. L.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Gadd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Specht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Roche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Gudewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Hughes</surname>
          </string-name>
          ,
          <article-title>The Feasibility of Using Natural Language Processing to Extract Clinical Information from Breast Pahology Reports</article-title>
          ,
          <source>J. Pathol Inform</source>
          <volume>3</volume>
          (
          <year>2012</year>
          )
          <article-title>23</article-title>
          . URL: https: //doi.org/10.4103/
          <fpage>2153</fpage>
          -
          <lpage>3539</lpage>
          .97788. doi:
          <volume>10</volume>
          .4103/
          <fpage>2153</fpage>
          -
          <lpage>3539</lpage>
          .
          <fpage>97788</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hassanpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Langlotz</surname>
          </string-name>
          ,
          <article-title>Information Extraction from Multi-Institutional Radiology Reports, Artif</article-title>
          .
          <source>Intell. Medicine</source>
          <volume>66</volume>
          (
          <year>2016</year>
          )
          <fpage>29</fpage>
          -
          <lpage>39</lpage>
          . URL: https://doi.org/10.1016/j.artmed.
          <year>2015</year>
          .
          <volume>09</volume>
          .007. doi:
          <volume>10</volume>
          .1016/j.artmed.
          <year>2015</year>
          .
          <volume>09</volume>
          .007.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Burger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abu-Hanna</surname>
          </string-name>
          , N. de Keizer, R. Cornet,
          <source>Natural Language Processing in Pathology: a Scoping Review</source>
          ,
          <source>Journal of Clinical Pathology</source>
          <volume>69</volume>
          (
          <year>2016</year>
          )
          <fpage>949</fpage>
          -
          <lpage>955</lpage>
          . URL: https: //doi.org/10.1136/jclinpath-2016-203872. doi:
          <volume>10</volume>
          .1136/jclinpath-2016-203872.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Topaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Murga</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Gaddis</surname>
            ,
            <given-names>M. V.</given-names>
          </string-name>
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Bar-Bachar</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>K. H.</given-names>
          </string-name>
          <string-name>
            <surname>Bowles</surname>
          </string-name>
          ,
          <source>Mining Fall-Related Information in Clinical Notes: Comparison of Rule-Based and Novel Word Embedding-Based Machine Learning Approaches, J. Biomed. Informatics</source>
          <volume>90</volume>
          (
          <year>2019</year>
          ). URL: https://doi.org/10.1016/j.jbi.
          <year>2019</year>
          .
          <volume>103103</volume>
          . doi:
          <volume>10</volume>
          .1016/j.jbi.
          <year>2019</year>
          .
          <volume>103103</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Oliwa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Maron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Chase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lomnicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V. T.</given-names>
            <surname>Catenacci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Furner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Volchenboum</surname>
          </string-name>
          ,
          <article-title>Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification</article-title>
          ,
          <string-name>
            <surname>Named-Entity Recognition</surname>
          </string-name>
          , and
          <string-name>
            <surname>Relation-Extraction</surname>
            <given-names>Heuristics</given-names>
          </string-name>
          ,
          <source>JCO Clinical Cancer Informatics</source>
          <volume>1</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . URL: https://doi.org/10.1200/CCI.19. 00008. doi:
          <volume>10</volume>
          .1200/CCI.19.00008.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kreimeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Arya</surname>
          </string-name>
          , G. Halford,
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Forshee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Walderhaug</surname>
          </string-name>
          , T. Botsis,
          <source>Natural Language Processing Systems for Capturing and Standardizing Unstructured Clinical Information: A Systematic Review, J. Biomed. Informatics</source>
          <volume>73</volume>
          (
          <year>2017</year>
          )
          <fpage>14</fpage>
          -
          <lpage>29</lpage>
          . URL: https://doi.org/10.1016/j.jbi.
          <year>2017</year>
          .
          <volume>07</volume>
          .012. doi:
          <volume>10</volume>
          .1016/j.jbi.
          <year>2017</year>
          .
          <volume>07</volume>
          .012.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rastegar-Mojarad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Moon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Afzal</surname>
          </string-name>
          , S. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>