<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Interpretation of Best Medical Coding Practices by Case-Based Reasoning - A User Assistance Prototype for Data Collection for Cancer Registries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Schnell</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sophie Coufignal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean Lieber</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stéphanie Saleh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolas Jay</string-name>
          <email>n.jay@chru-nancy.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Population Health, Luxembourg Institute of Health</institution>
          ,
          <addr-line>1A-B, rue Thomas Edison, L-1445 Strassen</addr-line>
          ,
          <country country="LU">Luxembourg</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Service d'évaluation et d'information médicales, Centre Hospitalier Régional Universitaire de Nancy</institution>
          ,
          <addr-line>Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>UL</institution>
          ,
          <addr-line>CNRS, Inria, Loria, F-54000 Nancy</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>There are numerous cancer registries around the world collecting data about cancers diagnosed and/or treated in a given area. This data is used to monitor cancer (incidence rates, survival rates, etc.) and to evaluate cancer care (diagnosis, treatment, etc.). To produce comparable data, common definitions (e.g. terminologies like the International Classification of Diseases (ICD)) and coding practices [5] have to be followed. However, the broadness and complexity of these standards make the work of the medical staf in charge of coding (operators) more dificult. The aim of this research is to address this complexity, by assisting both operators and coding experts in the interpretation of coding best practices. As an illustrating example, let us consider the case of a particular woman. In 2016, multiple pulmonary opacities were discovered within her right lung lobe. A CT scan indicated no mediastinal adenopathy.4 A histological analysis of a sample identified the morphology 5 of the cancer as adenocarcinoma. The TTF1 marker test was positive. After further testing, another tumor is found in the ovaries. An operator might wonder which topography6 should be coded (lung or ovaries?) and request help to answer the question. For the Luxembourg National Cancer Registry (NCR), operators ask their questions using an online ticketing system. With the free text description provided by operators, coding experts provide a solution, i.e. an answer with their reasoning in the form of a motivated argument. Section 2 describes an approach to assist the data collection process for cancer registries and how case-based reasoning (CBR [1]) is applied. In Section 3, a</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>4 An adenopathy is an enlargement of lymph nodes, likely due to cancer.
5 The morphology describes the type and behavior of the cells that compose the tumor.
6 The topography is the location where the tumor originated.
prototype and preliminary results are discussed. Section 4 presents a conclusion
and points out what further eforts need to be undertaken in the future.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Case-based interpretation of best practices</title>
      <p>
        This article summarizes the work presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and adds a description of the
developed prototype and some preliminary results.
      </p>
      <p>
        Preliminaries. A case (srce; sol(srce)) is composed of two parts: 1) a patient
record and a question, and 2) a solution. The patient record represents the data
from the hospital patient record (patient features, tumors, exams, treatments,
etc.) needed to answer the question. The relevant data depends on the subject
and is defined by coding experts. The patient record is represented by an RDFS
graph [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Body parts and cancer morphologies use classes from the SNOMED
Clinical Terms7 ontology. The question indicates the subject (incidence date,
topography, tumor nature, etc.). In the example, the question is about the
topography. The solution contains the answer to the question and the most important
arguments in favor of (pros) and against (cons) this answer. In the example,
the answer is to consider the topography to be the ovaries. The presence of
multiple pulmonary opacities is an argument in favor, as they are indicative of a
metastasis and thus the tumor is unlikely to have originated in the lungs.
      </p>
      <p>
        The arguments have two uses. They help explain the answer to operators
and serve as a reminder for coding experts. They are also used in the proposed
approach during the retrieval step. Three types of arguments will be considered:
strong pros, weak pros and weak cons. The diference between a strong and
a weak argument comes from their reliability for a given conclusion. A strong
argument is considered to be a suficient justification for an answer, unlike a
weak argument which is more of an indication or clue. It can be noted that there
are no strong cons in the source cases. Indeed, such an argument would be an
absolute argument against the given answer. Formally, an argument is a function
a that associates a Boolean to a case and is stored as a SPARQL ASK query.
Global architecture. The proposed approach uses a 4-R cycle (retrieve, reuse,
revise, retain) adapted from [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and four knowledge containers [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] (case base,
domain knowledge, retrieval knowledge, adaptation knowledge).
Retrieve. The proposed approach relies on arguments to find similar cases.
Indeed, similar answers should have similar reasoning and thus the same arguments
should apply. Our method checks the applicability of the arguments from the
source cases on the target problem tgt and uses this to decide which source case
is the most appropriate to solve tgt. This comparison between two source cases
i and j relies on three criteria, one for strong arguments si;j, one for weak
arguments wi;j and one for patient record similarity di;ijst. For the strong arguments,
the source case with the most applicable strong arguments is preferred. For the
      </p>
      <sec id="sec-2-1">
        <title>7 https://bioportal.bioontology.org/ontologies/SNOMEDCT</title>
        <p>
          weak arguments, a combination of pros and cons is used. The more weak pros
and the less weak cons are applicable, the more suited the source case. For the
last criterion, the patient record similarity with the target problem is used (using
a graph edit distance [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]). The three criterion are considered lexicographically,
ifrst is;j, then wi;j and finally id;ijst (see [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]).
        </p>
        <p>Reuse. Once an appropriate source case has been found, the solution associated
to the source case is copied: sol(tgt) := sol(srce). The arguments that do not
apply to the target problem, if any, are removed.</p>
        <p>
          Revise and retain. The newly formed case (tgt; sol(tgt)) can be reviewed by a
coding expert, to modify the answer, the arguments and/or the patient record.
A coding expert may choose to remove unnecessary information form the patient
record, removing unwanted specificity. Thus, (tgt; sol(tgt)) is substituted by
(tgt0; sol(tgt0)), where tgt0 is more general than tgt. (tgt0; sol(tgt0)) is a
generalized case that has a larger coverage than (tgt; sol(tgt)) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Prototype and preliminary results</title>
      <p>
        The prototype designed for the NCR serves as a ticketing system, where
operators ask coding questions and experts provide answers. It assists operators in
structuring questions, making it easier for the NCR and coding experts to find
similar questions later. For topography questions, it will also provide a tentative
answer. This answer is calculated using the approach described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. All the
answers are reviewed by experts. The prototype presents itself as a single page
application built using Angular8 with a backing REST API built with Go8 and
the Gin framework.9 The data is stored in a triple store Apache Jena10 and
exposed as a SPARQL endpoint using Apache Fuseki.11
      </p>
      <p>The prototype was tested internally, to perform a first assessment of its
usability and utility. Some old cases concerning the topography were formalized
and coded, with some domain knowledge. For the arguments, great care was
given during modeling in order to make them more broadly applicable. Then
new questions were presented to the system, and the proposed solution
compared with the expected ones. While the prototype answered every question,
not all of them were correct. The main reasons for the diference were the small
amount of cases (15 originally, however the case base will be enriched by routine
usage) and the simple reuse method used at this stage. Indeed, as the
arguments have been formalized to be more general, some of the provided answers
might be slightly incorrect (e.g. answering upper lung lobe instead of lower lung
lobe). Despite this, as the prototype displays the reused source case, an
operator should be able to make the necessary adaptation to the provided solution.</p>
      <sec id="sec-3-1">
        <title>8 https://angular.io 9 https://golang.org, https://github.com/gin-gonic/gin 10 https://jena.apache.org/ 11 https://jena.apache.org/documentation/fuseki2/</title>
        <p>For the questions concerning other subjects, the prototype relies entirely on the
coding experts to provide answers.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>
        Recently there has been a growing interest for case-based reasoning applications
in health sciences [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In this paper, an approach to assist operators in the
interpretation of best medical coding practices has been proposed. This approach is
based on discussions with operators and coding experts on actual coding
problems. A dozen tricky problems were discussed in detail, among a hundred simpler
problems. The coding questions asked by the operators are compared to previous
questions and solved by reusing the pros and cons of previously given solutions.
The results discussed are only preliminary and a more thorough evaluation,
including the operators and coding experts, is planned.
      </p>
      <p>At the moment the reasoning process is only partial. Arguments are only a
part of a more complex reasoning process. The formalization of this process and
the eventual integration of the coding standards remains an interesting avenue
for future work.</p>
      <p>After the prototype has been validated and improved by routine usage, a
second version will be designed that is less domain-dependent. The objective is
to build a generic system for argumentative case-based reasoning using semantic
web standards.</p>
      <p>Acknowledgments. The first author would like to thank the Fondation Cancer
for their financial support.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aamodt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Case-based reasoning: Foundational issues, methodological variations, and system approaches (</article-title>
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bichindaritz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marling</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Case-based Reasoning in the Health Sciences</article-title>
          .
          <source>In: Workshop Proceedings of ICCBR</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brickley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guha</surname>
            ,
            <given-names>R.V.</given-names>
          </string-name>
          :
          <source>RDF Schema 1</source>
          .1, https://www.w3.org/TR/rdf-schema/, W3C recommendation,
          <source>last consultation: March</source>
          <year>2017</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bunke</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Messmer</surname>
            ,
            <given-names>B.T.</given-names>
          </string-name>
          :
          <article-title>Similarity measures for structured representations</article-title>
          .
          <source>In: European Workshop on Case-Based Reasoning</source>
          . pp.
          <fpage>106</fpage>
          -
          <lpage>118</lpage>
          . Springer (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>European</surname>
          </string-name>
          <article-title>Network of Cancer Registries and Tyczynski</article-title>
          ,
          <string-name>
            <given-names>Jerzy E</given-names>
            and
            <surname>Démaret</surname>
          </string-name>
          , D and Parkin,
          <string-name>
            <given-names>D</given-names>
            <surname>Maxwell</surname>
          </string-name>
          :
          <article-title>Standards and guidelines for cancer registration in Europe: the ENCR recommendations</article-title>
          .
          <source>International Agency for Research on Cancer</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Maximini</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maximini</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergmann</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>An investigation of generalized cases</article-title>
          , pp.
          <fpage>261</fpage>
          -
          <lpage>275</lpage>
          . Springer (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Richter</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>R.O.</given-names>
          </string-name>
          :
          <article-title>Case-based reasoning: a textbook</article-title>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Schnell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coufignal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lieber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saleh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jay</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Case-Based Interpretation of Best Medical Coding Practices - Application to Data Collection for Cancer Registries</article-title>
          .
          <source>In: Conference Proceedings of ICCBR</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>