<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Greg, ML: Automatic Diagnostic Suggestions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paola Lapadula</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giansalvatore Mecca</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donatello Santoro</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luisa Solimando</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enzo Veltri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Svelto! Big Data Cleaning and Analytics - Potenza</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università della Basilicata - Potenza</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recently machine-learning techniques have been applied in a variety of fields. One of the most promising and challenging is handling medical records. In this paper we present Greg, ML, a machine-learning tool for generating automatic diagnostic suggestions based on patient profiles. At the core of our system there are two machine learning classifiers: a natural-language module that handles reports of instrumental exams, and a profile classifier that outputs diagnostic suggestions to the doctor. After discussing the architecture we present some experimental results based on the working prototype we have developed. Finally, we examine challenges and opportunities related to the use of this kind of tools in medicine, and some important lessons learned developing the tool. In this respect, despite the ironic title of this paper, we underline that Greg should be conceived primarily as a support for expert doctors in their diagnostic decisions, and can hardly replace humans in their judgment.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The larger availability of digital data related to all sectors of our everyday lives has
created opportunities for data-based applications that would not be conceivable a few years
ago. One example is medicine: the push for the widespread adoption of electronic
medical records [
        <xref ref-type="bibr" rid="ref5 ref9">9, 5</xref>
        ] and digital medical reports is paving the ground for new applications
based on these data.
      </p>
      <p>
        Greg, ML [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is one of these applications. It is a machine-learning tool for
generating automatic diagnostic suggestions based on patient profiles. In essence, Greg takes
as input a digital profile of a patient, and suggests one or more diagnosis that, according
to its internal models, fit the profile with a given probability. We assume that a doctor
inspects these diagnostic suggestions, and takes informed actions about the patients.
      </p>
      <p>
        We notice that the idea of using machine learning for the purpose of examining
medical data is not new [
        <xref ref-type="bibr" rid="ref10 ref7">7, 11, 10</xref>
        ]. In fact, several efforts have been taken in this
direction [
        <xref ref-type="bibr" rid="ref1 ref6">1, 6</xref>
        ]. To the best of our knowledge, however, all of the existing tools
concentrate on rather specific learning tasks, for example identifying a single pathology – like
Copyright © 2019 for the individual papers by the papers’ authors. Copying permitted for
private and academic purposes. This volume is published and copyrighted by its editors. SEBD
2019, June 16-19, 2019, Castiglione della Pescaia, Italy.
heart disease [14, 12], or pneumonia [13], or cancer, where results of remarkable quality
have been reported [15]. On the contrary, Greg has the distinguishing feature of being
a broad-scope diagnostic-suggestion tool. In fact, at the core of the tool stands a generic
learning model that allows to suggest large numbers of pathologies, currently several
dozens, and in perspective several hundreds.
      </p>
      <p>Greg is a research project developed by Svelto!, a spin-off of the data-management
group at University of Basilicata.</p>
      <p>The rest of the paper is devoted to introduce Greg, as follows. We discuss the
internal architecture of the tool in Section 2. Then, we introduce the methodology and
the additional tools in Section 3. We introduce some experimental results based on the
current version of the tool in Section 4.</p>
      <p>Finally, in Section 5 we conclude by discussing the possible applications we
envision for Greg, and discuss a few crucial lessons learned with the tool, which, in turn,
have inspired the title of this paper.</p>
      <p>numerical
lab results
textual
reports
patient
biographic data
and medical
history</p>
      <p>patient profile
pathology
indicators
The architecture and the overall flow of Greg is depicted in Figure 1.</p>
      <p>As we have already discussed, at the core of Greg stands a classifier for patient
profiles that provides doctors with diagnostic suggestions. Profiles are entirely anonymous,
i.e., Greg does not store nor requires any personal data about patients, and are composed
of three main blocks:
– anonymous biographical data, mainly age and gender, and medical history of the
patient, i.e., past medical events and pathologies, especially the chronic ones;
– result of lab exams, in numerical format;
– textual reports from instrumental exams, like RX, ultrasounds etc.</p>
      <p>These items compose the patient profile that is fed to the profile classifier in order to
propose diagnostic suggestions to doctors. Notice that, while biographic data, medical
history and lab exam results are essentially structured data, and therefore can be easily
integrated into the profile, reports of instrumental exams are essentially unstructured.
As a consequence, Greg relies on a second learning module to extract what we call
pathology indicators, i.e., structured labels indicating anomalies in the report that may
suggest the presence of a pathology.</p>
      <p>The report classifier is essentially a natural-language processing module. It takes the
text of the report in natural language and identifies pathology indicators that are then
integrated within the patient profile.</p>
      <p>The report classifier is, in a way, the crucial module for the construction of the
patient profile. In fact, reports of instrumental exams often carry crucial information for
the purpose of identifying the correct diagnostic suggestions. At the same time, their
treatment is language-dependent, and learning is labor-intensive, since it requires to
label large set of reports in order to train the classifier.</p>
      <p>Once the profile for a new patient has been built, it is fed to the profile classifier that
outputs diagnostic suggestions to the doctor. There are a few important aspects to be
noticed here.</p>
      <p>– First, Greg is trained to predict only a finite set of diagnoses. This means that it is
intended primarily as a tool to gain positive evidence about pathologies that might
be present, rather than as a tool to exclude pathologies that are not present. In other
terms, the fact that Greg does not suggest a specific diagnosis does not mean that
that can be excluded, since it might only be the case that Greg has not be trained for
that particular pathology. It can be seen that handling a large number of diagnoses
is crucial, in this respect.
– Second, Greg associates a degree of probability with each diagnostic suggestion,
i.e., it ranks them with a confidence measure. This is important, since the tool may
provide several different suggestions for a given profile, and not all of them are to
be considered as equally relevant.</p>
      <p>It can be seen how a tool like Greg has an effective as seamless integration with the
everyday procedures of a medical institution is. To foster this kind of adoption, Greg
can be used as a stand-alone tool, with its own user-interface, but it has been
developed primarily as an engine-backed API, that can be easily integrated with any medical
information system that is already deployed in medical units and wards. Ideally, with
this kind of integration, accessing medical suggestions provided by Greg should cost
no more than clicking a button, in addition of the standard procedure for patient-data
gathering and medical-record compilation.</p>
    </sec>
    <sec id="sec-2">
      <title>The Greg Workflow and Ecosystem</title>
      <p>As we have discussed in the previous sections, the effectiveness of a system like Greg is
strongly related to the number of pathologies which it can provide suggestions for. We
therefore put quite a lot of effort in structuring the learning workflow in order to make
it lean and easily reproducible. In this section we summarize a few key findings in this
respect, that led us to the development of a number of additional tools, which compose
the Greg ecosystem.</p>
      <p>
        A first important observation we make is that a system like Greg needs to make
reference to a standardized set of diagnosis. As it is common, we rely on the international
classification of diseases, ICD-10 (DRG) 3. This, however, poses a challenge when
dealing with large and heterogeneous collections of medical records coming from disparate
sources, which do not necessarily are associated with a DRG. This poses a
standardization problem for diagnosis labels. In fact, standardizing the vocabulary of pathologies
and pathology indicators is crucial in the early stages of data preparation. To this end,
we leveraged the consolidated suite of data-cleaning tools developed by our research
group over the years [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2–4</xref>
        ].
      </p>
      <p>A second important observation is that we need to handle large and complex
amounts of data gathered from medical information systems, including biographical
data, admissions and patient medical history, medical records, multiple lab exams, and
multiple reports. These data need to be explored, selected and prepared for the purpose
of training the learning models. In order to streamline the data-preparation process, we
decided to develop a tool to explore the available data. The tool is called Caddy and is
essentially a data warehouse build on top of the transactional medical databases. This
allowed us to adopt a structured approach to data exploration and data selection, that
proved essential in the development of the tool.</p>
      <p>Fig. 2: DAIMO, the ML Labeling Tool.</p>
      <p>However, the tool that proved to be the most crucial in the development of Greg is
DAIMO, our instance labeler. DAIMO stands for Digital Annotation of Instances and
3 http://www.who.int/classifications/icd/icdonlineversions/en/
Markup of Objects. It is a tool explicitly conceived to support the labeling phase of
machine learning projects. A snapshot of the system is shown in Figure 2.</p>
      <p>DAIMO is a semi-automated tool for data labeling. It provides a simple and effective
interface to explore pre-defined collections of samples to label. Samples may be either
textual, or even structured – for example, in tabular format– or even of mixed type.
Users that are tasked with labeling can cooperatively explore the samples, pick them,
explore existing labels and add more. Figure 2 shows the process of labeling one report.
Labels associated with the report are on the right. Each corresponds to a colored portion
of the text.</p>
      <p>We believe that even only the availability of an intuitive tool to support cooperative
labeling-work significantly increases productivity. In addition to this, DAIMO provides
additional functionalities that further improve the process.</p>
      <p>First, it allows to define label vocabularies, in order to standardize the way in which
labels are assigned to samples. Users usually search labels within the vocabulary, and
add new ones only when the ones they need are not present. When dealing with complex
labeling tasks with many different labels, such a systematic approach is crucial in order
to get good-quality results.</p>
      <p>Second, DAIMO is able to learn labeling strategies from examples. After some
initial training, it does not only collects new labels from users, but actually suggests them,
so that users need only to accept or refuse DAIMO’s suggestions. This approach really
transforms the labeling process from the inside out, since after a while it is DAIMO, not
the user to do most of the work.</p>
      <p>In fact, in our experience, working with DAIMO may lower text-labeling times up
to one order of magnitude with respect to manual, unassisted labeling.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Results</title>
      <p>We developed an advanced prototype of Greg, used to conduct a number of
experiments to assess the feasibility of the overall approach.</p>
      <p>
        We conducted a first preliminary experimental evaluation using 200 medical records
over a small set of diagnosis (pneumonia, cirrhosis, anemia, urological infection) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Lately. we used a bigger dataset, containing a total of 22160 medical records and we
were able to learn about 50 diagnosis. We used the discharge letters for each medical
record as labels. We called this annotated dataset GT - D.L. (Ground Truth - Discharge
letters). As usually data were split in training, cross-validation and test sets and we
measured the F-Measure of Greg predictions. On GT - D.L. we obtained poor results,
not as good as we would have expected. Some of the results are shown in Figure 3,
in terms of F-Measure. Our investigation of the data, however, suggested that in many
cases the quality of the results have been underestimated. In essence, in several cases
Greg suggested a more thorough set of diagnoses than the one indicated by the doctor
in discharge letters. As an example, this happened frequently with patients suffering
from anaemia, which is often associated with cirrhosis, even though doctors had not
explicitly mentioned that specific diagnosis in the discharge letter.</p>
      <p>We therefore conducted a second experiment. We asked our team of doctors to
review the set of diagnoses associated with patient profiles used for the test. In essence,
our doctors made sure that all relevant diagnoses were appropriately mentioned,
including those that the hospital doctors had omitted in the discharge letter. We called
this manually annotated dataset GT - Doct. (Ground Truth - Doctors). Figure 3 reports
Greg’s results over this revised dataset. As it can be seen, we obtained an F-Measure
for each diagnosis always above the 95%.</p>
      <p>To summarize, our preliminary tests show that Greg can effectively achieve high
accuracy in its predictions. In addition, it may effectively assist doctors in formulating
their diagnoses, by providing systematic suggestions.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions: Opportunities and Lessons Learned</title>
      <p>We believe that Greg can be a valid and useful tool to assist doctors in the diagnostic
process. Given its ability to learn diagnostic suggestions at scale, we envision several
main scenarios of use for the system in a medical facility:
– We believe Greg can be of particular help in ER, during the triage and first
diagnostic phase; in particular, based on first evidences about the patient, it may help
the ER operator to identify a few pathologies to it is worth exploring, perhaps with
the help of specialized colleague.
– Then, we envision interesting opportunities related to the use of Greg in the
diagnosis of rare pathologies; these are especially difficult to capture by a learning
algorithm, because, by definition, there are only a few training examples to use,
and therefore a special treatment is required. Still, we believe that supporting
doctors – especially younger ones, that might have less experience in diagnosing these
pathologies – in this respect is an important field of application.
– In medical institutions that rely on standardized clinical pathways or integrated care
pathways (ICPs) – PDTAs in Italy – Greg may be used to quickly suggest which
parts of a pathway need to be explored, and which ones can be excluded based on
the available evidence.
– Finally, Greg may be used as a second-opinion tool, i.e., after the doctor has
formulated her/his diagnosis, for the purpose of double checking that all possibilities
have been considered.</p>
      <p>While in our opinion all of these represent areas in which Greg can be a valid support
tool for the doctor, we would like to put them in context by discussing what we believe
to be the most important lessons we have learned so far.</p>
      <p>On the one side, the development of Greg has taught us a basic and important
lesson: in many cases, probably the majority, the basic workings of the diagnostic process
employed by human doctors is indeed reproducible by an automatic algorithm.</p>
      <p>In fact, it is well known that doctors tend to follow a decision process that looks for
specific indicators within the patient profile – e.g., values of laboratory tests, or specific
symptoms – and decides to consider or excludes pathologies based on them. As fuzzy
as this process may be, as any other human-thinking process, to our surprise we learned
that for a large number of pathologies this process provides a perfect opportunity for
the employment of a machine learning algorithm, which, in turn, may achieve very
good accuracy in mimicking the human decision process, with the additional advantage
of scale – Greg can be trained to learn very high numbers of diagnostic suggestions.
In this respect, ironically quoting Gregory House, we might be tempted to state that
“Humanity is overrated”, indeed.</p>
      <p>However, our experiences also led us to find that there are facets of the diagnostic
process that are inherently related to intuition, experience, and human factors. These
are, by nature, impossible to capture by an automatic algorithm. Therefore, our
ultimate conclusion is that humanity is not overrated, and that Greg can indeed provide
useful support in the diagnostic process, but it cannot and should not be considered as
a replacement of an expert human doctor.
11. N. Peek, C. Combi, R. Marin, and R. Bellazzi. Thirty years of artificial intelligence
in medicine (aime) conferences: A review of research themes. Artificial intelligence in
medicine, 65(1):61–73, 2015.
12. P. Rajpurkar, A. Y. Hannun, M. Haghpanahi, C. Bourn, and A. Y. Ng. Cardiologist-level
arrhythmia detection with convolutional neural networks. arXiv preprint arXiv:1707.01836,
2017.
13. P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz,
K. Shpanskaya, et al. Chexnet: Radiologist-level pneumonia detection on chest x-rays with
deep learning. arXiv preprint arXiv:1711.05225, 2017.
14. J. Soni, U. Ansari, D. Sharma, and S. Soni. Predictive data mining for medical diagnosis:
An overview of heart disease prediction. International Journal of Computer Applications,
17(8):43–48, 2011.
15. I. Steadman. IBM’s Watson is better at diagnosing cancer than human doctors. WIRED,
2013.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Deo</surname>
          </string-name>
          .
          <article-title>Machine learning in medicine</article-title>
          .
          <source>Circulation</source>
          ,
          <volume>132</volume>
          (
          <issue>20</issue>
          ):
          <fpage>1920</fpage>
          -
          <lpage>1930</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>F.</given-names>
            <surname>Geerts</surname>
          </string-name>
          , G. Mecca,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          .
          <article-title>Mapping and Cleaning</article-title>
          .
          <source>In Proceedings of the IEEE International Conference on Data Engineering - ICDE</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>F.</given-names>
            <surname>Geerts</surname>
          </string-name>
          , G. Mecca,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          .
          <article-title>That's All Folks! LLUNATIC Goes Open Source</article-title>
          .
          <source>In Proceedings of the International Conference on Very Large Databases - VLDB</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Veltri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mecca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Tang</surname>
          </string-name>
          .
          <article-title>Interactive and deterministic data cleaning</article-title>
          .
          <source>In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference</source>
          <year>2016</year>
          , pages
          <fpage>893</fpage>
          -
          <lpage>907</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>T.</given-names>
            <surname>Heinis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ailamaki</surname>
          </string-name>
          , et al.
          <article-title>Data infrastructure for medical research</article-title>
          .
          <source>Foundations and Trends in Databases</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <fpage>131</fpage>
          -
          <lpage>238</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>A.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          .
          <article-title>Machine learning for health informatics</article-title>
          .
          <source>In Machine Learning for Health Informatics</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. I. Kononenko.
          <article-title>Machine learning for medical diagnosis: history, state of the art and perspective</article-title>
          .
          <source>Artificial Intelligence in medicine</source>
          ,
          <volume>23</volume>
          (
          <issue>1</issue>
          ):
          <fpage>89</fpage>
          -
          <lpage>109</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>P.</given-names>
            <surname>Lapadula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mecca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Solimando</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Veltri</surname>
          </string-name>
          . Humanity Is Overrated. or Not.
          <article-title>Automatic Diagnostic Suggestions by Greg</article-title>
          ,
          <source>ML. In New Trends in Databases and Information Systems</source>
          , pages
          <fpage>305</fpage>
          -
          <lpage>313</lpage>
          . Springer International Publishing,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Miller</surname>
          </string-name>
          and
          <string-name>
            <given-names>I.</given-names>
            <surname>Sim</surname>
          </string-name>
          . Physicians'
          <article-title>use of electronic medical records: barriers and solutions</article-title>
          .
          <source>Health affairs</source>
          ,
          <volume>23</volume>
          (
          <issue>2</issue>
          ):
          <fpage>116</fpage>
          -
          <lpage>126</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>O.</given-names>
            <surname>Mohammed</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Benlamri</surname>
          </string-name>
          .
          <article-title>Developing a semantic web model for medical differential diagnosis recommendation</article-title>
          .
          <source>Journal of medical systems</source>
          ,
          <volume>38</volume>
          (
          <issue>10</issue>
          ):
          <fpage>79</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>