<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Learning Approach for Semantic Indexing of Animal Experiments Summaries in German Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S. Kayalvizhi</string-name>
          <email>kayalvizhi1704@cse.ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D. Thenmozhi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chandrabose Aravindan</string-name>
          <email>aravindanc@ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of CSE, SSN College of Engineering</institution>
          ,
          <addr-line>Chennai</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Semantic indexing of animal experiment summaries is the process of annotating the summaries with its medical codes. Semantic indexing is helpful in reducing time and performance in knowing the context and finding relevant summaries. Indexing the Non-Technical Summaries (NTP)s using codes from the German version of the International Classification of Diseases (ICD-10) is a challenging task. ICD-10 codes, which is a comprehensive way of storing the health conditions are useful for the identification of many disorders, diseases and other health related problems. Thus, annotating the NTPs with codes will make the way of storing, organising, retrieval and comparing the health information more easier. In our paper, we have approached the problem using deep neural network. This work is evaluated on the dataset given by eHealth@CLEF2019. The test set given by the task is used to evaluate our methodology which attains precision, recall and f1 score of 0.19, 0.27 and 0.23 for Run 1 , 0.19, 0.27 and 0.22 for Run 2 and 0.13, 0.34 and 0.19 for Run 3 respectively. The performance of our method can further be increased by considering other recurrent units.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic indexing</kwd>
        <kwd>Deep neural network</kwd>
        <kwd>Text Mining</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Non-technical summaries</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Task 1 - Multilingual Information Extraction.[11]</title>
      <p>
        In this task, Non Technical Summaries(NTPs) are given which are to be
indexed with their International Classification of Diseases (ICD-10) in German
version. ICD-10 codes are helpful in many ways to diagnose various diseases
and identify its drugs [
        <xref ref-type="bibr" rid="ref1 ref3 ref4 ref5 ref6">4, 12, 6, 3, 1, 14, 5</xref>
        ]. NTPs are short summaries which are
currently publicly available in the AnimalTestInfo database2, as part of the
approval procedure for animal experiments in Germany. The database currently
contains more than 10,000 NTPs, many of which have been manually indexed
by experts. We have built a deep neural network with LSTM to generate the
codes for the summaries.Task 1 of eHealth@CLEF2019 focuses on automatic
indexing of NTPs with ICD-10 codes.
2
      </p>
      <sec id="sec-1-1">
        <title>Dataset Description</title>
        <p>
          The dataset is given for the task 1 of eHealth-CLEF@2019[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The training
set contains 7544 document ids in which 5854 are along with its annotations,
development set has 842 ids in which 654 are along with its annotations and the
test set contains 407 ids whose annotations are to be found out. Each document
of animal summaries in the dataset has six lines of text which has the following
information:
1. title of the document;
2. uses (goals) of the experiment;
3. possible harms caused to the animals;
4. comments about replacement (in the scope of the 3R principles);
5. comments about reduction (in the scope of the 3R principles);
6. comments about refinement (in the scope of the 3R principles).
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>The example for ICD-10 codes:</title>
      <p>C50-C50|C00-C97|C00-C75|II
where ’|’ separates one code from another one.
3</p>
      <sec id="sec-2-1">
        <title>Proposed Methodology</title>
        <p>
          A Deep neural network model [
          <xref ref-type="bibr" rid="ref9">10, 9</xref>
          ] is used in our work for generating the
ICD10 codes for the NTPs given in German language. The data is prepared for giving
as input to Seq2Seq deep learning algorithm. The input documents are split up
into sentences (six for each document) and the corresponding ICD-10 codes are
generated for the documents. The vocabulary for all the input documents and
output labels are all formed.
        </p>
        <p>A deep neural network model was built using a multi-layer RNN (Recurrent
Neural Network) in which LSTM (Long Short Term Memory) as its recurrent</p>
        <sec id="sec-2-1-1">
          <title>2 http://www.animaltestinfo.de</title>
          <p>
            unit. Layers namely embedding layer, encoder-decoder layer, projection layer
and loss layer are used to build the deep neural network. The input lines in
the document and its corresponding code labels in the embedding layer are
used to learn the weight vectors based on their vocabulary. Two hidden layers
are used for encoding and decoding.The attention mechanism such as Normed
Bahdanau(NB) and Scaled Luong(SL) models [
            <xref ref-type="bibr" rid="ref2 ref9">2, 9</xref>
            ] are used. Softmax is used
as activation function in the projection layer to obtain the ICD-10 codes for
the summaries. Loss computed in the loss layer is reduced by back propagation
while building the model. Thus, ICD-10 codes is obtained by using the built
Seq2Seq model 1. The TensorFlow code based on tutorial code released by
Neural Machine Translation3 [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] that was developed based on Seq2Seq models
[
            <xref ref-type="bibr" rid="ref2">13, 2, 10</xref>
            ] is used to implement our deep learning approach for Semantic
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>3 https://github.com/tensorflow/nmt</title>
          <p>indexing. We have implemented two broad variations of the Seq2Seq model
by varying the attention models with a batch size of one hundred and twenty
eight, two encoder-decoder layers, dropout of 0.2 and bi-directional. Further
variations are done by different considerations in post processing and type of
input(attention model) given. The different variations are explained in the Table 1.</p>
          <p>The performance obtained for these variations are evaluated using the
evaluation script provided by the CLEF-ehealth@2019 in shown Table 2.</p>
          <p>From the above models, model 3, 4 and 6 are submitted as Run 1, Run 2 and
Run 3 respectively for the shared task.</p>
          <p>The evaluation script provided by the CLEF-eHealth@20194 is used to
evaluate our models with respect to both development set and test set. The
devel</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>4 https://github.com/mariananeves/clef19ehealth-task1</title>
          <p>opment set has more accuracy than compared to the test set. Testing accuracy
can also be improved by building the model using test set as its development
set. From the Table 3, Run 3 has more true postive score than the other runs
for both development set and test set.
4</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Results</title>
        <p>The final evaluation is done on the dataset provided by CLEF-eHealth@2019.
The test set contains 407 summaries which should be annotated.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Conclusions</title>
        <p>Semantic indexing is the indexing of animal experiment summaries with ICD-10
codes in German version. We have made use of deep neural network with two
different attention models for the indexing the summaries with their medical codes
such as C50-C50|C00-C75|II. We have splitted the documents into sentences and
generated the codes with respect to the documents and combined them
according to minimum of two occurences with NB attention for Run 1 which has 0.19,
0.27 and 0.23 and the same with SL attention for Run 2 which has 0.19, 0.27
and 0.22 and considering minimum of two occurences and considering the whole
codes generated if nothing is generated as its code as Run which has 0.13, 0.34
and 0.19 as its precision, recall and F1 score which is evaluated on the test
set of eHealth@CLEF2019. Further improvements can be done by considering
Google Neural Machine Translation (GNMT), Gated Recurrent Unit (GRU) as
recurrent units instead of LSTM.
10. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based
neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
11. Neves, M., Butzke, D., Dörendahl, A., Leich, N., Hummel, B., Schönfelder, G.,
Grune, B.: Overview of the CLEF eHealth 2019 Multilingual Information
Extraction. In: Crestani, F., Braschler, M., Savoy, J., Rauber, A., et al. (eds.)
Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the
Tenth International Conference of the CLEF Association (CLEF 2019). Lecture
Notes in Computer Science. Springer, Berlin Heidelberg, Germany (2019)
12. Skull, S.A., Andrews, R.M., Byrnes, G.B., Campbell, D.A., Nolan, T.M., Brown,
G.V., Kelly, H.A.: Icd-10 codes are a valid tool for identification of pneumonia
in hospitalized patients aged 65 years. Epidemiology &amp; Infection 136(2), 232–240
(2008)
13. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural
networks. In: Advances in neural information processing systems. pp. 3104–3112
(2014)
14. Thygesen, S.K., Christiansen, C.F., Christensen, S., Lash, T.L., Sørensen, H.T.:
The predictive value of icd-10 diagnostic coding used to assess charlson comorbidity
index conditions in the population-based danish national registry of patients. BMC
medical research methodology 11(1), 83 (2011)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alotaibi</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senthilselvan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McMurtry</surname>
            ,
            <given-names>M.S.:</given-names>
          </string-name>
          <article-title>The validity of icd codes coupled with imaging procedure codes for identifying acute venous thromboembolism using administrative data</article-title>
          .
          <source>Vascular medicine 20(4)</source>
          ,
          <fpage>364</fpage>
          -
          <lpage>368</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.:</given-names>
          </string-name>
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bergen</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beghi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medina</surname>
          </string-name>
          , M.T.:
          <article-title>Revising the icd-10 codes for epilepsy and seizures</article-title>
          .
          <source>Epilepsia</source>
          <volume>53</volume>
          ,
          <fpage>3</fpage>
          -
          <lpage>5</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Germaine-Smith</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Metcalfe</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pringsheim</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>J.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beck</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hemmelgarn</surname>
            ,
            <given-names>B.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McChesney</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jette</surname>
          </string-name>
          , N.:
          <article-title>Recommendations for optimal icd codes to study neurologic conditions: a systematic review</article-title>
          .
          <source>Neurology</source>
          <volume>79</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1049</fpage>
          -
          <lpage>1055</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hohl</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karpov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reddekopp</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stausberg</surname>
          </string-name>
          , J.:
          <article-title>Icd-10 codes used to identify adverse drug events in administrative data: a systematic review</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>21</volume>
          (
          <issue>3</issue>
          ),
          <fpage>547</fpage>
          -
          <lpage>557</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jette</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beghi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hesdorffer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moshé</surname>
            ,
            <given-names>S.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuberi</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medina</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Icd coding for epilepsy: past, present, and future-a report by the international league against epilepsy task force on icd codes in epilepsy</article-title>
          .
          <source>Epilepsia</source>
          <volume>56</volume>
          (
          <issue>3</issue>
          ),
          <fpage>348</fpage>
          -
          <lpage>355</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neves</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , et al.:
          <article-title>Clef ehealth 2019 evaluation lab</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <fpage>267</fpage>
          -
          <lpage>274</lpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neves</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scells</surname>
          </string-name>
          , H., ao Palotti, J.:
          <article-title>Overview of the CLEF eHealth evaluation lab 2019</article-title>
          . In: Crestani,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Savoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , et al. (eds.)
          <article-title>Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Tenth International Conference of the CLEF Association (CLEF</source>
          <year>2019</year>
          ).
          <source>Lecture Notes in Computer Science</source>
          . Springer, Berlin Heidelberg, Germany (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Luong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brevdo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>Neural machine translation (seq2seq) tutorial</article-title>
          . https://github.com/tensorflow/nmt (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>