<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Topic Modelling of the Czech Supreme Court Decisions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tereza NOVOTNA´</string-name>
          <email>tereza.novotna@mail.muni.cz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakub HARA SˇTA</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakub KO´ L</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Atlas Consulting spol. s r.o.</institution>
          ,
          <addr-line>Ostrava</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Law and Technology, Masaryk University</institution>
          ,
          <addr-line>Brno, Czech republic</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>The Czech Supreme Court produces significant amount of decisions totalling more than 130 000 decisions since 1993. The amount makes it difficult for law practitioners to research this case law. This work focuses on topic models for enhanced information retrieval through identification of case law approaching the same or similar issues. We provide initial quantitative evaluation of Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) models according to CV coherence score for different number of topics modelled nk= f10, 20, ..., 90, 100g. Additionally, we provide qualitative evaluation for LDA and NMF models nk= f20, 30g that will serve as a starting point for subsequent expert-user evaluation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>The Czech Supreme Court produces significant amount of decisions totalling more than
130 000 decisions since 1993. The amount makes it difficult for law practitioners to
research this case law. In this paper, we apply topic modelling methods in order to provide
less time-consuming and more efficient court decisions retrieval.</p>
      <p>Ultimately, our aim is to provide more accurate legal information retrieval methods
that take into consideration specifics of the Czech law and the Czech language and are
extensively evaluated by lawyers knowledgeable in the Czech law and practicing within
the jurisdiction.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        A general purpose of topic modelling methods is to discover underlying topic structures
in the given set of documents. These topics are probability distributions over a set of
words. This method is beneficial for many information retrieval tasks. It is
fundamentally unsupervised, however it has many supervised or semi-supervised extensions or
applications [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. For our experiment, we have selected two well-known topic modelling
algorithms, Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization
(NMF).
      </p>
      <p>
        LDA was first introduced as an unsupervised model in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It was successfully used
for classifying journalistic texts in [
        <xref ref-type="bibr" rid="ref13 ref6">6,13</xref>
        ] and it was applied on Twitter tweets in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
demonstrating that this method may not be the best option for short texts. Additionally,
it was used for summarizing of scientific papers [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In legal IR, this method is used for
topic detection or clustering of similar documents [
        <xref ref-type="bibr" rid="ref11 ref8">8,11</xref>
        ]. NMF was first introduced in
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and gained subsequent popularity through [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] as an innovative data clustering
algorithm. NMF algorithm was used for polyphonic music transcription [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or for document
clustering in [
        <xref ref-type="bibr" rid="ref19 ref9">19,9</xref>
        ].
      </p>
      <p>
        Part of the research in topic modelling focuses on comparing different models. For
this purpose, various extrinsic and intrinsic evaluation methods were designed [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
Coherence score, as one of the intrinsic methods, is a metric expressing the logical order or
coherence of topics and thus enables machine (quantitative) validation and comparison.
Coherence calculation is applied to the most important words of the topic and the result is
then calculated as the sum or arithmetic mean of all these values. The CV (or sometimes
referred to as ”c v”) coherence score which is used as an automatic validation measure in
this paper was introduced in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. It is based on a sliding window, one-set segmentation
of the top words and an indirect confirmation measure that uses normalized pointwise
mutual information (NPMI) and the cosine similarity [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        It was reported that NMF outperforms LDA in topic coherence score when using
corpora of relatively shorts texts [
        <xref ref-type="bibr" rid="ref2 ref4">4,2</xref>
        ]. On the other hand, LDA provided more coherent
topics in the case of longer texts and outperformed NMF [
        <xref ref-type="bibr" rid="ref12 ref18">18,12</xref>
        ]. Considering that
purpose of our research is to provide more accurate legal information retrieval, expert-user
evaluation of the relevance of topics assigned to documents is more important than
coherence measure. However, in this phase of our research, we use coherence measure to
compare LDA and NMF in different settings (different number of topics nk) to identify
most coherent setting to be later subjected to expert-user evaluation.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>
        We used CzCDC 1.0 corpus [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], specifically its Supreme Court subset containing
111 977 decisions (published between 01/1994 and 09/2018). We removed so called
unifying opinions due to their highly specific nature (both substantive and procedural),
which left us with 111 187 decisions. Subsequently, we removed headers (containing
names of judges, identification of parties and their representatives etc.), numbers,
punctuation symbols and stop words based on the general list of Czech stop words.
Subsequently, we used part-of-speech tagging to select nouns and adjectives (because these are
usual bearers of meaning in legal language), and used lemmatization and short words
removal (all words shorter than three characters were removed). Finally, all the remaining
characters were transformed into lower-case form. We used spaCy python library with
its extension for the Czech language via ud-pipe.2
      </p>
      <sec id="sec-3-1">
        <title>2https://pypi.org/project/spacy-udpipe/.</title>
        <p>We relied on LDA and NMF implementation algorithm in gensim package.3 One
of the parametres to be set for both methods is the extremes filtering removing words
that appear either very often or very little. We removed all tokens appearing in less than
5 documents and all tokens appearing in more than half of the documents. The upper
limit stems from the fact that the Czech Supreme Court serves as an apex court for civil
and criminal cases. Hence, if the token appears in more than half of the documents, we
assumed that it appears in both civil and criminal branch as a very common term without
any specific legal importance. Furthermore, NMF model is built on the tf-idf corpus.</p>
        <p>We used the resulting dataset to train both LDA and NMF models over ten instances
with a different number of topics nk= f10, 20, 30, 40, 50, 60, 70, 80, 90, 100g.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>
        The CV coherence score comparing LDA and NMF models for the same nk is presented
in Figure 1. LDA model with nk= 30 and NMF model with nk= 20 achieved the
highest CV coherence score at 0.6387 or 0.7418 respectively. At the same time, all of the
instances of NMF models achieved higher CV coherence score than instances of LDA
models with the same nk. All models and their topics as pyLDAvis graphs are available at
github page.4 Generally, scores for both methods in different settings are relatively high
compared to those reported in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        This means that topics are composed of highly coherent keywords or semantically
related terms. Following the interpretation in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], topics and their distribution in the
corpus of court decisions should be relatively similar to human evaluation. To confirm
this assumption, further qualitative analysis is necessary. As is evident from [
        <xref ref-type="bibr" rid="ref13 ref16 ref3">13,16,3</xref>
        ],
the correlation between human evaluation and coherence score of different models varies
greatly and depends on various parameters. It is not sufficient to declare success in this
experiment based on CV coherence score of different models alone. Initial subsequent
analysis is precursor of successful larger expert-user evaluation.
      </p>
      <p>Given the scope of this paper, it is impossible to provide for qualitative evaluation
of all twenty models. For small-scale initial qualitative analysis, we selected the LDA
model with the highest CV coherence score nk= 30 and the NMF model with the highest
CV coherence score nk= 20 plus corresponding NMF model with nk= 30.</p>
      <p>The list of topics for both selected models reveals interesting tendencies. First of all,
while NMF model offers higher CV coherence score in general, topics are more general.
LDA topics are more specific. Therefore, it can discover more of the less common topics,
that the NMF method does not reveal at all. For example, the topic no. 29 contains words
related to the inheritance (”child”, ”adult”, ”inheritance”, ”testator” etc.). The NMF
method does not contain these words at all in the most coherent model. Logically, this
may be caused by the fact that the model has a third smaller number of topics all together
but furthermore not even the 30 topics NMF model contains this topic. If we look at the
30 topics NMF model, we can see that a higher number of topics means more general
or even interchangeable topics (for example topics no. 15 and no. 16). At the same time,</p>
      <sec id="sec-4-1">
        <title>3https://radimrehurek.com/gensim/.</title>
        <p>4https://github.com/tm-czech-supreme-court/lda-nmf-models
0:75
0:7
0:65
V
C
0:6
0:55</p>
        <p>LDA</p>
        <p>NMF
NMF and LDA model differs in tendencies towards more substantive (LDA) or more
procedural (NMF) topics. Both of these approaches can be used in practice, as lawyers
research case law focusing on both procedural and substantive aspects. As such, these
deserve specific consideration within the subsequent expert-user evaluation.</p>
        <p>Similarly, there are differences between models related to civil law and criminal law
Supreme Court decisions. The NMF 20 and 30 topics models tend to cluster criminal
law terms into a few main topics without further distinction, on the other hand, the 30
topics LDA model tends to contain more criminal law topics with finer distinction. The
20 topics NMF model contains 3 criminal law topics, the 30 topics NMF model contains
4 criminal law topics and the LDA 30 topics model contains 5 purely criminal topics.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>Quantitative results show that CV coherence score for all models is relatively high and
that both methods provide relatively meaningful and coherent legal topics. NMF models
generally score higher in CV coherence score, while LDA models appear to provide
more details and specific topics. Qualitative analysis of results also shows that NMF
models are better at identification of procedural topics, while LDA models are better at
identification of substantive topics. This suggests that ability to use these models to allow
better information retrieval is goal-specific.</p>
      <p>This short paper is part of larger project, our future work requires extensive
expertuser evaluation to identify whether results of our initial quantitative and qualitative
analysis of LDA and NMF models are supported by expert-user experience. The expert-user
evaluation will include topics identified by both LDA and NMF models and expert-users
will be tasked to identify which topics describe the specific court decisions more
accurately.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>This publication was supported by Masaryk University (MUNI/A/1454/2019, Automatic
Processing of Court Decisions: User Experiment). We would like to thank Vincent Kr´ızˇ
for his consultations and advices.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Blei</surname>
            <given-names>DM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            <given-names>AY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            <given-names>MI</given-names>
          </string-name>
          .
          <article-title>Latent Dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <year>2003</year>
          ,
          <volume>3</volume>
          (
          <issue>4-5</issue>
          ), p.
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O</given-names>
            <surname>'Callaghan</surname>
          </string-name>
          <string-name>
            <given-names>D</given-names>
            ,
            <surname>Green</surname>
          </string-name>
          <string-name>
            <given-names>D</given-names>
            ,
            <surname>Carthy</surname>
          </string-name>
          <string-name>
            <given-names>J</given-names>
            ,
            <surname>Cunningham</surname>
          </string-name>
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>An analysis of the coherence of descriptors in topic modeling</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <year>2015</year>
          ,
          <volume>42</volume>
          (
          <issue>13</issue>
          ), p.
          <fpage>5645</fpage>
          -
          <lpage>5657</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyd-Graber</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerrish</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blei</surname>
            <given-names>DM</given-names>
          </string-name>
          . Reading Tea Leaves:
          <article-title>How Humans Interpret Topic Models</article-title>
          .
          <source>Proceedings of Neural Information Processing Systems (NIPS)</source>
          <year>2009</year>
          , p.
          <fpage>288</fpage>
          -
          <lpage>296</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Chen</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ye</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Experimental explorations on short text topic mining between LDA and NMF based Schemes</article-title>
          .
          <source>Knowledge-Based Systems</source>
          ,
          <year>2019</year>
          ,
          <volume>163</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>He</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhuge H. Exploring</surname>
          </string-name>
          <article-title>Differential Topic Models for Comparative Summarization of Scientific Papers</article-title>
          .
          <source>Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics</source>
          , p.
          <fpage>1028</fpage>
          -
          <lpage>1038</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Jacobi</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Atteveldt</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welbers</surname>
            <given-names>K.</given-names>
          </string-name>
          <article-title>Quantitative analysis of large amounts of journalistic texts using topic modelling</article-title>
          .
          <source>Digital Journalism</source>
          ,
          <year>2016</year>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ), p.
          <fpage>89</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Jonsson</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stolee</surname>
            <given-names>J.</given-names>
          </string-name>
          <article-title>An Evaluation of Topic Modelling Techniques for Twitter</article-title>
          . Research paper.
          <year>2016</year>
          , https://www.cs.toronto.edu/~jstolee/projects/topic.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Kumar</surname>
            <given-names>VR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raghuveer</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Legal Document</surname>
          </string-name>
          <article-title>Summarization using Latent Dirichlet Allocation</article-title>
          .
          <source>International Journal of Computer Science and Telecommunications</source>
          ,
          <year>2012</year>
          ,
          <volume>3</volume>
          (
          <issue>7</issue>
          ), p.
          <fpage>114</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Laxmi</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            <given-names>PK</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shankar</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakshmanaprabu</surname>
            <given-names>SK</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidhyavathi</surname>
            <given-names>RM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maseleno</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Charismatic Document Clustering Through Novel K-Means</surname>
          </string-name>
          Non
          <article-title>-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction</article-title>
          .
          <source>International Journal of Parallel Programming</source>
          ,
          <year>2002</year>
          ,
          <volume>48</volume>
          (
          <issue>3</issue>
          ), p.
          <fpage>496</fpage>
          -
          <lpage>514</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Lee</surname>
            <given-names>DD</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seung</surname>
            <given-names>HS</given-names>
          </string-name>
          .
          <article-title>Learning the parts of objects by non-negative matrix factorization</article-title>
          .
          <source>Nature</source>
          ,
          <year>1999</year>
          ,
          <volume>401</volume>
          (
          <issue>6755</issue>
          ), p.
          <fpage>788</fpage>
          -
          <lpage>791</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Lu</surname>
            <given-names>Q</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conrad</surname>
            <given-names>JG</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Kofahi</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keenan</surname>
            <given-names>W. Legal</given-names>
          </string-name>
          <article-title>Document Clustering with Built-in Topic Segmentation</article-title>
          .
          <source>Proceedings of the 20th ACM International Conference on Information and Knowledge Management</source>
          <year>2011</year>
          , p.
          <fpage>383</fpage>
          -
          <lpage>392</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Mifrah</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Topic Modeling</surname>
          </string-name>
          <article-title>Coherence: A Comparative Study between LDA and NMF Models using COVID'19 Corpus</article-title>
          .
          <source>International Journal of Advanced Trends in Computer Science and Engineering</source>
          ,
          <year>2020</year>
          ,
          <volume>9</volume>
          (
          <issue>4</issue>
          ), p.
          <fpage>5756</fpage>
          -
          <lpage>5761</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Nikolenko</surname>
            <given-names>SI</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koltcov</surname>
            <given-names>S</given-names>
          </string-name>
          , Koltsova.
          <article-title>Topic modelling for qualitative studies</article-title>
          .
          <source>Journal of Information Science</source>
          ,
          <year>2017</year>
          ,
          <volume>43</volume>
          (
          <issue>1</issue>
          ), p.
          <fpage>88</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Novotna</surname>
          </string-name>
          ´ T,
          <string-name>
            <surname>Harasˇta J. The Czech Court Decisions</surname>
          </string-name>
          <article-title>Corpus (CzCDC): Availability as the First Step</article-title>
          .
          <year>2019</year>
          , arXiv:
          <year>1910</year>
          .09513.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Paatero</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tapper</surname>
            <given-names>U</given-names>
          </string-name>
          .
          <article-title>Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values</article-title>
          .
          <source>Environmetrics</source>
          ,
          <year>1994</year>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ), p.
          <fpage>111</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Ro¨der</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Both</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinneburg</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Exploring the Space of Topic Coherence Measures</article-title>
          .
          <source>Proceedings of the Eighth ACM International Conference on Web Search and Data Mining - WSDM</source>
          <year>2015</year>
          , p.
          <fpage>399</fpage>
          -
          <lpage>408</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Smaragdis</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broxn</surname>
            <given-names>JC</given-names>
          </string-name>
          .
          <article-title>Non-negative matrix factorization for polyphonic music transcription</article-title>
          .
          <source>2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics</source>
          , p.
          <fpage>177</fpage>
          -
          <lpage>180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Suri</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
            <given-names>NR</given-names>
          </string-name>
          .
          <article-title>Comparison between LDA NMF for event-detection from large text stream data</article-title>
          .
          <source>Proceedings of the 3rd International Conference on Computational Intelligence Communication Technology (CICT</source>
          <year>2017</year>
          ), p.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Xu</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gong</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Document clustering based on non-negative matrix factorization</article-title>
          .
          <source>Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          <year>2003</year>
          , p.
          <fpage>267</fpage>
          -
          <lpage>273</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>