<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>An Analysis of Topic Modelling for Legislative Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>James O' Neill</string-name>
          <email>james.oneill@insight-centre.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leona O' Brien</string-name>
          <email>leona.obrien@ucc.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cécile Robin</string-name>
          <email>cecile.robin@insight-centre.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Buitelaar</string-name>
          <email>paul.buitelaar@insight-centre.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Governance, Risk and Compliance Technology Centre, University College Cork</institution>
          ,
          <addr-line>Cork</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Insight Centre for Data Analytics</institution>
          ,
          <addr-line>IDA Business Park, Galway</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>16</volume>
      <issue>2017</issue>
      <abstract>
        <p>The uprise of legislative documents within the past decade has risen dramatically, making it difficult for law practitioners to attend to legislation such as Statutory Instrument orders and Acts. This work focuses on the use of topic models for summarizing and visualizing British legislation, with a view toward easier browsing and identification of salient legal topics and their respective set of topic specific terms. We provide an initial qualitative evaluation from a legal expert on how the models have performed, by ranking them for each jurisdiction according to topic coherency and relevance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>The legal domain is experiencing a major shift towards automated
tools that can perform tasks that are becoming increasingly
difficult for legal practitioners to carry out, due to the rate of change
in the legal domain. Regulatory change (RC) is a notable area that
has gained more attention in recent years due to the difficulties in
compliance. In order to build automated solutions for compliance
and verification, automated knowledge acquisition is an imperative
for related tasks. An initial step towards such a system requires an
overview/summarization of the core topics within the domain, in
order to identify salient terms within the topics that are potentially
associated with compliance across various documents. Many
approaches in legal systems require metadata from an XML schema to
carry out analysis such as topic modelling. This paper analyzes the
use of topic models to do this automatically from raw text. We start
with a background to the models used for testing.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>TOPIC MODELLING</title>
    </sec>
    <sec id="sec-3">
      <title>Dimensionality Reduction Approaches</title>
      <p>ft,d ∗ log
A basic approach to modelling topics is to view a corpus as a set
of term frequencies (tf) where the weight for each term is also
dependent on the inverse document frequency (idf) ( e.g “and” occurs
many times in a document, therefore its weight is low). Formally,
N</p>
      <p>where N represents the number of documents and nt is
1
2 |M − W H |2F = 2
1 Xn
m</p>
      <p>X(Mi j − W Hi j )2
i=1 j=1</p>
      <p>
        Also described by Lee and Seung [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], the multiplicative update
algorithm is used for updating both W and H . Both update rules
are outlined in Equation 2. The objective ensures the minimization
is constrained to W and H being positive and that the distance D
between both is positive.
      </p>
      <p>(W T M)α, µ
Hα, µ := Hα, µ (W T W H )α, µ ,</p>
      <p>(MHT )α,i</p>
      <p>Wi,α := Wi,α (W H HT )α,i</p>
      <p>
        In this work, instead of using gradient descent to minimize the
sum of squared (euclidean) distance (SSD) between M and W H , we
use the Coordinate Descent solver. Lin et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] describe the
process that builds upon the multiplicative update algorithm by applying
Alternating Non-negative Least Squares (ANLS) using projected
gradient descent which is a parameter estimator with lower-bounded
constraints. Although, NMF is widely used for topic modelFling [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ],
it is sometimes known to produce non-meaningful topics,
particularly if a term-document matrix is relatively sparse. Therefore, the
identification of both rare and non-distinct terms is an important step
(1)
(2)
to consider for removal before factorization. Furthermore, NMF can
be prone to local minima.
and less general terms are desired by the legal practitioners, hence
we use this saliency measure in our analysis.
      </p>
      <p>2.1.2 Singular Value Decomposition. SVD decomposes a
matrix into three parts as shown in Equation (3) in order to find
a lower rank1 approximation of the term-document matrix.
Consider M to be a tf-idf matrix representation of the corpus, where
U diagonalizes MMT and ui represents the corresponding
eigenvector. Similarly V ∗ diagonalizes MT M and vi represents MT M
eigenvectors. The diagonal values of Σ are ordered singular values2.</p>
      <p>M = U ΣV ∗
(3)</p>
      <p>
        SVD on a term-document matrix is also referred to as Latent
Semantic Analysis (LSA), as the lower ranked matrix M is said
to represent a latent semantic space. In information retrieval, it is
referred to as Latent Semantic Indexing (LSI), where SVD is used to
index documents by representing documents (document-document)
and terms (document-term where terms are query terms) in vector
space where the elements in the vector correspond to the degree that
a term or document has to a given topic. The similarity between a
query and a given set of documents can then be determined using a
term-topic-matrix [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. This is particularly helpful for distinguishing
polysemous and synonymous terms.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Latent Dirichlet Allocation</title>
      <p>
        Latent Dirichlet Allocation was first introduced by Blei et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and
has since been a state of the art (SoTA) topic model, showing to have
more expressiveness over probabilistic LSA (pLSA) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. LDA builds
a Bayesian generative model using Dirichlet priors for topic mixtures
(an assumed prior probability for each topic distribution, Dirichlet is
a set of categorical distributions in this sense), in contrast to pLSA
that can be considered to use uniform prior distribution for the topic
mixtures. Further extensions since then have been made to improve
and adapt this model in a continuous space setting. In this sense,
continuous word embeddings are used. Categorical distributions
are replaced with multivariate Gaussian distributions, meaning that
Gaussian LDA has the capability of handling out of vocabulary
words on unseen text [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The probability of word w is dependent
on a topic k in z which is dependent on probability of a document
θd that is drawn from a Dirichlet prior α . Likewise, a word w is also
dependent on the probability ϕ that a word w is in topic k.
      </p>
      <p>
        The LDA generative process is described by Blei [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. For each
document, a parameter θd is chosen from a Dirichlet prior
distribution, then for each word in d a topic category is chosen according to
the Dirichlet. A word w is generated afterwards, given the topic zw
and β .
      </p>
      <p>The aforementioned Gaussian LDA represents these words as
continuous embedded vectors instead of discrete co-occurrence counts,
replacing the categorical distributions for zn and wn with Gaussians.</p>
      <p>
        The saliency of terms within a topic is considered by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
formulated in Equation 4. A distinctive word w is a word that has a
higher log-likelihood of being in a topic K compared to a random
word. Hence, if a word w occurs in many topics it is non-informative,
resulting in lower saliency. More informative topic-specific terms
1The rank of a matrix is the number of linearly independent column vectors in a matrix
(e.g document-term matrix), which can be used to reconstruct all column vectors.
2singular values are the square root of the eigenvalue
S(w) = P (w) X P (K |w) log
      </p>
      <p>P (K |w)</p>
      <p>P (K )</p>
      <p>T</p>
      <p>
        Sievert and Shirley [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] describe the relevance measure, also
shown in 5, where ϕk(w ) is the probability of w for topic k and p(w)
is the probability of observing w in corpus D. In this work, λ is
can be chosen between 0-1. We set λ according to term relevance
judgments made by a legal practitioner, prior to the final analysis of
each topic model.
      </p>
      <p>r (w, k |λ) = λ log(ϕk(w )) + (1 − λ) log( ϕk(w ) )
p(w)
(4)
(5)
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Saffron</title>
      <p>Saffron is a software tool3 that can construct a model-free topic
hierarchy. It extracts terms related to the domain of expertise, establish
semantic relations between them, and constructs a taxonomy out of
it. Saffron also deals with multiword expressions, which can improve
topic coherency as phrases are often necessary for better readability
and understanding.</p>
      <p>
        Saffron builds the topic hierarchy of a corpus by first capturing
the expertise domain through a model represented as single-word
list. The latter is extracted using feature selection during a term
and linguistic pattern extraction phase. It uses constraints such as
limiting to contentful parts-of-speech, to single words (in order to
target a more generic level) and to terms distributed across at least a
1/4 of the corpus (for the specificity to the area of expertise). Topic
coherency, which is a main issue for statistically driven models in
order for Subject Matter Experts (SMEs) to reply upon them, is
tackled here by using semantic relatedness to filter the candidate
words. It is interpreted here as a domain coherency measure using
Pointwise Mutual Information (PMI) (see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for more details). The
domain model is then used as a base to measure the coherence of the
topics within the domain in the next phase.
      </p>
      <p>
        After extracting candidate terms following a standard multi-word
term extraction technique (see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), the first step involves searching
for words from the domain model in the immediate context of those
candidates. This allows to determine a term’s coherence within the
domain. This is achieved again through PMI calculation, by using
top level terms to extract intermediate level terms.
      </p>
      <p>
        To create the pruned graph which represents the taxonomy, the
strength of relationship between two research terms is measured,
defined as Ii j = Di j /(Di × Dj ) where Di is the number of articles
that mention the term Ti in our corpus, Dj is number of articles that
mention the term Tj , and Di j is the number of documents where
both terms appear. Edges are added in the graph for all the pairs
that appear together in at least three documents, threshold fixed
based on the results of previous studies and tests (see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for more
details). Saffron also uses a generality measure to direct edges from
generic concepts to more specific ones. This results in a dense, noisy
directed graph that is further trimmed using a specific branching
algorithm which was successfully applied for the construction of
domain taxonomies in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This yields a tree structure where the
      </p>
      <sec id="sec-5-1">
        <title>3see here - http://saffron.insight-centre.org/</title>
        <p>root is the most generic term and the leaves are the most specific
terms.
3</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>RELATED WORK</title>
      <p>
        Wiltshire et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] introduced a large scale machine learning
systems that incorporates the use of hierarchical topic construction after
the extraction of terms, legal phrases and case cites. Their system
allows for a ranking and classification of topics given a legal concept
as input according to a scoring criterion. George et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] provide
a legal system for ranking documents according to their similarity
to legal cases by finding similarity between documents in the
latent topic space and query terms. They then use human assistance
to provide annotate documents that are relevant to the query in a
semi-supervised fashion. In contrast, our work is fully unsupervised
with no human assistance during the topic modelling process. LDA
has been used extensively on natural language texts such as social
media texts [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], publication texts, newspapers etc. and typically not
in formal settings such as their use on legal texts.
      </p>
      <p>
        Raghuveer and Kumar [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] use LDA to cluster Indian legal
judgements and use cosine similarity as the distance measure between
documents for clustering. However, their evaluation does not present
the prior knowledge of a legal expert to determine if the clusters
coincide with legal knowledge within the domain.
      </p>
      <p>
        O’ Neill et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] have identified salient legal statements (in
contrast to salient topics) by extracting deontic modalities from
using a small number of labeled samples to train a recurrent neural
network.
      </p>
      <p>
        Ahmed and Xing [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] use dynamic HDP to track topic over time,
documents can be exchanged however the ordering is intact. They
also use longitudinal NIPS papers to track emerging topics and
decaying topics (this is worth noting, particularly for tracking changing
topics around compliance issues).
      </p>
      <p>
        The use of the aforementioned Saffron has been previously
demonstrated through a wide range of projects from several domains and
for different tasks. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Bordea used Saffron’s topic extractor to
analyze legal documents arising around the financial crisis in 2008.
She mapped the problem as an expert finding task, which aims at
ranking people that have knowledge about a given topic. In that
particular context, the task allowed the identification of individuals
involved in defining the response of the U.S. government to the
financial crisis by searching for a topic of interest. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Saffron was
used as a tool to detect the presence of different disciplines within
the field of Web Science. By running it on over 10 years of Web
Science conference series documents, it resulted on a discovery of 4
communities (Communication, Computer Science, Psychology, and
Sociology), and trends over time and types of paper. Saffron was
also used in a demo for an Irish bookshop website4 to extract topics
from book descriptions/reviews and then classify them accordingly.
It was also used to link the books for the creation of a multi-level
browsing application for book navigation.
4
      </p>
    </sec>
    <sec id="sec-7">
      <title>METHODOLOGY</title>
      <p>This section outlines the steps towards creating each topic model
and their configurations used for analysis. We start with a brief
introduction to the corpora used and preprocessing steps common</p>
      <sec id="sec-7-1">
        <title>4see http://kennys.insight-centre.org/</title>
        <p>to all topic models. United Kingdom legislative texts were used for
topic modelling5. The corpus contains 41,518 documents between
2000 - 2016. However, for practical purposes the analysis is carried
out on the year 2016, only to lessen the reading burden on the
legal practitioner. The legislative types consist of the following: 304
Northern Ireland Statutory Rules, 838 UK Statutory Instruments, 132
Welsh Statutory Instruments and 317 Scottish Statutory Instruments.
4.1</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Text Preprocessing</title>
      <p>
        Corpus specific regular expressions (RE) are used to clean legal
domain syntax (e.g bracketed alphanumerics), followed by tokenization
and lemmatization using the WordNet lemmatizer [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The
structure usually contains nested expressions e.g (ii) followed by (a) and
(b) subsections. This syntax is removed using the regular
expressions along with other standard RE for identifying references and
alphanumeric expressions e.g “Regulation EC No. 1370/2007 means
Regulation 1370/2007 ...”. Redundant stopwords are removed from
the corpora for word frequency f &lt; 2. This is carried out under the
supervision of a subject expert by analysing a subsample of terms
which are considered for removal. We assume that terms with high
frequency are not specific to a particular topic e.g ’the’,’of’ etc. Also,
rare terms that occur infrequently are not representative of a single
topic since they do not appear enough to infer that it is salient for a
topic. Each corpus (corpus per jurisdiction) is then converted to a
term-document matrix where weights are placed on each word using
the aforementioned tf-idf weighting scheme. Furthermore, 30 terms
for all models except Saffron are listed for SME for ranking. For
Saffron we rely on a visualization of the term hierarchy for a domain
expert to judge.
4.2
      </p>
    </sec>
    <sec id="sec-9">
      <title>Ranking Criterion and Model Configurations</title>
      <p>In order for a legal practitioner to assess the models in a fair manner,
a set of guidelines are presented for the ranking of the models. An
important aspect to ranking is the pretuning of the term relevance
parameter λ, which chooses the top 30 terms that are presented
for each topic within the jurisdiction accordingly. We also assess a
number of parameter setting for NMF, LSA, LDA and HDP before
ifnally choosing the final 10 set of topics which the legal expert
makes their final judgment. Since the term-document matrix is quite
sparse (evident from 1), NMF is initialized using Non-Negative
Singular Value Decomposition (NNSVD). The Coordinate Descent
solver is used for minimizing the reconstruction error as mentioned
in section 2.1.1. The number of components is set to nk = 10. LSI
uses standard SVD which does not require much tuning only to
choose the number of singular values, also set nk = 10. For LDA we
choose low relevance λ = 0.25 to highlight topic specific terms.
5</p>
    </sec>
    <sec id="sec-10">
      <title>RESULTS</title>
      <p>In this section we analyse the topics retrieved for each approach, and
an SME evaluated the topics for the regulations. Figure 1 simply
compares the effects of dictionary size once infrequent terms are
increasingly removed. It is evident that after removing terms that
occur less than twice, the corpus’ size dramatically decreases,
meaning that a significant number of terms are too specific to a particular
document. We remove these terms for subsequent analysis.</p>
      <sec id="sec-10-1">
        <title>5Retrieved from: http://www.legislation.gov.uk/</title>
        <p>
          Latent Dirichlet Allocation Visualization. For the visualization
of LDA topics, we use the pyLDAvis [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] visualization tool. A
multidimensional scaling projects the t dimensional space to a 2
dimensions as shown in figure 2. Ten topics for Northern Ireland Statutory
Rules (NISR) are presented with the relevance metric set λ = 0.25
(which decides the term-topic specificity). This is done under the
supervision of a legal practitioner to ensure that λ is tuned to a correct
specificity and that topics are also coherent, before a final evaluation.
        </p>
        <p>Some terms such as biomass, biomaterial, bioliquid, fossil and
fuel show a clear and distinct topic and are quite topic specific given
λ = 0.25, shown by red bars which indicate the term frequency with
the given topic as opposed to the blue bar that indicate the term
frequency among the whole corpus.</p>
        <p>Saffron. In Saffron’s results, a cluster is located around the
extracted topic of department of justice, and support allowance which
derives the whole taxonomy for the Northern Ireland Statutory
Rules. This topic is thus the primary node of the 2016 corpus.
In Figure 4, we zoom in a subset of this graph (and thus
subdomains) which includes housing benetfi , income support, social
security, personal_independence payment. They all are
semantically related to the mother node support allowance, but tackling
different aspects of it. We can see the advantage of the
hierarchical structure of the graph, with semantically related topics
going from the more generic to the more specialized ones. We can
this way identify a waterfall structure from the housing benefit
branch, logically followed by the more specific local housing
allowance, and then local housing allowance determination.
Another quite clear example can be observed from the child
support branch, related to the personal independence_payment node.
From child support, the directed edge links to child support
maintenance, then maintenance calculation, and finally the three topics
child_support_maintenance_calculation_regulation, welfare service
and maintenance assessment. The police service node is at the root of
a taxonomy that includes children nodes northern_ireland_reserve
⇒ notice_of_appeal ⇒ written_representation,avoiding service ⇒
reasonable_amount_of_duty_time. This example summary allows a
legal practitioner to identify topics surrounding certain legal issues,
or for simply summarizing a complete jurisdiction. Zooming in on
a subset of the hierarchical tree, we highlight a topic with coherent</p>
      </sec>
      <sec id="sec-10-2">
        <title>Saffron LDA HDP LSA</title>
        <p>Saffron
Saffron
HLDP/LSI
HLDP/LSI</p>
        <p>WSI
Saffron
LDA
HLDP/LSI
HLDP/LSI
NMF
multi-word expressions summarizing an area within the Northern</p>
      </sec>
      <sec id="sec-10-3">
        <title>Ireland Statutory Rules in Figure 5.</title>
        <p>Ranking. Table 1 shows the results of SME ranking after
assessing each topic model for each jurisdiction. Saffron overall is favored
for all jurisdictions, considering it is the only model that performs
multi-word expression topic extraction and weighting of descriptive
noun terms/phrases. We conjecture that the appeal of a
hierarchical structure and multi-word noun expressions has influenced the
interpretation of the salient terms in the domain, making it easier for
legal practitioners to identify important and coherent legal topics.</p>
        <p>We emphasize at this point that single word topic models and
multi-word hierarchical models are not directly comparable for this
reasons outlined however, they are included in table 1 to highlight
the importance of longer expressions that are linked in a taxonomy,
providing more clarity on what the emerging topics are.
6</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSION</title>
      <p>This work has presented a fully automated approach for identifying
topics in regulations that assist in easier tracking of important domain
terms that correspond to compliance related issues. After
evaluation Saffron has been consistently ranked as the most favourable of
all models, as the aforementioned vocabulary pruning and usage
of multi-word expressions has played a fundamental role in topic
coherency. Standard LDA has performed the best of all single term
models, particularly when top terms are chosen according to their
topic specificity. HDP has inferred a similar number of topics as that
of LDA according to an analysis of the log-likelihood curve and the
legal practitioners judgment. This work is an early indication as to
how legal practitioners can identify salient and coherent topics using
automatic topic modelling tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Amr</given-names>
            <surname>Ahmed</surname>
          </string-name>
          and
          <string-name>
            <given-names>Eric P.</given-names>
            <surname>Xing</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream</article-title>
          .
          <source>CoRR abs/1203</source>
          .3463 (
          <year>2012</year>
          ). http://arxiv.org/abs/1203.3463
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>David</surname>
            <given-names>M Blei</given-names>
          </string-name>
          , Andrew Y Ng, and
          <string-name>
            <given-names>Michael I</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of machine Learning research 3</source>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          (
          <year>2003</year>
          ),
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>David</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>Andrew Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ng</surname>
            , and
            <given-names>Michael I.</given-names>
          </string-name>
          <string-name>
            <surname>Jordan</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>J. Mach. Learn. Res. 3 (March</source>
          <year>2003</year>
          ),
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          . http://dl.acm.org/ citation.cfm?id=
          <volume>944919</volume>
          .
          <fpage>944937</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Georgeta</given-names>
            <surname>Bordea</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Domain adaptive extraction of topical hierarchies for Expertise Mining</article-title>
          .
          <source>Ph.D. Dissertation.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Georgeta</given-names>
            <surname>Bordea</surname>
          </string-name>
          , Kartik Asooja, Paul Buitelaar, and
          <string-name>
            <surname>Leona Oâ A˘ Z´Brien</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Gaining insights into the Global Financial Crisis using Saffron. NLP Unshared Task in PoliInformatics (</article-title>
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Deng</given-names>
            <surname>Cai</surname>
          </string-name>
          , Xiaofei He,
          <string-name>
            <surname>Xiaoyun Wu</surname>
          </string-name>
          , and Jiawei Han.
          <year>2008</year>
          .
          <article-title>Non-negative matrix factorization on manifold</article-title>
          .
          <source>In Data Mining</source>
          ,
          <year>2008</year>
          . ICDM'08. Eighth IEEE International Conference on. IEEE,
          <fpage>63</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jason</given-names>
            <surname>Chuang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Christopher D Manning</surname>
            , and
            <given-names>Jeffrey</given-names>
          </string-name>
          <string-name>
            <surname>Heer</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Termite: Visualization techniques for assessing textual topic models</article-title>
          .
          <source>In Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM</source>
          ,
          <volume>74</volume>
          -
          <fpage>77</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Rajarshi</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Manzil Zaheer</surname>
            , and
            <given-names>Chris</given-names>
          </string-name>
          <string-name>
            <surname>Dyer</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Gaussian LDA for Topic Models with Word Embeddings.</article-title>
          .
          <source>In ACL (1)</source>
          .
          <fpage>795</fpage>
          -
          <lpage>804</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Christiane</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          .
          <year>1998</year>
          . WordNet. Wiley Online Library.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Clint</given-names>
            <surname>Pazhayidam</surname>
          </string-name>
          <string-name>
            <surname>George</surname>
          </string-name>
          , Sahil Puri, Daisy Zhe Wang, Joseph N Wilson, and William F Hamilton.
          <year>2014</year>
          .
          <article-title>SMART Electronic Legal Discovery Via Topic Modeling.</article-title>
          .
          <source>In FLAIRS Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Daniel</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            and
            <given-names>H Sebastian</given-names>
          </string-name>
          <string-name>
            <surname>Seung</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Learning the parts of objects by non-negative matrix factorization</article-title>
          .
          <source>Nature</source>
          <volume>401</volume>
          ,
          <issue>6755</issue>
          (
          <year>1999</year>
          ),
          <fpage>788</fpage>
          -
          <lpage>791</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Daniel</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            and
            <given-names>H Sebastian</given-names>
          </string-name>
          <string-name>
            <surname>Seung</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Algorithms for non-negative matrix factorization</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>556</volume>
          -
          <fpage>562</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Projected gradient methods for nonnegative matrix factorization</article-title>
          .
          <source>Neural computation 19</source>
          ,
          <issue>10</issue>
          (
          <year>2007</year>
          ),
          <fpage>2756</fpage>
          -
          <lpage>2779</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Roberto</surname>
            <given-names>Navigli</given-names>
          </string-name>
          , Paola Velardi, and
          <string-name>
            <given-names>Stefano</given-names>
            <surname>Faralli</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch</article-title>
          .
          <source>In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Three (IJCAI'11)</source>
          . AAAI Press,
          <year>1872</year>
          -
          <fpage>1877</fpage>
          . DOI:http://dx.doi.org/10. 5591/978-1-
          <fpage>57735</fpage>
          -516-8/
          <fpage>IJCAI11</fpage>
          -313
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>James</surname>
            <given-names>O</given-names>
          </string-name>
          ' Neill, Paul Buitelaar, Cecile Robin, and
          <string-name>
            <surname>Leona</surname>
            <given-names>O'</given-names>
          </string-name>
          <string-name>
            <surname>Brien</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Classifying sentential modality in legal language: a use case in financial regulations, acts and directives</article-title>
          .
          <source>In Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law</source>
          . ACM,
          <volume>159</volume>
          -
          <fpage>168</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Pennacchiotti</surname>
          </string-name>
          and
          <string-name>
            <given-names>Siva</given-names>
            <surname>Gurumurthy</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Investigating topic models for social media user recommendation</article-title>
          .
          <source>In Proceedings of the 20th international conference companion on World wide web. ACM</source>
          ,
          <volume>101</volume>
          -
          <fpage>102</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K</given-names>
            <surname>Raghuveer</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Legal documents clustering using latent dirichlet allocation</article-title>
          .
          <source>IAES Int. J. Artif. Intell. 2</source>
          ,
          <issue>1</issue>
          (
          <year>2012</year>
          ),
          <fpage>34</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Barbara</given-names>
            <surname>Rosario</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Latent semantic indexing: An overview</article-title>
          .
          <source>Techn. rep. INFOSYS</source>
          <volume>240</volume>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Carson</given-names>
            <surname>Sievert</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kenneth E</given-names>
            <surname>Shirley</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>LDAvis: A method for visualizing and interpreting topics</article-title>
          .
          <source>In Proceedings of the workshop on interactive language learning, visualization, and interfaces</source>
          .
          <volume>63</volume>
          -
          <fpage>70</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>James</surname>
            <given-names>S Wiltshire</given-names>
          </string-name>
          <string-name>
            <surname>Jr</surname>
          </string-name>
          , John T Morelock,
          <string-name>
            <surname>Timothy L Humphrey</surname>
            ,
            <given-names>X Allan</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <surname>James M Peck</surname>
            ,
            <given-names>and Salahuddin</given-names>
          </string-name>
          <string-name>
            <surname>Ahmed</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>System and method for classifying legal concepts using legal topic scheme</article-title>
          .
          <source>(Dec. 31</source>
          <year>2002</year>
          ).
          <source>US Patent 6</source>
          ,
          <issue>502</issue>
          ,
          <fpage>081</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Xiaohui</surname>
            <given-names>Yan</given-names>
          </string-name>
          , Jiafeng Guo, Shenghua Liu, Xueqi Cheng, and
          <string-name>
            <given-names>Yanfeng</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Learning topics in short texts by non-negative matrix factorization on term correlation matrix</article-title>
          .
          <source>In Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM</source>
          ,
          <fpage>749</fpage>
          -
          <lpage>757</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>