<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Exploring the Use of Topic Analysis in Latvian Legal Documents</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Rinalds</forename><surname>Vīksna</surname></persName>
							<email>rinaldsviksna@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Artificial Intelligence and Systems Engineering</orgName>
								<orgName type="institution">Riga Technical University</orgName>
								<address>
									<country key="LV">Latvia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marite</forename><surname>Kirikova</surname></persName>
							<email>marite.kirikova@rtu.lv</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Artificial Intelligence and Systems Engineering</orgName>
								<orgName type="institution">Riga Technical University</orgName>
								<address>
									<country key="LV">Latvia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Daiga</forename><surname>Kiopa</surname></persName>
							<affiliation key="aff1">
								<address>
									<settlement>Lursoft</settlement>
									<country key="LV">Latvia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Exploring the Use of Topic Analysis in Latvian Legal Documents</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">99A84B19226D5BF21ECB2BB8FF43CB36</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:23+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>Topic Analysis -Legal Analysis -Information Retrieval 1</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The large number of legislative documents produced every day makes it difficult to follow each and every document. However, it is important for enterprises to comply with all current legislative acts. In this paper we demonstrate the application of different topic analysis algorithms and stop word filtering approaches to the corpus of legal texts of the Republic of Latvia. This is done for the purpose of supporting the discovery of expressive and meaningful legal topics and marking respective documents according to those topics. Topic models produced in this work are intended to be used as an aid for experts, enabling faster document browsing possibilities.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Every enterprise must conform to and comply with current regulatory acts. Moreover, some types of regulations may be used as blueprints for business process models <ref type="bibr" target="#b0">[1]</ref>. Legislative documents may be laws issued by parliament, regulations issued by the Cabinet of Ministers, Municipalities or other institutions, as well as industry standards, various contracts and other documents <ref type="bibr" target="#b1">[2]</ref>. Many regulations are related to others, being either an update of an earlier regulation or depending on or being implemented by other regulations. Keeping track of the changing regulatory environment requires significant time and effort.</p><p>In this paper we envision a solution that may help to save effort by the overview and summarization of various topics within the Latvian law domain. The goal of this paper is to explore the application of different topic analysis algorithms and stop word filtering approaches on the corpus of legal texts of the Republic of Latvia. For the demonstration we use three common topic analysis algorithms, briefly introduced in Section 2. The paper presents the research in progress that is a part of more extensive research activity, the aim of which is to find core topics in Latvian legislation, as well as to identify, for further exploration, a method for automated document tagging with salient topics.</p><p>The paper is organized as follows. Section 2 discusses the problem domain and available topic analysis algorithms. Section 3 shows data preparation steps and the results of stop word removal. Section 4 discusses the results obtained and Section 5 provides brief conclusions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work and Background</head><p>Topic analysis (often called "topic modeling" or "topic detection") is a text-mining <ref type="bibr" target="#b2">[3]</ref> technique for soft clustering (where each document has a probability distribution over all the clusters) of documents according to distribution of terms that occur in the text body. O'Neill et al. <ref type="bibr" target="#b3">[4]</ref> used topic analysis to summarize and visualize British legislation to find useful topics and terms. Wyner et al. applied topic analysis to profile and extract arguments from legal cases <ref type="bibr" target="#b4">[5]</ref>. Soria et al. applied topic analysis to annotate each paragraph in Italian law texts with semantic information <ref type="bibr" target="#b5">[6]</ref>. Sulea et al. explore the use of text classification methods in <ref type="bibr" target="#b6">[7]</ref>, however, here, the use of text classification methods requires that documents have known labels. The results may differ depending on the language used. In this work we address regulatory (legal) documents in Latvian with the purpose of supporting legal document handling activities by experts. Topic analysis is an unsupervised learning method, which produces a number of topics, which each consist of related terms and their respective weights. Topic analysis is most often done using either Latent Semantic Analysis (LSI), Latent Dirichlet allocation (LDA) <ref type="bibr" target="#b7">[8]</ref> or its variant -Hierarchical Dirichlet Process (HDP) algorithms <ref type="bibr" target="#b8">[9]</ref>, <ref type="bibr" target="#b2">[3]</ref>. These algorithms are briefly described below and, being the most popular ones, were used in the experiments reported in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Latent Semantic Indexing</head><p>Latent Semantic Analysis, also called LSI, is a method for extracting and representing the contextual usage meaning of words by statistical computations applied to a large corpus of text. The text corpus is viewed as a set of term tf-idf weights, where tf is a term frequency in the given text, and idf is an inverse document frequency. To this term-document matrix singular value decomposition (SVD) is applied. In SVD a rectangular matrix is decomposed into the product of three other matrices in order to find a lower rank approximation of the term-document matrix <ref type="bibr" target="#b9">[10]</ref>. LSI is implemented using gensim 12 python library.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Latent Dirichlet Allocation</head><p>Latent Dirichlet allocation (LDA) is a generative probabilistic model of a corpus. The basic idea is that documents are represented as random mixtures over latent topics where each topic is characterized by a distribution over words. The LDA algorithm is described in <ref type="bibr" target="#b10">[11]</ref> and it is implemented in the sklearn python library 23 and in gensim.</p><p>Term distinctiveness and saliency are used to evaluate generated topics. For a given word w, unconditional probability P(w) and probability P(T|w) that that given word was generated by latent topic T is computed. Probability P(T|w') that random word w' was generated by topic T is also computed. Distinctiveness of word w is then calculated as follows <ref type="bibr" target="#b11">[12]</ref>:</p><formula xml:id="formula_0">distinctivness w = 𝑃 𝑇 𝑤 𝑙𝑜𝑔 ! ! ! ! ! ! (1)</formula><p>The above-presented equation describes how informative word w is for determining topic T. If a word occurs in all topics, observing the word tells little about the document's topic, and the word has little distinctiveness. The saliency of a word is defined as <ref type="bibr" target="#b11">[12]</ref>:</p><formula xml:id="formula_1">𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑦 𝑤 = 𝑃 𝑤 * 𝑑𝑖𝑠𝑡𝑖𝑛𝑐𝑡𝑖𝑣𝑛𝑒𝑠𝑠(𝑤)<label>(2)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Hierarchical Dirichlet Process</head><p>Hierarchical Dirichlet Process (HDP) is a Bayesian nonparametric model for unsupervised analysis of grouped data. Documents are viewed as bags of words, which are drawn from a number of latent clusters or "topics", where "a topic" is modeled as a multinomial probability distribution on words from some basic vocabulary. Given a collection of documents, HDP finds latent clusters, without the need to specify the number of topics as a parameter <ref type="bibr" target="#b8">[9]</ref>. HDP analysis requires multiple passes through all the data and therefore is poorly suited for massive and streaming data. Wang C. proposed Online Variational Inference for the HDP algorithm, which requires only one pass through data and is significantly faster <ref type="bibr" target="#b8">[9]</ref> and is implemented in gensim library.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Corpus and Data Preparation and Analysis</head><p>In this paper, we use the corpus of legal acts from http://likumi.lv/ -a website of legal acts that ensures free access to systematized (consolidated) legal acts of the Republic of Latvia. Documents were downloaded as HTML documents which were kept for later metadata extraction (document information -issuer, status, adoption, end-ofvalidity date, related documents, etc.). Text contents of downloaded HTML documents were extracted and saved as plain text in UTF-8 format. Some of the documents contained mostly Russian or English text and were dropped from the corpus. In total, over 50000 documents in the Latvian language were collected. With these documents, the experiments regarding stop word removal and different topic models were made.</p><p>In the remainder of this section, first we do exploratory data analysis, and assess the impact of stop word removal on the performance of clustering algorithms with LDA as an example (sub-Section 3.1), and in the second part (sub-Section 3.2) explore alternative clustering algorithms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Experiments with Stop Word Removal Approaches</head><p>During text preprocessing boilerplate content (irrelevant text, ads) and generic Latvian stop words identified by Garkaje <ref type="bibr" target="#b12">[13]</ref> were removed. After this step, the 10 most common words in the corpus were identified (Fig. <ref type="figure" target="#fig_0">1</ref>). As we see in Fig. <ref type="figure" target="#fig_0">1</ref>, most common words ("state", "latvian", "in force", etc.) occur multiple times in most documents in the corpus and are not very informative, so in this context those words are stop words and were removed. To find stop words specific for this domain, a normalized tf-idf metric was used <ref type="bibr" target="#b13">[14]</ref>. A tf-idf metric was calculated for each word as given in <ref type="bibr" target="#b13">[14]</ref>:</p><formula xml:id="formula_2">𝑡𝑓-𝑖𝑑𝑓 = tf 𝑘 !"#$ * 𝑖𝑑𝑓(𝑘) 𝑡𝑓 𝑘 !"#$ = − log( !" ! )<label>(3)</label></formula><formula xml:id="formula_3">𝑖𝑑𝑓 𝑘 = log ( ! !"# !(!)</formula><p>)</p><p>where TF -term frequency, is the number of times a certain word appears in this corpus; N(doc) -number of documents in the corpus; N(k) -number of documents containing term k, and U -total number of words in the corpus. A tf-idf was calculated for each word in the corpus, and 140 words with a tf-idf score of less than 9 were selected as stop words. This custom stop word list was combined with general Latvian stop words from Garkaje <ref type="bibr" target="#b12">[13]</ref>: in total 450 stop words. After additional stop word filtering, documents contained relatively more informative words (see Fig. <ref type="figure" target="#fig_1">2</ref>). To evaluate topics created using different stop word selection, we used preassigned theme labels given in http://likumi.lv/ 4 . 3 Documents belonging to the same theme should have similar content or describe similar topics and questions. It should be noted that each document in likumi.lv may belong to more than one theme. We used documents from 3 themes: "human rights", "banks, finance, budget" and "taxes and fees". Topic models were created (one -using generic stop word set, and anotherusing adapted stop word set) by which selected documents were then classified as belonging to particular topics. Document distribution by topics is shown in Fig. <ref type="figure" target="#fig_2">3</ref> and Fig. <ref type="figure" target="#fig_3">4</ref>.</p><p>To assess the impact of domain-specific stop words removal, we implemented the LDA model using a general list of stop words (Fig. <ref type="figure" target="#fig_2">3</ref>) and then compared it with the model generated using a domain-specific list of stop words (Fig. <ref type="figure" target="#fig_3">4</ref>). One topic is found by both models: Topic 9 in Fig. <ref type="figure" target="#fig_2">3</ref> and topic 15 in Fig. <ref type="figure" target="#fig_3">4</ref> represent a document with significant English content -it is indicated by keywords in the English language. Other topics, although different, display some similarities (Table <ref type="table" target="#tab_1">1 and Table 2)</ref>.  As we see in Table <ref type="table" target="#tab_0">1</ref>, many keywords are present in multiple topics and are not representative ("year", "in law"). Topics generated using the second model (Table <ref type="table" target="#tab_1">2</ref>) contain representative words (i.e. "Cadaster", "convention"), which indicate that this topic talks about real estate ("Cadaster"), or international treaties ("convention").  The LDA model using adapted stop word list created more meaningful topics, as it contained more meaningful words, which tell us more about its content. Furthermore, as most of the more popular words were labeled as stop words, the model was able to classify more tax related documents into very expressive topics 3, 5, 9, 12 in Fig. <ref type="figure" target="#fig_3">4</ref>, while the model with just generic stop word filtering applied, created more broad topics as # 4, 6 and 12 in Fig. <ref type="figure" target="#fig_2">3</ref>. Both models classified documents into multiple topics, some of which corresponded to topics assigned to those documents in http://likumi.lv/. However, most topics were different; for instance, topics containing English words, or terms related to international treaties. This shows that topics that are found using topic analysis give insights into data and offer different classification schemes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Topic Models and Their Evaluation</head><p>To evaluate the performance of different topic analysis algorithms, we built topic models using LDA, HDP and LSI algorithms, and visualized topics found using gensim visualization tool. HDP algorithm does not need to know the number of topics as it is able to determine the number of topics automatically. In this case it has found 150 topics, which is the maximum allowed by gensim implementation. The first topic (see Fig. <ref type="figure" target="#fig_4">5</ref> left) contains 80% of tokens (words), the second topic -16% of tokens, the third -2.5% of tokens, and the rest of the topics contain less than 1.5% of tokens. The LDA model, in comparison with the HDP model, is more balanced -the largest topic contains 11.7% of tokens and the smallest one contains 1.7% of tokens. It was not possible to visualize the LSI model, as it contains negative weights for terms, which are not supported by gensim. Therefore, models were evaluated using coherence metrics proposed by Röder et al. <ref type="bibr" target="#b14">[15]</ref>. The results are shown in Table <ref type="table" target="#tab_2">3</ref>. In Table <ref type="table" target="#tab_2">3</ref>, the LDA 20 topic model has the highest coherence measure among the three models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>We explored the application of different topic analysis algorithms and stop word filtering approaches to the corpus of legal texts in the Latvian language. Domainadapted stop word filtering improves topic models that are produced using the LDA model, yielding more expressive topics, which allow separating of topics into more distinctive groups. When the corpus contains multiple documents, stop word filtering methods, explored in this work, are applicable to other corpora and languages. Compared to the LSI and the HDP, the LDA algorithm produced a topic model which was shown to perform better than the alternatives. However, for topics generated by the LDA to be of practical value, some further finetuning needs to be done, as, currently, topics from different dimensions are mixed -for instance, there was a topic for documents in English, and a topic for documents which are international treaties, and both topics encompass documents which talk about different themes such as civil rights and taxes.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Most common words in corpus.</figDesc><graphic coords="4,224.67,295.40,145.90,85.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Most common words in the corpus after stop words removal.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Document distribution by topic using generic stop word list.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Document distribution by topic using adapted stop word list.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. Topics found using HDP (left) and LDA (right).</figDesc><graphic coords="8,124.70,147.40,348.75,210.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Keywords of most prominent topics without filtering.</figDesc><table><row><cell></cell><cell></cell><cell>bankas-finanses-budzets</cell><cell>cilvektiesibas</cell><cell>nodokli-un-nodevas</cell></row><row><cell></cell><cell>50</cell><cell></cell></row><row><cell>Document % in topic</cell><cell>0 10 20 30 40</cell><cell></cell></row><row><cell></cell><cell></cell><cell cols="2">0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20</cell></row><row><cell></cell><cell></cell><cell></cell><cell>Topic number</cell></row><row><cell cols="3">Topic Keywords in Latvian</cell><cell>Keywords translated in English</cell></row><row><cell cols="2"># 4</cell><cell cols="2">Latvijas, republikas, padomes, eiropas, Latvian, republic, councils, european</cell></row><row><cell cols="3">3 https://likumi.lv/ta/tema</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Keywords of most prominent topics with filtering.</figDesc><table><row><cell cols="3">Topic Keywords in Latvian</cell><cell></cell><cell></cell><cell>Keywords in English</cell></row><row><cell># 3</cell><cell cols="4">valstī, līgumslēdzējas, nodokļiem,</cell><cell>in the country, contracting, taxes, state,</cell></row><row><cell></cell><cell cols="2">state, līgumslēdzējā</cell><cell></cell><cell></cell><cell>contracting</cell></row><row><cell># 5</cell><cell cols="5">likumu, persona, daļā, tiesas, redakcijā law, a person, part, courts, version</cell></row><row><cell># 9</cell><cell cols="4">kadastra, nekustamā, eiro, nodokļa,</cell><cell>cadaster, real, euro, tax, property</cell></row><row><cell></cell><cell>īpašumu</cell><cell></cell><cell></cell><cell></cell></row><row><cell># 10</cell><cell>atbalsta,</cell><cell>izmaksas,</cell><cell cols="2">programmas,</cell><cell>supports, costs, programs, within, coop-</cell></row><row><cell></cell><cell cols="2">ietvaros, sadarbības</cell><cell></cell><cell></cell><cell>eration</cell></row><row><cell># 12</cell><cell cols="4">kapitāla, ieguldījumu, tirgus, pārskata,</cell><cell>capital, investment, market, review,</cell></row><row><cell></cell><cell cols="2">apdrošināšanas</cell><cell></cell><cell></cell><cell>insurance</cell></row><row><cell># 14</cell><cell cols="4">vēlēšanu, domes, pilsētas, komisija,</cell><cell>election, city council, cities, commis-</cell></row><row><cell></cell><cell cols="2">pārvaldes dienesta</cell><cell></cell><cell></cell><cell>sion, administration, service</cell></row><row><cell># 15</cell><cell cols="3">or, shall, be, for, by, article</cell><cell></cell><cell>or, shall, be, for, by, article</cell></row><row><cell># 16</cell><cell>puses,</cell><cell cols="2">līgumslēdzējas,</cell><cell>puse,</cell><cell>sides, contracting, sides, convention,</cell></row><row><cell></cell><cell cols="2">konvencijas, teritorijā</cell><cell></cell><cell></cell><cell>territory</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Coherence measures of topic models.</figDesc><table><row><cell>Topic model</cell><cell>Coherence measure (u_mass)</cell></row><row><cell>HDP</cell><cell>-7.906688044302112</cell></row><row><cell>LDA (20 topics)</cell><cell>-7.7222343265180715</cell></row><row><cell>LSI</cell><cell>-9.482837750182188</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://radimrehurek.com/gensim/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://scikit-learn.org/stable/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgments. The work on this paper is supported by ERAF research 1.2.1.1/18/A/003 project No. 1.9.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Raw materials for business processes in cloud</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kirikova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Buksa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Penicina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Lect. Notes Bus. Inf. Process</title>
		<imprint>
			<biblScope unit="volume">113</biblScope>
			<biblScope unit="page" from="241" to="254" />
			<date type="published" when="2012">2012</date>
			<publisher>LNBIP</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L L S</forename><surname>Washington</surname></persName>
		</author>
		<title level="m">Types of Legislative Documents</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques</title>
		<author>
			<persName><forename type="first">M</forename><surname>Allahyari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pouriyeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Assefi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Safaei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">D</forename><surname>Trippe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">B</forename><surname>Gutierrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kochut</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">An analysis of topic modelling for legislative texts</title>
		<author>
			<persName><forename type="first">J</forename><surname>O'neill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Robin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>O'brien</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Buitelaar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proc</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">2143</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Approaches to text mining arguments from legal cases</title>
		<author>
			<persName><forename type="first">A</forename><surname>Wyner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mochales-Palau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Moens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Milward</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Lect. Notes Comput. Sci</title>
		<imprint>
			<biblScope unit="volume">6036</biblScope>
			<biblScope unit="page" from="60" to="79" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
	<note>LNAI</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Automatic extraction of semantics in law documents</title>
		<author>
			<persName><forename type="first">C</forename><surname>Soria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bartolini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lenci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Montemagni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Pirrelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. V Legis. XML Work</title>
				<meeting>V Legis. XML Work</meeting>
		<imprint>
			<date type="published" when="2007-02">February 2007. 2007</date>
			<biblScope unit="page" from="253" to="266" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Exploring the use of text classification in the legal domain</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">M</forename><surname>Sulea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Van Genabith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proc</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">2143</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Topic subject creation using unsupervised learning for topic modeling</title>
		<author>
			<persName><forename type="first">R</forename><surname>Mehdiyev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Nava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sodhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Acharya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Rana</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Online variational inference for the hierarchical Dirichlet process</title>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Paisley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="752" to="760" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">New algorithms assessing short summaries in expository texts using latent semantic analysis</title>
		<author>
			<persName><forename type="first">R</forename><surname>Olmos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>León</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Jorge-Botana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Escudero</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Behav. Res. Methods</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="page" from="944" to="950" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Latent Dirichlet allocation</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="993" to="1022" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Termite: Visualization techniques for assessing textual topic models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Heer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proc. Work. Adv. Vis. Interfaces AVI</title>
		<imprint>
			<biblScope unit="page" from="74" to="77" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Normalization and Automatized Sentiment Analysis of Contemporary Online Latvian Language</title>
		<author>
			<persName><forename type="first">G</forename><surname>Garkaje</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zilgalve</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dargis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Front. Artif. Intell. Appl</title>
		<imprint>
			<biblScope unit="volume">268</biblScope>
			<biblScope unit="page" from="83" to="86" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Automatically Building a Stopword List for an Information Retrieval System</title>
		<author>
			<persName><forename type="first">Tsz-Wai</forename><surname>Lo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ounis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Exploring the space of topic coherence measures</title>
		<author>
			<persName><forename type="first">M</forename><surname>Röder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Both</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hinneburg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WSDM 2015 -Proc. 8th ACM Int. Conf. Web Search Data Min</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="399" to="408" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
