<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Investigating the Similarity of Court Decisions</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sarika</forename><surname>Jain</surname></persName>
							<email>jasarika@nitkkr.ac.in</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Applications</orgName>
								<orgName type="institution">National Institute of Technology Kurukshetra</orgName>
								<address>
									<settlement>Kurukshetra</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Deepak</forename><surname>Jaglan</surname></persName>
							<email>deepakjaglan34@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Applications</orgName>
								<orgName type="institution">National Institute of Technology Kurukshetra</orgName>
								<address>
									<settlement>Kurukshetra</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kapil</forename><surname>Gupta</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Applications</orgName>
								<orgName type="institution">National Institute of Technology Kurukshetra</orgName>
								<address>
									<settlement>Kurukshetra</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Investigating the Similarity of Court Decisions</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">B35AC0C5CE11365035503388B7E52052</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Semantic Similarity</term>
					<term>Legal Document</term>
					<term>Document Embedding</term>
					<term>Cosine Similarity</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The association between words, phrases, and documents is referred to as semantic similarity. Semantic similarity has played a significant role in internet search engines regarding content ranking. It also has wide applications in information retrieval, artificial intelligence, etc., to name a few. This paper reviews the general architecture, categorization of approaches, and techniques and metrics for determining semantic similarity between documents in a comprehensive way. We have conducted experiments on the different statistical methods, viz., word vector-based techniques (TF-IDF, LDA, Word2Vec, Doc2Vec, Glove, and fastText), and transformer-based techniques (Longformer-base, Sentence-BERT-large-nli, Sentence-BERT-large-nli-stsb, and Sentence-RoBERTa-large-nli-stsb) over Indian Supreme Court decisions and discussed the results. The Doc2Vec approach over the whole document is found to correlate the most with the expert judgment.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Semantic similarity can be well described as the relate-ability between words, sentences, and documents. It is most likely a quantitative measure of information that has evolved into a core technique that is now widely used in a variety of fields, including biological computing <ref type="bibr" target="#b0">[1]</ref>, information retrieval <ref type="bibr" target="#b1">[2]</ref>, artificial intelligence <ref type="bibr" target="#b2">[3]</ref>, geoinformation <ref type="bibr" target="#b3">[4]</ref>, and natural language processing <ref type="bibr" target="#b4">[5]</ref>, as well as other intelligent knowledge-based systems <ref type="bibr" target="#b5">[6]</ref>. For the use case scenario, identification of related literature assists legal professionals in obtaining relevant literature. Some authors have studied similarity analysis of legal judgements <ref type="bibr" target="#b6">[7]</ref>. We bring relevant literature primarily based on text-based methods and deep learning approaches like transformer models.</p><p>Our focus in this paper is to review the semantic similarity approaches exhaustively in context to the legal case documents in particular. This approach is not restrictive to the legal case documents. Instead, we may use this method in various other subjects' domains. Throughout this article, we shall concentrate on the legal arena.</p><p>The requirement for an accurate and relatable legal information retrieval area is the most pressing challenge in today's legal society. Because the Common Law System is one of the most widely followed legal systems globally, the success or failure of a case is heavily influenced by previous instances. The deluge of information on the internet has made it difficult for legal practitioners to manually discover significant earlier examples that appropriately serve their current case. As a result, a likely answer is found by comparing the similarity of the different case documents, which various authors have recently studied <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10]</ref>. Statistical methods, also known as text-based methods, utilize the textual content of legal documents. These methods include only primitive text-based similarity measures, such as TF-IDF-based approaches. In <ref type="bibr" target="#b9">[10]</ref>, the authors have improved the text-based technique with the similarity measures such as topic modeling and neural network models such as word embeddings and document embeddings. Also, it has been shown that the word vector-based approaches perform better than other approaches.</p><p>The present paper details the outlines of the review of different methods used in the text-based category besides approximate validation of the experimental results. More precisely, we discuss:</p><p>1. The comprehensive details of the different semantic similarity approaches provide an insight into the generalized architecture of the various techniques used in semantic similarity. 2. In context to the legal domain, we confine ourselves to word vector-based and transformerbased approaches and discuss the experimental results we obtain in each method in both directions.</p><p>The layout of this paper is as follows: In the next section, we present the analytical discussion comprising the general architecture for semantic similarity along with the different semantic similarity approaches in detail. A brief discussion regarding the similarity measures followed by evaluation measures is required for a comparative study in Section 3. Finally, in the same section, we detail the experimental results obtained in the context of the legal domain, followed by the underlying discussion. Section 4 deals with the conclusions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Analytical Discussion</head><p>The main content of the present paper is depicted in the following flowchart (Figure <ref type="figure" target="#fig_0">1</ref>), i.e., first of all, we will discuss various document representations methods and document pre-processing.</p><p>After that, we will discuss the semantic similarity approaches followed by semantic measures and evaluation measures. The meaning of these terminologies shall be transparent in their respective discussions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>General Architecture</head><p>We feed the input as the unstructured text, and from it, we select the corpus to obtain the representative text. After that, we initiate the data processing of the representative text by first removing punctuations and stopwords and then stemming. Thus we get a clean text. Now, we begin the data modeling of the clean text to extract the features of documents, i.e., the word embeddings. For that, we employ semantic similarity techniques, viz., word vector-based techniques (TF-IDF, LDA, Word2Vec, Doc2Vec, Glove, and fastText), and transformer-based techniques (Longformer-base, Sentence-BERT-large-nli, Sentence-BERT-large-nli-stsb, and  Sentence-RoBERTa-large-nli-stsb). We calculate the cosine similarity between these feature documents to obtain the similarity scores. Data modeling and similarity measures can utilize semantic information. Further, since we also have the similarity scores from the experts, we will evaluate the Pearson coefficient between the similarity scores given by the expert with the ones obtained by us. Hence, we quantify how well our methods perform when compared to the similarity scores given by the experts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Document Representation</head><p>There are various ways to document representation, viz., whole document, summary, paragraph, and the reason for citations (RFCs).</p><p>In whole document representation, the whole of the document is taken under consideration, while in summary, the important content is taken into consideration, leaving the redundant part. A set of paragraphs is considered in paragraph document representation in such a way that each paragraph of one document is compared to all the paragraphs of the other document in the corpus. RFC method is a citation-based method, and it works on a similar note as the paragraph-based method. In thematic representation, the theme of the document is taken into consideration. After selecting meaningful representations from the text of the documents, their similarity is measured.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Data Pre-processing</head><p>Data preprocessing is crucial in preparing the data since we deal with unstructured text data. It transforms the text into a more digestible form. Now, we outline the steps involved in the data preprocessing. Firstly, all of the letters are changed to small lowercase. Then based on whitespaces, tokenization of the text into words is done. Except for terms containing the letters hyphen, dot, and comma, all non-alphabetic words are filtered away. After that, standard English stopwords are then removed from the list of words. Using Porter Stemmer, we finally perform the overall word stemming. In this way, we obtain a better representation of our text.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Semantic Similarity Approaches and Measures</head><p>The main principles behind the existing approaches that we reproduce in this study are described in this section. As previously indicated, existing approaches utilize various similarity measures that are divided into three broad categories: (i) statistical similarity, (ii) graph-based similarity, and (iii) document clustering-based similarity. We will present a detailed overview of each category classified above.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.1.">Statistical Similarity</head><p>The statistical-based similarity approach is built on collecting texts either in written or spoken forms. There are various ways to compare statistical similarities between legal documents, viz., word vector-based, string-based, transformer-based, and hybrid-based. We confine our experiments to the word vector-based and transformer-based techniques in this paper.</p><p>The meaning of the word vector-based method is clear from its very name, i.e., it defines the vector representation of the documents. We enlist all the methods derived from word vector, viz., TF-IDF technique, LDA, Word2Vec, Doc2Vec, Glove, fastText. A single vector representation of the given document (e.g., a legal document) is created in the TF-IDF approach. The computation of the similarity score between vectors is done with the aid of the cosine similarity (see, e.g., <ref type="bibr" target="#b7">[8]</ref>). In contrast, as depicted in <ref type="bibr" target="#b9">[10]</ref>, the LDA technique is a topic modeling algorithm, and it captures the semantics of the documents in an appropriate way. In the models, based on neural networks such as Word2vec and Doc2vec, gives a vector for each distinct word (see, e.g., <ref type="bibr" target="#b10">[11]</ref>) and each document (see, e.g., <ref type="bibr" target="#b11">[12]</ref>), respectively. Similar to word2vec method, the dense vectors are constructed in both these GloVe (see, e.g., <ref type="bibr" target="#b12">[13]</ref>) and fastText methods (see, e.g., <ref type="bibr" target="#b13">[14]</ref>).</p><p>String-based similarity includes the character and term-based similarity measures. The transformer-based similarity approach is built on the language models with deep contextual text representations by incorporating the word positioning. The various transformer techniques are given as Longformer-base, Sentence-BERT-large-nli, Sentence-BERT-large-nli-stsb, and Sentence-RoBERTa-large-nli-stsb.</p><p>To address the constraints of the numerous statistically-based similarity approaches listed above, a hybrid model was created by combining some or all of them in a suitable way to meet at least all of the essential criteria of each feasible combination of methods. For more details in the context of the hybrid method, the reader is referred to <ref type="bibr" target="#b14">[15]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.2.">Document Clustering</head><p>Clustering is an unsupervised learning problem in which the goal is to arrange a set of objects in such a way that the objects in the same cluster are more similar (in meaning) to each other than the objects in the other cluster. Clustering may be used in various disciplines, with intelligent text clustering being one of the most common. Traditional text clustering algorithms gathered documents based on keyword matching, which meant that the texts were grouped without any descriptive concepts. As a result, non-similar texts were grouped. The essential answer to this challenge is group papers based on semantic similarity, which means grouping pages based on meaning rather than keywords.</p><p>One of the most well-known methods for producing a single grouping is k-means, wherein the number of clusters, 𝑘, must be determined beforehand. Initially, there are 𝑘 clusters specified, and after that, each document in the document collection is reassigned based on the document's resemblance to the 𝑘 clusters. The 𝑘 clusters are then updated. After that, the document set's documents are all reassigned. This method is repeated until all 𝑘 clusters remain the same. Alternatively, from <ref type="bibr" target="#b15">[16]</ref>, bisecting 𝑘-means method is used to cluster documents. Here, all items are thought to be part of a single cluster. A cluster is broken into two every time. This process is continued until the desired number of clusters has been achieved. The reader is referred to <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b18">19]</ref> for more details on clustering approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.3.">Graph Based Similarity</head><p>The graph-based similarity approach is based on graphical methods. These methods are further based on different techniques, ontology-based, relational-based, citation-based, and hybrid-based. The prior-case citation network of the document is constructed to compute the Precedent Citation Similarity. The vertices of the network are the case documents. A directed edge exists between two vertices 𝑖 and 𝑗 if document 𝑖 cites document 𝑗 in its text. Consider an example graph such that an edge exists from vertex A to E since A cites E. To build document vectors, we investigate citation-based networks approaches in which documents are nodes and edges correspond to citations.</p><p>The relational approach emphasizes measuring the relation between two words, unlike measuring the degree of similarity. Using a predetermined pattern of vector frequencies from a vast corpus, this approach determines the link between word pairs. It enhances current ontologies and is utilized in document semantic annotation. The reader is referred to <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22]</ref> for more details on these three approaches.</p><p>The ontology-based approach is a graph-based semantic similarity approach, and it is classified into three broad methods: single ontology-based, cross ontology-based, and lexical resource. The path distance between concepts determines how similar the two concepts are. The ontology or taxonomy structure is used to calculate similarity in this metric. A type relation links essential linkages in this ontology or taxonomic structure. As a result, the shortest path is used to compute similarity, and the length of the path defines the degree of similarity. The depth relative measure is similar to the shortest path approach, but it takes into account the depth of the edges linking the two concepts in the ontology's basic structure and determines the depth between the root and the target concept. In the information-based approach, also known as the corpus-based approach, the information previously contained in the ontologies or taxonomy is supplemented with the knowledge given by the corpus. For comparing the concepts, the hybrid and feature-based measures consider the knowledge derived from different sources and features, respectively. We refer the reader to <ref type="bibr" target="#b22">[23]</ref> for further details on the DeepWalk algorithm. Previously mentioned semantic similarity measurements are intended for a single ontology. With the expansion of online information sources, metrics are needed to calculate the similarity between concepts belonging to different ontologies. The methods that quantify the comparison of the terms from various ontologies are known as cross ontology measures.</p><p>To compute the semantic similarity, one employs WordNet and Wikipedia as Lexical resources. The wordNet technique is based on Directed Acyclic Graphs (DAG) theory. The semantic distance and DAG information compute the semantic similarity between the words or concepts. We refer the reader to <ref type="bibr" target="#b23">[24]</ref> and <ref type="bibr" target="#b24">[25]</ref> for further details on DAG.</p><p>The hybrid methods can be a combination of statistical, ontology, and relational approaches. We refer the reader to <ref type="bibr" target="#b25">[26]</ref> for more details on such approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experimental Results and Discussion</head><p>This section compares these scores to those assigned by domain experts to see if they are consistent. We have taken the data sets of legal documents, viz., Indian Supreme Court case decisions (gold standard pairs) (see 3.1), for legal document similarity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Dataset</head><p>The dataset contains all Indian Supreme Court case decisions in text format spanning 67 years (from 1950 to 2016). Each text begins with an optional headnote (a summary of a legal case that incorporates several legal concerns and specifies the written laws employed throughout the litigation process) and continues with the case's whole litigation procedure. We crawled the texts from the Legal Information Institute of India's (LIIofIndia) website (http://www.liiofindia.org/ in/cases/cen/INSC/), a website that maintains several legal databases. A gold standard comprising legal expert judgments on how similar two documents are, is essential to compare and evaluate our methods. We have analyzed the 47 pairs of the case documents of the Indian Supreme Court, as our gold standard, along the lines of <ref type="bibr" target="#b7">[8]</ref> and <ref type="bibr" target="#b9">[10]</ref>. The expert annotations ranging from 0 (lowest similarity) to 10 (highest similarity) were sought for each of these pairs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Evaluation Measure</head><p>We calculate the similarity scores using each of our techniques for each of the 47 test pairings to assess our techniques. Then, for each strategy, we find the Pearson Correlation Coefficient between the 47 scores obtained by the techniques to those provided by the experts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Calculate Similarity between pairs</head><p>Finding similarities among documents is vital from the perspective of Information Retrieval and allied fields. The approaches create for two documents a vector representation with the dimensions being the terms in the documents, word embeddings, or semantic notions. As a result, we obtained the vectors of the document pairs. Finally, we apply cosine similarity to find the angle between the resultant vectors.</p><p>Cosine Similarity: It is a similarity measure of two non-zero vectors of an inner product space, which finds the cosine of the angle between them. The cosine similarity of two vectors having the same orientation is 1, and vectors that are orthogonal have the similarity of 0.The cosine similarity cos(𝜃) of two vectors 𝐴 and 𝐵 is</p><formula xml:id="formula_0">cos(𝜃) = 𝐴 • 𝐵 ||𝐴|| ||𝐵|| ,</formula><p>where, (•) represents the vector dot product.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Performance</head><p>The Pearson's correlation coefficient is used to measure how well our approaches work compared to expert similarity scores. The correlation between the obtained scores and those offered by legal experts is then calculated.</p><p>Correlation coefficient (𝜌): It is the ratio of the two variables' covariance and standard deviations. Mathematically, let P and Q be two variables, then correlation coefficient (𝜌) is defined below as</p><formula xml:id="formula_1">𝜌 = 𝑐𝑜𝑣(𝑃, 𝑄) 𝜎 𝑃 𝜎 𝑄 ,</formula><p>where 𝑐𝑜𝑣(𝑃, 𝑄) represents the covariance between 𝑃 and 𝑄, and 𝜎 𝑃 and 𝜎 𝑄 represent the standard deviations of variables 𝑃 and 𝑄. Also, we have the inequality that −1 ≤ 𝜌 ≤ 1. The value 𝜌 = −1 signifies that the variables are anti-correlated whereas 𝜌 = 1 signifies that they are highly correlated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Results and Discussion</head><p>Table <ref type="bibr" target="#b0">(1)</ref> enlists the column headings, viz., Case Pairs, Expert Scores, and Similarity scores using word vector-based and transformer-based approaches. This table shows different similarity scores-given (1) by legal experts and ( <ref type="formula">2</ref>) by those that we obtained from the experiment by using word vector-based and transformer-based techniques. To find the similarity scores between the pairs, we used the case of word vector-based the following approaches: TFIDF, Doc2vec, GloVe, and fastText methods, while in transformer-based, we employed Longformer-base, Sentence-BERT-large-nli-stsb, and Sentence-RoBERTa-large-nli-stsb.</p><p>In Table <ref type="bibr" target="#b1">(2)</ref>, for each technique, viz., word vector based and transformer-based, we compute the Pearson correlation coefficient for each method with respect to the expert scores. The highest correlation value obtained for both the approaches is in the italic font, i.e., Doc2Vec and Sentence-RoBERTa-large-nli-stsb.</p><p>The methods corresponding to which the detailed similarity scores between each pair are computed in the Table <ref type="bibr" target="#b0">(1)</ref> are represented by the bold font in the Table <ref type="bibr" target="#b1">(2)</ref>. When the expert scores are assigned low, the word vector-based technique is closer to the expert scores than the transformer-based technique. Whereas, when the expert scores are assigned as high, the transformer-based approach is closer to the expert scores than the word vector-based. The Pearson correlation coefficient in the transformer-based method is lesser than that of the word vector-based. This trend can also be seen in <ref type="bibr" target="#b14">[15]</ref> where the authors obtain that the evaluation parameters are lesser in transformer-based methods as compared to word vector-based methods in the context of the US Supreme Court decisions. The higher the value of the correlation, the better the corresponding method's performance. Doc2vec obtains the highest correlation value with the experts' score (is computed as 0.685) and Sentence-RoBERTa-large-nli-stsb (is computed as 0.401) methods, respectively, in the word vector-based technique and the transformer-based technique. Overall, Doc2vec provides the highest correlation value with the experts' scores.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>This paper presents a comprehensive review of the semantic similarity, i.e., categorization, and techniques and metrics for determining semantic similarity. We then discuss exclusively the semantic similarity of the legal court case documents wherein we confine ourselves to word-vector-based and transformer-based techniques in the context of the experiments. Finally, we discuss the results we obtained while computing semantic similarity among legal documents with different techniques, viz., word vector-based techniques (TF-IDF, LDA, Word2Vec, Doc2Vec, Glove, and fastText), and transformer-based techniques (Longformer-base, Sentence-BERT-largenli, Sentence-BERT-large-nli-stsb, and Sentence-RoBERTa-large-nli-stsb). We observed that the Doc2vec similarity correlates the most with expert judgment from both the techniques, viz., word vector-based and transformer-based techniques.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Acknowledgment</head><p>This work is supported by the IHUB-ANUBHUTI-IIITD FOUNDATION set up under the NM-ICPS scheme of the Department of Science and Technology, India. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: General Architecture for Semantic Similarity.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Semantic Similarity Approaches.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1</head><label>1</label><figDesc>Similarity scores for all the gold standard pairs of Indian supreme Court decisions, using word vector base and transformer based approaches.</figDesc><table><row><cell cols="2">Sr. No. Case Pairs</cell><cell>Expert Score</cell><cell cols="7">Similarity Scores Word Vector Based TFIDF Doc2vec GloVe fastText Longformer BERT RoBERTa Transformer Based</cell></row><row><cell>1</cell><cell>1992_47 &amp; 1992_76</cell><cell>0</cell><cell>0.168</cell><cell>0.160</cell><cell>0.864</cell><cell>0.347</cell><cell>0.993</cell><cell>0.443</cell><cell>0.351</cell></row><row><cell>2</cell><cell>1992_76 &amp; 1992_182</cell><cell>0</cell><cell>0.143</cell><cell>0.146</cell><cell>0.838</cell><cell>0.386</cell><cell>0.993</cell><cell>0.449</cell><cell>0.378</cell></row><row><cell>3</cell><cell>1972_11 &amp; 1984_115</cell><cell>0</cell><cell>0.127</cell><cell>0.084</cell><cell>0.838</cell><cell>0.179</cell><cell>0.986</cell><cell>0.643</cell><cell>0.466</cell></row><row><cell>4</cell><cell>1969_57 &amp; 1980_91</cell><cell>0</cell><cell>0.282</cell><cell>0.271</cell><cell>0.910</cell><cell>0.521</cell><cell>0.989</cell><cell>0.565</cell><cell>0.625</cell></row><row><cell>5</cell><cell>1959_151 &amp; 1982_28</cell><cell>0</cell><cell>0.237</cell><cell>0.238</cell><cell>0.895</cell><cell>0.527</cell><cell>0.990</cell><cell>0.677</cell><cell>0.492</cell></row><row><cell>6</cell><cell>1976_200 &amp; 1959_151</cell><cell>0</cell><cell>0.218</cell><cell>0.051</cell><cell>0.904</cell><cell>0.304</cell><cell>0.990</cell><cell>0.674</cell><cell>0.424</cell></row><row><cell>7</cell><cell>1985_114 &amp; 1959_151</cell><cell>0</cell><cell>0.291</cell><cell>0.263</cell><cell>0.899</cell><cell>0.572</cell><cell>0.990</cell><cell>0.777</cell><cell>0.683</cell></row><row><cell>8</cell><cell>1966_236 &amp; 1967_267</cell><cell>0</cell><cell>0.236</cell><cell>0.353</cell><cell>0.903</cell><cell>0.688</cell><cell>0.983</cell><cell>0.563</cell><cell>0.611</cell></row><row><cell>9</cell><cell>1961_34 &amp; 1979_110</cell><cell>0</cell><cell>0.303</cell><cell>0.322</cell><cell>0.935</cell><cell>0.635</cell><cell>0.995</cell><cell>0.689</cell><cell>0.556</cell></row><row><cell>10</cell><cell>1961_34 &amp; 1987_37</cell><cell>0</cell><cell>0.151</cell><cell>0.193</cell><cell>0.885</cell><cell>0.447</cell><cell>0.992</cell><cell>0.671</cell><cell>0.615</cell></row><row><cell>11</cell><cell>1992_47 &amp; 1987_315</cell><cell>0</cell><cell>0.388</cell><cell>0.358</cell><cell>0.898</cell><cell>0.712</cell><cell>0.991</cell><cell>0.689</cell><cell>0.613</cell></row><row><cell>12</cell><cell>1984_115 &amp; 1987_315</cell><cell>0</cell><cell>0.489</cell><cell>0.459</cell><cell>0.960</cell><cell>0.796</cell><cell>0.991</cell><cell>0.712</cell><cell>0.498</cell></row><row><cell>13</cell><cell>1992_47 &amp; 1992_76</cell><cell>0</cell><cell>0.168</cell><cell>0.160</cell><cell>0.864</cell><cell>0.347</cell><cell>0.993</cell><cell>0.443</cell><cell>0.351</cell></row><row><cell>14</cell><cell>1984_115 &amp; 1987_315</cell><cell>0</cell><cell>0.246</cell><cell>0.238</cell><cell>0.842</cell><cell>0.502</cell><cell>0.990</cell><cell>0.708</cell><cell>0.556</cell></row><row><cell>15</cell><cell>1983_129 &amp; 1983_27</cell><cell>1</cell><cell>0.590</cell><cell>0.561</cell><cell>0.959</cell><cell>0.723</cell><cell>0.995</cell><cell>0.745</cell><cell>0.657</cell></row><row><cell>16</cell><cell>1979_110 &amp; 1953_28</cell><cell>2</cell><cell>0.481</cell><cell>0.178</cell><cell>0.957</cell><cell>0.583</cell><cell>0.989</cell><cell>0.534</cell><cell>0.458</cell></row><row><cell>17</cell><cell>1963_170 &amp; 1979_158</cell><cell>2</cell><cell>0.512</cell><cell>0.492</cell><cell>0.951</cell><cell>0.648</cell><cell>0.993</cell><cell>0.793</cell><cell>0.744</cell></row><row><cell>18</cell><cell>1983_27 &amp; 1983_37</cell><cell>2</cell><cell>0.640</cell><cell>0.527</cell><cell>0.960</cell><cell>0.809</cell><cell>0.995</cell><cell>0.784</cell><cell>0.774</cell></row><row><cell>19</cell><cell>1983_27 &amp; 1979_33</cell><cell>2</cell><cell>0.672</cell><cell>0.581</cell><cell>0.957</cell><cell>0.685</cell><cell>0.994</cell><cell>0.739</cell><cell>0.725</cell></row><row><cell>20</cell><cell>1984_115 &amp; 1981_49</cell><cell>2</cell><cell>0.520</cell><cell>0.500</cell><cell>0.963</cell><cell>0.788</cell><cell>0.992</cell><cell>0.733</cell><cell>0.598</cell></row><row><cell>21</cell><cell>1979_110 &amp; 1989_233</cell><cell>3</cell><cell>0.368</cell><cell>0.351</cell><cell>0.935</cell><cell>0.551</cell><cell>0.991</cell><cell>0.648</cell><cell>0.663</cell></row><row><cell>22</cell><cell>1983_129 &amp; 1976_176</cell><cell>5</cell><cell>0.428</cell><cell>0.266</cell><cell>0.954</cell><cell>0.703</cell><cell>0.992</cell><cell>0.651</cell><cell>0.573</cell></row><row><cell>23</cell><cell>1971_111 &amp; 1972_291</cell><cell>5</cell><cell>0.445</cell><cell>0.393</cell><cell>0.931</cell><cell>0.566</cell><cell>0.993</cell><cell>0.656</cell><cell>0.513</cell></row><row><cell>24</cell><cell>1990_171 &amp; 1988_88</cell><cell>5</cell><cell>0.275</cell><cell>0.297</cell><cell>0.893</cell><cell>0.602</cell><cell>0.993</cell><cell>0.658</cell><cell>0.558</cell></row><row><cell>25</cell><cell>1972_31 &amp; 1984_115</cell><cell>5</cell><cell>0.533</cell><cell>0.536</cell><cell>0.947</cell><cell>0.754</cell><cell>0.991</cell><cell>0.694</cell><cell>0.581</cell></row><row><cell>26</cell><cell>1984_118 &amp; 1971_336</cell><cell>5</cell><cell>0.479</cell><cell>0.356</cell><cell>0.960</cell><cell>0.681</cell><cell>0.991</cell><cell>0.808</cell><cell>0.609</cell></row><row><cell>27</cell><cell>1987_154 &amp; 1964_144</cell><cell>5</cell><cell>0.501</cell><cell>0.492</cell><cell>0.954</cell><cell>0.846</cell><cell>0.991</cell><cell>0.665</cell><cell>0.527</cell></row><row><cell>28</cell><cell>1973_186 &amp; 1986_218</cell><cell>5</cell><cell>0.392</cell><cell>0.393</cell><cell>0.926</cell><cell>0.586</cell><cell>0.989</cell><cell>0.645</cell><cell>0.498</cell></row><row><cell>29</cell><cell>1990_96 &amp; 1990_171</cell><cell>5</cell><cell>0.325</cell><cell>0.439</cell><cell>0.932</cell><cell>0.724</cell><cell>0.992</cell><cell>0.689</cell><cell>0.732</cell></row><row><cell>30</cell><cell>1958_3 &amp; 1992_144</cell><cell>5</cell><cell>0.399</cell><cell>0.372</cell><cell>0.909</cell><cell>0.551</cell><cell>0.992</cell><cell>0.664</cell><cell>0.476</cell></row><row><cell>31</cell><cell>1979_158 &amp; 1965_111</cell><cell>7</cell><cell>0.586</cell><cell>0.529</cell><cell>0.964</cell><cell>0.755</cell><cell>0.994</cell><cell>0.670</cell><cell>0.606</cell></row><row><cell>32</cell><cell>1962_303 &amp; 1972_291</cell><cell>7</cell><cell>0.394</cell><cell>0.540</cell><cell>0.931</cell><cell>0.672</cell><cell>0.988</cell><cell>0.745</cell><cell>0.613</cell></row><row><cell>33</cell><cell>1987_37 &amp; 1989_233</cell><cell>7</cell><cell>0.169</cell><cell>0.234</cell><cell>0.903</cell><cell>0.560</cell><cell>0.992</cell><cell>0.565</cell><cell>0.530</cell></row><row><cell>34</cell><cell>1953_40 &amp; 1953_24</cell><cell>7</cell><cell>0.867</cell><cell>0.836</cell><cell>0.989</cell><cell>0.931</cell><cell>0.996</cell><cell>0.763</cell><cell>0.700</cell></row><row><cell>35</cell><cell>1966_154 &amp; 1976_43</cell><cell>7</cell><cell>0.434</cell><cell>0.431</cell><cell>0.947</cell><cell>0.745</cell><cell>0.989</cell><cell>0.588</cell><cell>0.663</cell></row><row><cell>36</cell><cell>1953_24 &amp; 1957_52</cell><cell>7</cell><cell>0.259</cell><cell>0.177</cell><cell>0.883</cell><cell>0.357</cell><cell>0.985</cell><cell>0.437</cell><cell>0.418</cell></row><row><cell>37</cell><cell>1984_115 &amp; 1971_49</cell><cell>7</cell><cell>0.489</cell><cell>0.482</cell><cell>0.942</cell><cell>0.817</cell><cell>0.993</cell><cell>0.714</cell><cell>0.662</cell></row><row><cell>38</cell><cell>1980_221 &amp; 1984_115</cell><cell>8</cell><cell>0.489</cell><cell>0.539</cell><cell>0.944</cell><cell>0.727</cell><cell>0.988</cell><cell>0.726</cell><cell>0.615</cell></row><row><cell>39</cell><cell>1980_39 &amp; 1969_324</cell><cell>8</cell><cell>0.663</cell><cell>0.648</cell><cell>0.973</cell><cell>0.933</cell><cell>0.992</cell><cell>0.652</cell><cell>0.575</cell></row><row><cell>40</cell><cell>1991_48 &amp; 1987_189</cell><cell>9</cell><cell>0.517</cell><cell>0.537</cell><cell>0.943</cell><cell>0.858</cell><cell>0.993</cell><cell>0.635</cell><cell>0.634</cell></row><row><cell>41</cell><cell>1979_104 &amp; 1979_110</cell><cell>9</cell><cell>0.793</cell><cell>0.695</cell><cell>0.974</cell><cell>0.922</cell><cell>0.994</cell><cell>0.831</cell><cell>0.802</cell></row><row><cell>42</cell><cell>1985_113 &amp; 1969_324</cell><cell>9</cell><cell>0.690</cell><cell>0.619</cell><cell>0.972</cell><cell>0.941</cell><cell>0.981</cell><cell>0.561</cell><cell>0.518</cell></row><row><cell>43</cell><cell>1979_33 &amp; 1979_110</cell><cell>9</cell><cell>0.815</cell><cell>0.838</cell><cell>0.990</cell><cell>0.949</cell><cell>0.995</cell><cell>0.776</cell><cell>0.799</cell></row><row><cell>44</cell><cell>1968_197 &amp; 1972_62</cell><cell>10</cell><cell>0.425</cell><cell>0.584</cell><cell>0.914</cell><cell>0.687</cell><cell>0.993</cell><cell>0.806</cell><cell>0.764</cell></row><row><cell>45</cell><cell>1992_47 &amp; 1984_115</cell><cell>10</cell><cell>0.518</cell><cell>0.540</cell><cell>0.945</cell><cell>0.725</cell><cell>0.993</cell><cell>0.733</cell><cell>0.688</cell></row><row><cell>46</cell><cell>1991_12 &amp; 1985_113</cell><cell>10</cell><cell>0.755</cell><cell>0.725</cell><cell>0.980</cell><cell>0.952</cell><cell>0.990</cell><cell>0.615</cell><cell>0.601</cell></row><row><cell>47</cell><cell>1983_37 &amp; 1979_33</cell><cell>10</cell><cell>0.754</cell><cell>0.750</cell><cell>0.978</cell><cell>0.884</cell><cell>0.993</cell><cell>0.679</cell><cell>0.721</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2</head><label>2</label><figDesc>Pearson correlation coefficient for the word vector based and transformer based methods on Indian Supreme Court decisions (Gold standard pairs).</figDesc><table><row><cell cols="2">Sr. No. Methods</cell><cell>Pearson Correlation Coefficient</cell></row><row><cell></cell><cell cols="2">Word Vector Based</cell></row><row><cell>1</cell><cell>TF-IDF</cell><cell>0.614</cell></row><row><cell>2</cell><cell>Word2Vec</cell><cell>0.601</cell></row><row><cell>3</cell><cell>Doc2Vec</cell><cell>0.685</cell></row><row><cell>4</cell><cell>LDA</cell><cell>0.424</cell></row><row><cell>5</cell><cell>fastText</cell><cell>0.625</cell></row><row><cell>6</cell><cell>GloVe</cell><cell>0.567</cell></row><row><cell></cell><cell cols="2">Transformer Based</cell></row><row><cell>7</cell><cell>Longformer-base</cell><cell>0.057</cell></row><row><cell>8</cell><cell>Sentence-BERT-large-nli</cell><cell>0.148</cell></row><row><cell>9</cell><cell>Sentence-BERT-large-nli-stsb</cell><cell>0.199</cell></row><row><cell>10</cell><cell>Sentence-RoBERTa-large-nli-stsb</cell><cell>0.401</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Omnisearch: a semantic search system based on the ontology for microrna target (omit) for microrna-target gene interaction data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gutierrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">J</forename><surname>Strachan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Blake</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Eilbeck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Natale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of biomedical semantics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="1" to="17" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Textpresso: an ontology-based information retrieval and extraction system for biological literature</title>
		<author>
			<persName><forename type="first">H.-M</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">E</forename><surname>Kenny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">W</forename><surname>Sternberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ashburner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PLoS biology</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page">e309</biblScope>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Measuring semantic similarity by latent relational analysis</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Turney</surname></persName>
		</author>
		<idno>arXiv preprint cs/0508053</idno>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Approaches to semantic similarity measurement for geo-spatial data: a survey</title>
		<author>
			<persName><forename type="first">A</forename><surname>Schwering</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions in GIS</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="5" to="29" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Term representation with generalized latent semantic analysis</title>
		<author>
			<persName><forename type="first">I</forename><surname>Matveeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Levow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Farahat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Royer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Amsterdam Studies in the Theory and History of Linguistic Science Series</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page">45</biblScope>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Knowledge-based sentence semantic similarity: algebraical properties</title>
		<author>
			<persName><forename type="first">M</forename><surname>Oussalah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mohamed</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Progress in Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="43" to="63" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Similarity analysis of legal judgments</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Reddy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">B</forename><surname>Reddy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the fourth annual ACM Bangalore conference</title>
				<meeting>the fourth annual ACM Bangalore conference</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1" to="4" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Finding similar legal judgements under common law system</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Reddy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">B</forename><surname>Reddy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Suri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International workshop on databases in networked information systems</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="103" to="116" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Similarity analysis of legal judgments</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">K</forename><surname>Reddy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">B</forename><surname>Reddy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the fourth annual ACM Bangalore conference</title>
				<meeting>the fourth annual ACM Bangalore conference</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1" to="4" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Measuring similarity among legal court case documents</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mandal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ghosh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ghosh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th annual ACM India compute conference</title>
				<meeting>the 10th annual ACM India compute conference</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3781</idno>
		<title level="m">Efficient estimation of word representations in vector space</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Distributed representations of sentences and documents</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1188" to="1196" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</title>
				<meeting>the 2014 conference on empirical methods in natural language processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Enriching word vectors with subword information</title>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions of the association for computational linguistics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="135" to="146" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Evaluating document representations for content-based legal literature recommendations</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ostendorff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ruas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Gipp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Moreno-Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rehm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law</title>
				<meeting>the Eighteenth International Conference on Artificial Intelligence and Law</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="109" to="118" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A novel approach to find semantic similarity measure between words</title>
		<author>
			<persName><forename type="first">L</forename><surname>Sahni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sehgal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kochar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ahmad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ahmad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2014 2nd International Symposium on Computational and Business Intelligence</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="89" to="92" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A similarity measure for text classification and clustering</title>
		<author>
			<persName><forename type="first">Y.-S</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-J</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on knowledge and data engineering</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page" from="1575" to="1590" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">An ensemble approach for text document clustering using wikipedia concepts</title>
		<author>
			<persName><forename type="first">S</forename><surname>Nourashrafeddin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Milios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">V</forename><surname>Arnold</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 ACM symposium on Document engineering</title>
				<meeting>the 2014 ACM symposium on Document engineering</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="107" to="116" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">A comparison of document clustering techniques</title>
		<author>
			<persName><forename type="first">M</forename><surname>Steinbach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Karypis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kumar</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">A vector space model for semantic similarity calculation and owl ontology alignment</title>
		<author>
			<persName><forename type="first">R</forename><surname>Tous</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Delgado</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Database and Expert Systems Applications</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="307" to="316" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Measuring semantic similarity by latent relational analysis</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Turney</surname></persName>
		</author>
		<idno>arXiv preprint cs/0508053</idno>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Combining statistical techniques and lexicosyntactic patterns for semantic relations extraction from text</title>
		<author>
			<persName><forename type="first">E</forename><surname>Giovannetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Marchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Montemagni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SWAP, Citeseer</title>
				<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Deepwalk: Online learning of social representations</title>
		<author>
			<persName><forename type="first">B</forename><surname>Perozzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Al-Rfou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Skiena</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the 20th ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="701" to="710" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">A new measure of word semantic similarity based on wordnet hierarchy and dag theory</title>
		<author>
			<persName><forename type="first">P</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2009 International Conference on Web Information Systems and Mining, IEEE</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="181" to="185" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Semantic information retrieval based on wikipedia taxonomy</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Han</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Applications Technology and Research</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="77" to="80" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">A wordnet-based semantic similarity measure enhanced by internet-based knowledge</title>
		<author>
			<persName><forename type="first">G</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Buckley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SEKE</title>
		<imprint>
			<biblScope unit="page" from="175" to="178" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
