<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Legal Assistance using Word Embeddings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S. Kayalvizhi</string-name>
          <email>kayalvizhis@ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D. Thenmozhi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chandrabose Aravindan</string-name>
          <email>aravindanc@ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of CSE, SSN College of Engineering</institution>
          ,
          <addr-line>Chennai</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The legal counsellors will always look up on the prior cases and statutes to ensure justice. Referring all prior cases is a time consuming process since they are vast. Artificial intelligence can be made use to select the most relevant among all the documents. The existing systems have made use of different word embeddings, machine learning and deep learning methods to select the relevant ones. In our approach, different vectorization methods such as Word2Vec, Glove, Tf-Idf and count vectorizer are used and then similarity is calculated using Jaccard similarity and cosine similarity for ranking the prior cases and statutes. The work is evaluated on the AILA@FIRE-2019 dataset, in which it provides two tasks of finding the prior documents namely the relevant cases and statutes.</p>
      </abstract>
      <kwd-group>
        <kwd>Arifical Intelligence</kwd>
        <kwd>Machine learning</kwd>
        <kwd>Cosine similarity</kwd>
        <kwd>Vectorization</kwd>
        <kwd>Word Embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The population which needs to attain any legal assistance have to search for
the prior cases and statutes for the current case. The search goes innumerable
since there are many. Aiding the help of artificial intelligence for the search and
retrieval seems to be a effective idea when compared to the manual search
retrieval. Different machine learning and deep learning methods include Doc2Vec,
Tf-Idf, LSTM , etc. are made use for retrieving the prior cases [
        <xref ref-type="bibr" rid="ref2 ref5 ref6 ref7 ref8">2, 6, 8, 5, 7</xref>
        ] .
AILA@FIRE-2019 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proclaimed two tasks namely identifying relevant prior
cases and relevant statutes. Identifying relevant prior cases refers to the
retrieving similar prior cases for the given cases and identifying relevant prior statutes
refers to the retrieving similar prior statutes for the given statute.
2.1
first 10 queries are given the gold standard set which can be taken as training
set and the remaining 40 queries are considered as test set.
      </p>
      <p>For retrieving the relevant cases and statutes, initially the documents are
vectorized using Word2Vec, Tf-Idf, count vectorizer and a ensembling method
of Glove and Word2Vec. After vectorizing, the documents are all ranked by
finding the similarity using cosine similarity and jaccard similarity.
2.2</p>
      <sec id="sec-1-1">
        <title>Task 1: Identifying relevant cases</title>
      </sec>
      <sec id="sec-1-2">
        <title>Word2Vec:</title>
        <p>
          The query document and the case document are initially vectorized using
Word2Vec model of dimension ’300’. After vectorizing the whole document, the
max-min vector of the documents are considered to represent each document as
a vector. Max-min vector of query is considered as ’a’ and ’b’ be the max-min
vector of case document. The case documents are ranked by finding the cosine
similarity [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] between a and b.
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>Ensembling Word2Vec and Glove:</title>
        <p>
          In this method, the query and documents are all vectorized using both Word2Vec
and Glove and then they are concatenated. Considering a single query and case
document, vectorize the case document using Glove which forms the vector ’a1’
and vectorize the case document using Word2Vec model which forms the vector
’a2’. Concatenate the two vectors a1 and a2 which becomes the vector ’a’. The
same process of vectorization is done for the query document which forms
vector ’b’. Then, the case documents are ranked by finding the cosine similarity [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
between a and b.
        </p>
      </sec>
      <sec id="sec-1-4">
        <title>Tf-idf vectorizer:</title>
        <p>
          In this method, the documents and queries are all vectorized using Tf-Idf
vectorizer. The entire corpus (queries and cases) is made use to form vocabulary which
is used to fit the documents. The vector of query forms ’a’ and that of document
forms ’b’. Rank the documents by finding the cosine similarity [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] between a and
b.
2.3
        </p>
      </sec>
      <sec id="sec-1-5">
        <title>Task 2: Identifying relevant statutes</title>
      </sec>
      <sec id="sec-1-6">
        <title>Tf-idf vectorizer:</title>
        <p>
          In this method, the documents and queries are all vectorized using Tf-Idf
vectorizer. The entire corpus (queries and statues) is made use to form vocabulary
which is used to fit the documents. The vector of query forms ’a’ and that of
document forms ’b’. Rank the statute documents by finding the cosine similarity
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] between a and b.
        </p>
      </sec>
      <sec id="sec-1-7">
        <title>Jaccard Similarity:</title>
        <p>
          This method is done by finding out the Jaccard similarity. The vocabulary of
query document forms ’a’ and vocabulary of statute document which forms ’b’.
The documents are ranked by finding the jaccard similarity [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] between a and b.
        </p>
      </sec>
      <sec id="sec-1-8">
        <title>Count vectorizer:</title>
        <p>
          In this method, the documents and queries are all vectorized using count
vectorizer (i.e) the count of each word in the documents (query and statute). Frequent
words of the query document forms the dictionary of the file whose vector forms
’a’. Frequent words of the statute document forms the dictionary of the file
whose vector forms ’b’. The statutes are ranked by finding the cosine similarity
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] between a and b.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>vectorization methods for identifying the relevant cases and Tf-Idf vectorizing
method seems to be better for identifying the relevant statutes.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>Artificial Intelligence helps us in many ways for identifying the relevant
documents among the entire prior documents in legal domain. Different word
embedding methods of finding out similarity are experimented on ALIA@FIRE-2019
dataset. Various word embeddings like Word2Vec, Glove, ensembling Word2Vec
and Glove, Tf-Idf vectorizer and count vectorizer are used for vectorization.
After vectorization, cosine similarity is calculated to rank the documents. Among
these Word2Vec seems to perform better than the other vectorization process
for Task 1 of identifying the relevant cases and Tf-Idf seems to perform better
than the other vectorizing methods for task 2 of identifying the relevant prior</p>
      <p>Teams and Runs
SSN_NLP Run 1
SSN_NLP Run 2
SSN_NLP Run 3</p>
      <p>Yunqiu Shao</p>
      <p>UBLTM</p>
      <p>Sara Renjit
Soumil Mandal - JU_SRM</p>
      <p>HLJIT2019</p>
      <p>Kavya S Ganesh
Baban Gain - IITP</p>
      <p>MAP score
0.077
0.061
0.051
0.156
0.102
0.096
0.083
0.081
0.068
0.036
statutes. The performance can be further improved by other machine learning
and deep learning methods.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgement</title>
      <p>We would like to thank the Science and Engineering Research Board
(SERB), Department of Science and Technology for funding the GPU system
(EEQ/2018/000262) where this work has been carried out.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Achananuparp</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>The evaluation of sentence similarity measures</article-title>
          .
          <source>In: International Conference on data warehousing and knowledge discovery</source>
          . pp.
          <fpage>305</fpage>
          -
          <lpage>316</lpage>
          . Springer (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. BarathiGaneshH.,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            ,
            <surname>Soman</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.P.</surname>
          </string-name>
          :
          <article-title>Distributional semantic representation for text classification and information retrieval</article-title>
          .
          <source>In: FIRE</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bhattacharya</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhattacharya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the FIRE 2019 AILA track: Artificial Intelligence for Legal Assistance</article-title>
          .
          <source>In: Proceedings of FIRE 2019 - Forum for Information Retrieval Evaluation (December</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Similarity measures for text document clustering</article-title>
          .
          <source>In: Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008)</source>
          , Christchurch, New Zealand. vol.
          <volume>4</volume>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Locke</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Automatic cited decision retrieval: Working notes of ielab for fire legal track precedence retrieval task</article-title>
          .
          <source>In: FIRE</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Sandeep</surname>
            ,
            <given-names>G.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bharadwaj</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An extraction based approach to keyword generation and precedence retrieval: Bits pilani - hyderabad</article-title>
          . In: FIRE (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Thenmozhi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kannan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aravindan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A text similarity approach for precedence retrieval from legal documents</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ning</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kong</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qi</surname>
          </string-name>
          , H.:
          <article-title>Hljit2017@irled-fire2017: Information retrieval from legal documents</article-title>
          .
          <source>In: FIRE</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>