<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>From Legal Documents to Legal Document Management Systems; The Case of LegiCrowd</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexandros Nousias</string-name>
          <email>alexandros.nousias@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alain Couillault</string-name>
          <email>alain.couillault@apoliade.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sofia Almpani</string-name>
          <email>salmpani@mail.ntua.gr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Theodoros Mitsikas</string-name>
          <email>mitsikas@central.ntua.gr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petros Stefaneas</string-name>
          <email>petros@math.ntua.gr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Association des Professionnels des</institution>
          ,
          <addr-line>Industries de la langue (APIL), Montreuil</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Future Now Business Consultants &amp;, Training / MyData Greece)</institution>
          ,
          <addr-line>Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Technical University of</institution>
          ,
          <addr-line>Athens, Zografou</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this position paper, we argue that users' online consents to terms of services and privacy notices is naturally impaired by the unbalanced powers between online service providers and their users. We argue that a full fledged legal document management system relying on semantic representation is key to resolving this conflict and facilitating transparency of Online Legal Documents, and we give a quick overview of the LegiCrowd project, a crowdsourced approach to legal documents annotation, which paves the way towards such solution.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>As AI technology and automation permeate society horizontally, the
law and the subsequent enforcement mechanism prove incapable
of keeping pace. Concepts originating from the past like consent
tend to maintain their static properties in an increasingly complex
and dynamic space, thus resulting in a state of obsolescence. The
law and its design and implementation properties are in need of
radical update. The present paper argues that such update requires
a transition from plain legal text to a full-fledged Legal Document
Management Systems.</p>
      <p>
        World Wide Web today is the outcome of a three stage
evolution. Web 1.0 refers to the so called static Web of documents in a
unidirectional broadcasting format. Web 2.0 introduced the web
of people, by allowing the sharing of user generated content and
further social networking. Web 3.0 or the Web of data, is currently
evolving under the idea of defining and linking structured data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
in order to produce formal semantic representations thus
introducing massive automation via algorithmically informed decisions. The
Web 3.0 comes however with one major loophole; the lack of legal
knowledge modelling and representation, which emerges systemic
inadequacies in the digital design, as the always hungry-for-data
service supply side conducts a “permissionless invasion”[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
However, in a complex dynamic system like the Web of data, algorithms
require huge amounts of high quality and relevant data. We start
from the basics revisiting the concept, role, and specs of terms of
services and privacy policies as agents of information provision
towards systemic, human centric, and human friendly automation.
Terms of service and privacy policies are deemed raw data for
automated meaning extractions via relevant information retrieval,
question answering, dialogue systems, and other Natural Language
Processing applications.
      </p>
      <p>The rest of the paper is organised as follows: In Section 2 we
provide a brief description of the information technology
advancements to date and key characteristics thereof. Section 3 discusses
inconsistencies and loopholes of the modern legal design. This
Section also expands on that ground we argue that legal representation
and modelling could be the solution for a radical update of the
modern legal properties and enforcement mechanism, if put in the
appropriate ethical context. Section 4 introduces the LegiCrowd
platform, a crowdsourced legal document annotation system.
Finally, Section 5 concludes the paper and provides some thoughts
for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>FROM LEGAL TEXT TO LEGAL</title>
    </sec>
    <sec id="sec-3">
      <title>INFORMATICS</title>
      <p>
        Ubiquitous automation does not support the static format of the
online legal documents and the linked consent models. Terms of
services and privacy policies in their present form constitute an iconic
proof of inadequacy of the digital design. Complicated legal and
technical documents that no one reads, no one understands, and no
one cares about, govern the emerging data lifecycles for the benefit
of data driven business operations by extending their unhealthy
operational patterns. A piece of information of such magnitude
turns into an irrelevant node in the data value chain, hindering the
unfolding and systemic assertion of the evolving human centric
patterns. On top of that, modern businesses increasingly use consent
as a de facto standard for demonstrating privacy commitments and
wider legal compliance claiming consent provisions as proxies of
informed choice. This evolution has given rise to a situation where
many technology giants, on the pretext of providing improved
services, have begun to track every action of every user with little or
no transparency [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The result has been that clicking the ‘Agree’
button for consent was dubbed “the Biggest lie on the Internet”[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
and incidents of data misuse such as unsolicited call, spam and
deliberate manipulation have resulted in a massive trust deficit.
And all that formalised by the court’s validation of the ‘I Agree’
button maximising the power asymmetries and the trust deficit.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>THE ETHICS OF LEGAL REPRESENTATION</title>
    </sec>
    <sec id="sec-5">
      <title>AND MODELLING</title>
      <p>Such a formatting reality and the imposed data dispossession from
the technology and digital service supply side, brings into the
surface the need for dynamic, data-driven, and data-relevant legal
and ethical enforcement. In the environment of Web 3.0, such an
enforcement requires a data driven solution shaped with
mathematical reasoning. It requires the transition to a ubiquitous legal
representation and modelling apparatus; an extended Legal
Document Management System comprised by structured legal data,
methods, and tools for suficient syntactic and semantic
representation, capable of generating documented, machine readable legal
knowledge, using very diferent logic, norms, and languages.</p>
      <p>
        The ethical starting point lies on the axiom expressed by [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that
“The common misconception is that language has to do with words
and what they mean. It doesn’t. It has to do with people and what they
mean”. It is not about simple language data linking and annotation,
rather about providing accurate meaning in the appropriate
context. The aim is a virtuous cycle of legal data structuring,
modelling, representation and context in order to: (i) Provide end users
spot on clear and ascertained information on data processes and
circulation; (ii) Provide the supply side proof of concept for technical
and legal compliance throughout the data lifecycle, thus mitigating
compliance inconsistencies and pertaining risks; (iii) Turn to a
standard design building block; (iv) Enhance platform transparency and
user confidence and trust; (v) Embed into the increasing B2B, B2C,
C2C as well as Device to Device (D2D) data flows ethical
requirements, like human agency and oversight, technical robustness and
safety, privacy and data governance, (OLDs) fairness, accountability,
etc.
4
      </p>
    </sec>
    <sec id="sec-6">
      <title>THE LEGICROWD APPROACH</title>
      <p>The LegiCrowd project could be an answer for such a need for
transparency, as it aims at creating a platform to render Online
Legal Documents (OLDs), namely Privacy Notices and Terms of
services, in a quick and easy to read format, such as icons, dataviz
or simplified language through a crowdsourced approach. This
requires first to design a semantically sound annotation tag set, as
an ontology of descriptors. This is the goal of the current LegiCrowd
Onto project, which relies on a number of competencies particularly
related to natural knowledge modelling, law and corresponding
visualisations thereof gathered in an international consortium. Such
a platform aims at truly putting end users in the driver’s seat as it a)
provides an ethical building block in the overall design, b) empowers
end users to extract accurate legal information in context, to assess
the levels of legal compliance and the ethics standards in place and
c) provide or reject a consent on a truly informed basis.
5</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION</title>
      <p>No doubt, the practice and assertion of law in the Web 3.0 era is a
combination of numerous language data inputs and outputs from
multiple workflows. In the said legal workflows, the extraction,
formulation, and exploitation of related metadata and provenance
constitute a basic processing component towards Machine Learning
models or Natural Language Processing applications, capable for
more eficient legal enforcement. With high awareness of its
potential societal impact, any decisions about legal data, methods, and
tools tend to tie up with their impact on people and the society in
a practical way thus bringing ethics in the automation foreground.</p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGMENTS</title>
      <p>
        The LegiCrowd Onto consortium is lead by the French Non Profit
Organisation Association des Professionnels des Industries de la
Langue (APIL), and includes the National Technical University
of Athens (NTUA) and the Research, Consultant &amp; Training firm
‘Future Now’, backed by MyData Greece, the Greek node of MyData
Global. It has received funding from the European Union’s Horizon
2020 research and innovation programme under the NGI TRUST
grant agreement no 825618. This project has been made possible
thanks to Short Term Scientific Missions conducted within the
framework of the enet collect Cost Action ([
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Nupur</given-names>
            <surname>Choudhury</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>World Wide Web and Its Journey from Web 1.0 to Web 4.0</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Herbert</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Clarck</surname>
            and
            <given-names>Michael F.</given-names>
          </string-name>
          <string-name>
            <surname>Schober</surname>
          </string-name>
          .
          <year>1992</year>
          .
          <article-title>Questions about question - Enquiries into the cognitive bases of surveys</article-title>
          . Russell Sage Foundation - New York, New York, NY, USA, Chapter Asking questions and influencing answers,
          <fpage>15</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Alain</given-names>
            <surname>Couillault</surname>
          </string-name>
          .
          <volume>18</volume>
          /5/2018.
          <article-title>SHORT TERM SCIENTIFIC MISSION (STSM)SCIENTIFIC REPORT</article-title>
          .
          <source>Technical Report</source>
          . Apoliade. http://www.enetcollect.net/ilias/goto.php? target=file_530_download
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Alain</given-names>
            <surname>Couillault</surname>
          </string-name>
          .
          <volume>3</volume>
          /3/2019.
          <article-title>SHORT TERM SCIENTIFIC MISSION (STSM)SCIENTIFIC REPORT</article-title>
          .
          <source>Technical Report</source>
          . Apoliade. http://www.enetcollect.net/ilias/goto.php? target=file_908_download
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Alain</given-names>
            <surname>Couillault</surname>
          </string-name>
          .
          <volume>8</volume>
          /3/2020.
          <article-title>SHORT TERM SCIENTIFIC MISSION (STSM)SCIENTIFIC REPORT</article-title>
          .
          <source>Technical Report</source>
          . Apoliade. http://www.enetcollect.net/ilias/goto.php? target=file_1053_download
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Joss</given-names>
            <surname>Langford</surname>
          </string-name>
          , Antti Jogi Poikola, Wil Janssen, Viivi Lähteenoja, and
          <string-name>
            <given-names>Marlies</given-names>
            <surname>Rikken</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Understanding Mydata Operators</article-title>
          .
          <source>Technical Report</source>
          . MyData.org.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Jonathan</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Obar</surname>
            and
            <given-names>Anne</given-names>
          </string-name>
          <string-name>
            <surname>Oeldorf-Hirsch</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>The biggest lie on the Internet: ignoring the privacy policies and terms of service policies of social networking services</article-title>
          .
          <source>Information, Communication &amp; Society</source>
          <volume>23</volume>
          ,
          <issue>1</issue>
          (
          <year>2020</year>
          ),
          <fpage>128</fpage>
          -
          <lpage>147</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Tom</given-names>
            <surname>Wheeler</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Time to Fix It: Developing Rules for Internet Capitalism</article-title>
          . Fellows Research Paper Series. Shorenstein Center on Media, Politics and
          <string-name>
            <given-names>Public</given-names>
            <surname>Policy</surname>
          </string-name>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>