From Legal Documents to Legal Document Management Systems; The Case of LegiCrowd Alexandros Nousias Alain Couillault Sofia Almpani Future Now Business Consultants & Association des Professionnels des National Technical University of Training / MyData Greece) Industries de la langue (APIL) Athens Athens, Greece Montreuil, France Zografou, Greece alexandros.nousias@gmail.com alain.couillault@apoliade.com salmpani@mail.ntua.gr Theodoros Mitsikas Petros Stefaneas National Technical University of National Technical University of Athens Athens Zografou, Greece Zografou, Greece mitsikas@central.ntua.gr petros@math.ntua.gr ABSTRACT from the basics revisiting the concept, role, and specs of terms of In this position paper, we argue that users’ online consents to terms services and privacy policies as agents of information provision of services and privacy notices is naturally impaired by the unbal- towards systemic, human centric, and human friendly automation. anced powers between online service providers and their users. We Terms of service and privacy policies are deemed raw data for au- argue that a full fledged legal document management system rely- tomated meaning extractions via relevant information retrieval, ing on semantic representation is key to resolving this conflict and question answering, dialogue systems, and other Natural Language facilitating transparency of Online Legal Documents, and we give a Processing applications. quick overview of the LegiCrowd project, a crowdsourced approach The rest of the paper is organised as follows: In Section 2 we to legal documents annotation, which paves the way towards such provide a brief description of the information technology advance- solution. ments to date and key characteristics thereof. Section 3 discusses inconsistencies and loopholes of the modern legal design. This Sec- 1 INTRODUCTION tion also expands on that ground we argue that legal representation and modelling could be the solution for a radical update of the As AI technology and automation permeate society horizontally, the modern legal properties and enforcement mechanism, if put in the law and the subsequent enforcement mechanism prove incapable appropriate ethical context. Section 4 introduces the LegiCrowd of keeping pace. Concepts originating from the past like consent platform, a crowdsourced legal document annotation system. Fi- tend to maintain their static properties in an increasingly complex nally, Section 5 concludes the paper and provides some thoughts and dynamic space, thus resulting in a state of obsolescence. The for future work. law and its design and implementation properties are in need of radical update. The present paper argues that such update requires a transition from plain legal text to a full-fledged Legal Document 2 FROM LEGAL TEXT TO LEGAL Management Systems. INFORMATICS World Wide Web today is the outcome of a three stage evolu- Ubiquitous automation does not support the static format of the tion. Web 1.0 refers to the so called static Web of documents in a online legal documents and the linked consent models. Terms of ser- unidirectional broadcasting format. Web 2.0 introduced the web vices and privacy policies in their present form constitute an iconic of people, by allowing the sharing of user generated content and proof of inadequacy of the digital design. Complicated legal and further social networking. Web 3.0 or the Web of data, is currently technical documents that no one reads, no one understands, and no evolving under the idea of defining and linking structured data [1] one cares about, govern the emerging data lifecycles for the benefit in order to produce formal semantic representations thus introduc- of data driven business operations by extending their unhealthy ing massive automation via algorithmically informed decisions. The operational patterns. A piece of information of such magnitude Web 3.0 comes however with one major loophole; the lack of legal turns into an irrelevant node in the data value chain, hindering the knowledge modelling and representation, which emerges systemic unfolding and systemic assertion of the evolving human centric pat- inadequacies in the digital design, as the always hungry-for-data terns. On top of that, modern businesses increasingly use consent service supply side conducts a “permissionless invasion”[8]. How- as a de facto standard for demonstrating privacy commitments and ever, in a complex dynamic system like the Web of data, algorithms wider legal compliance claiming consent provisions as proxies of require huge amounts of high quality and relevant data. We start informed choice. This evolution has given rise to a situation where many technology giants, on the pretext of providing improved ser- WAIEL2020, September 3, 2020, Athens, Greece vices, have begun to track every action of every user with little or Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). no transparency [6]. The result has been that clicking the ‘Agree’ button for consent was dubbed “the Biggest lie on the Internet”[7] and incidents of data misuse such as unsolicited call, spam and multiple workflows. In the said legal workflows, the extraction, deliberate manipulation have resulted in a massive trust deficit. formulation, and exploitation of related metadata and provenance And all that formalised by the court’s validation of the ‘I Agree’ constitute a basic processing component towards Machine Learning button maximising the power asymmetries and the trust deficit. models or Natural Language Processing applications, capable for more efficient legal enforcement. With high awareness of its poten- 3 THE ETHICS OF LEGAL REPRESENTATION tial societal impact, any decisions about legal data, methods, and AND MODELLING tools tend to tie up with their impact on people and the society in a practical way thus bringing ethics in the automation foreground. Such a formatting reality and the imposed data dispossession from the technology and digital service supply side, brings into the sur- ACKNOWLEDGMENTS face the need for dynamic, data-driven, and data-relevant legal and ethical enforcement. In the environment of Web 3.0, such an The LegiCrowd Onto consortium is lead by the French Non Profit enforcement requires a data driven solution shaped with mathe- Organisation Association des Professionnels des Industries de la matical reasoning. It requires the transition to a ubiquitous legal Langue (APIL), and includes the National Technical University representation and modelling apparatus; an extended Legal Doc- of Athens (NTUA) and the Research, Consultant & Training firm ument Management System comprised by structured legal data, ‘Future Now’, backed by MyData Greece, the Greek node of MyData methods, and tools for sufficient syntactic and semantic represen- Global. It has received funding from the European Union’s Horizon tation, capable of generating documented, machine readable legal 2020 research and innovation programme under the NGI TRUST knowledge, using very different logic, norms, and languages. grant agreement no 825618. This project has been made possible The ethical starting point lies on the axiom expressed by [2] that thanks to Short Term Scientific Missions conducted within the “The common misconception is that language has to do with words framework of the enet collect Cost Action ([3], [4], [5]). and what they mean. It doesn’t. It has to do with people and what they mean”. It is not about simple language data linking and annotation, REFERENCES [1] Nupur Choudhury. 2014. World Wide Web and Its Journey from Web 1.0 to Web rather about providing accurate meaning in the appropriate 4.0. context. The aim is a virtuous cycle of legal data structuring, mod- [2] Herbert H. Clarck and Michael F. Schober. 1992. Questions about question - Enquiries elling, representation and context in order to: (i) Provide end users into the cognitive bases of surveys. Russell Sage Foundation - New York, New York, NY, USA, Chapter Asking questions and influencing answers, 15–48. spot on clear and ascertained information on data processes and cir- [3] Alain Couillault. 18/5/2018. SHORT TERM SCIENTIFIC MISSION (STSM)SCIENTIFIC culation; (ii) Provide the supply side proof of concept for technical REPORT. Technical Report. Apoliade. http://www.enetcollect.net/ilias/goto.php? and legal compliance throughout the data lifecycle, thus mitigating target=file_530_download [4] Alain Couillault. 3/3/2019. SHORT TERM SCIENTIFIC MISSION (STSM)SCIENTIFIC compliance inconsistencies and pertaining risks; (iii) Turn to a stan- REPORT. Technical Report. Apoliade. http://www.enetcollect.net/ilias/goto.php? dard design building block; (iv) Enhance platform transparency and target=file_908_download [5] Alain Couillault. 8/3/2020. SHORT TERM SCIENTIFIC MISSION (STSM)SCIENTIFIC user confidence and trust; (v) Embed into the increasing B2B, B2C, REPORT. Technical Report. Apoliade. http://www.enetcollect.net/ilias/goto.php? C2C as well as Device to Device (D2D) data flows ethical require- target=file_1053_download ments, like human agency and oversight, technical robustness and [6] Joss Langford, Antti Jogi Poikola, Wil Janssen, Viivi Lähteenoja, and Marlies Rikken. 2019. Understanding Mydata Operators. Technical Report. MyData.org. safety, privacy and data governance, (OLDs) fairness, accountability, [7] Jonathan A. Obar and Anne Oeldorf-Hirsch. 2020. The biggest lie on the Internet: etc. ignoring the privacy policies and terms of service policies of social networking services. Information, Communication & Society 23, 1 (2020), 128–147. [8] Tom Wheeler. 2018. Time to Fix It: Developing Rules for Internet Capitalism. 4 THE LEGICROWD APPROACH Fellows Research Paper Series. Shorenstein Center on Media, Politics and Public Policy (2018). The LegiCrowd project could be an answer for such a need for transparency, as it aims at creating a platform to render Online Legal Documents (OLDs), namely Privacy Notices and Terms of services, in a quick and easy to read format, such as icons, dataviz or simplified language through a crowdsourced approach. This requires first to design a semantically sound annotation tag set, as an ontology of descriptors. This is the goal of the current LegiCrowd Onto project, which relies on a number of competencies particularly related to natural knowledge modelling, law and corresponding visualisations thereof gathered in an international consortium. Such a platform aims at truly putting end users in the driver’s seat as it a) provides an ethical building block in the overall design, b) empowers end users to extract accurate legal information in context, to assess the levels of legal compliance and the ethics standards in place and c) provide or reject a consent on a truly informed basis. 5 CONCLUSION No doubt, the practice and assertion of law in the Web 3.0 era is a combination of numerous language data inputs and outputs from