=Paper= {{Paper |id=Vol-2941/paper12 |storemode=property |title=DALICC As A Service - A Scaleable Architecture for License Clearance |pdfUrl=https://ceur-ws.org/Vol-2941/paper12.pdf |volume=Vol-2941 |authors=Giray Havur,Sebastian Neumaier,Tassilo Pellegrini |dblpUrl=https://dblp.org/rec/conf/i-semantics/HavurNP21 }} ==DALICC As A Service - A Scaleable Architecture for License Clearance== https://ceur-ws.org/Vol-2941/paper12.pdf
DALICC As A Service - A Scaleable Architecture for
License Clearance
Giray Havur1,2 , Sebastian Neumaier1 and Tassilo Pellegrini1
1
    St. Pölten University of Applied Sciences, Matthias Corvinus-Straße 15, 3100 St. Pölten, Austria
2
    Siemens AG Austria, Technology, Siemensstraße 90, 1210 Vienna, Austria


                                         Abstract
                                         DALICC stands for Data Licenses Clearance Center. It is a software framework that utilizes semantic
                                         web standards and linked data principles for the purpose of cost efficient clearing of rights issues in
                                         the creation of derivative data and software works. The paper describes the service architecture of
                                         and usage scenarios for the DALICC framework, exemplifying a scaleable architecture of semantic web
                                         enabled compliance services.

                                         Keywords
                                         license clearance, legal compliance, policy aware system




1. Introduction
Modern IT applications are increasingly composed of various third party components that are
provided under various licenses. This can raise questions about the compatibility of licenses and
the application‘s compliance with existing law. Manual clearance of licenses can be complex
and error-prone, thus requiring a high degree of costly expert knowledge. To lower these costs
and improve the quality of license clearance, we developed the DALICC framework [1] that
supports the convenient and cost-efficient clearance of licenses in the creation of derivative
software and data works by following the semantic web and linked data standards. DALICC can
process and reason over RDF representations of licenses, identify conflicts between licenses and
support their resolution. While earlier publications on the DALICC framework were mainly
concerned with methodological issues of license modelling and reasoning [2], in this paper,
we describe the latest developments concerned with the architectural design and associated
application scenarios for the user-centric and scalable deployment of the DALICC framework.


2. Related Work
Most of the work on automated processing of licensing information is situated in the context
of Rights Expression Languages and contracting [3, 4, 5]. Early work dates back to 1989 and

The Posters and Demos Track of the 17th International Conference on Semantic Systems co-located with the 17th
International Conference on Semantic Systems, Amsterdam, Netherlands, September 06–09, 2021
" giray.havur@siemens.com (G. Havur); sebastian.neumaier@fhstp.com (S. Neumaier);
tassilo.pellegrini@fhstp.com (T. Pellegrini)
 0000-0002-6898-6166 (G. Havur); 0000-0002-9804-4882 (S. Neumaier); 0000-0002-0795-0661 (T. Pellegrini)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
                                                                                               ...
                                                                              DALICC Services
 API Users                                                            R
         ...                                       Web Framework                License
               R                            R                                   Search
                      Load Balancer                      R            R


                                                                                License
                                                        Reasoner
                                                                               Composer
                   search for a license,                                  R
                   compose a license,
                   check compatibility of
                    licenses (optionally
                    with a goal license)            License        Dependency           User
                                                    Library          Graph              Data


Figure 1: DALICC service architecture.


has since then sparked a rich body of research [6]. One of the first foundational papers on
reasoning over licenses was published in 2002 by [7] and extended by [8, 9] towards rule-based
resolution of licensing conflicts based on OWL constructs. A proof of applicability was provided
by [10] and [11], who describe formalisations of a license clearance tool for derivative works
based on the processing of deontic clauses [12, 13]. [10] also provide a demo called Licentia
(http://licentia.inria.fr/) that exemplifies the practical value of such a service. More recent work
on rule-based conflict detection in license compatibility has been provided by [14]. Our paper
contributes to this research by describing a user-centric and scalable architecture that can be
understood as a blueprint for semantic web-enabled compliance services.


3. DALICC Service Architecture
The components of the DALICC service framework are shown in Figure 1. An API User’s request
is first redirected to an available DALICC instance where the web framework delivers the request
to the relevant micro-service. To handle multiple API requests, we rely on container-based
load balancing in front of the services. The license library is a repository that contains machine-
readable and human-readable representations of the licenses, the former as ODRL policies (set
of statements) and the latter as plain text. The dependency graph encodes the expert knowledge
about the implicit and explicit semantic dependencies between actions [15, 16]. Additionally, we
store user specific data such as licenses provided by individual users and executed compatibility
reports. The following APIs are available:

The License Search API queries the license library given specific text or facets.
/get/licenses/{license_identifier}: It returns the license definition in a RDF serial-
ization format (e.g., Turtle and JSON-LD) for a given license identifier.
/post/licenses/search/text: Given a string, it returns a JSON object where license iden-
tifiers and descriptions are listed w.r.t. their relatedness to the string.
/post/licenses/search/faceted: Given the asset type (i.e., creative work, dataset, or
software) and a list of permissions, duties and prohibitions in an RDF serialization format, it
returns a JSON object where license identifiers and descriptions are listed.

The License Composer API allows customized licenses to be created from a set of ODRL
policy-permission-action-duty statements expressed in an RDF serialization format.
/get/licenses/composer/vocabulary: Returns a JSON object where ODRL, ccREL and
DALICC vocabularies for defining license statements are listed.
/post/licenses/composer/check: Returns a JSON object that states if the sent license is
formally correct, logically coherent and thus legally valid, and an error message if the license is
badly formed.
/post/licenses/composer/upload: Uploads the license in the license library and returns
the new license identifier.

The Compatibility Check API provides information on equivalence, similarity and compati-
bility within a set of licenses. It supports two modes: a) reporting all the conflicting pairs of
statements in the given set of licenses and b) further checking if a defined goal license subsumes
the input set of licenses, e.g., a software developer wants to find out if his/her project can be
published under a specific license. In both modes, the reasoner supports conflict resolution by
suggesting the minimum removal of statements/licenses for achieving a coherent state.
/post/licenses/compatibility: Given a list of license identifiers and optionally a goal
license identifier, it returns the compatibility report.


4. Application Scenarios
In the following section, we provide example application scenarios for the proposed DALICC
APIs. Large scale software projects potentially have dozens – or even hundreds – of dependen-
cies to other projects, e.g., to third-party software packages, libraries and modules. Example
repositories that list libraries and modules together with their licenses are the Python Pack-
age Index (“pip”)1 and the Node Package Manager (“‘npm”)2 for the JavaScript programming
language. For instance, the popular Python module “pandas” itself already depends on over 80
other modules listed in the respective requirements file.3
   The goal of the DALICC service is to support programmers and data engineers in the process
of creating and publishing new applications that depend on third-party sources. Figure 2
displays how this is envisioned: software projects typically specify a list of dependencies (e.g.,
a requirements.txt file in Python); initially, the respective licenses get extracted from the
listed dependencies and mapped to the DALICC license identifiers. Regarding the identification
of licenses, the Software Package Data Exchange4 (SPDX) provides standardised, short identifiers
for a number of standard licenses that potentially serve as canonical permanent URLs. Having
   1
     https://pypi.org/
   2
     https://www.npmjs.com/
   3
     https://github.com/pandas-dev/pandas/blob/master/requirements-dev.txt, last accessed 2021-06-08
   4
     https://spdx.org/licenses/, last accessed 2021-06-08
                                                          compatibility check
                              extraction                  of extracted licenses
                              of license
                              identifiers                                  goal license check   DALICC
                                                          +                against extracted    Services
                                                                           licenses
                                                                 Goal
  Software       Libraries/                  License
                                                               License
   Project       Modules                    Identifiers
                                                              Identifier
Figure 2: The DALICC services support software projects in the process of (i) checking compatibility
of dependencies and (ii) publishing the project under a new license.


the licenses extracted, a programmer can check for compatibility of the dependencies via the
DALICC API. If the project gets published under a new license (that exists in the DALICC license
library), the API allows to check for compatibility with the specified “goal license” (cf. Figure 2).


5. Concluding remarks
We are currently working on a FAIR documentation and will make the DALICC framework
available under a dual license by the end of 2021, thus allowing various forms of collaborative
exploitation. The framework closes the existing gap between the technological capabilities
to create and publish digital assets and the legal infrastructure necessary to provide them on
a legally secure basis for reuse. Hence, DALICC is a tool that puts policies into practice and
thus facilitates data governance at various levels. Thus, the DALICC framework should be
understood as an enabling service for the emerging data economy.


References
 [1] T. Pellegrini, et al., Automated Rights Clearance Using Semantic Web Technologies:
     The DALICC Framework, in: Semantic Applications, Springer Berlin Heidelberg, Berlin,
     Heidelberg, 2018, pp. 203–218. URL: http://link.springer.com/10.1007/978-3-662-55433-3_
     14. doi:10.1007/978-3-662-55433-3_14.
 [2] O. Panasiuk, et al., Modeling and Reasoning over Data Licenses, in: A. Gangemi, et al.
     (Eds.), The Semantic Web: ESWC 2018 Satellite Events, volume 11155, Springer Inter-
     national Publishing, Cham, 2018, pp. 218–222. URL: http://link.springer.com/10.1007/
     978-3-319-98192-5_41. doi:10.1007/978-3-319-98192-5_41.
 [3] V. Rodriguez-Doncel, J. Delgado, A Media Value Chain Ontology for MPEG-21, IEEE
     Multimedia 16 (2009) 44–51. URL: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?
     arnumber=6214727. doi:10.1109/MMUL.2009.78.
 [4] J. Prenafeta, Protecting Copyright Through Semantic Technology, Publishing Research
     Quarterly 26 (2010) 249–254. URL: http://link.springer.com/10.1007/s12109-010-9182-3.
     doi:10.1007/s12109-010-9182-3.
 [5] E. Rodriguez, J. Delgado, L. Boch, V. Rodriguez-Doncel, Media Contract Formalization
     Using a Standardized Contract Expression Language, IEEE MultiMedia 22 (2015) 64–74.
     URL: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6786878. doi:10.
     1109/MMUL.2014.22.
 [6] T. Pellegrini, et al., A Genealogy and Classification of Rights Expression Languages –
     Preliminary Results, in: Data Protection / LegalTech - Proceedings of the 21st International
     Legal Informatics Symposium IRIS 2018, Colloquium, Editions Weblaw, Salzburg, Austria,
     2018, pp. 243–250.
 [7] R. Pucella, V. Weissman, A Logic for Reasoning About Digital Rights, in: Proceedings of
     the 15th IEEE Workshop on Computer Security Foundations, CSFW ’02, IEEE Computer
     Society, Washington, DC, USA, 2002, pp. 282–294. URL: http://dl.acm.org/citation.cfm?id=
     794201.795182.
 [8] R. García, R. Gil, J. Delgado, Intellectual Property Rights Management Using a Semantic
     Web Information System, in: D. e. a. Hutchison (Ed.), On the Move to Meaningful Internet
     Systems 2004: CoopIS, DOA, and ODBASE, volume 3290, Springer Berlin Heidelberg, Berlin,
     Heidelberg, 2004, pp. 689–704. URL: http://link.springer.com/10.1007/978-3-540-30468-5_
     44.
 [9] R. García, R. Gil, Copyright Licenses Reasoning an OWL-DL Ontology, in: Proceedings of
     the 2009 Conference on Law, Ontologies and the Semantic Web: Channelling the Legal
     Information Flood, IOS Press, Amsterdam, The Netherlands, The Netherlands, 2009, pp.
     145–162. URL: http://dl.acm.org/citation.cfm?id=1563987.1564000.
[10] R.-D. Víctor, V. Serena, G.-P. Asunción,              A dataset of RDF li-
     censes, Frontiers in Artificial Intelligence and Applications (2014) 187–188. URL:
     http://www.medra.org/servlet/aliasResolver?alias=iospressISSNISBN&issn=0922-6389&
     volume=271&spage=187. doi:10.3233/978-1-61499-468-8-187.
[11] G. Governatori, H.-P. Lam, A. Rotolo, S. Villata, G. A. Atemezing, F. Gandon, LIVE: a tool
     for checking licenses compatibility between vocabularies and data, 2014. Published: ISWC
     2014, 13th International Semantic Web Conference, 19-23 October 2014, Riva del Garde,
     Italy.
[12] A. Rotolo, S. Villata, F. Gandon, A Deontic Logic Semantics for Licenses Composition in
     the Web of Data, in: Proceedings of the Fourteenth International Conference on Artificial
     Intelligence and Law, ICAIL ’13, ACM, New York, NY, USA, 2013, pp. 111–120. URL:
     http://doi.acm.org/10.1145/2514601.2514614. doi:10.1145/2514601.2514614.
[13] E. Cabrio, A. Palmero Aprosio, S. Villata, These Are Your Rights, in: D. e. a. Hutchison (Ed.),
     The Semantic Web: Trends and Challenges, volume 8465, Springer International Publishing,
     Cham, 2014, pp. 255–269. URL: http://link.springer.com/10.1007/978-3-319-07443-6_18.
[14] B. Moreau, P. Serrano-Alvarado, M. Perrin, E. Desmontils, Modelling the Compatibility
     of Licenses, in: e. a. Hitzler (Ed.), The Semantic Web, volume 11503, Springer Inter-
     national Publishing, Cham, 2019, pp. 255–269. URL: http://link.springer.com/10.1007/
     978-3-030-21348-0_17. doi:10.1007/978-3-030-21348-0_17, series Title: Lecture
     Notes in Computer Science.
[15] S. Steyskal, A. Polleres, Towards formal semantics for ODRL policies, in: 9th International
     Symposium RuleML, 2015, pp. 360–375.
[16] T. Pellegrini, et al., Dalicc: A license management framework for digital assets, Interna-
     tionales Rechtsinformatik Symposion (IRIS) 10 (2019).