=Paper=
{{Paper
|id=Vol-2721/paper536
|storemode=property
|title=ShEx-Lite: Automatic Generation of Domain Object Models from a Shape Expressions Subset Language
|pdfUrl=https://ceur-ws.org/Vol-2721/paper536.pdf
|volume=Vol-2721
|authors=Guillermo Facundo Colunga,Alejandro González Hevia,Emilio Rubiera Azcona,Jose Emilio Labra Gayo
|dblpUrl=https://dblp.org/rec/conf/semweb/ColungaHAG20
}}
==ShEx-Lite: Automatic Generation of Domain Object Models from a Shape Expressions Subset Language==
ShEx-Lite: Automatic Generation of Domain
Object Models from a Shape Expressions Subset
Language ?
Guillermo Facundo Colunga2[0000−0003−1283−2763] , Alejandro González
Hevia2[0000−0003−1394−5073] , Emilio Rubiera Azcona2[0000−0002−0292−9177] , and
Jose Emilio Labra Gayo1[0000−0001−8907−5348]
1
Dpt. of Computer Science, University of Oviedo, Spain labra@uniovi.es
2
WESO Research Group, University of Oviedo, Spain
{thewilly.work alejgh.weso emilio.rubiera}@gmail.com
Abstract. Shape Expressions (ShEx) was defined as a human-readable
and concise language to describe and validate RDF. In the last years,
the usage of ShEx has grown and more functionalities are being de-
manded. One such functionality is to ensure interoperability between
ShEx schemas and domain models in programming languages. In this
paper, we present ShEx-Lite, a tabular based subset of ShEx that allows
to generate domain object models in different object-oriented languages.
Although the current system generates Java and Python, it offers a public
interface so anyone can implement code generation in other programming
languages. The system has been employed in a workflow where the shape
expressions are used both to define constraints over an ontology and to
generate domain objects that will be part of a clean architecture style.
· ·
Keywords: Linked Data RDF Shape Expressions Validation.·
1 Introduction
Since the appearance of Shape Expressions Language [4] (ShEx) the demands of
the community on new tools based on ShEx have grown. One of those demands,
born during the development of the Hercules ASIO European Project3 , was the
creation of a tool that can automatically transform Shape Expressions into ob-
ject domain models represented by means of an Object Oriented Programing
Language. The object domain model generated will be part of a clean architec-
ture based solution [2]. ShEx-Lite4 was created as a tabular based subset of
ShEx which enabled the automatic generation of domain object models from the
schemas expressed with it. This paper describes how the domain object models
are generated along with the architecture of the software that implements it.
?
Copyright ©2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
3
https://www.um.es/web/hercules
4
https://www.weso.es/shex-lite/
2 Domain Object Model Generation
The ShEx language is quite powerful and allows for a high degree of expres-
siveness like disjunctions, negations, etc. which make generating domain object
models from shapes a non-trivial task. However the ShEx-lite language is sim-
pler and it permits mainly tabular based schemas based on constraints of type
PROPERTY TYPE CARDINALITY referred to as simple triple constraints. For in-
stance, Fig. 1 shows an schema example that defines the properties of a Person
model. In this particular case a Person is defined as a property :name of type
xsd:string and cardinality 1 (Default one) as well as a second property :knows
of type @:Person and cardinality 0-n, which represents the set of people known
by the current Person.
Person.shexl Person.java
1 # Prefixes ... 1 // Imports ...
2 : Person { 2 public class Person {
3 : name xsd : string ; 3 private String name ;
4 : knows @ : Person * 4 private List < Person > knows ;
5 } 5 // Constructor ...
6 // Getters and Setters ...
7 }
Fig. 1: Schema modeling a Person in ShExC syntax to the left. And the ShEx-
Lite generated code in Java to the right.
Once defined the input of ShEx-Lite it is easier to explain how the system
generated domain object models. Basically, the comunication with the system is
done through a CLI tool that is provided. In this tool the users can define several
options but the one that is in the scope of this paper is --java-pkg=STRING
which triggers the java code generation and generates the target object in the
specified package.
For example, for the input java -jar shexlc.jar --java-pkg=demo per-
son.shexl where the person.shexl file corresponds to the schema defined at
Fig. 1 ShEx-Lite generates a single java class with the code that appears at the
Person.java file, also in Fig. 1.
From this process a number of questions raise, such as how the mapping
process between the constraint types in the schemas and the object-oriented
programming language is done, or what happens if the schema expresses some-
thing that the programing language is not able to represent as fixed cardinalities
or repeated properties. The way ShEx-Lite solves this issue is by delegation [1].
It does not implicity check anything, it delegates to the specific code generators
the ability to inform about any incompatibility and perform the corresponding
mappings.
For example, by default, Java and Python code generation is built in with
ShEx-Lite but the JavaCodeGenerator runs some checks that the PythonCode-
Generator does not and viceversa. If any incompatibility between the schema
error [ E014 ]: feature not available
--> i n p u t _ i n c o r r e c t _ s c h e m a _ b i g _ s c h e m a _ 2 . shexl :15:24
|
15 | schema : name asdf : string
| ^ this prefix has no mapping in java
Fig. 2: ShEx-Lite example error caused by a prefix with no mapping in java.
and the target language is found by the corresponding code generator, ShEx-Lite
will let the user know by means of error or warning messages such as the once
shown in Fig. 2.
3 ShEx-Lite Architecture
ShEx-Lite is available as open source software at GitHub5 . It has been designed
in accordance with the concept “compiler as an API”, born with the Roslyn
compiler [3]. The feature of code generation was designed with the goal of being
flexible enough to work with different target programming languages like Java
or Python. The main components of ShEx-Lite follow a traditional compiler
architecture and are represented in Fig. 3.
Fig. 3: ShEx-Lite internal architecture. SIL stands for ShEx-Lite Intermediate
Language. IR stands for Intermediate representation.
– Parse. The components of this module aim at reading the source file and per-
forming the syntax validation. The syntax validation of the schemas checks
that the schemas defined follow the ShEx-Lite syntax. If this is not the case,
the compiler lets the user know about the problem and possible solutions.
5
https://github.com/weso/shex-lite
– Sema. At this stage the compiler checks that types are correct and that
the invocations and references that occur in the schemas are defined. Also,
during this process, if any error or warning is detected, the compiler will
notify the user about the problem and possible solutions.
– IRGen. This module is the one actually generating target code (Java,
Python, Any...). In order to allow adding other languages in the future,
ShEx-Lite delegates the specific language checks and mappings and there-
fore it offers an interface that other language generators will implement. Each
one of the code generators is responsible of checking that the constraints rep-
resented by the schemas are able to be expressed in the corresponding lan-
guage. If the schema meets the requierements of the corresponding language,
it is the specific code generator the one that produces the code.
4 Conclusions and future work
In this paper we have presented ShEx-lite, a subset of ShEx that allows to gen-
erate domain model objects in different Object-Oriented languages like Python
and Java. The system proposed is being used in the Hércules project6 , and as a
future work we’re planning to incorporate the possiblity to read shape expres-
sions from tabular formats like CSV7 .
Aknowledgements. The HÉRCULES Semantic University Research Data
Project is backed by the Ministry of Economy, Industry and Competitiveness
with a budget of 5.462.600,00 euros with an 80% of cofinancing from the 2014-
2020 ERDF Program. This work has also been partially funded by the Spanish
Ministry of Economy and Competitiveness (Society challenges: TIN2017-88877-
R).
References
1. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design patterns: Abstraction and
reuse of object-oriented design. In: European Conference on Object-Oriented Pro-
gramming. pp. 406–431. Springer (1993)
2. Martin, R.C.: Clean Architecture: A Craftsman’s Guide to Software Structure and
Design. Pearson (2017)
3. McAllister, N.: Microsoft’s roslyn: Reinventing the compiler as we know it. N. McAl-
lister//InfoWorld from IDG.–2011 (2011)
4. Prud’hommeaux, E., Labra Gayo, J.E., Solbrig, H.: Shape expressions: an RDF
validation and transformation language. In: Proceedings of the 10th International
Conference on Semantic Systems. pp. 32–40 (2014)
6
https://www.um.es/web/hercules
7
https://github.com/dcmi/dcap