<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ShEx-Lite: Automatic Generation of Domain Ob ject Models from a Shape Expressions Subset Language ?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Dpt. of Computer Science, University of Oviedo</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>WESO Research Group, University of Oviedo</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Shape Expressions (ShEx) was de ned as a human-readable and concise language to describe and validate RDF. In the last years, the usage of ShEx has grown and more functionalities are being demanded. One such functionality is to ensure interoperability between ShEx schemas and domain models in programming languages. In this paper, we present ShEx-Lite, a tabular based subset of ShEx that allows to generate domain object models in di erent object-oriented languages. Although the current system generates Java and Python, it o ers a public interface so anyone can implement code generation in other programming languages. The system has been employed in a work ow where the shape expressions are used both to de ne constraints over an ontology and to generate domain objects that will be part of a clean architecture style.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked Data • RDF • Shape Expressions • Validation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Since the appearance of Shape Expressions Language [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (ShEx) the demands of
the community on new tools based on ShEx have grown. One of those demands,
born during the development of the Hercules ASIO European Project3, was the
creation of a tool that can automatically transform Shape Expressions into
object domain models represented by means of an Object Oriented Programing
Language. The object domain model generated will be part of a clean
architecture based solution [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. ShEx-Lite4 was created as a tabular based subset of
ShEx which enabled the automatic generation of domain object models from the
schemas expressed with it. This paper describes how the domain object models
are generated along with the architecture of the software that implements it.
? Copyright ' 2020 for this paper by its authors. Use permitted under Creative
      </p>
      <p>Commons License Attribution 4.0 International (CC BY 4.0).
3 https://www.um.es/web/hercules
4 https://www.weso.es/shex-lite/</p>
    </sec>
    <sec id="sec-2">
      <title>Domain Object Model Generation</title>
      <p>The ShEx language is quite powerful and allows for a high degree of
expressiveness like disjunctions, negations, etc. which make generating domain object
models from shapes a non-trivial task. However the ShEx-lite language is
simpler and it permits mainly tabular based schemas based on constraints of type
PROPERTY TYPE CARDINALITY referred to as simple triple constraints. For
instance, Fig. 1 shows an schema example that de nes the properties of a Person
model. In this particular case a Person is de ned as a property :name of type
xsd:string and cardinality 1 (Default one) as well as a second property :knows
of type @:Person and cardinality 0-n, which represents the set of people known
by the current Person.</p>
      <p>Person.shexl
1 # Prefixes ...
2 : Person {
3 : name xsd: string ;
4 : knows @: Person *
5 }</p>
      <p>Person.java
1 // Imports ...
2 public class Person {
3 private String name ;
4 private List &lt;Person &gt; knows ;
5 // Constructor ...
6 // Getters and Setters ...</p>
      <p>7 }</p>
      <p>Once de ned the input of ShEx-Lite it is easier to explain how the system
generated domain object models. Basically, the comunication with the system is
done through a CLI tool that is provided. In this tool the users can de ne several
options but the one that is in the scope of this paper is --java-pkg=STRING
which triggers the java code generation and generates the target object in the
speci ed package.</p>
      <p>For example, for the input java -jar shexlc.jar --java-pkg=demo
person.shexl where the person.shexl le corresponds to the schema de ned at
Fig. 1 ShEx-Lite generates a single java class with the code that appears at the
Person.java le, also in Fig. 1.</p>
      <p>
        From this process a number of questions raise, such as how the mapping
process between the constraint types in the schemas and the object-oriented
programming language is done, or what happens if the schema expresses
something that the programing language is not able to represent as xed cardinalities
or repeated properties. The way ShEx-Lite solves this issue is by delegation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
It does not implicity check anything, it delegates to the speci c code generators
the ability to inform about any incompatibility and perform the corresponding
mappings.
      </p>
      <p>For example, by default, Java and Python code generation is built in with
ShEx-Lite but the JavaCodeGenerator runs some checks that the
PythonCodeGenerator does not and viceversa. If any incompatibility between the schema
error [ E014 ]: feature not available
--&gt; input_incorrect_schema_big_schema_2 . shexl :15:24</p>
      <p>|
15 | schema : name
|
asdf : string
^ this prefix has no mapping in java
and the target language is found by the corresponding code generator, ShEx-Lite
will let the user know by means of error or warning messages such as the once
shown in Fig. 2.
3</p>
    </sec>
    <sec id="sec-3">
      <title>ShEx-Lite Architecture</title>
      <p>
        ShEx-Lite is available as open source software at GitHub5. It has been designed
in accordance with the concept \compiler as an API", born with the Roslyn
compiler [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The feature of code generation was designed with the goal of being
exible enough to work with di erent target programming languages like Java
or Python. The main components of ShEx-Lite follow a traditional compiler
architecture and are represented in Fig. 3.
      </p>
      <p>{ Parse. The components of this module aim at reading the source le and
performing the syntax validation. The syntax validation of the schemas checks
that the schemas de ned follow the ShEx-Lite syntax. If this is not the case,
the compiler lets the user know about the problem and possible solutions.</p>
      <sec id="sec-3-1">
        <title>5 https://github.com/weso/shex-lite</title>
        <p>{ Sema. At this stage the compiler checks that types are correct and that
the invocations and references that occur in the schemas are de ned. Also,
during this process, if any error or warning is detected, the compiler will
notify the user about the problem and possible solutions.
{ IRGen. This module is the one actually generating target code (Java,
Python, Any...). In order to allow adding other languages in the future,
ShEx-Lite delegates the speci c language checks and mappings and
therefore it o ers an interface that other language generators will implement. Each
one of the code generators is responsible of checking that the constraints
represented by the schemas are able to be expressed in the corresponding
language. If the schema meets the requierements of the corresponding language,
it is the speci c code generator the one that produces the code.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and future work</title>
      <p>In this paper we have presented ShEx-lite, a subset of ShEx that allows to
generate domain model objects in di erent Object-Oriented languages like Python
and Java. The system proposed is being used in the Hercules project6, and as a
future work we're planning to incorporate the possiblity to read shape
expressions from tabular formats like CSV7.</p>
      <p>Aknowledgements. The HERCULES Semantic University Research Data
Project is backed by the Ministry of Economy, Industry and Competitiveness
with a budget of 5.462.600,00 euros with an 80% of co nancing from the
20142020 ERDF Program. This work has also been partially funded by the Spanish
Ministry of Economy and Competitiveness (Society challenges:
TIN2017-88877R).</p>
      <sec id="sec-4-1">
        <title>6 https://www.um.es/web/hercules</title>
        <p>7 https://github.com/dcmi/dcap</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gamma</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Helm</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Johnson, R.,
          <string-name>
            <surname>Vlissides</surname>
          </string-name>
          , J.:
          <article-title>Design patterns: Abstraction and reuse of object-oriented design</article-title>
          .
          <source>In: European Conference on Object-Oriented Programming</source>
          . pp.
          <volume>406</volume>
          {
          <fpage>431</fpage>
          . Springer (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>R.C.</given-names>
          </string-name>
          :
          <article-title>Clean Architecture: A Craftsman's Guide to Software Structure and Design</article-title>
          .
          <source>Pearson</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>McAllister</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Microsoft's roslyn: Reinventing the compiler as we know it</article-title>
          . N. McAllister//InfoWorld from IDG.{
          <year>2011</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Prud'hommeaux</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Labra</surname>
            <given-names>Gayo</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.E.</given-names>
            ,
            <surname>Solbrig</surname>
          </string-name>
          , H.:
          <article-title>Shape expressions: an RDF validation and transformation language</article-title>
          .
          <source>In: Proceedings of the 10th International Conference on Semantic Systems</source>
          . pp.
          <volume>32</volume>
          {
          <issue>40</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>