<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Designment of E-R model based on RDF(S)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiangbin Gao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dantong Ouyang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuxin Ye</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Computer Science and Technology, Jilin University</institution>
          ,
          <addr-line>Changchun 130012</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education</institution>
          ,
          <addr-line>Changchun 130012</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Based on the vast domain resources of RDF(S) on the web and SPARQL's powerful query ability, this article presents a new method of designment of E-R model. The steps for this design are: (1) Formulating SPARQL rules (including resource query rules and schema query rules) by the analysis of RDF(S)'s structure. (2) Parsing the optimal resource obtained through the query sentences. (3) Completing the designment by taking advantages of the translation from RDF(S) model to entity-relationship model in accordance with the content queried. The results indicate that, the designment of E-R model based on RDF(S) could restore user real requirements of great possibilities and help database designer to complete design in a strange area.</p>
      </abstract>
      <kwd-group>
        <kwd>RDF</kwd>
        <kwd>RDF Schema</kwd>
        <kwd>E-R model</kwd>
        <kwd>SPARQL</kwd>
        <kwd>database design</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Generally speaking, database design depends, to a large extent, on the designer's
understanding and representing of user requirements[1]. Passed through 60 years
development, the main ideas about database construction have improved a lot.
Much research has been put forward. Kahn raised a way of obtaining description
information in the process of database design[2], which emphasized the designer
should gather requirement in the real world and judge the requirement's su
ciency and completeness during the process of design. Blaha, Premerlani came up
with the method of Object-Oriented to build relational database[3], it
promoted adherence to normal forms and improved integration between databases and
applications. Finkelstein, Schkolnick took advantage of the calculation method
of heuristic to optimize data dictionary and applied it in relationship database
design[4]. Asuman, Birol designed a generalized expert system for database
design[5], they optimized the expert system by adding new design approach and
modifying older method and formed the human interaction mode. Unfortunately,
because of the shock of the web, the traditional methods can not longer t the
needs in some certain mode, so it's confronted with more complicated problems.</p>
      <p>Resource Description Framework (RDF)[6] provides a set of data models to
support the description of domain resources stored on the web and has strong
ability on the knowledge sharing. SPARQL[7] just o ers the powerful query
capabilities to RDF(S) and could be able to analyze RDF deeply. Based on vast
amounts of RDF resources and SPARQL, this article brings a fresh perspective
about designment of E-R model. As a result, the designment further strengthens
the database design system and satis es the user in special elds a lot.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Frame structure</title>
      <p>
        Based on the strong ability on knowledge sharing, we choose RDF as design
source. Particularly, we use WordNet[8] and Swoogle[9] to obtain user
requirements. In this article, we propose a designment of E-R model based on RDF(S)
and it is built around the traditional method. There are three main tasks:
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Based on the structure features of RDFS, formulating the relevant query
rules of SPARQL;
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Parsing the RDF resources through the SPARQL query rules;
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) Drawing E-R model by transformation from RDF(S) model to E-R model.
      </p>
      <p>
        For task (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), formulating query rules by the analysis of RDF(s)'s structure
and writing query sentences. Especially, we propose resource query rules which
designs by the le's own namespace and schema query rules which designs by
RDF Schema's namespace. For task (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), using the method of Width First
Traversal to traverse RDF by the sentences obtained from task (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) to get results, such
as classes, properties and data types. For task (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), using the rules of
transformation between RDF(S) model which describes the structure of RDF(S) and E-R
model to draw E-R model through the analysis and arrangement of the results
from task (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ). The frame structure is shown in Figure 1.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Formulating query rules</title>
      <p>For the designment of E-R model by RDF(S), we parse RDF resources by
SPARQL to transform the elements from RDF(S) model to E-R model. As SPARQL
mainly uses the rules in WHERE, so the core content of this article is focus on
the query rules. To keep things simple, the rules below mentioned keep only the
part of WHERE, the parts of PREFIX, SELECT and FROM are all ignored.
3.1</p>
      <sec id="sec-3-1">
        <title>Resource query rules</title>
        <p>Like RDF, SPARQL also provides the matching pattern of triple. Unlikely, the
triple's subject, predicate and object all can be variable. The triples in SPARQL
are used to match the triples in RDF and there will be the results if they matches
successfully. Because every RDF le has its own namespace and builds the le
structure under it, so we can not query all of the information with a set of
inherent rules. The designer has to design the query sentences by the speci c
namespace. In this case, we put forward the resource query rules in table 1.</p>
        <p>In order to have a comprehensive understanding of the le, we can get all
the triples by rule 1. As needed, we may often run into the situation that there
are one known data item and two unknown or two known data items and one
unknown, so we design rule 2 and rule 3. According to the content above, we
can make a traverse only by one known data item. Based on the diversity of the
semantic, we can add some constraints and deformation to the rules to make the
semantic more richer. In rule 4, we use UNION to combine two triples together.
As a result, there will be at least one branch matching the triples in RDF. In
rule 5, we use OPTIONAL to combine two triples together. Speci cally, the
second triple will modify the rst. In rule 6, we use FILTER to combine two
triples together and the second triple will constrain the rst. Furthermore, the
constraints can be logical expressions of Boolean value.</p>
        <p>Specially, there will be the blank nodes in the results. The blank node is
a special variable and it only matches the element whose data type is blank.
Otherwise, the blank node doesn't represent the real meaning and it only holds
up some space to connect the other two nodes. During the process of query, we
will meet the blank nodes inevitably. At this time, designer should ignore the
blank node and skip it to search the bilateral nodes.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Schema query rules</title>
        <p>About RDF, in addition to the user de ned namespace, RDF Schema also
provides its own namespace for the layout of the structure of RDF. The rules
mentioned above describe the resource rules, but the results are always disorderly and
they are even meaningless in extreme cases. On the other hand, resource query is
always blind and has low e ciency. Therefore, we put forward the schema query
rules shown in table 2. So, we design the query sentence that one known data
item connects with one unknown by the prede ned vocabulary RDF Schema
provides. Thus, as long as the designer understand the special semantics of the
prede ned vocabularies, he can obtain the unknown data item from the known
easily. On the whole, the schema query rules not only improve the readability of
RDF but also enhance the regularity of the query. Specially, like resource query
rules, we can also add some constraints and deformation to schema query rules
to complete more complex query and they are no longer to be described.
Bene ting by the hierarchical structure, we simulate the whole RDF le as the
tree model and regard the triples as the nodes of the tree. For the elements we
have found without redundancy and missing, this article takes the strategy of
top-down and the method of gradually precision to query. Furthermore, Width
First Traversal method is also used during the whole process. It is necessary
to state that with the query nodes expanding, the semantic relevance between
the follow-up nodes and the initial nodes will decrease rapidly. So, in order to
pledge the usefulness, the designer should de ne the layer of query with actual
requirement to improve the quality of query results.
4.1</p>
      </sec>
      <sec id="sec-3-3">
        <title>The structure of RDF(S) model</title>
        <p>
          In order to achieve the model-to-model transformation, it's necessary for us to
analyze the models' structures with their characters to ensure the relation of the
speci c elements between the two models. As the source, RDF has rich semantic
content and can describe the domain knowledge in more detail. We will get
rdfs:class which represents class and rdf:Property which represents property by the
generalization of rdfs:Resource. About the RDF(S) model, the most important
characteristic is hierarchy, the class can be inherited by rdfs:subClassOf, while
the property by rdfs:subPropertyOf. Using the tree structure, rdfs:Resource acts
as the root node while rdfs:Class and rdf:Property and their subclasses act as
the sub-nodes to expand. Because RDF is a kind of resource to store
information, so we save the classes and the attributes after they have been instantiated.
For stipulating the class connected with property and the relationship between
class and property, RDF provides rdfs:domain and rdfs:range to constrain the
property's domain and range respectively. Based on the constraints, we will be
able to nd the relationship between the class and the property as well as the
relationship between the classes. Finally, the RDF network will be presented
clearly. The structure is showed in Figure 2.
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Query against class. Based on the characteristic of hierarchy, we can
use rule 7 and rule 8 to search for all the classes and their sub-classes. The
subclass not only has the attributes its parent class has, but also has its own.
Correspondingly, the primary key and the foreign key are also inherited. As rule
9 says, rdf:type describes the connection between the instance and its
corresponding class. Rule 10 de nes the relationship between the subject class and
the object class, it connects the classes with the keywords such as rdfs:seeAlso
and rdfs:isDe nedBy. Document query provides the designer with a readable
description and explains the class's speci c de nition by rule 11.
        </p>
        <p>When traversing, the classes act as the starting points and the ones searched
in the rst case stored in queues. Then, putting out the rst class and nding
its sub-nodes as well as the sibling-nodes until the sub-nodes are all searched.
After the traverse of the rst class, we put out the second one and do the same
process. Likewise, making a traverse for the whole tree. While, because of the
namespace user de ned, we may not get classes by rule 1 directly sometimes. So
the designer should traverse from the result's rst record. By the instance and
the property mentioned in the rst record, we will nd the related class.</p>
        <p>
          (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Query against property. Similar to the structure of class, property
also has the corresponding hierarchical structure. According to rule 12 and 13, we
can respectively nd all the properties and their sub-properties. RDF Schema is
a description framework centered on property. It utilizes domain class quali ed
by rdfs:domain and range class quali ed by rdfs:range to constraint the
property's semantic. On the other hand, it also links up class and property closely.
Frequently, using the constraints to query property and class which connects
with given nodes is a major way to traverse. As above, we query the domain
classes of property through rule 14 and the range classes by rule 15. The rules
take the property as predicate and specify the subject's type as well as object's.
        </p>
        <p>Due to the semantic absence of query, some properties only connects with
instances rather than classes. In this case, we can only query by instances to nd
the relationship between property and class.</p>
        <p>
          (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Query on datatype. Since RDFS does not have its own datatype, it
uses XSD's (XML Schema Datatype). If and only if the property value is text
type, we will get the corresponding data. Speci cally, we use rule 16 to query
datatype and rule 17 to query the value.
        </p>
        <p>
          (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) Query on instance. The instance of class is described by the
corresponding property's value (the range of the property). Since the instance could
be obtained by the method of data inquiry, so we won't give speci c rules,
designer could use the rules of datatype inquiry to accomplish the instance inquiry.
4.3
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Semantic absence of query</title>
        <p>Due to the truth that RDF is an open semantic framework, writers are not
under the speci c discipline to construct classes, properties and instances. Thus,
they could use the namespace they have de ned or RDFS prede ned. In the
reason that there is little or not any RDFS's prede ned vocabularies in the RDF
les, designer often needs resource query by the requirements. This leads to the
situation that some elements are unlikely to be acquired. On the contrary,
ER model makes demands on the form and structure of data with uni ed and
re ne requirements. Base on the standard, all the elements are designed strictly
to follow the regulation and they are not permitted by default. Therefore, it is
inevitable to cause semantics absence when doing the transformation.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Drawing E-R model</title>
      <sec id="sec-4-1">
        <title>The structure of E-R model</title>
        <p>As design result, relational database has the ability of data storage and o ers a
standard to manage resources. Entity, relationship and attribute together
constitute the E-R model whose structure is shown in Figure 3. The relationship
de nes the relation between entities and it extends the large network structure by
connecting entities. An entity has several attributes, while the attributes modify
the entity and express the entity's features. Among those, when an attribute
or a set of attributes can uniquely determine the entity, we call it primary key.
Specially, the relationship between the entities has many types, such as
one-toone (1:1), one-to-many (1:n) and many-to-many (m:n). The types express the
correspondence of the entities.
5.2</p>
        <p>
          Model-to-model transformation
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Transformation of class. We transform every class to an entity and take
the class name as the entity name. Using the existing knowledge, the designer
can select the proper attribute as primary key which can describe the class
only such as ID, name, etc. The speci c condition should be determined by the
actual requirements. Particularly, if the attribute appeared in one class is also
the keyword of some other classes, we call it foreign key. While transforming the
models, the keywords ( primary key and foreign key ) should be marked.
        </p>
        <p>
          (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Transformation of property. According to the di erent semantics,
property can be classi ed into text type (rdfs:Literal) and resource type (URI).
When the type is text, we can take the property as attribute which corresponds
to its domain class. Also, making the property name as attribute name and
property datatype as attribute datatype. When the type is resource, we can take
the property as relationship between the entities which correspond to the domain
and the range. What's more, we also make the property name as relationship
name. Based on the the query about the domains and the range's instances, we
can ensure the cardinality of the relationship. Specially, if the entity in one hand
has one instance and the other also has one, we call it 1:1; if the entity in one
hand has one instance and the other has many, we call it 1:n; if the entity in one
hand has many instances and the other also has many, we call it m:n.
        </p>
        <p>
          (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Transformation of datatype. The database makes a speci c request
for the datatype of attribute. Therefore, we need to specify the datatype in the
model-to-model transformation. Taking the datatype of MySQL as an example,
the corresponding relation of several main datatypes is given in Table 3. By
comparison, it is found that there are many similarities between XSD datatype
and SQL datatype, which also provides the possibility for the conversion.
        </p>
        <p>Note:VARCHAR stores the variable length strings with the maximum length
of 255 characters. If the length is greater than 255 characters, we use TEXT.</p>
        <p>In the actual process, the query and the transformation is a process of the two
at the same time, there is no obvious sequence. In combination with the resource
query rules and the schema query rules, we parse RDF of the tree structure and
facilitate the transformation from RDF(S) model to E-R model. Figure 4 shows
the corresponding relationship between RDF(S) model and E-R model.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Example for the Designment based on "blogger"</title>
      <p>The article chooses the inference engine Jena[?] to assist SPARQL with querying
and reasoning. The RDF le and the SPARQL le are used as input and the
query results are used as output. Designer selects the content that he wants
according to the RDF resources and writes some corresponding query sentences
into SPARQL le.</p>
      <p>As a typical example, we choose "blogger" whose address is "http://
wiki.creativecommons.org/ Special:ExportRDF/ Blogger" given by IBM to act as
source le to design. Speci cly, we use resource query rules and schema query
rules to parse the le and the layer of query is de ned at 5. Based on rule 1, we
can list all of the triples which is shown in Figure 5. However, according to rule
7, we can not nd any classes, so we traverse from the result's rst record.</p>
      <p>As we can see, the rst record gives a description to the instance of "&lt;http://
blog.planetrdf.com/rss.xml&gt;". Therefore, we use rule 9 to query the instances
type and the result is "rss:channel". Unfortunately, we don't have found some
properties connected with "rss:channel". For this reason, we turn to give a query
to the instance of "&lt;http://blog.planetrdf.com/rss.xml&gt;" again. With rule 3,
we can get the property ("foaf:topic") of "rss:channel". In addition, its value and
data type which is "string". Then, we transform it to "VARCHAR" which is the
data type of SQL. What's more , we can also nd the relationship "foaf:maker"
that connects "rss:channel" and a blank node. What is shown in Figure 6.
According to the query sentence like
f &lt; http://blog.planetrdf.com/rss.xml &gt; foaf:maker :b0.</p>
      <p>:b0 ?y ?z. g,
we will get the resource "&lt;http://blog.planetrdf.com/&gt;" connected with
"rss:channel" by the blank node. Otherwise, we can also search for "foaf:Agent"
which represents the blank node and its properties "foaf:name" and "foaf:interest".
By convention, we regard "foaf:name" as primary key and underline it. What is
shown in Figure 7. Based on the query about the classes "foaf:maker" connects
with, we can ensure the cardinality is "m:n". What is shown in Figure 8.</p>
      <p>Repeating the steps, we draw the E-R model about "blogger" in Figure 9.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Designment of E-R model based on RDF(S) is a new method for database design.
It takes advantages of RDF's ability on knowledge sharing and makes use of the
resources stored on the web vastly. The results show that the method could
restore user real requirements of great possibilities and help database designer
to complete design in a strange area.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Sugumaran</given-names>
            <surname>Vijayan</surname>
          </string-name>
          , Storey Veda C:
          <article-title>The role of domain ontologies in database design: An ontologymanagement and conceptual modeling environment</article-title>
          ,
          <source>ACM Transactions on Database Systems</source>
          ,
          <year>2006</year>
          ,
          <volume>31</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1064</fpage>
          -
          <lpage>1094</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kahn Beverly</surname>
            <given-names>K</given-names>
          </string-name>
          :
          <article-title>A method for describing information required by the database design process</article-title>
          ,
          <source>Proceedings of the 1976 ACM SIGMOD international conference on Management of data</source>
          ,
          <year>1976</year>
          :
          <fpage>53</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Blaha</surname>
            <given-names>Michael R.</given-names>
          </string-name>
          , Premerlani William J.:
          <article-title>Relational database design using an object-oriented methodology</article-title>
          ,
          <source>Communications of the ACM</source>
          ,
          <year>1988</year>
          ,
          <volume>31</volume>
          (
          <issue>4</issue>
          ):
          <fpage>414</fpage>
          -
          <lpage>427</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Finkelstein</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schkolnick</surname>
            <given-names>M</given-names>
          </string-name>
          :
          <article-title>Physical database design for relational database</article-title>
          ,
          <source>ACM Transactions on Database Systems</source>
          ,
          <year>1998</year>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ),
          <fpage>91</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dogac</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spaccapietra</surname>
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A generalized expert system for database design, Software Engineering</article-title>
          , IEEE Transactions on,
          <year>1989</year>
          ,
          <volume>15</volume>
          (
          <issue>4</issue>
          ):
          <fpage>479</fpage>
          -
          <lpage>491</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Ebiri</given-names>
            <surname>Sejla</surname>
          </string-name>
          , Goasdoue Francois:
          <article-title>Query-Oriented summarization of RDF graphs</article-title>
          ,
          <source>Proceedings of the 30th British International Conference on Databases, Edinburgh</source>
          , United kingdom,
          <source>July</source>
          ,
          <year>2015</year>
          :
          <fpage>87</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Meehan</given-names>
            <surname>Alan</surname>
          </string-name>
          , Brennan Rob:
          <article-title>SPARQL based mapping management</article-title>
          ,
          <source>Proceedings of the 9th International Conference on Semantic Computing</source>
          ,
          <string-name>
            <surname>Anaheim</surname>
            <given-names>CA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>United</surname>
            <given-names>states</given-names>
          </string-name>
          ,
          <source>February</source>
          ,
          <year>2015</year>
          :
          <fpage>456</fpage>
          -
          <lpage>459</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Zhang</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
          </string-name>
          Jing-Jiao, Hu Ming-Han:
          <article-title>Implementation of Chinese WordNet</article-title>
          , Dongbei Daxue Xuebao/Journal of Northeastern University,
          <year>2003</year>
          ,
          <volume>24</volume>
          (
          <issue>4</issue>
          ):
          <fpage>327</fpage>
          -
          <lpage>329</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Ding</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Finin Tim: Swoogle: A search and metadata engine for the semantic web</article-title>
          ,
          <source>Proceedings of the13th ACM Conference on Information and Knowledge Management</source>
          ,
          <string-name>
            <surname>Washington</surname>
            <given-names>DC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>United</surname>
            <given-names>state</given-names>
          </string-name>
          ,
          <source>November</source>
          ,
          <year>2004</year>
          :
          <fpage>652</fpage>
          -
          <lpage>659</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Ameen Ayesha:
          <article-title>Extracting knowledge from ontology using Jena for semantic web</article-title>
          ,
          <source>Proceedings of 2014 International Conference for Convergence of Technology</source>
          , Pune, India, April,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>