=Paper=
{{Paper
|id=Vol-1409/paper-11
|storemode=property
|title=Bringing Agility into Linked Data Development: An Industrial Use Case in Logistics
                        Domain
|pdfUrl=https://ceur-ws.org/Vol-1409/paper-11.pdf
|volume=Vol-1409
|dblpUrl=https://dblp.org/rec/conf/www/GocebeDK15
}}
==Bringing Agility into Linked Data Development: An Industrial Use Case in Logistics
                        Domain==
                 Bringing Agility into Linked Data Development: ∗
                                      An Industrial Use Case in Logistics Domain
                  Pinar Gocebe          Oguz Dikenelli                                           Nuri Umut Kose
                               Ege University                                            BIMAR Information Technologies
                          gocebepinar@gmail.com                                              umut.kose@bimar.com.tr
                           odikenelli@gmail.com
Abstract                                                                   of the discharging operation at the port can be very critical for the
Logistics is a complex industry where many different types of com-         roles that are participated in the process.
panies collaborate in order to transport containers to the last point.         This paper introduces an implemented architecture based on
One of the most important problem in logistics domain is obser-            Linked Data infrastructure for the well known logistics problem
vation and monitoring of container life cycle where each step of           “Observation and Monitoring of Container Life Cycle”. Applica-
the container transportation may be performed by different com-            tion is developed within the ARKAS Holding which is one of
pany. Thus, observing and monitoring of the container’s life cy-           Turkey’s leading logistics and transportations company. It oper-
cle in real time become a challenging engineering task. In this re-        ates in different fields such as sea, land, rail, air transportation, ship
search, Linked Data development infrastructure has been used to            operations and port operations. Therefore, executing “Observation
implement dynamic container observation and monitoring system              and Monitoring of Container Life Cycle” problem in real time is a
for ARKAS company which is the leading logistics company in                challenging engineering task since all sub-transportations may run
Turkey. During the development of the system, it has been observed         in parallel on different software systems of different companies.
that agile practices like feature/story oriented development, test first   The end goal is to have managers and customers be able to track
development and usage of Agile Architecture approach improves              the container transportation life cycle.
the product and project management quality. So, a new methodol-                In the last decade, EDI-based standards (EDIFACT, RosettaNet,
ogy has been proposed based on these practices for Linked Data             STEP, AnsiX12), XML standard and Service Oriented Architecture
development.                                                               (SOA) approaches are used for solving the integration problems of
                                                                           logistics industry[1, 2]. These standards provide common syntax
Keywords Linked Data Development Methodology, Agile Ana-                   for data representation. SOA provides an application integration in-
lytics, Agile Architecture                                                 frastructure between different companies via web services. In the
                                                                           EDI-based standards, messages are pushed among the organiza-
1.   Introduction                                                          tions on a predefined time and these messages are translated into
                                                                           suitable format for receiver or sender organization in order to pro-
Logistics is a complex industry where many different types of roles        vide communication. However, these technologies are not sufficient
such as shipping agency company(ies), port/ship operator(s), land,         to ultimately solve the integration challenges in large enterprises. In
rail and air transportation companies collaborate in order to trans-       the SOA approaches, the most important problem is connectivity
port a container to a destination. These companies require a collab-       [3]. Identifiers (ID) of the data which are stored in the database are
oration infrastructure in order to resume the whole transportation         acknowledge inside of the systems and they lose their meaning in
process seamlessly. This collaboration infrastructure should pro-          the other systems. Finding the operation of a web service that will
vide an integration environment for the information systems of the         be called by the identifier is constructed into the software appli-
companies in order to manage all sub-transportations and needs to          cation logic. This is elaborated application logic when considering
be open in a sense that it should to be easy to be able to add/remove      the complexity of the logistics industry. EDI-based standards are
company(ies) into the process. In addition to the integration of the       not suitable for real-time applications since data may be outdated
information systems, monitoring of the process is also critical for        when information is updated and this is not directly send in an mes-
the effective management of the process. For instance, important           sage. Also, too many conversion is needed when organizations are
events like delay in the completion of the land transport, starting        used different types of EDI formats.
                                                                               Linked Data infrastructure seems like an appropriate technical
∗ This work is funded by the Republic of Turkey Ministry of Science,
                                                                           solution for the requirements of the logistics industry, since it pro-
Industry and Technology                                                    vides an integration environment which is more flexible, extensible
                                                                           and open to the exterior when necessary. Linked Data standards
                                                                           and infrastructure are prevalently used to integrate enterprise in-
                                                                           formation and business processes[4–6]. In Linked Data based in-
                                                                           tegration, identifiers (URI) is not only known in system-wide, but
                                                                           also known in web-wide and the data which is represented with
                                                                           these URIs is reachable from HTTP protocol. Thus, all data sources
                                                                           within the company and/or on the web can be connected with each
                                                                           other creating a huge knowledge base and software systems can
                                                                           use this knowledge base independently from each other. Therefore,
Copyright is held by the author/owner(s).
WWW2015 Workshop: Linked Data on the Web (LDOW2015).
                                                                           Linked Data technologies propose a new solution to dynamic, dis-
tributed and complex nature of the “Observation and Monitoring of              Events of the container transportation in different work areas
Container Life Cycle” problem specifically and logistics domain in             should be monitoring and customers should be informed about
general.                                                                       transportation status.
    During the development of the aforementioned “Observation
and Monitoring of Container Life Cycle” application, the devel-          3.     Overview of the Methodology
opment team defined a development methodology. Since, the de-
velopment team has a long time experience in Agile Development,          It is well understood that agility is the good approach to cope with
the proposed methodology brings agile practices into the Linked          changing business and architectural requirements in software de-
Data development. The Methodology called as BLOB(A Method-               velopment. Not only software development, but also business in-
ology to Bring Agility into Linked Open Data Development for             telligence and analytics implementations benefit the agile style of
Businesses) has evolved through the iterations of the development.       development as proposed in [8]. The proposed methodology takes
Its final version which is overviewed within the paper has following     some practices from agile software development, agile analytics
contributions:                                                           like Scrum[9] and XP[10] such as feature/story oriented devel-
                                                                         opment, test first approach[11], customer involvement and con-
 • Feature/story oriented Linked Data development.                       tinuous integration[12]. Scrum infrastructure is used to managed
                                                                         project with self-organized team dynamics and iterative life cycle.
 • Introducing Agile Architecture[7] approach to Linked Data de-
                                                                         Linked Data environment changes constantly by occurenses of new
     velopment.                                                          data sources, new links and changes in ontologies. This highly dy-
 • Applying test first approach to Linked Data development.              namic environment may cause changes in business and architec-
                                                                         tural requirements. Thus, agile practices used within the linked data
At the moment, “Observation and Monitoring of Container Life             methodoloy makes the methodology suitable for highly dynamic
Cycle” application is operational and tested by customer operation       environment of Linked Data application development.
unit of ARKAS holding. The paper introduces how the application              Linked Data development is very young domain where criti-
and its architecture evolved through the iterations of the proposed      cal tools and architectural patterns are constantly evolving. Also,
methodology.                                                             changes in business requirements and/or Linked Data environment
                                                                         may affect the initial architectural assumptions. Thus, development
                                                                         team should observe the evolution and the performance of the ar-
2.    The Problem : Observation and Monitoring of                        chitecture throughout the development. Observation of the arhitec-
      Container Life Cycle                                               ture evaluation throughout the methodology is clearly defined in
                                                                         Agile Architecture approach [7]. As defined in Agile Architecture
In the logistics industry, customers’ loads are transported to a final   approach, the proposed methodology sets the architectural style and
destination from a start location within containers. Through this        evaluation criterias at the beginning of the each iteration and vali-
transportation, containers are proccessed in a variety of work ar-       dates the architectural assumptions at the end of the iteration.
eas such as port, warehouse, land, rail and air. For instance, let us        Linked Data development requires some specific tasks as de-
consider a company that wants to send its load to customers in Mu-       fined in various Linked Data development methodologies[13–16].
nich/Germany from Manisa/Turkey. This company is aggreed with            Also, Ontology Modelling literature has a long history with many
a forwarder company for the transportation. This transportation is       proposed and used methodologies[17–21]. The proposed method-
planned as a four-stage; a land transportation from Manisa to Izmir      ology takes the required tasks from Linked Data development and
Alsancak port, a maritime transportation from Izmir Alsancak port        Ontology Modelling approaches and combines them with agile
to Piraeus port of Athens, a maritime transportation from Pireaus to     practices within a iterative life cycle. The methodology is evolved
Venice port and a land transportation from Venice port to Munich.        through the four iterations of the application development which
Also, it takes approximately 10 days and there is not an interface to    took more than a year and each iteration contains small sprints
get information about the transportation.                                which took between 2 and 4 weeks. During the iterations, it is
    Throughout the transportation, customers want to learn exact         clearly understood that we need to define test first approach by
status and position of the transported loads and this problem is         identifying test case modelling and testing points in the life cycle.
dealed in wireless network and RFID studies[36, 38, 39] from the         Another critical observation is the neccessity of parallel execution
viewpoint of hardware. But, these studies is related with only posi-     of Linked Data Environment Implementation and Application De-
tion and status of the cotainers. They are not interested with links     velopment cycles. Linked Data Environment Implementation cycle
of container with other concepts in the domain. In our case, cus-        includes all the activities related with data perspective. On the other
tomers only takes information by calling customer operation unit         hand, Application Development cycle includes software develop-
of ARKAS. All transportation companies use their special informa-        ment activities on the top of the generated linked data. In Figure1,
tion technologies and infrastructures. Furthermore, there is not an      inner cycle of the methodology represents the Application Devel-
integration environment between the systems of these companies.          opment cycle and the outher one represents Linked Data Environ-
Any latency in the transportation process is affected all other re-      ment Implementation cycle.
lated transportations. Therefore, transportation must be constantly
monitored by directly calling companies in charge and this brings        3.1    Analysis
too much operational load to customer operation unit employees.
                                                                         In the analysis activity, application requirements are identified and
In this research, we aim to solve the “Observation and Monitoring
                                                                         managed by product owner by incorporating with necessary project
of Container Life Cycle” problem of the logistics industry by us-
                                                                         stakeholders like customer role(s) and development team mem-
ing Linked Data infrastructre. For this purpose, following two main
                                                                         ber(s). As a first step of the analysis, main goals of the applica-
requirements of the industry are performed;
                                                                         tions are identified and for each main goal new user stories are de-
1. Providing an Integration Environment                                  fined to satisfy the goal. The critical aspect of the analysis phase
                                                                         from the linked data perspective is the identification of the data
   Transportation history of the container which is distributed into     sources that require for the user story. The data sources are iden-
   the different software systems should be integrated.                  tified by development team and attached to the “Application Re-
2. Monitoring Containers                                                 quirement Card” (ARC). The second critical differences from the
                                                                         • User Stories: Senarios of use from the viewpoint of end-users
                                                                              in order to implement the defined feature of the goal.
                                                                         • Competency Questions: Questions to validate the story.
                                                                        3.2     Iteration Requirement Specification
                                                                        Iteration Requirement Specification (IRS) is a planning activity.
                                                                        ARCs that are maintained in the ARC backlog are prioritized ac-
                                                                        cording to their business value and included data sources. Consid-
                                                                        ering the data sources in story prioritization is critical in terms of
                                                                        Linked Data development. If more than one sources are included in
                                                                        the story, this situation affects the Linked Data Generation, Link-
                                                                        ing and also Architecture Identification activities. Properties of core
                                                                        Linked Data architecture depends on publishing and integration de-
                                                                        cisions on different data sources. Thus, inclusion of more than one
                                                                        data source is critical in terms of establishing and evaluating core
                                                                        architecture in early iterations. Thus, at the end of the IRS activity
                                                                        development team decides the stories for the iteration depending on
                                                                        the business and architectural perspective.
                                                                        3.3     Linked Data Environment Implementation Cycle
                                                                        3.3.1    Architecture Identification
                                                                        Architecture Identification activity is first step of the architectural
                                                                        agility. Firstly, user stories of the ARC(s) that are selected in the
                                                                        previous activity are analyzed to identify the architectural pattern(s)
                                                                        depending on the architectural requirement of the application. From
                                                                        test first perspective, test planning for architecture evaluation are
                                                                        defined in this activity. Architecture evaluation is conducted by two
                                                                        levels of testing. The first level focuses on the retrieving perfor-
                                                                        mance of generated Linked Data and the other level focuses on
                                                                        the evaluation of selected quality attibutes[22] like performance,
                                                                        scalability, availability etc for the final application. The first level
                                                                        is applied in the Linked Data Generation and/or Linking activity
                                                                        and second level is applied in Validation&Verification activity. At
                                                                        this point, data retrieving criterias, quality attributes and their ex-
                                                                        pected boundaries are identified and documented as initial architec-
                                                                        ture evaluation test plan.
                                                                            There are three well known architectural patterns for consuming
Figure 1. BLOB: A Methodology to Bring Agility into Linked              Linked Data according to application requirements: the On-The-
Open Data Development for Businesses                                    Fly Dereferencing pattern, the Crawling Pattern and the Query Fed-
                                                                        eration Pattern [23]. On-The-Fly Dereferencing pattern conceptual-
                                                                        izes the web as graph of documents which contains dereferanceble
classical user story definition is the competency question section      URIs. Thus, an application executes a query by accessing a RDF
of the card. Competency questions is the well known approach in         file by dereferencing the URL address then follows the URI links
the ontology development literature [17, 19–21] in order to limit       by parsing the received file on-the-fly[24]. In the Crawling Pattern,
the scope of the ontology and validate the developed ontology [37].     web of data is constantly crawled by dereferencing URLs, follow-
In our case, competency questions are drived by considering the         ing links and integrating the discovered data on the local site. Query
linked view of the data sources and validation of the user stories      Federation Pattern is based on dividing a complex query into sub-
through the this linked view. These competency questions are the        queries and distributing sub-queries to relevant datasets. Query fed-
main sources of the user story validation test cases. Analysis activ-   eration requires accessing datasets via SPARQL endpoints in order
ity is not part of the iterations and can be executed when necces-      to execute sub-queries on distributed data sources.
sary. Product owner observes the evolution of the implementation,           All of them have disadvantages and advantages while mak-
collaborates with customer constantly and may define new goals,         ing architectural decision should take into account. In On-The-Fly
refined existing goals and/or define new ARC(s) for new or exist-       Dereferencing pattern, complex operations are very slow because
ing goals according to situation. These ARC(s) are maintained in        of dereferencing thousands of URIs in the background, but stale
the ARC backlog.                                                        data is never processed. The main advantage of the crawling pat-
    The ARCs are defined for each story including the following         tern is performance. Applications can use high volume of inte-
parts;                                                                  grated data in much higher performance than other patterns. On the
 • ID: Identifier of the ARC                                            other hand, the main disadvantage of this pattern is data staling and
                                                                        complexity of automatic linking of data on the fly. Query Federa-
 • Application Goal: Includes intended use of the application. It
                                                                        tion pattern enables applications to work with current data without
   lends assistance to draw the boundaries of the application.          needing to replicate complete data sources locally. On the other
 • Selected Data sources the story: Data sources that will be con-      hand, the main problem of this pattern is performance of the com-
   verted to Linked Data and be consumed by the application.            plex queries especially when query needs to join data from large
   They can be relational databases, documents, web pages etc.          number of data sources.
    Also, there are a wide range of Linked Data Design Patterns         Sindice2 , Watson3 and so on. If there is a ontology that overlaps
for modelling, publishing and consuming in the literature [25]. De-     outputs of conceptualizing activity, this ontology is taken as input
velopment team or data publishers can be use appropriate pattern        to the following activities.
or mixture of these patterns. However selection of these patterns           If there is not a suitable ontology, Building Concept Hierarchy
is related with different factors that affect architectural decisions   sub-activity is taken place. In this activity, ontology engineers vali-
such as number of data sources, data freshness level, application re-   date taxonomies of terms in the Domain-Property-Range Table ac-
sponse time and ability to discover new sources at runtime. Devel-      cording to OntoClean Methodology [26]. After the hierarchy vali-
opment team makes architectural identification decision(s) based        dation, Design-Pattern Implementation sub-activity uses the ontol-
on the selected quality attributes and all of these factors.            ogy design patterns in the literature[27] and structure of the ontol-
                                                                        ogy is improved based on the selected pattern(s). Finally, the ontol-
3.3.2   Ontology Modelling                                              ogy is implemented with a formal language such as RDFs, OWL
In the Ontology Modelling activity, concepts of domain and rela-        by using capabilities of an ontology development environment. For
tionships between these concepts are modelled and implemented           example, TopBraid Composer4 , Protege5 , WebProtege6 .
in a formal language. Ontology Modelling activity is inspired from
the following literature [17–21, 26–28]. This activity is composed      Integration/ Modularization
of Conceptualizing, Ontology Implementation, Integration/ Modu-         Main goal of the Integration/ Modularization activity is achieving
larization and Test Case Generation sub-activities. Figure 2 shows      reuse, maintainability, and evolution for large ontologies. Inspiring
that ontology modelling lifecycle.                                      from the ANEMONE[28], we examined the conceptual links be-
                                                                        tween concepts which are generated in the previous iterations and
                                                                        the concepts of this iteration. After that, ontology is divided into
                                                                        modules or common concepts of modules are integrated into a new
                                                                        ontology module.
                                                                        Test Case Generation
                                                                        Ontology Modelling activity focuses on the data sources that de-
                                                                        fined in the ARC(s) and generates the metadata part of the ontol-
                                                                        ogy for the focused data source. This activity also identifies linking
                                                                        requirement(s) between the ontologies in metadata level. At this
                                                                        point, it is possible to define test cases in the ontological level
                                                                        to validate consistency and competency of the developed ontolo-
                                                                        gies and the linking requirement(s). The main source to define test
                                                                        cases is the competency questions that defined in Analysis activity.
                                                                        These questions are refined based on the knowledge of the devel-
                                                                        oped ontologies and linking requirement(s). Also, new competency
                                                                        questions may be added , if they are needed. These competency
                                                                        questions are transfered to the real SPARQL queries to validate that
                                                                        developed ontology satisfies the execution of the competency ques-
            Figure 2. Ontology Modelling Life Cycle.                    tions. These queries are saved as an ontological test cases for the
                                                                        ARC(s) at hand.
Conceptualizing
                                                                        3.3.3   Linked Data Generation
Conceptualizing is a common activity in any ontology modelling
methodology[17–19]. Main goal of the Conceptualizing activity is        Linked Data generation activity is related to generate linked data
construction a conceptual model of domain knowledge. Domain             from selected data sources according to the ontology model(s).
experts and ontology engineers analyze data sources of ARC(s)           The data generation process differs according to being whether
which are selected in the IRS phase and prepare a rough list of         data sources structured (e.g. databases), semi-structured (e.g. XML,
terms, then the terms which are out of the goal scope that belongs      XLS, CVS, etc.), or un-structured (e.g. HTML, XHTML, etc.).
to the selected ARC(s) are eliminated. Ontology engineers and               Structured data is mapped directly to the ontologies via RDB2RDF
domain experts prepare a Lexicon of Terms document that contains        convertors as D2RQ7 or Ultrawrap8 . Also, “RDB2RDF Mapping
a list of domain concepts and then create a Controlled Lexicon          Patterns”[29]can be used in creation of R2RML9 mappings that
with explanations of the concepts. These explanations lend to find      are used by RDB2RDF converters. Semi-structured data sources
additional concepts. Also, Domain-Property-Range and Concept-           are proccesssed by using toolsets such as tripliser10 or Google Re-
Instance tables are prepared in order to simplify following ontology    fine RDF Extension11 that allow to convertion according to par-
modelling sub-activities. The former table represents relationships     2 http://sindice.com/search
between source and target terms and the latter represents instances     3 http://watson.kmi.open.ac.uk/WatsonWUI/
of concepts.
                                                                        4 http://www.topquadrant.com/tools/modeling-topbraid-composer-
Ontology Implementation                                                 standard-edition/
                                                                        5 http://protege.stanford.edu/
Ontology Implementation activity defines a formal ontology model
from the defined conceptual model and implement it with a on-           6 http://protegewiki.stanford.edu/wiki/WebProtege
tology modelling language. First sub-activity of the Ontology Im-       7 http://sw.cs.technion.ac.il/d2rq/tutorial
plementation step is Reusability. Reusability aims to reuse known       8 http://capsenta.com/#section-ultrawrap
and accepted ontologies on the web. Ontology engineers try to find      9 http://www.w3.org/TR/r2rml/
an ontology in semantic web search engines such as Swoogle1 ,           10 http://daverog.github.io/tripliser/
1 http://swoogle.umbc.edu/                                              11 http://refine.deri.ie/
ticular instructions. Except for RDF converters, Natural Language            team implement visual test automation scripts for each senario and
Processing (NLP) methods such as tagging can be used to acquire              integrate these senarios into CI infrastructure.
data from un-structured data sources.
    Test case(s) to validate the ontologies are generated in the pre-        3.4.4     Integration with Real Data
vious activity (Test Case Generation sub-activity of the Ontology            In the final sub-activity of application development cycle, visual
Modelling). At this point, instance data is generated for the devel-         design and data integration layer of the visual design are connected
oped ontologies. Thus, it is possible to verify that real data sources       with real generated Linked Data sources. Final implementation is
are correctly transformed to the ontological instances. For this             tested with visual test scripts and whole application becomes ready
purpose, test automation script(s) that uses generated test case(s)          for Validation&Verification activity.
(SPARQL queries) are written to verify that expected results is
equals to the results of the SPARQL queries and these script(s) are          3.5     Validation & Verification
included into Continuous Integration (CI) infrastructure.
                                                                             Visual test automation script(s) and architecture evaluation test
                                                                             plan(s) are inputs of the Validation&Verification activity. All visual
3.3.4    Linking
                                                                             application test scripts are evaluated with customer and develop-
One of the most important principles of Linked Data is Linking. In           ment team together to validate that these script(s) cover all ARC(s)
this activity, connections are established between data sources in           requirements in the iteration. Then, architecture evaluation test(s)
manually or automatically. For this purpose, SILK12 and LIMES13              are created according to architecture evaluation test plan(s). The
link discovery frameworks can be used in order to automatise this            iteration is ended when all the defined tests are passed.
process. If unstructured sources like text will be linked, tools such
as Dbpedia Spotlight14 can be used. Also, another method is using            4.      Overview of the Methodology Implementation
same URIs at ontological level to establish links manually between
sources.                                                                     In the Analysis activity, two main goals were identified by stake-
    Similar to Linked Data Generation activity, in this activity test        holders as “Monitoring of the Container Transportation” (mon-
automation script(s) of the SPARQL queries that contains linking             itoring goal) and “Observation of the Container Transportation
requirements are written and integrated into CI infrastructure.              History” (observation goal). During the discussion, it was under-
                                                                             stood that primary customer for the system is customer operation
3.4     Application Development Cycle                                        unit employees. At this point, user stories were generated from the
                                                                             customer perspective. For the observation goal, stories generally fo-
3.4.1    Initial Visual Development
                                                                             cused on knowledge about the sub-transportations in the container
Linked Data visualization can be a complex and time consuming                transportation life cycle. For instance, a user story was defined as
task depending on the size of the visualized content. So, it is critical     “As a customer, I want to learn history of the specific booking in a
to start the application development cycle at the beginning of the           known port”. When stories were analyzed, it was seen by the devel-
iteration. Firstly, a visual interface draft is identified considering the   opment team, many stories required knowledge from different data
ARC(s) requirements. Then, team evaluates the different layouts              sources. For example, to drive booking knowledge in a known port,
of the selected visualization infrastructure(s) and produces new             knowledge of the agency and port data sources were needed. For
interface design examples for the working ARC(s). These examples             the monitoring goal, stories generally define the monitoring rules
are discussed with the customer and then initial visual design is            like “As a customer, I want to learn specific container is loaded to
decided for the iteration.                                                   a specific ship”. Thus, main senarios of these goals, their compe-
                                                                             tency questions and data sources were defined as ARC(s) and added
3.4.2    Mock Data Generation                                                to the backlog. Figure 3 is represented an example ARC.
Application does not only include user interface design compo-
nents, it also includes data integration layer to handle the data com-
ing from the Linked Data side. Real Linked Data is generated at
the end of the Linking sub-activity of the Linked Data Environ-
ment Implementation cycle. But, application development should
not wait till linked data generation and should continue seamlessly.
Thus, development team participate into Ontology Modelling activ-
ity and/or cooperate with the ontology modelling team to develop
mock linked data repository as the ontologies occures. These mock
repositories give chance to work on the data integration layer of the
visual design and also improve the visual components further.
3.4.3    Integration with Mock Data
In this activity, visual design and data integration layer of the
design is completed by using generated mock repositories. Since,
all visualization is fully functional development team can start to
define accaptence test senarios. Competency questions defined in
Analysis activity and refined in the Ontology Modelling activity                   Figure 3. Application Requirement Card for First Iteration.
are used to shape acceptance test senarios. Since, these senarios
are defined considering the implemented visual design, developer
                                                                             4.1     Overwiew of Iteration 1
12 http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/
                                                                             In the IRS activity, customer indicated that observation goal is
13 http://aksw.org/Projects/LIMES.html
                                                                             more urgent for their daily operations. Thus, it was understood
14 https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki               that ARC(s) of these goal has higher priority. The development
team outlined the criticality of the agency and port data sources,       generated by applying the previously defined ontology modelling
since these sources contains major part of the whole container           activities.
transportation life cycle. Therefore, ARC(s) included these sources          In the Linked Data Generation activity, data sources were con-
were selected for this iteration.                                        verted into RDF format by using an RDB2RDF tool according to
    The development team proceeded to Architecture Identification        developed ontologies and published in a local server. These conver-
activity and decided to use Query Federation pattern to avoid the        tions were defined in R2RML language by applying mapping pat-
stale data problem. Since, agency and port data sources are stored       terns in [29]. After the Linked Data generation of port and agency
in different relational databases, the development team decided to       sources, the development team immediately started to Linking ac-
use RDB2RDF tool for Linked Data generation as seen in Figure            tivity(without any testing of individual sources). Since, there are
3. After architecture evaluation test plan(s) were defined, federated    unique fields in the agency and port systems such as booking, con-
query response time (max 0.5 second) was taken as a base criteria to     tainer and bill of lading numbers, we did not use an automatic link
evaluate retrieving performance of generated Linked Data. Also, a        discovery tool. Linking of these sources were done by making URIs
load test (max 1second response time for each concurrent 20 users)       of them same in ontological level.
was planned to evaluate performance and scalability of the final             Since, reponse time of the federated query execution was se-
application.                                                             lected as Linked Data generation evaluation criteria in Architecture
                                                                         Identification activity. The development team started to rework on
                                                                         competency questions of the selected ARC(s), revised existing ones
                                                                         and added new questions. Then, these questions were converted
                                                                         to SPARQL queries (some minor changes required in ontology
                                                                         model) and executed over the generated linked data. Unfortunately,
                                                                         architecture did not satisfy the identified performance limit in the
                                                                         Architecture Identification activity. The development team tried to
                                                                         improve query response time by using different toolsets such as
                                                                         Ultrawrap, D2RQ and Oracle Spatial and Graph17 . Also, ontology
                                                                         model(s) were changed to simplfy the complex mapping(s). But,
                                                                         query performance was still away of expected 0.5 second and on-
                                                                         tology structure became unrealistic from domain perspective. Thus,
                                                                         the development team decided to terminate the iteration and began
                                                                         to new one.
                                                                         4.1.1     Lesson Learned
                                                                          • Since, learning of Linked Data development knowledge set
                                                                               takes time and prevent the smooth flow of activities within the
                                                                               iteration, plan an education for the development team about
                                                                               the Linked Data technologies including ontology modelling,
                                                                               R2RML mappings, SPARQL (as a minimum set).
                                                                          • Testing of Linked Data generation is conducted at the end of
                                                                               Linking activity. Some SPARQL queries require changes in
                                                                               ontologies and causes to return Ontology Modelling activity
                                                                               again. So, as a general rule develop test cases and execute tests
                                                                               when neccessary knowledge is ready. In this case, comptency
                                                                               questions can be revised and SPARQL queries can be gener-
               Figure 4. Architecture of First Iteration.
                                                                               ated at the end of the Ontology Modelling activity (method-
                                                                               ology cycle was revised according to this observation). Also,
     Ontology Modelling activity started after the Architecture Iden-          seperate these test cases for single data source(s) and federation
tification. After the initial conceptualizing effort, the development          in Ontology Modelling activity and apply them in Linked Data
team began to search logistics ontologies that proposed in the lit-            Generation and Linking activities seperately.
erature. One of these studies aims to plan military transportation
domain with different transportation types[30]. However, this study       • Integrating architecture identification and evaluation of the
have not scrutinize intimately the transportation field and focuses            identified architecture within the development life cycle is a
on planning and scheduling. In study [31], events and actions that             good idea.
are occured in the logistics processes of enterprises are defined, but
transportation process is not detailed. In[32–34], an OWL ontol-         4.2     Overview of the Iteration 2
ogy for the formal representation of logistics services defined and      After experiencing the limitations of the Query Federation Pattern,
OWL-S usage is examplified for service oriented logistics opera-         the development team started to this iteration with a clear goal of
tions. iCargo15 and CASSANDRA16 projects are tried to integrate          changing the Linked Data architecture. Also, the development team
data which comes from different sources and whereupon require-           selected a new ARC for observation goal which includes warehouse
ments such as transportation risk assessment and energy conserva-        data source. This selection aimed to implement whole methodology
tion are performed[35]. However, the proposed ontologies is not          life cycle and improve experiences and apply the learned lessons.
available for public use. Thus, the development team decided to          In the Architecture Identification activity, the development team de-
develop required ontologies from scratch. To this end, in this step      cided to use Crawling Pattern which crawls internal data sources of
the ontologies which represent port and agency data sources were         the company to solve the observed performance problems. At this
                                                                         point, solving stale data problem of the Crawling Pattern became
15 http://i-cargo.eu/
16 http://www.cassandra-project.eu                                       17 http://www.oracle.com/technetwork/database/options/spatialandgraph
a problem. For this purpose, the development team decided to use       team decided to use AKKA infrastructure19 whis is highly scal-
a Change Data Capture(CDC)18 tool and transforming each event          able event driven agent system to manage concurrency, parallelism
that comes from CDC tool to the RDF instances of the ontolo-           and fault tolerance. Each update actor in the Integration Module is
gies. The development team decided to choose high performance          an AKKA agent which handles XML messages in the queue and
commercial CDC tool and it was planned to develop a Integration        converts them to the RDF triple and updates RDF store. AKKA
Module for transforming CDC events. Also, an open source queue         infrastructure of the Integration Module is represented in Figure 6.
implementation was selected to synchronize the CDC tool and In-            In the Linking activity, linking was not required because of the
tegration Module. It was also decided to store generated RDF in        establishing links in the ontological level. Thus, only correctness of
a central RDF store. General architectural view of this iteration is   the links between warehouse RDF data and previously generated
shown in 5. There was no change in the evaluation criteria of the      RDF data were tested in this activity.
architecture. Thus, architecture evaluation test plan was kept same.
            Figure 5. Architecture of Second Iteration.                      Figure 6. AKKA Infrastructure of the Integration Module
    In the Ontology Modelling activity, the development team               After the successful verification of the generated Linked Data,
started to define warehouse ontology model considering previous        the development team started to build a visual interface for observa-
experiences. Thus, test cases were generated at the end of this ac-    tion of the container transportation. According to discussions with
tivity. The development team revised and extended competency           the customer, the development team identified a visual interface de-
questions of the selected ARC and tried to write SPARQL version        sign and select a suitable visualization tecnique and tool. But, im-
of these questions in order to validate adequateness of the gen-       plementation of this interface took too long and it was not finished
erated ontology model. After the successfully defining SPARQL          in the planned iteration time limits.
queries of competency questions, the development team is started       4.2.1    Lesson Learned
to Linked Data Generation activity. In this activity, R2RML map-
ping of the warehouse source was created and all converted RDF          • Generation of test cases and making tests in the Ontology Mod-
is stored to central RDF store. Then, generated warehouse data               elling, Linked Data Generation and Linking activities are good
was tested by using generated test cases. Also, agency and port              idea.
data were converted using previously defined mappings and stored        • Implementing a visual interface after the Linked Data gener-
in the central RDF store. In order to solve stale data problem,              ation causes delay in product delivery and decreases motiva-
the development team started to implement an Integration Mod-                tion of the project stakeholders. Thus, application development
ule. In this module, CDC tool handles changes that occur in the              should be a parallel process with Linked Data generation (Ap-
internal relational data sources simultaneous without overload. It           plication Development cycle is added to the methodology).
sends each change to a java message queue (CDC Queue) using
a pre-defined XML format. Integration Module is responsible for        4.3     Overview of the Iteration 3
consuming XML change messages from CDC Queue. Change mes-              In this iteration, the development team selected a new ARC for
sages are converted to RDF triples and RDF store is updated ac-        monitoring goal. Also, an additional ARC was added which in-
cording to found triples. However, it was realized that Integration    cludes an external data source (owned by external land transporta-
Module should work concurently in a scalable environment, since        tion company) to the application, since customers wanted to ob-
these data sources produces approximately three million message        serve history of the transportation life cycle with an external com-
per day. In order to handle these message traffic, the development     pany. In the Architecture Identification activity, the development
18 http://en.wikipedia.org/wiki/Change_data_capture                    19 http://akka.io/
team decided to implement a hybrid architecture which uses Crawl-
ing pattern for the internal sources of ARKAS and Query Federa-
tion pattern for the external land transportation company, since cus-
tomer indicated that monitoring events of external company is not a
high priority task from business perspective. For handling monitor-
ing in architectural level, team thought to use another AKKA agent
organization with separate queue mechanism. In this architecture,
change events are transferred to new AKKA organization by the
first organization and new organization agent’s creates events by
applying monitoring rules. Team planned a new load test to verify
monitoring with maximum daily messages and decided to observe
the real systems’ CDC events to formulate the size of the load test.
    After the IRS activity, the application development team started
to design a visual interface for the monitoring goal in the design a
visual interface for the monitoring goal in the parallel with Archi-
tecture Identification activity. The application development team
worked on visual designs with the customer. In the Ontology Mod-
elling activity, a simple core land transportation ontology was gen-
erated which is similar to the relational database schema of external
company and test case(s) were generated. Also, the ontology mod-
elling team worked with the application developers for generation
of mock data in parallel. The application development team inte-
grated this mock data with visual interface while Linked Data Gen-
eration activity was implementing. Also, visual acceptance test(s)
were generated and added to the CI infrastructure. In the Linked
Data Generation activity R2RML mapping of the land transporta-                         Figure 7. Architecture of Third Iteration.
tion source was defined and RDF data of the company published
using an endpoint. Also, generated Linked Data was tested by ap-
plying previously generated test cases(s) and found errors were
fixed. After the Linked Data generation, the application develop-          ees). Observation of container life cycle features can be used by
ment team integrated visual application with real generated Linked         container, booking and bill of lading unique numbers. For instance,
Data and updated visual test(s).                                           Figure 8 shows history of a booking whose number is “B14170741”
    In this iteration, the application was shaped around to notify         in tree format (This number is correspondence of the resource
user when related transportation events are occurred and represent         whose URI is “http://data.arkas.com/general/booking/B14170741”
external land transportation history in addition to previously im-         and it is same in the RDF data of each participant company). More-
plemented user interface. In order to catch transportation events          over, the development team is comfortable to work with BLOB
the development team began to implement the Monitoring Module              methodology and ready to use it for other linked data project(s).
which includes the new AKKA organization. Monitoring Module                    According to the feedbacks of the employees, It is realized that
responsible for catching all events in the transportation life cycle       new ARCs are necessary to visually define new transportation rules
according to changed RDF triples that are explored by Integration          for monitoring of the transportation life cycle. In this interface,
Module. In this module, monitoring AKKA actors try to find af-             customers can add/remove/update transportation rules that they
fected transportation rules according to changes and serve them to         want to follow. Also, role based access management is needed,
user interface. User interface notifies users about transportation sta-    since each customer is interested in different information about the
tus in real time. Each change may be matched with multiple rules.          transportation. In the future of the project, these requirements will
Final architecture of the application is represented in Figure 7. At       be implemented.
the end of the iteration developed application verified in Valida-
tion&Verification activity. Acceptance test(s) and load test(s) were
evaluated with customer and product owner and iteration was fin-
ished successfully.
4.3.1    Lesson Learned
 • Integration of an external source(s) should be implemented in
     the early iterations of the application, since it affects architec-
     tural decisions and requires lots of effort to introduce linked
     data infrastructure in a new company.
5.      Conclusion and Future Works
Currently “Observation and Monitoring of Container Life Cycle”
application is used by customer operation unit of ARKAS hold-
ing. At the end of three iterations, the developed application has
generated ~300 millions RDF triples by integration agency, port
and warehouse data sources of ARKAS. Also, the application han-
dles ~3 millions change messages per day. Customer operation unit                               Figure 8. User Interface.
test observation and monitoring features of the application with 20
concurrent users (number of the customer operation unit employ-
6.    Acknowledgement                                                         [16] Hausenblas, M. Linked Data Life Cycles. 2011, Available at:
                                                                                   http://www.slideshare.net/mediasemanticweb/linked-data-life-cycles.
The authors wish to thanks Bimar Information Technologies Ser-
vices, ARKAS holding and their employees Guner Mutlu, Necmi                   [17] Fernandez-Lopez, M., Gomez-Perez, A. and Juristo, N. METHON-
Sentuna, Gokhan Daghan, Tugkan Tuglular, Burak Bilyay and                          TOLOGY: From Ontological Art Towards Ontological Engineering.
Kubra Parkın for their helps. Also, we wish to acknowledge Galak-                  In Proceedings of the AAAI Spring Symposium Series on Ontological
                                                                                   Engineering, 1997.
siya Information Technologies and Consultancy and its manager
Erdem Eser Ekinci for their contributions to the developed method-            [18] Pinto, H.S., Tempich, C. and Staab, S. Diligent: Towards a fine-
ology.                                                                             grained methodology for distributed, loosely-controlled and evolving
                                                                                   engingeering of ontologies. In Proceedings of the 16th European
                                                                                   Conference on Artificial Intelligence ECAI, 2004.
References                                                                    [19] De Nicola, A., Missikoff, M. and Navigli, R. A Software Engineering
 [1] Harleman, R. Improving the Logistic Sectors Efficiency using Service          Approach to Ontology Building.Information Systems 34(2), 258-275,
     Oriented Architectures (SOA). In 17th Twente Student Conference on            2009.
     IT, 2012.
                                                                              [20] Noy, N. F. and McGuinness, D. L. Ontology Development 101:
 [2] Nurmilaakso, J.M. Adoption of e-business functions and migration              A Guide to Creating Your First Ontology. 2001, Available at:
     from EDI-based to XML- based e-business frameworks in supply                  http://protege.stanford.edu/publications/ontology_development/ontology101-
     chain integration. International Journal of Production Economics              noy-mcguinness.html.
     113(2), 721-733, 2008.
                                                                              [21] Suarez-Figueroa, M., Gomez-Perez, A. and Fernandez-Lopez, M. The
 [3] Loutas, N. Case Study: How Linked Data is transforming eGov-                  NeOn Methodology for Ontology Engineering. Ontology Engineering
     ernment. European Commission ISA Programme, 2013, Avail-                      in a Networked World, 9-34, 2012.
     able at: http://joinup.ec.europa.eu/community/semic/document/case-
     study-how-linked-data-transforming-egovernment.                          [22] Meier, J.D., Homer, A., Taylor, J., Bansode, P., Wall,
                                                                                   L., Boucher, R. and Bogawat., A.              How To - De-
 [4] Frischmuth, P., Klímek, J., Auer, S., Tramp, S., Unbehauen, J.,               sign Using Agile Architecture. 2008,             Available at:
     HolzweiSSig, K. and Marquardt, C.M. Linked Data in Enterprise In-             http://apparch.codeplex.com/wikipage?title=How%20To%20-
     formation Integration.Semantic Web – Interoperability, Usability, Ap-         %20Design%20Using%20Agile%20Architecture&referringTitle=How%20Tos.
     plicability an IOS Press Journal, 2012.
                                                                              [23] Heath, T. and Bizer, C. Linked Data: Evolving the Web into a Global
 [5] Mihindukulasooriya, N., Garcia-Castro, R. and Gutiérrez, M.E.Linked           Data Space (1st edition). Synthesis Lectures on the Semantic Web:
     Data Platform as a novel approach for Enterprise Application Integra-         Theory and Technology, 1:1, 1-136. Morgan & Claypool, 2011.
     tion. In the Prooceedings of the 4th COLD Workshop, 2013.
                                                                              [24] Hartig, O., Bizer, C. and Freytag, J.C. Executing SPARQL queries
 [6] Hu, B. and Svensson, G. A Case Study of Linked Enterprise Data.
                                                                                   over theweb of linked data.In International Semantic Web Conference,
     In Proceedings of the 9th International Conference on The Semantic
                                                                                   pages 293-309, 2009.
     Web, 2010.
 [7] Brown, N., Nord, R., Ozkaya, I. Enabling Agility Through Architec-       [25] Dodds, L. and Davis, I. Linked Data Patterns. 2012, Available at:
     ture. Software Engineering Institute, 2010.                                   http://patterns. dataincubator.org/book/index.html.
 [8] Collier, K.W. Agile Analytics: A Value-Driven Approach to Business       [26] Guarino, N. and Welty, C.A. An Overview of OntoClean. Handbook
     Intelligence and Data Warehousing (1st ed.). Addison-Wesley Profes-           on Ontologies International Handbooks on Information Systems, 201-
     sional, 2011.                                                                 220, 2009.
 [9] Schwaber, K.Agile Project Management with Scrum.Microsoft Press,         [27] Presutti, V., Daga, E., Gangemi, A. and Blomqvist, E.eXtreme Design
     Redmond, WA, USA, 2004.                                                       with Content Ontology Design Patterns. In Proceedings of the Work-
                                                                                   shop on Ontology Patterns WOP, 2009.
[10] Beck, K. and Andres, C. Extreme Programming Explained: Embrace
     Change (2nd Edition). Addison-Wesley Professional, 2004.                 [28] Ozacar, T., Ozturk, O. and Unalir, M. O.ANEMONE: An environment
[11] Fraser, S., Beck, K., Caputo, B., Mackinnon, T., Newkirk, J. and              for modular ontology development. Data & Knowledge Engineering
     Poole, C. Test driven development (TDD). In Proceedings of the 4th            70(6), 504–526, 2011.
     international conference on Extreme programming and agile processes      [29] Sequeda, J., Priyatna, F. and Villazon-Terrazas, B.Relational Database
     in software engineering (XP’03), Michele Marchesi and Giancarlo               to RDF Mapping Patterns. In Proceedings of the Workshop on Ontol-
     Succi (Eds.). Springer-Verlag, Berlin, Heidelberg, 459-462, 2003.             ogy Patterns WOP, 2012.
[12] Campos, J., Arcuri, A., Fraser, G. and Abreu, R. Continuous test gen-    [30] Becker, M. and Smith, S. F. An ontology for Multi-Modal Transporta-
     eration: enhancing continuous integration with automated test genera-         tion Planning and Scheduling. Carnegie Mellon University Technical
     tion. In Proceedings of the 29th ACM/IEEE international conference            Report, 1997.
     on Automated software engineering (ASE ’14). ACM, New York, NY,
     USA, 55-66, 2014.                                                        [31] Lian, P., Park, D. and Kwon, H. Design of Logistics Ontology for Se-
                                                                                   mantic Repre- senting of Situation in Logistics. In the Second Work-
[13] Auer, S., Bühmann, L., Dirschl, C., Erling, O., Hausenblas, M.,               shop on Digital Media and its Application in Museum & Heritages,
     Isele, R., Lehmann, J., Martin, M., Mendes, P. N., Van Nuffelen, B.,          432-437, 2007.
     Stadler, C., Tramp, S. and Williams, H. Managing the life-cycle of
     linked data with the LOD2 stack. In Proceedings of the 11th interna-     [32] Hoxha, J., Scheuermann, A. and Bloehdorn, S.An Approach to Formal
     tional conference on The Semantic Web - Volume Part II (ISWC’12),             and Semantic Representation of Logistics Services. In Proceedings of
     Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache,            the Workshop on Artificial Intelligence and Logistics (AILog) at the
     and Jérôme Euzenat (Eds.), Vol. Part II. Springer-Verlag, Berlin, Hei-        19th European Conference on Artificial Intelligence (ECAI), 2010.
     delberg, 1-16, 2012.                                                     [33] Scheuermann, A. and Hoxha, J.Ontologies for Intelligent Provision of
[14] Villazon-Terrazas, B., Vilches-Blazquez, L., Corcho, O. and Gomez-            Logistics Ser- vices. In Proceedings of the 7th International Confer-
     Perez, A.Methodological guidelines for publishing government linked           ence on Internet and Web Applications and Services, 2012.
     data linking government data. Linking Government Data, 27–49,            [34] Preist, C., Esplugas-Cuadrado, J., Battle, S.A., Grimm, S. and
     2011.                                                                         Williams, S.K. Auto- mated Business-to-Business Integration of a Lo-
[15] Hyland, B. and Wood, D.The Joy of Data - A Cookbook for Publishing            gistics Supply Chain Using Seman- tic Web Services Technology. In
     Linked Government Data on the Web. Linking Government Data, 3-                Proceedings of the 4th International Conference on the Semantic Web,
     26, 2011.                                                                     2005.
[35] Dalmolen, S., Cornelisse, E., Moonen, H. and Stoter, A. Cargo’s
     Digital Shadow - A Blueprint to Enable a Cargo Centric Information
     Architecture. In eFreight Con- ference, 2012.
[36] Ahn, S. B. Container tracking and tracing system to enhance global
     visibility. In Proceedings of the Eastern Asia Society for Transporta-
     tion Studies, vol. 5, 1719 - 1727 pp, 2005.
[37] Bezerra, C., Freitas, F., Santana, F. Evaluating Ontologies with Com-
     petency Questions. Web Intelligence (WI) and Intelligent Agent Tech-
     nologies (IAT), IEEE/WIC/ACM International Joint Conferences, vol
     3, 284-285 pp, 2013.
[38] S.L. Ting, L.X. Wang, W.H. Ip. A study of RFID adoption for vehicle
     tracking in a container terminal. Journal of Industrial Engineering &
     Management, Vol. 5 Issue 1, 22 p, 2012.
[39] J. K. Siror, G. Liang, K. Pang, H. Sheng and D. Wang.Impact of RFID
     Technology on Tracking of Export Goods in Kenya.JCIT 5(9),190-199
     pp, 2010.