Bringing Agility into Linked Data Development: ∗ An Industrial Use Case in Logistics Domain Pinar Gocebe Oguz Dikenelli Nuri Umut Kose Ege University BIMAR Information Technologies gocebepinar@gmail.com umut.kose@bimar.com.tr odikenelli@gmail.com Abstract of the discharging operation at the port can be very critical for the Logistics is a complex industry where many different types of com- roles that are participated in the process. panies collaborate in order to transport containers to the last point. This paper introduces an implemented architecture based on One of the most important problem in logistics domain is obser- Linked Data infrastructure for the well known logistics problem vation and monitoring of container life cycle where each step of “Observation and Monitoring of Container Life Cycle”. Applica- the container transportation may be performed by different com- tion is developed within the ARKAS Holding which is one of pany. Thus, observing and monitoring of the container’s life cy- Turkey’s leading logistics and transportations company. It oper- cle in real time become a challenging engineering task. In this re- ates in different fields such as sea, land, rail, air transportation, ship search, Linked Data development infrastructure has been used to operations and port operations. Therefore, executing “Observation implement dynamic container observation and monitoring system and Monitoring of Container Life Cycle” problem in real time is a for ARKAS company which is the leading logistics company in challenging engineering task since all sub-transportations may run Turkey. During the development of the system, it has been observed in parallel on different software systems of different companies. that agile practices like feature/story oriented development, test first The end goal is to have managers and customers be able to track development and usage of Agile Architecture approach improves the container transportation life cycle. the product and project management quality. So, a new methodol- In the last decade, EDI-based standards (EDIFACT, RosettaNet, ogy has been proposed based on these practices for Linked Data STEP, AnsiX12), XML standard and Service Oriented Architecture development. (SOA) approaches are used for solving the integration problems of logistics industry[1, 2]. These standards provide common syntax Keywords Linked Data Development Methodology, Agile Ana- for data representation. SOA provides an application integration in- lytics, Agile Architecture frastructure between different companies via web services. In the EDI-based standards, messages are pushed among the organiza- 1. Introduction tions on a predefined time and these messages are translated into suitable format for receiver or sender organization in order to pro- Logistics is a complex industry where many different types of roles vide communication. However, these technologies are not sufficient such as shipping agency company(ies), port/ship operator(s), land, to ultimately solve the integration challenges in large enterprises. In rail and air transportation companies collaborate in order to trans- the SOA approaches, the most important problem is connectivity port a container to a destination. These companies require a collab- [3]. Identifiers (ID) of the data which are stored in the database are oration infrastructure in order to resume the whole transportation acknowledge inside of the systems and they lose their meaning in process seamlessly. This collaboration infrastructure should pro- the other systems. Finding the operation of a web service that will vide an integration environment for the information systems of the be called by the identifier is constructed into the software appli- companies in order to manage all sub-transportations and needs to cation logic. This is elaborated application logic when considering be open in a sense that it should to be easy to be able to add/remove the complexity of the logistics industry. EDI-based standards are company(ies) into the process. In addition to the integration of the not suitable for real-time applications since data may be outdated information systems, monitoring of the process is also critical for when information is updated and this is not directly send in an mes- the effective management of the process. For instance, important sage. Also, too many conversion is needed when organizations are events like delay in the completion of the land transport, starting used different types of EDI formats. Linked Data infrastructure seems like an appropriate technical ∗ This work is funded by the Republic of Turkey Ministry of Science, solution for the requirements of the logistics industry, since it pro- Industry and Technology vides an integration environment which is more flexible, extensible and open to the exterior when necessary. Linked Data standards and infrastructure are prevalently used to integrate enterprise in- formation and business processes[4–6]. In Linked Data based in- tegration, identifiers (URI) is not only known in system-wide, but also known in web-wide and the data which is represented with these URIs is reachable from HTTP protocol. Thus, all data sources within the company and/or on the web can be connected with each other creating a huge knowledge base and software systems can use this knowledge base independently from each other. Therefore, Copyright is held by the author/owner(s). WWW2015 Workshop: Linked Data on the Web (LDOW2015). Linked Data technologies propose a new solution to dynamic, dis- tributed and complex nature of the “Observation and Monitoring of Events of the container transportation in different work areas Container Life Cycle” problem specifically and logistics domain in should be monitoring and customers should be informed about general. transportation status. During the development of the aforementioned “Observation and Monitoring of Container Life Cycle” application, the devel- 3. Overview of the Methodology opment team defined a development methodology. Since, the de- velopment team has a long time experience in Agile Development, It is well understood that agility is the good approach to cope with the proposed methodology brings agile practices into the Linked changing business and architectural requirements in software de- Data development. The Methodology called as BLOB(A Method- velopment. Not only software development, but also business in- ology to Bring Agility into Linked Open Data Development for telligence and analytics implementations benefit the agile style of Businesses) has evolved through the iterations of the development. development as proposed in [8]. The proposed methodology takes Its final version which is overviewed within the paper has following some practices from agile software development, agile analytics contributions: like Scrum[9] and XP[10] such as feature/story oriented devel- opment, test first approach[11], customer involvement and con- • Feature/story oriented Linked Data development. tinuous integration[12]. Scrum infrastructure is used to managed project with self-organized team dynamics and iterative life cycle. • Introducing Agile Architecture[7] approach to Linked Data de- Linked Data environment changes constantly by occurenses of new velopment. data sources, new links and changes in ontologies. This highly dy- • Applying test first approach to Linked Data development. namic environment may cause changes in business and architec- tural requirements. Thus, agile practices used within the linked data At the moment, “Observation and Monitoring of Container Life methodoloy makes the methodology suitable for highly dynamic Cycle” application is operational and tested by customer operation environment of Linked Data application development. unit of ARKAS holding. The paper introduces how the application Linked Data development is very young domain where criti- and its architecture evolved through the iterations of the proposed cal tools and architectural patterns are constantly evolving. Also, methodology. changes in business requirements and/or Linked Data environment may affect the initial architectural assumptions. Thus, development team should observe the evolution and the performance of the ar- 2. The Problem : Observation and Monitoring of chitecture throughout the development. Observation of the arhitec- Container Life Cycle ture evaluation throughout the methodology is clearly defined in Agile Architecture approach [7]. As defined in Agile Architecture In the logistics industry, customers’ loads are transported to a final approach, the proposed methodology sets the architectural style and destination from a start location within containers. Through this evaluation criterias at the beginning of the each iteration and vali- transportation, containers are proccessed in a variety of work ar- dates the architectural assumptions at the end of the iteration. eas such as port, warehouse, land, rail and air. For instance, let us Linked Data development requires some specific tasks as de- consider a company that wants to send its load to customers in Mu- fined in various Linked Data development methodologies[13–16]. nich/Germany from Manisa/Turkey. This company is aggreed with Also, Ontology Modelling literature has a long history with many a forwarder company for the transportation. This transportation is proposed and used methodologies[17–21]. The proposed method- planned as a four-stage; a land transportation from Manisa to Izmir ology takes the required tasks from Linked Data development and Alsancak port, a maritime transportation from Izmir Alsancak port Ontology Modelling approaches and combines them with agile to Piraeus port of Athens, a maritime transportation from Pireaus to practices within a iterative life cycle. The methodology is evolved Venice port and a land transportation from Venice port to Munich. through the four iterations of the application development which Also, it takes approximately 10 days and there is not an interface to took more than a year and each iteration contains small sprints get information about the transportation. which took between 2 and 4 weeks. During the iterations, it is Throughout the transportation, customers want to learn exact clearly understood that we need to define test first approach by status and position of the transported loads and this problem is identifying test case modelling and testing points in the life cycle. dealed in wireless network and RFID studies[36, 38, 39] from the Another critical observation is the neccessity of parallel execution viewpoint of hardware. But, these studies is related with only posi- of Linked Data Environment Implementation and Application De- tion and status of the cotainers. They are not interested with links velopment cycles. Linked Data Environment Implementation cycle of container with other concepts in the domain. In our case, cus- includes all the activities related with data perspective. On the other tomers only takes information by calling customer operation unit hand, Application Development cycle includes software develop- of ARKAS. All transportation companies use their special informa- ment activities on the top of the generated linked data. In Figure1, tion technologies and infrastructures. Furthermore, there is not an inner cycle of the methodology represents the Application Devel- integration environment between the systems of these companies. opment cycle and the outher one represents Linked Data Environ- Any latency in the transportation process is affected all other re- ment Implementation cycle. lated transportations. Therefore, transportation must be constantly monitored by directly calling companies in charge and this brings 3.1 Analysis too much operational load to customer operation unit employees. In the analysis activity, application requirements are identified and In this research, we aim to solve the “Observation and Monitoring managed by product owner by incorporating with necessary project of Container Life Cycle” problem of the logistics industry by us- stakeholders like customer role(s) and development team mem- ing Linked Data infrastructre. For this purpose, following two main ber(s). As a first step of the analysis, main goals of the applica- requirements of the industry are performed; tions are identified and for each main goal new user stories are de- 1. Providing an Integration Environment fined to satisfy the goal. The critical aspect of the analysis phase from the linked data perspective is the identification of the data Transportation history of the container which is distributed into sources that require for the user story. The data sources are iden- the different software systems should be integrated. tified by development team and attached to the “Application Re- 2. Monitoring Containers quirement Card” (ARC). The second critical differences from the • User Stories: Senarios of use from the viewpoint of end-users in order to implement the defined feature of the goal. • Competency Questions: Questions to validate the story. 3.2 Iteration Requirement Specification Iteration Requirement Specification (IRS) is a planning activity. ARCs that are maintained in the ARC backlog are prioritized ac- cording to their business value and included data sources. Consid- ering the data sources in story prioritization is critical in terms of Linked Data development. If more than one sources are included in the story, this situation affects the Linked Data Generation, Link- ing and also Architecture Identification activities. Properties of core Linked Data architecture depends on publishing and integration de- cisions on different data sources. Thus, inclusion of more than one data source is critical in terms of establishing and evaluating core architecture in early iterations. Thus, at the end of the IRS activity development team decides the stories for the iteration depending on the business and architectural perspective. 3.3 Linked Data Environment Implementation Cycle 3.3.1 Architecture Identification Architecture Identification activity is first step of the architectural agility. Firstly, user stories of the ARC(s) that are selected in the previous activity are analyzed to identify the architectural pattern(s) depending on the architectural requirement of the application. From test first perspective, test planning for architecture evaluation are defined in this activity. Architecture evaluation is conducted by two levels of testing. The first level focuses on the retrieving perfor- mance of generated Linked Data and the other level focuses on the evaluation of selected quality attibutes[22] like performance, scalability, availability etc for the final application. The first level is applied in the Linked Data Generation and/or Linking activity and second level is applied in Validation&Verification activity. At this point, data retrieving criterias, quality attributes and their ex- pected boundaries are identified and documented as initial architec- ture evaluation test plan. There are three well known architectural patterns for consuming Figure 1. BLOB: A Methodology to Bring Agility into Linked Linked Data according to application requirements: the On-The- Open Data Development for Businesses Fly Dereferencing pattern, the Crawling Pattern and the Query Fed- eration Pattern [23]. On-The-Fly Dereferencing pattern conceptual- izes the web as graph of documents which contains dereferanceble classical user story definition is the competency question section URIs. Thus, an application executes a query by accessing a RDF of the card. Competency questions is the well known approach in file by dereferencing the URL address then follows the URI links the ontology development literature [17, 19–21] in order to limit by parsing the received file on-the-fly[24]. In the Crawling Pattern, the scope of the ontology and validate the developed ontology [37]. web of data is constantly crawled by dereferencing URLs, follow- In our case, competency questions are drived by considering the ing links and integrating the discovered data on the local site. Query linked view of the data sources and validation of the user stories Federation Pattern is based on dividing a complex query into sub- through the this linked view. These competency questions are the queries and distributing sub-queries to relevant datasets. Query fed- main sources of the user story validation test cases. Analysis activ- eration requires accessing datasets via SPARQL endpoints in order ity is not part of the iterations and can be executed when necces- to execute sub-queries on distributed data sources. sary. Product owner observes the evolution of the implementation, All of them have disadvantages and advantages while mak- collaborates with customer constantly and may define new goals, ing architectural decision should take into account. In On-The-Fly refined existing goals and/or define new ARC(s) for new or exist- Dereferencing pattern, complex operations are very slow because ing goals according to situation. These ARC(s) are maintained in of dereferencing thousands of URIs in the background, but stale the ARC backlog. data is never processed. The main advantage of the crawling pat- The ARCs are defined for each story including the following tern is performance. Applications can use high volume of inte- parts; grated data in much higher performance than other patterns. On the • ID: Identifier of the ARC other hand, the main disadvantage of this pattern is data staling and complexity of automatic linking of data on the fly. Query Federa- • Application Goal: Includes intended use of the application. It tion pattern enables applications to work with current data without lends assistance to draw the boundaries of the application. needing to replicate complete data sources locally. On the other • Selected Data sources the story: Data sources that will be con- hand, the main problem of this pattern is performance of the com- verted to Linked Data and be consumed by the application. plex queries especially when query needs to join data from large They can be relational databases, documents, web pages etc. number of data sources. Also, there are a wide range of Linked Data Design Patterns Sindice2 , Watson3 and so on. If there is a ontology that overlaps for modelling, publishing and consuming in the literature [25]. De- outputs of conceptualizing activity, this ontology is taken as input velopment team or data publishers can be use appropriate pattern to the following activities. or mixture of these patterns. However selection of these patterns If there is not a suitable ontology, Building Concept Hierarchy is related with different factors that affect architectural decisions sub-activity is taken place. In this activity, ontology engineers vali- such as number of data sources, data freshness level, application re- date taxonomies of terms in the Domain-Property-Range Table ac- sponse time and ability to discover new sources at runtime. Devel- cording to OntoClean Methodology [26]. After the hierarchy vali- opment team makes architectural identification decision(s) based dation, Design-Pattern Implementation sub-activity uses the ontol- on the selected quality attributes and all of these factors. ogy design patterns in the literature[27] and structure of the ontol- ogy is improved based on the selected pattern(s). Finally, the ontol- 3.3.2 Ontology Modelling ogy is implemented with a formal language such as RDFs, OWL In the Ontology Modelling activity, concepts of domain and rela- by using capabilities of an ontology development environment. For tionships between these concepts are modelled and implemented example, TopBraid Composer4 , Protege5 , WebProtege6 . in a formal language. Ontology Modelling activity is inspired from the following literature [17–21, 26–28]. This activity is composed Integration/ Modularization of Conceptualizing, Ontology Implementation, Integration/ Modu- Main goal of the Integration/ Modularization activity is achieving larization and Test Case Generation sub-activities. Figure 2 shows reuse, maintainability, and evolution for large ontologies. Inspiring that ontology modelling lifecycle. from the ANEMONE[28], we examined the conceptual links be- tween concepts which are generated in the previous iterations and the concepts of this iteration. After that, ontology is divided into modules or common concepts of modules are integrated into a new ontology module. Test Case Generation Ontology Modelling activity focuses on the data sources that de- fined in the ARC(s) and generates the metadata part of the ontol- ogy for the focused data source. This activity also identifies linking requirement(s) between the ontologies in metadata level. At this point, it is possible to define test cases in the ontological level to validate consistency and competency of the developed ontolo- gies and the linking requirement(s). The main source to define test cases is the competency questions that defined in Analysis activity. These questions are refined based on the knowledge of the devel- oped ontologies and linking requirement(s). Also, new competency questions may be added , if they are needed. These competency questions are transfered to the real SPARQL queries to validate that developed ontology satisfies the execution of the competency ques- Figure 2. Ontology Modelling Life Cycle. tions. These queries are saved as an ontological test cases for the ARC(s) at hand. Conceptualizing 3.3.3 Linked Data Generation Conceptualizing is a common activity in any ontology modelling methodology[17–19]. Main goal of the Conceptualizing activity is Linked Data generation activity is related to generate linked data construction a conceptual model of domain knowledge. Domain from selected data sources according to the ontology model(s). experts and ontology engineers analyze data sources of ARC(s) The data generation process differs according to being whether which are selected in the IRS phase and prepare a rough list of data sources structured (e.g. databases), semi-structured (e.g. XML, terms, then the terms which are out of the goal scope that belongs XLS, CVS, etc.), or un-structured (e.g. HTML, XHTML, etc.). to the selected ARC(s) are eliminated. Ontology engineers and Structured data is mapped directly to the ontologies via RDB2RDF domain experts prepare a Lexicon of Terms document that contains convertors as D2RQ7 or Ultrawrap8 . Also, “RDB2RDF Mapping a list of domain concepts and then create a Controlled Lexicon Patterns”[29]can be used in creation of R2RML9 mappings that with explanations of the concepts. These explanations lend to find are used by RDB2RDF converters. Semi-structured data sources additional concepts. Also, Domain-Property-Range and Concept- are proccesssed by using toolsets such as tripliser10 or Google Re- Instance tables are prepared in order to simplify following ontology fine RDF Extension11 that allow to convertion according to par- modelling sub-activities. The former table represents relationships 2 http://sindice.com/search between source and target terms and the latter represents instances 3 http://watson.kmi.open.ac.uk/WatsonWUI/ of concepts. 4 http://www.topquadrant.com/tools/modeling-topbraid-composer- Ontology Implementation standard-edition/ 5 http://protege.stanford.edu/ Ontology Implementation activity defines a formal ontology model from the defined conceptual model and implement it with a on- 6 http://protegewiki.stanford.edu/wiki/WebProtege tology modelling language. First sub-activity of the Ontology Im- 7 http://sw.cs.technion.ac.il/d2rq/tutorial plementation step is Reusability. Reusability aims to reuse known 8 http://capsenta.com/#section-ultrawrap and accepted ontologies on the web. Ontology engineers try to find 9 http://www.w3.org/TR/r2rml/ an ontology in semantic web search engines such as Swoogle1 , 10 http://daverog.github.io/tripliser/ 1 http://swoogle.umbc.edu/ 11 http://refine.deri.ie/ ticular instructions. Except for RDF converters, Natural Language team implement visual test automation scripts for each senario and Processing (NLP) methods such as tagging can be used to acquire integrate these senarios into CI infrastructure. data from un-structured data sources. Test case(s) to validate the ontologies are generated in the pre- 3.4.4 Integration with Real Data vious activity (Test Case Generation sub-activity of the Ontology In the final sub-activity of application development cycle, visual Modelling). At this point, instance data is generated for the devel- design and data integration layer of the visual design are connected oped ontologies. Thus, it is possible to verify that real data sources with real generated Linked Data sources. Final implementation is are correctly transformed to the ontological instances. For this tested with visual test scripts and whole application becomes ready purpose, test automation script(s) that uses generated test case(s) for Validation&Verification activity. (SPARQL queries) are written to verify that expected results is equals to the results of the SPARQL queries and these script(s) are 3.5 Validation & Verification included into Continuous Integration (CI) infrastructure. Visual test automation script(s) and architecture evaluation test plan(s) are inputs of the Validation&Verification activity. All visual 3.3.4 Linking application test scripts are evaluated with customer and develop- One of the most important principles of Linked Data is Linking. In ment team together to validate that these script(s) cover all ARC(s) this activity, connections are established between data sources in requirements in the iteration. Then, architecture evaluation test(s) manually or automatically. For this purpose, SILK12 and LIMES13 are created according to architecture evaluation test plan(s). The link discovery frameworks can be used in order to automatise this iteration is ended when all the defined tests are passed. process. If unstructured sources like text will be linked, tools such as Dbpedia Spotlight14 can be used. Also, another method is using 4. Overview of the Methodology Implementation same URIs at ontological level to establish links manually between sources. In the Analysis activity, two main goals were identified by stake- Similar to Linked Data Generation activity, in this activity test holders as “Monitoring of the Container Transportation” (mon- automation script(s) of the SPARQL queries that contains linking itoring goal) and “Observation of the Container Transportation requirements are written and integrated into CI infrastructure. History” (observation goal). During the discussion, it was under- stood that primary customer for the system is customer operation 3.4 Application Development Cycle unit employees. At this point, user stories were generated from the customer perspective. For the observation goal, stories generally fo- 3.4.1 Initial Visual Development cused on knowledge about the sub-transportations in the container Linked Data visualization can be a complex and time consuming transportation life cycle. For instance, a user story was defined as task depending on the size of the visualized content. So, it is critical “As a customer, I want to learn history of the specific booking in a to start the application development cycle at the beginning of the known port”. When stories were analyzed, it was seen by the devel- iteration. Firstly, a visual interface draft is identified considering the opment team, many stories required knowledge from different data ARC(s) requirements. Then, team evaluates the different layouts sources. For example, to drive booking knowledge in a known port, of the selected visualization infrastructure(s) and produces new knowledge of the agency and port data sources were needed. For interface design examples for the working ARC(s). These examples the monitoring goal, stories generally define the monitoring rules are discussed with the customer and then initial visual design is like “As a customer, I want to learn specific container is loaded to decided for the iteration. a specific ship”. Thus, main senarios of these goals, their compe- tency questions and data sources were defined as ARC(s) and added 3.4.2 Mock Data Generation to the backlog. Figure 3 is represented an example ARC. Application does not only include user interface design compo- nents, it also includes data integration layer to handle the data com- ing from the Linked Data side. Real Linked Data is generated at the end of the Linking sub-activity of the Linked Data Environ- ment Implementation cycle. But, application development should not wait till linked data generation and should continue seamlessly. Thus, development team participate into Ontology Modelling activ- ity and/or cooperate with the ontology modelling team to develop mock linked data repository as the ontologies occures. These mock repositories give chance to work on the data integration layer of the visual design and also improve the visual components further. 3.4.3 Integration with Mock Data In this activity, visual design and data integration layer of the design is completed by using generated mock repositories. Since, all visualization is fully functional development team can start to define accaptence test senarios. Competency questions defined in Analysis activity and refined in the Ontology Modelling activity Figure 3. Application Requirement Card for First Iteration. are used to shape acceptance test senarios. Since, these senarios are defined considering the implemented visual design, developer 4.1 Overwiew of Iteration 1 12 http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/ In the IRS activity, customer indicated that observation goal is 13 http://aksw.org/Projects/LIMES.html more urgent for their daily operations. Thus, it was understood 14 https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki that ARC(s) of these goal has higher priority. The development team outlined the criticality of the agency and port data sources, generated by applying the previously defined ontology modelling since these sources contains major part of the whole container activities. transportation life cycle. Therefore, ARC(s) included these sources In the Linked Data Generation activity, data sources were con- were selected for this iteration. verted into RDF format by using an RDB2RDF tool according to The development team proceeded to Architecture Identification developed ontologies and published in a local server. These conver- activity and decided to use Query Federation pattern to avoid the tions were defined in R2RML language by applying mapping pat- stale data problem. Since, agency and port data sources are stored terns in [29]. After the Linked Data generation of port and agency in different relational databases, the development team decided to sources, the development team immediately started to Linking ac- use RDB2RDF tool for Linked Data generation as seen in Figure tivity(without any testing of individual sources). Since, there are 3. After architecture evaluation test plan(s) were defined, federated unique fields in the agency and port systems such as booking, con- query response time (max 0.5 second) was taken as a base criteria to tainer and bill of lading numbers, we did not use an automatic link evaluate retrieving performance of generated Linked Data. Also, a discovery tool. Linking of these sources were done by making URIs load test (max 1second response time for each concurrent 20 users) of them same in ontological level. was planned to evaluate performance and scalability of the final Since, reponse time of the federated query execution was se- application. lected as Linked Data generation evaluation criteria in Architecture Identification activity. The development team started to rework on competency questions of the selected ARC(s), revised existing ones and added new questions. Then, these questions were converted to SPARQL queries (some minor changes required in ontology model) and executed over the generated linked data. Unfortunately, architecture did not satisfy the identified performance limit in the Architecture Identification activity. The development team tried to improve query response time by using different toolsets such as Ultrawrap, D2RQ and Oracle Spatial and Graph17 . Also, ontology model(s) were changed to simplfy the complex mapping(s). But, query performance was still away of expected 0.5 second and on- tology structure became unrealistic from domain perspective. Thus, the development team decided to terminate the iteration and began to new one. 4.1.1 Lesson Learned • Since, learning of Linked Data development knowledge set takes time and prevent the smooth flow of activities within the iteration, plan an education for the development team about the Linked Data technologies including ontology modelling, R2RML mappings, SPARQL (as a minimum set). • Testing of Linked Data generation is conducted at the end of Linking activity. Some SPARQL queries require changes in ontologies and causes to return Ontology Modelling activity again. So, as a general rule develop test cases and execute tests when neccessary knowledge is ready. In this case, comptency questions can be revised and SPARQL queries can be gener- Figure 4. Architecture of First Iteration. ated at the end of the Ontology Modelling activity (method- ology cycle was revised according to this observation). Also, Ontology Modelling activity started after the Architecture Iden- seperate these test cases for single data source(s) and federation tification. After the initial conceptualizing effort, the development in Ontology Modelling activity and apply them in Linked Data team began to search logistics ontologies that proposed in the lit- Generation and Linking activities seperately. erature. One of these studies aims to plan military transportation domain with different transportation types[30]. However, this study • Integrating architecture identification and evaluation of the have not scrutinize intimately the transportation field and focuses identified architecture within the development life cycle is a on planning and scheduling. In study [31], events and actions that good idea. are occured in the logistics processes of enterprises are defined, but transportation process is not detailed. In[32–34], an OWL ontol- 4.2 Overview of the Iteration 2 ogy for the formal representation of logistics services defined and After experiencing the limitations of the Query Federation Pattern, OWL-S usage is examplified for service oriented logistics opera- the development team started to this iteration with a clear goal of tions. iCargo15 and CASSANDRA16 projects are tried to integrate changing the Linked Data architecture. Also, the development team data which comes from different sources and whereupon require- selected a new ARC for observation goal which includes warehouse ments such as transportation risk assessment and energy conserva- data source. This selection aimed to implement whole methodology tion are performed[35]. However, the proposed ontologies is not life cycle and improve experiences and apply the learned lessons. available for public use. Thus, the development team decided to In the Architecture Identification activity, the development team de- develop required ontologies from scratch. To this end, in this step cided to use Crawling Pattern which crawls internal data sources of the ontologies which represent port and agency data sources were the company to solve the observed performance problems. At this point, solving stale data problem of the Crawling Pattern became 15 http://i-cargo.eu/ 16 http://www.cassandra-project.eu 17 http://www.oracle.com/technetwork/database/options/spatialandgraph a problem. For this purpose, the development team decided to use team decided to use AKKA infrastructure19 whis is highly scal- a Change Data Capture(CDC)18 tool and transforming each event able event driven agent system to manage concurrency, parallelism that comes from CDC tool to the RDF instances of the ontolo- and fault tolerance. Each update actor in the Integration Module is gies. The development team decided to choose high performance an AKKA agent which handles XML messages in the queue and commercial CDC tool and it was planned to develop a Integration converts them to the RDF triple and updates RDF store. AKKA Module for transforming CDC events. Also, an open source queue infrastructure of the Integration Module is represented in Figure 6. implementation was selected to synchronize the CDC tool and In- In the Linking activity, linking was not required because of the tegration Module. It was also decided to store generated RDF in establishing links in the ontological level. Thus, only correctness of a central RDF store. General architectural view of this iteration is the links between warehouse RDF data and previously generated shown in 5. There was no change in the evaluation criteria of the RDF data were tested in this activity. architecture. Thus, architecture evaluation test plan was kept same. Figure 5. Architecture of Second Iteration. Figure 6. AKKA Infrastructure of the Integration Module In the Ontology Modelling activity, the development team After the successful verification of the generated Linked Data, started to define warehouse ontology model considering previous the development team started to build a visual interface for observa- experiences. Thus, test cases were generated at the end of this ac- tion of the container transportation. According to discussions with tivity. The development team revised and extended competency the customer, the development team identified a visual interface de- questions of the selected ARC and tried to write SPARQL version sign and select a suitable visualization tecnique and tool. But, im- of these questions in order to validate adequateness of the gen- plementation of this interface took too long and it was not finished erated ontology model. After the successfully defining SPARQL in the planned iteration time limits. queries of competency questions, the development team is started 4.2.1 Lesson Learned to Linked Data Generation activity. In this activity, R2RML map- ping of the warehouse source was created and all converted RDF • Generation of test cases and making tests in the Ontology Mod- is stored to central RDF store. Then, generated warehouse data elling, Linked Data Generation and Linking activities are good was tested by using generated test cases. Also, agency and port idea. data were converted using previously defined mappings and stored • Implementing a visual interface after the Linked Data gener- in the central RDF store. In order to solve stale data problem, ation causes delay in product delivery and decreases motiva- the development team started to implement an Integration Mod- tion of the project stakeholders. Thus, application development ule. In this module, CDC tool handles changes that occur in the should be a parallel process with Linked Data generation (Ap- internal relational data sources simultaneous without overload. It plication Development cycle is added to the methodology). sends each change to a java message queue (CDC Queue) using a pre-defined XML format. Integration Module is responsible for 4.3 Overview of the Iteration 3 consuming XML change messages from CDC Queue. Change mes- In this iteration, the development team selected a new ARC for sages are converted to RDF triples and RDF store is updated ac- monitoring goal. Also, an additional ARC was added which in- cording to found triples. However, it was realized that Integration cludes an external data source (owned by external land transporta- Module should work concurently in a scalable environment, since tion company) to the application, since customers wanted to ob- these data sources produces approximately three million message serve history of the transportation life cycle with an external com- per day. In order to handle these message traffic, the development pany. In the Architecture Identification activity, the development 18 http://en.wikipedia.org/wiki/Change_data_capture 19 http://akka.io/ team decided to implement a hybrid architecture which uses Crawl- ing pattern for the internal sources of ARKAS and Query Federa- tion pattern for the external land transportation company, since cus- tomer indicated that monitoring events of external company is not a high priority task from business perspective. For handling monitor- ing in architectural level, team thought to use another AKKA agent organization with separate queue mechanism. In this architecture, change events are transferred to new AKKA organization by the first organization and new organization agent’s creates events by applying monitoring rules. Team planned a new load test to verify monitoring with maximum daily messages and decided to observe the real systems’ CDC events to formulate the size of the load test. After the IRS activity, the application development team started to design a visual interface for the monitoring goal in the design a visual interface for the monitoring goal in the parallel with Archi- tecture Identification activity. The application development team worked on visual designs with the customer. In the Ontology Mod- elling activity, a simple core land transportation ontology was gen- erated which is similar to the relational database schema of external company and test case(s) were generated. Also, the ontology mod- elling team worked with the application developers for generation of mock data in parallel. The application development team inte- grated this mock data with visual interface while Linked Data Gen- eration activity was implementing. Also, visual acceptance test(s) were generated and added to the CI infrastructure. In the Linked Data Generation activity R2RML mapping of the land transporta- Figure 7. Architecture of Third Iteration. tion source was defined and RDF data of the company published using an endpoint. Also, generated Linked Data was tested by ap- plying previously generated test cases(s) and found errors were fixed. After the Linked Data generation, the application develop- ees). Observation of container life cycle features can be used by ment team integrated visual application with real generated Linked container, booking and bill of lading unique numbers. For instance, Data and updated visual test(s). Figure 8 shows history of a booking whose number is “B14170741” In this iteration, the application was shaped around to notify in tree format (This number is correspondence of the resource user when related transportation events are occurred and represent whose URI is “http://data.arkas.com/general/booking/B14170741” external land transportation history in addition to previously im- and it is same in the RDF data of each participant company). More- plemented user interface. In order to catch transportation events over, the development team is comfortable to work with BLOB the development team began to implement the Monitoring Module methodology and ready to use it for other linked data project(s). which includes the new AKKA organization. Monitoring Module According to the feedbacks of the employees, It is realized that responsible for catching all events in the transportation life cycle new ARCs are necessary to visually define new transportation rules according to changed RDF triples that are explored by Integration for monitoring of the transportation life cycle. In this interface, Module. In this module, monitoring AKKA actors try to find af- customers can add/remove/update transportation rules that they fected transportation rules according to changes and serve them to want to follow. Also, role based access management is needed, user interface. User interface notifies users about transportation sta- since each customer is interested in different information about the tus in real time. Each change may be matched with multiple rules. transportation. In the future of the project, these requirements will Final architecture of the application is represented in Figure 7. At be implemented. the end of the iteration developed application verified in Valida- tion&Verification activity. Acceptance test(s) and load test(s) were evaluated with customer and product owner and iteration was fin- ished successfully. 4.3.1 Lesson Learned • Integration of an external source(s) should be implemented in the early iterations of the application, since it affects architec- tural decisions and requires lots of effort to introduce linked data infrastructure in a new company. 5. Conclusion and Future Works Currently “Observation and Monitoring of Container Life Cycle” application is used by customer operation unit of ARKAS hold- ing. At the end of three iterations, the developed application has generated ~300 millions RDF triples by integration agency, port and warehouse data sources of ARKAS. Also, the application han- dles ~3 millions change messages per day. Customer operation unit Figure 8. User Interface. test observation and monitoring features of the application with 20 concurrent users (number of the customer operation unit employ- 6. Acknowledgement [16] Hausenblas, M. Linked Data Life Cycles. 2011, Available at: http://www.slideshare.net/mediasemanticweb/linked-data-life-cycles. The authors wish to thanks Bimar Information Technologies Ser- vices, ARKAS holding and their employees Guner Mutlu, Necmi [17] Fernandez-Lopez, M., Gomez-Perez, A. and Juristo, N. METHON- Sentuna, Gokhan Daghan, Tugkan Tuglular, Burak Bilyay and TOLOGY: From Ontological Art Towards Ontological Engineering. Kubra Parkın for their helps. Also, we wish to acknowledge Galak- In Proceedings of the AAAI Spring Symposium Series on Ontological Engineering, 1997. siya Information Technologies and Consultancy and its manager Erdem Eser Ekinci for their contributions to the developed method- [18] Pinto, H.S., Tempich, C. and Staab, S. Diligent: Towards a fine- ology. grained methodology for distributed, loosely-controlled and evolving engingeering of ontologies. In Proceedings of the 16th European Conference on Artificial Intelligence ECAI, 2004. References [19] De Nicola, A., Missikoff, M. and Navigli, R. A Software Engineering [1] Harleman, R. Improving the Logistic Sectors Efficiency using Service Approach to Ontology Building.Information Systems 34(2), 258-275, Oriented Architectures (SOA). In 17th Twente Student Conference on 2009. IT, 2012. [20] Noy, N. F. and McGuinness, D. L. Ontology Development 101: [2] Nurmilaakso, J.M. Adoption of e-business functions and migration A Guide to Creating Your First Ontology. 2001, Available at: from EDI-based to XML- based e-business frameworks in supply http://protege.stanford.edu/publications/ontology_development/ontology101- chain integration. International Journal of Production Economics noy-mcguinness.html. 113(2), 721-733, 2008. [21] Suarez-Figueroa, M., Gomez-Perez, A. and Fernandez-Lopez, M. The [3] Loutas, N. Case Study: How Linked Data is transforming eGov- NeOn Methodology for Ontology Engineering. Ontology Engineering ernment. European Commission ISA Programme, 2013, Avail- in a Networked World, 9-34, 2012. able at: http://joinup.ec.europa.eu/community/semic/document/case- study-how-linked-data-transforming-egovernment. [22] Meier, J.D., Homer, A., Taylor, J., Bansode, P., Wall, L., Boucher, R. and Bogawat., A. How To - De- [4] Frischmuth, P., Klímek, J., Auer, S., Tramp, S., Unbehauen, J., sign Using Agile Architecture. 2008, Available at: HolzweiSSig, K. and Marquardt, C.M. Linked Data in Enterprise In- http://apparch.codeplex.com/wikipage?title=How%20To%20- formation Integration.Semantic Web – Interoperability, Usability, Ap- %20Design%20Using%20Agile%20Architecture&referringTitle=How%20Tos. plicability an IOS Press Journal, 2012. [23] Heath, T. and Bizer, C. Linked Data: Evolving the Web into a Global [5] Mihindukulasooriya, N., Garcia-Castro, R. and Gutiérrez, M.E.Linked Data Space (1st edition). Synthesis Lectures on the Semantic Web: Data Platform as a novel approach for Enterprise Application Integra- Theory and Technology, 1:1, 1-136. Morgan & Claypool, 2011. tion. In the Prooceedings of the 4th COLD Workshop, 2013. [24] Hartig, O., Bizer, C. and Freytag, J.C. Executing SPARQL queries [6] Hu, B. and Svensson, G. A Case Study of Linked Enterprise Data. over theweb of linked data.In International Semantic Web Conference, In Proceedings of the 9th International Conference on The Semantic pages 293-309, 2009. Web, 2010. [7] Brown, N., Nord, R., Ozkaya, I. Enabling Agility Through Architec- [25] Dodds, L. and Davis, I. Linked Data Patterns. 2012, Available at: ture. Software Engineering Institute, 2010. http://patterns. dataincubator.org/book/index.html. [8] Collier, K.W. Agile Analytics: A Value-Driven Approach to Business [26] Guarino, N. and Welty, C.A. An Overview of OntoClean. Handbook Intelligence and Data Warehousing (1st ed.). Addison-Wesley Profes- on Ontologies International Handbooks on Information Systems, 201- sional, 2011. 220, 2009. [9] Schwaber, K.Agile Project Management with Scrum.Microsoft Press, [27] Presutti, V., Daga, E., Gangemi, A. and Blomqvist, E.eXtreme Design Redmond, WA, USA, 2004. with Content Ontology Design Patterns. In Proceedings of the Work- shop on Ontology Patterns WOP, 2009. [10] Beck, K. and Andres, C. Extreme Programming Explained: Embrace Change (2nd Edition). Addison-Wesley Professional, 2004. [28] Ozacar, T., Ozturk, O. and Unalir, M. O.ANEMONE: An environment [11] Fraser, S., Beck, K., Caputo, B., Mackinnon, T., Newkirk, J. and for modular ontology development. Data & Knowledge Engineering Poole, C. Test driven development (TDD). In Proceedings of the 4th 70(6), 504–526, 2011. international conference on Extreme programming and agile processes [29] Sequeda, J., Priyatna, F. and Villazon-Terrazas, B.Relational Database in software engineering (XP’03), Michele Marchesi and Giancarlo to RDF Mapping Patterns. In Proceedings of the Workshop on Ontol- Succi (Eds.). Springer-Verlag, Berlin, Heidelberg, 459-462, 2003. ogy Patterns WOP, 2012. [12] Campos, J., Arcuri, A., Fraser, G. and Abreu, R. Continuous test gen- [30] Becker, M. and Smith, S. F. An ontology for Multi-Modal Transporta- eration: enhancing continuous integration with automated test genera- tion Planning and Scheduling. Carnegie Mellon University Technical tion. In Proceedings of the 29th ACM/IEEE international conference Report, 1997. on Automated software engineering (ASE ’14). ACM, New York, NY, USA, 55-66, 2014. [31] Lian, P., Park, D. and Kwon, H. Design of Logistics Ontology for Se- mantic Repre- senting of Situation in Logistics. In the Second Work- [13] Auer, S., Bühmann, L., Dirschl, C., Erling, O., Hausenblas, M., shop on Digital Media and its Application in Museum & Heritages, Isele, R., Lehmann, J., Martin, M., Mendes, P. N., Van Nuffelen, B., 432-437, 2007. Stadler, C., Tramp, S. and Williams, H. Managing the life-cycle of linked data with the LOD2 stack. In Proceedings of the 11th interna- [32] Hoxha, J., Scheuermann, A. and Bloehdorn, S.An Approach to Formal tional conference on The Semantic Web - Volume Part II (ISWC’12), and Semantic Representation of Logistics Services. In Proceedings of Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache, the Workshop on Artificial Intelligence and Logistics (AILog) at the and Jérôme Euzenat (Eds.), Vol. Part II. Springer-Verlag, Berlin, Hei- 19th European Conference on Artificial Intelligence (ECAI), 2010. delberg, 1-16, 2012. [33] Scheuermann, A. and Hoxha, J.Ontologies for Intelligent Provision of [14] Villazon-Terrazas, B., Vilches-Blazquez, L., Corcho, O. and Gomez- Logistics Ser- vices. In Proceedings of the 7th International Confer- Perez, A.Methodological guidelines for publishing government linked ence on Internet and Web Applications and Services, 2012. data linking government data. Linking Government Data, 27–49, [34] Preist, C., Esplugas-Cuadrado, J., Battle, S.A., Grimm, S. and 2011. Williams, S.K. Auto- mated Business-to-Business Integration of a Lo- [15] Hyland, B. and Wood, D.The Joy of Data - A Cookbook for Publishing gistics Supply Chain Using Seman- tic Web Services Technology. In Linked Government Data on the Web. Linking Government Data, 3- Proceedings of the 4th International Conference on the Semantic Web, 26, 2011. 2005. [35] Dalmolen, S., Cornelisse, E., Moonen, H. and Stoter, A. Cargo’s Digital Shadow - A Blueprint to Enable a Cargo Centric Information Architecture. In eFreight Con- ference, 2012. [36] Ahn, S. B. Container tracking and tracing system to enhance global visibility. In Proceedings of the Eastern Asia Society for Transporta- tion Studies, vol. 5, 1719 - 1727 pp, 2005. [37] Bezerra, C., Freitas, F., Santana, F. Evaluating Ontologies with Com- petency Questions. Web Intelligence (WI) and Intelligent Agent Tech- nologies (IAT), IEEE/WIC/ACM International Joint Conferences, vol 3, 284-285 pp, 2013. [38] S.L. Ting, L.X. Wang, W.H. Ip. A study of RFID adoption for vehicle tracking in a container terminal. Journal of Industrial Engineering & Management, Vol. 5 Issue 1, 22 p, 2012. [39] J. K. Siror, G. Liang, K. Pang, H. Sheng and D. Wang.Impact of RFID Technology on Tracking of Export Goods in Kenya.JCIT 5(9),190-199 pp, 2010.