Product Search using Linked Data John Walker1 and Herman Elenbaas2 1 Semaku, Torenallee 20, 5617 BC Eindhoven, Netherlands john.walker@semaku.com 2 Nexperia, Jonkerbosplein 52, 6534 AB Nijmegen, Netherlands herman.elenbaas@nexperia.com Abstract. Nexperia is a semiconductor manufacturer with a rich heritage from Philips and NXP. With such a long history, the company has built an impressive product portfolio of over ten thousand distinct product models. With this many products, finding the relevant data across the various systems can be an onerous task. To that end Nexperia have adopted and extended the award-winning Enter- prise Data Hub [1] solution originally developed in cooperation with NXP and publish Linked Data on the Nexperia Data Portal [2]. One of the key new devel- opments has been the Product Search application which enables users to quickly find answers to the most frequently asked questions from customers in a fast, responsive and intuitive interface. Typically, this might be a Nexperia sales rep- resentative talking to a customer who needs information about what products to purchase. Being able to provide fast and accurate product information can help Nexperia to win more sales. To ensure a fast search experience, we use an Elas- ticsearch cluster for the search index. In this paper we explain the Extract, Trans- form, Load (ETL) approach used to populate the search index and how Linked Data adds value in the Product Search application. Keywords: RDF, SPARQL, JSON-LD, ETL, search. 1 ETL process From previous work on the Product Data Hub, almost all the required data is available as RDF and can be queried via SPARQL. The approach taken in the Enterprise Data Hub is to maintain a clear relationship and provenance of data from the source systems across various domains and disciplines. To this end, there is a repository per source where the data is modeled in RDF maintaining a structure close to the representation in the source system. Where possible, common keys are already mapped to standard URI templates to already interlink data across sources, but this is not always possible. The use of RDF provides a normalized data layer where data can easily be blended and mashed up. A schema-on-read approach is applied, where SPARQL 1.1 Query and SPARQL 1.1 Federated Query are used to gather and integrate data from several repos- itories. The CONSTRUCT form of query is used to map onto the target model. In earlier versions of the application we used monolithic SPARQL queries to extract a complete description of a product in a single request. As the complexity of the queries 2 increased, we saw a degradation in performance and maintainability of the queries. To remedy this, we decided to split the monolithic queries into several smaller CONSTRUCT queries where the results of the queries can be combined using an RDF merge operation to concatenate the graphs. Not only is the performance of these smaller queries much more predictable, it also reduces the workload that the SPARQL endpoint must perform on a single request and therefore reduces the stress on the server. The queries are also significantly easier to understand for the developers and therefore easier to develop, debug and maintain. Ad- ditional benefits of doing this is it is now possible to test and measure performance of the individual queries to identify bottlenecks and take actions to improve performance. This also allowed us to change the ETL approach, which was previously an incremental approach product-by-product, to also enable a full reload of the search index in a per- formant manner. The merge of the results from the queries is done in a Jena in-memory model. We then extract an unbounded sub-model from this per product resource. The sub-model is then framed using JSON-LD Framing to coerce the graph into a hierarchical idiomatic JSON structure and to alias URIs to developer-friendly ‘local’ names. These framed documents are then added to the Elasticsearch index. On a full reload of the index, we create a new index alongside the current index and swap these once the new index is populated. This gives zero downtime for users. The process for a full refresh takes under 5 minutes to complete. 2 Product Search application The Product Search application was developed in close cooperation with business us- ers. We followed a user-oriented approach with a design sprint to make sure we truly understand the user needs and requirements, resulting in a validated design. This was followed by several implementation sprints to bring the design to a working application. The application consists of Java Spring Boot server-side and Vue.js client-side run- ning in the web browser. The server mediates requests to the backend services by ex- posing an API consumed by the client. The backend services include calls to the Elas- ticsearch index, SPARQL queries and a third-party API for external stock and pricing data. The application leverages the linked nature of the data by displaying links to other resources and by using the property URIs to lookup the labels and definitions of those properties for display in the application. References 1. Enterprise Linked Data award 2015, https://2015.semantics.cc/eldc-awards-given, last ac- cessed 2018/06/04 2. Nexperia Data Portal, http://www.data.nexperia.com/