Semantic Web in the Fog of Browsers Pascal Molli and Hala Skaf-Molli University of Nantes, LS2N, France {pascal.molli,hala.skaf}@univ-nantes.fr Abstract. Imagine connecting thousands of web browsers with browser-to-browser connections, sharing storage, bandwidth, and CPU. This builds a fog of browsers where end-user devices are ready to collaborate. Imagine semantic fog applications running in fogs of browsers, querying the linked data servers hosted in the cloud and data hosted in the fog. Fogs of browsers running semantic fog applications create a new massively decentralized infrastructure where RDF data and SPARQL query processing are available both on web servers and on browsers. In this paper, we explore new opportunities and research challenges opened by a fog of browsers for the semantic web. 1 Introduction Fog computing relies on the collaboration of a multitude of devices located near end-users to provide new services or improve cloud services [14]. There are interdependencies between fog computing and with cloud computing. Fog can act as a proxy to improve the quality of services of cloud services. Moreover, fog can be the beachhead of the cloud to collect and aggregate data [5]. Cloudlets [12], Cisco IOx, or Paradrop are good examples of how the fog nodes can be implemented and deployed near end-user devices [15]. In the context of the semantic web, fog computing is able to improve the availability of semantic data without increasing the cost of hosting data for data providers. It could also greatly help the semantic web to aggregate new data collected from end-users or from the web of things. Traditionally, fog computing uses network gateways to run fog nodes. In the context of the semantic web, we believe that web browsers also meet naturally most of the criteria for fog computing. Web Browsers are located near end-users, they have storage, CPU, communica- tion, and most of all, they are de facto the most widely deployed execution environments in the world. The recent introduction of WebRTC 1 has further extended the capabilities of browsers by introducing support for browser-to-browser communications. This turns browsers into a decentralized execution environment for running semantic web applications. As browsers have already the ability to locally store RDF data and in some context to run SPARQL queries, semantic fog applications running in the fogs of browsers create a new massively distributed infrastructure where RDF data and SPARQL query processing are available both on the cloud and the fog. Therefore, the main challenge is to decide how to locate RDF data and process SPARQL queries over this massively distributed infrastructure to deliver services and quality of services required for a given semantic web application. In this paper, we explore new opportunities and research challenges opened by a fog of browsers for the semantic web. 1 https://webrtc.org/ 2 Pascal Molli and Hala Skaf-Molli The paper is organized as follows. Section 2 defines semantic fog applications in the fog of browsers. Section 3 presents new opportunities opened by semantic fog applications. Sec- tion 4 highlights new research challenges for semantic fog applications. Finally, conclusions are outlined in section 5. 2 Semantic Fog Applications in the Fog of Browsers GeoData DBpedia Web Server http://myfog b6 b3 b1 b9 b5 b7 b8 b2 b4 b0 Fig. 1. A semantic fog application running in a fog of browsers A fog of browsers is a set of interconnected browsers with browser-to-browser connections. Such connections are now supported thanks to the WebRTC standard in Firefox, Chrome, Microsoft Edge and IOS. A browser can participate to one or several fogs. A fog of browsers is accessible through one or several URIs hosted on regular web server. The web server dereferences this address to a JavaScript application bootstrapped with a sample of already connected browsers2 . This JavaScript application represents a semantic fog application with its own logic. The application is able to manage RDF data and runs SPARQL queries over linked data and/or over data hosted in the fog. We assume that all RDF data are managed following the linked data principles [3]. Once downloaded in the browser, the semantic fog application joins the network of browsers by connecting the browser to at least one of the already connected browsers. Following this approach, at a given time, there is potentially a high number of browsers, running the same application, and all these browsers are connected together. We do not make any assump- tion about the topology of the network, i.e. hierarchical, structured, unstructured, hybrid or multi-layer. Topology depends on the objective of the semantic fog application. In figure 1, the browsers are connected in an unstructured network, they execute SPARQL queries over data in the fog and 2 datasources hosted in the cloud. The browser b0 contacts the web server hosting the semantic fog application in order to join the network. The web server is returning the semantic fog application and references to two browsers: b1 and b2. b0 contacts one of them to join this fog. To be usable, a semantic fog application must meet the following requirements inspired from P2P data management [11]: 2 As has been already done in [9] and [7]. Semantic Web in the Fog of Browsers 3 autonomy Each browser participating in a fog of browsers is free to join and leave at any time. It owns its data and have a full control on it. query expressiveness A semantic fog application runs SPARQL queries or a subset of SPARQL. The scope of the query can refer traditional linked data providers and/or fog participants. efficiency A fog of browsers is composed by the resources of fog participants and the re- sources of cloud providers involved in the semantic fog application. The efficient uses of all resources should result in higher throughput of queries. quality of service The fog has to improve the user-perceived efficiency of the system. fault tolerance Quality of service can be maintained for a period of time even in presence of failures of browsers or failures of linked data providers. security As an open system, a fog of browsers can be used to steal personal data, attack other browsers in the fog or attack servers. Access control and resistance to malicious are crucial for semantic fog applications. 3 Semantic Fog Applications Deploying semantic fog applications over a fog of browsers raises several opportunities. In this section, we present several semantic fog applications illustrating different usages. 3.1 Queries in the fog 1 v o i d main ( ) { 2 3 /∗ Connect b r o w s e r s g e o l o c a l i z e d i n Nantes ∗/ 4 Overlay . c o n f i g u r e ( G e o l o c a t i o n =’ Nantes ’ ) ; 5 6 /∗ Q1 : Get nearby p o i n t o f i n t e r e s t ∗/ 7 S t r i n g query= ’ s e l e c t ? p l a c e where ? p l a c e nearby ( ’+ m y p o s i t i o n +’ 100m) ’ ; 8 ResultSet l o c = q u e r y E x e c u t i o n . e x e S e l e c t ( query , model ) 9 10 /∗ Q2 : C o l l e c t u s e r f e e d b a c k ∗/ 11 answer=a s k U s e r ( ’ Do you l i k e your l o c a t i o n ? : ’ + l o c ) 12 UpdateAction . e x e c u t e ( ’ INSERT DATA’+ ’me’+op : l i k e s+ l o c ) ; 13 14 /∗ Q3 : D i s p l a y most l i k e d p l a c e s ∗/ 15 queryExecution . e x e S e l e c t ( 16 ’SELECT ? p l a c e COUNT( ? l i k e s ) { ? p l a c e l i k e d ? o } groupby ? p l a c e ’ ) ) 17 } Fig. 2. The ”tourism in Nantes” semantic fog application Consider a simple semantic web application where people visiting a city have access to point of interests around them, can rate these points, and can list top-ranked point of interests. This application can be written with queries like Q1, Q2, and Q3 presented in figure 2 and can be deployed in the cloud. Consequently, queries are executed in the cloud with data stored in the cloud datastore. In this case, the cost of running this application relies entirely on the 4 Pascal Molli and Hala Skaf-Molli application provider, the availability and the performances of the application relies on the cloud provider. Now consider that the code of the Figure 2 is a semantic fog application that is loaded and run in each browser visiting the web page of this application. The line 4, the semantic fog application connects the browser of the visitor to a fog of browsers where browsers are now located in the city of Nantes. Concerning query Q1, data are still located in the cloud, but now the fog could provide data caching and consequently could improve data availability and reduce the cost of data providers. Under certain conditions, the application could continue to run even if cloud services are unavailable. Concerning query Q2, the semantic fog application can be configured to store data locally in the browser, in the fog, or in the cloud. Suppose user feedback is stored locally in the browser. Then the cost of executing Q2 is no more on the charge of the application provider and furthermore, user ratings do not leave browsers. Query Q3 is dependent on data location. Different situations are possibles: if users ratings are stored in the cloud, then executing Q3 in the fog is similar to Q1. If users ratings are stored in their own browsers, then Q3 execution requires to contact every browsers. If users ratings are stored somewhere in the fog, then data can be smartly located and aggregated to answer Q3 efficiently without contacting every browsers. As we can see, running semantic fog application opens different trade-off concerning what can be done in the fog and what can be done in the cloud. This trade-off impacts the cost of running an application, the performance of the application, and the availability of the application. It can also impact the privacy of personal data and the quality of collected data. 3.2 Semantic Collaborative caching DrugBank DBpedia DS x ... DS y Serveur TPF 1 ... Serveur TPF n Cache HTTP 1 ... Cache HTTP n C6 C3 C1 C9 C5 Similarity network C7 C8 C2 C4 C6 C3 C1 C9 C5 Random Network C7 C8 C2 C4 Fig. 3. Semantic Collaborative caching Semantic Web in the Fog of Browsers 5 Cyclades [6] is a collaborative caching system that can be used by a semantic fog applica- tion as the one presented in figure 2. Cyclades connect similar browsers by assuming that users with similar queries in the past will certainly perform similar queries in the future. Therefore, data cached at similar nodes could be used to answer queries without using resources of linked data servers. Cyclades is based on a double overlay networks; the first one builds a random network providing connectivity while the second one incorporates a similarity metric. The similarity metric is able to detect users performing similar queries based on the analysis on their local caches. The two-level network topology of Cyclades is described in figure 3. In this scenario, the fog is able to reduce the number of calls to data providers. Conse- quently, this improves data availability and reduces the cost of providing data. 3.3 Queries with the fog Fig. 4. Queries with the fog Ladda [7] is a semantic fog application that allows participants to delegate their SPARQL queries to their neighbors in the fog. For example, one can want to execute: 1 f o r each $ c o u n t r y i n c o u n t r i e s 2 query . e x e c u t e ( ”SELECT ? s o f t w a r e ?company WHERE { 3 ? s o f t w a r e dbpedia−owl : d e v e l o p e r ? company . 4 ?company dbpedia−owl : l o c a t i o n C o u n t r y 5 [ r d f s : l a b e l ” $ c o u n t r y ”@en ] . 6 }’ By parallelizing the execution of queries over different browsers, the execution time of this workloads can be significantly reduced. Figure 4 illustrates a Ladda’s query execution. 6 Pascal Molli and Hala Skaf-Molli In this execution, a browser executes 1509 queries with the help of 6 neighbors in a network composed of 50 participants. Each square represents the execution time of a query on the swim lane of a browser. On this run, the execution time of the workload is 2m37s instead of 3m32s if the workload was executed by one browser. In this scenario, the semantic fog application allows to share the CPU and bandwidth of browsers for SPARQL query processing. 4 Research challenges Deploying semantic fog applications on a fog of browsers opens many opportunities for seman- tic web application developers. They can optimize financial costs, availability, performances, privacy . . . However, the programming model has to remain simple as the one depicted in Figure 2. The configuration of the semantic fog application has to determine how queries and data are deployed in the cloud and in the fog to reach developper expectations. The fog of browsers can reuse some scientific results from P2P data management sys- tems [11, chapter 16]. Many works demonstrated how data can be efficiently stored and accessed on structured, unstructured, and hybrid P2P networks such as Edutella [10], RDF- Peers [4], PierDB [8], GridVine [1] etc. However, the context and objectives of fog of browsers are slightly different: – Fog and cloud are interdependent. Cloud services can be used to manage the fog. The fog can just improve the efficiency and the quality of services of data providers without managing data as demonstrated in [6] and [7]. – Most of work on P2P data management have been done on TCP/IP networks. How- ever, WebRTC networks used by browsers have several major differences with traditional TCP/IP networks: 1. A WebRTC network is not addressable and basically has no routing. Consequently, contacting a particular browser can be costly. 2. Establishing a WebRTC connection between 2 browsers requires a third party to ex- change tokens. Once tokens exchanged, a complex negotiation protocol starts to allow NAT traversal. So, establishing a WebRTC connection can be more costly than a TCP/IP connection. The constraints of WebRTC change the cost of communications and potentially impact all existing algorithms. Customized overlay networks for a fog of browsers. A fog of browsers connects thou- sands of browsers over WebRTC. The nature of WebRTC networks and the objective of the semantic fog application can lead to different design choices. As routing is costly in WebRTC, keeping useful neighbors around us in one hop, can be a good strategy for efficiency and qual- ity of service. Indeed, direct neighbors can be contacted at low cost. ’useful neighbors’ can have different meanings according the application. Many similarity metrics can be defined and many overlay can be combined in the same fog as proposed in [6]. Finding the best similarity metrics, topologies and combinations of topologies for query efficiency and quality of services is clearly an important research direction. Dynamic replication and consistency in a fog of browsers. Data replication is a fundamental concept for improving data availability and performances of query processing. In the context of a fog of browsers, replication contributes to query efficiency, quality of service Semantic Web in the Fog of Browsers 7 and fault-tolerance requirements. A replication strategy has to decide what data to replicate, where to replicate and when to replicate. Such decisions are complex in a fog of browsers: the participants are autonomous, the data storage is limited, the communication costs constrained by network topology. Adaptivity of replication to queries seems a good strategy. Materializing data fragments that are frequently retrieved from data providers and spreading them within the fog can have a significant impact on performances. Defining these fragments, deciding when to replicate them and where to locate them is clearly challenging. Another challenge strongly related to data replication is consistency management. Data needs to be up-to-date. Maintaining consistent data fragments at low-cost in a fog of browsers is clearly challenging. Crowdsourcing with a fog of browsers A browser is not just an execution environment for JavaScript programs. It could also involve humans with their Web of Things devices. Fog computing allows a collaboration between man and machines to collect, curate and aggregate data. Consequently, a fog of browsers can be seen as a distributed crowdsourcing platform where data are collected, semantified and verified within the fog, before saved to the cloud. How the functionalities of a crowdsourcing platform can be distributed among the fog and cloud providers is an interesting challenge. Federated query engines for a fog of browsers. Federated SPARQL query engines [13, 2] allow to query several data sources in a transparent way. In the context of a fog a browsers, the fog itself could be considered as a new data source that cloud be combined with traditional data providers. However, each fog participant has a fragment of data and has to be contacted to answer queries. Such problems have been partially addressed by P2P data management systems. The challenge is to build a distributed federated query engine running in the fog, able to query data in the cloud and in the fog. Security for semantic fog application If a fog of browsers opens many opportunities, it also brings new threats: A fog of browsers can be used to perform DDOS attacks, to steal personal information from browsers, and to watch people. A semantic fog application has to protect participants and data providers against malicious users. Semantic fog applications require appropriate security models. 5 Conclusions In this paper, we presented how semantic fog applications running in the fog of browsers creates a massively decentralized infrastructure that extends the semantic web to the browsers of end-users. By this way, the semantic web can take advantage of resources of browsers, including end-users and IoT devices. Semantic fog applications can improve the efficiency and quality of service of linked data providers. It can also enhance the linked data with data provided by end-users. If some semantic fog applications are already there, more research efforts are needed to fully exploit all the potential of semantic fog applications: pertinent network topologies, dynamic replication, efficient query processing, data quality and security. Another interesting research questions have not been discussed in this paper: the dynam- icity of the fog of browsers and how fog of browsers can be combined with distributed ledgers for commercial query processing in the fog of browsers. 8 Pascal Molli and Hala Skaf-Molli References 1. Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth, and Tim Van Pelt. Gridvine: Building internet-scale semantic overlay networks. In International semantic web conference, volume 3298, pages 107–121. Springer, 2004. 2. Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo, and Edna Ruckhaus. Anapsid: an adaptive query processing engine for sparql endpoints. The Semantic Web–ISWC 2011, pages 18–34, 2011. 3. Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked Data - The Story So Far. International Journal of Semantic Web and Information Syststems, 5(3):1–22, 2009. 4. Min Cai and Martin Frank. Rdfpeers: a scalable distributed rdf repository based on a structured peer- to-peer network. In Proceedings of the 13th international conference on World Wide Web, pages 650–657. ACM, 2004. 5. Mung Chiang and Tao Zhang. Fog and iot: An overview of research opportunities. IEEE Internet of Things Journal, 3(6):854–864, 2016. 6. Pauline Folz, Hala Skaf-Molli, and Pascal Molli. CyCLaDEs: a decentralized cache for Linked Data Fragments. In ESWC: Extended Semantic Web Conference, 2016. 7. Arnaud Grall, Pauline Folz, Gabriela Montoya, Halla Skaf-Molli, Pascal Molli, Miel Vander Sande, and Ruben Verborgh. Ladda: SPARQL queries in the fog of browsers. In Proceedings of the 14th ESWC: Posters and Demos, May 2017. 8. Ryan Huebsch, Joseph M Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, and Ion Stoica. Querying the internet with pier. In Proceedings of the 29th international conference on Very large data bases-Volume 29, pages 321–332. VLDB Endowment, 2003. 9. Brice Nédelec, Pascal Molli, and Achour Mostefaoui. Crate: Writing stories together with our browsers. In Proceedings of the 25th International Conference Companion on World Wide Web, pages 231–234. International World Wide Web Conferences Steering Committee, 2016. 10. Wolfgang Nejdl, Boris Wolf, Changtao Qu, Stefan Decker, Michael Sintek, Ambjörn Naeve, Mikael Nilsson, Matthias Palmér, and Tore Risch. Edutella: a p2p networking infrastructure based on rdf. In Proceedings of the 11th international conference on World Wide Web, pages 604–615. ACM, 2002. 11. M Tamer Özsu and Patrick Valduriez. Principles of distributed database systems -. Springer, 2011. 12. M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies. The case for vm-based cloudlets in mobile computing. IEEE Pervasive Computing, 8(4):14–23, Oct 2009. 13. Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. Fedx: Optimization techniques for federated query processing on linked data. In International Semantic Web Conference, pages 601–616. Springer, 2011. 14. Luis M Vaquero and Luis Rodero-Merino. Finding your way in the fog: Towards a comprehensive definition of fog computing. ACM SIGCOMM Computer Communication Review, 44(5):27–32, 2014. 15. Shanhe Yi, Zijiang Hao, Zhengrui Qin, and Qun Li. Fog computing: Platform and applications. In Hot Topics in Web Systems and Technologies (HotWeb), 2015 Third IEEE Workshop on, pages 73–78. IEEE, 2015.