=Paper=
{{Paper
|id=Vol-1666/paper-01
|storemode=property
|title=Distributed Linked Data as a Framework for Human-Machine Collaboration
|pdfUrl=https://ceur-ws.org/Vol-1666/paper-01.pdf
|volume=Vol-1666
|authors=Paolo Pareti
|dblpUrl=https://dblp.org/rec/conf/semweb/Pareti16
}}
==Distributed Linked Data as a Framework for Human-Machine Collaboration==
Distributed Linked Data as a Framework for Human-Machine Collaboration Paolo Pareti ? University of Edinburgh, Edinburgh, United Kingdom Taiger, Madrid, Spain paolo.pareti@taiger.com Abstract. This paper presents a novel application of Linked Data as an indirect communication framework for human-machine collaboration. In a decentralised fashion, agents interact by publishing Linked Data resources without having access to a centralised knowledge base. This framework provides an initial set of solutions to the problems of dy- namic Linked Data discovery, of querying frequently-updated distributed datasets and of guaranteeing consistency in the case of concurrent up- dates. As a motivation for this framework we take the use-case of human and machine agents collaborating to the execution of tasks. This use-case is based on existing real-world Linked Data representations of human in- structions and research on their integration with machine functionalities. 1 Introduction On the web, the amount of structured information available to machines is steeply increasing. This information can be used by computer systems to answer complex queries about factual knowledge, for example about the population of cities and the date of birth of notable people. However, machines still have lit- tle or no understanding of human activities. For example, a user who is in the process of performing a complex task might use the functionalities offered by multiple software tools. However, these systems might operate in isolation, not knowing what the user is trying to achieve and how. This lack of understanding is a limitation to human-machine collaboration as machines cannot predict when and how their functionalities might be needed. A typical approach to describe activities is by writing instructions. Previous research has demonstrated how human instructions can be converted into Linked Data and how related tasks and entities can be automatically interlinked [5]. While certain steps of the instructions can only be executed by humans, others, such as sending emails or modifying files, can be automated. Such steps can be linked to machine functionalities and executed at the right time when a user is performing a related activity [4]. This paper generalises this approach ? This research has been funded by the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 607062 ESSENCE: Evolu- tion of Shared Semantics in Computational Environments. collective knowledge LD LD human interface machine agent agent LD LD human interface machine agent agent Fig. 1. A multi-agent environment with no shared editable resources. Dashed and solid lines represent, respectively, the ability to read and write Linked Data resources. to automation to the case of multiple agents and decentralised resources. As its main contribution, this paper presents a novel application of Linked Data as an indirect collaboration framework for humans and machines. The main benefits and limitations of this framework are discussed and initial solutions are proposed to the problems of dynamic Linked Data discovery, of querying frequently-updated distributed datasets and of guaranteeing consistency in the case of concurrent updates by multiple agents. 2 Problem Description The proposed framework addresses the problem of allowing a collaborative set of human and machine agents, who can publish and access web resources but that cannot directly interact with each other, to communicate and coordinate their actions to collaboratively achieve tasks in the absence of a centralised system. To achieve decentralisation, no shared resource which multiple agents can edit is assumed to be available. As depicted in Figure 1, each agent is able to modify resources in its own repository while it is able to read the resources in all of the repositories of the other agents. In this context, communication refers to the process by which agents can propagate information (i.e. rdf1 triples) to the other agents by modifying the collective knowledge, namely the resources that all agents can access. Coordination instead refers to the ability to guarantee certain conditions across all datasets. For example, coordination might be required to ensure that no agent can declare its intention to execute a task which is already being executed by another agent. This framework is intended to address dynamic partially-automatable prob- lems. A partially-automatable problem requires some human intervention, as it cannot be entirely automated. At the same time, it cannot be optimally solved by humans alone, as some part of it could be automated. Dynamic problems, instead, are those that cannot be predicted in advance, making the development 1 http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/ of dedicated software solutions impractical. To simplify the problem at hand, it is assumed that the agents involved already know and trust each other. Issues such as agent discovery, coalition formation and trust are considered outside the scope of this project. 3 Motivational Use-Case As a running example we take the use-case of John, an employee of a company working on a project alongside with Ann. Every week, John and Ann write a report about the status of the project which then gets sent to the other members of the company. To do so, they agree on the following procedure: Weekly report procedure: Step 1: John writes a draft of the report Step 2: The report is uploaded to an online repository Step 3: Ann corrects the report Step 4: The report is sent to all the colleagues While this procedure can be completed manually, it can also be made more efficient with automation. For example, as soon as John finishes writing the draft of the report, a machine agent could upload it to the correct repository. Ann could then be automatically notified that she can start correcting the report as step 3 is ready to be executed. Step 4 might also be a good candidate for automation. Different colleagues might have different preferences on how to be contacted. For example, some might prefer to receive an email, others a message on an instant messaging system or a social networking platform. A machine agent could deal with these diverse preferences automatically, provided its given the necessary contact details. 4 Methodology Lacking any central system, agents are the only components of the proposed framework. At a functional level of abstraction, all agents can be seen as entities capable of reading and writing Linked Data resources. Agents also have individ- ual repositories where they can publish and modify Linked Data. More specifi- cally, agents routinely perform three phases. During the first phase, agents sense the environment by accessing the Linked Data resources of the other agents. This phase is described in Section 4.1. During the second phase, agents can perform actions and decide what to communicate the the other agents by reasoning over the state of the environment and their own capabilities. Human agents perform this phase by interpreting the state of the environment using intuitive interfaces while machine agents use a logical formalism. This phase is described in Section 4.2. During the third phase, agents update their own repository with the infor- mation they intend to communicate to the other agents, such as the outcome of their actions. This phase is described in Section 4.3. Table 1. The rdf namespaces used in this document. All other namespace prefixes can be considered to be associated with generic uris such as http://example.org/ Prefix Namespace uri prohow http://w3id.org/prohow# rdfs http://www.w3.org/2000/01/rdf-schema# 4.1 Knowledge Retrieval While direct communication requires knowledge of the recipients of the mes- sages, indirect communication requires a common environment that all agents can observe [2]. An agent wanting to use direct communication must know how to transmit information to specific recipients. This would be a impractical in our scenario as agents might be humans or machines and might need to agree on different communication protocols. On the other hand, agents using indirect communication mechanisms only need to know how to store and retrieve infor- mation from the shared environment. Using indirect communication, agents can be oblivious to the characteristics or even the very existence of the other agents. In this framework the shared environment that enables indirect communi- cation is a set of Linked Data resources available online, here called collective knowledge. Therefore agents, to communicate, only need the ability to read and write Linked Data resources. It is important to notice that while all agents can access and modify the overall collective knowledge, they cannot modify all parts of it, but only the resources that they have direct control of. Although agents do not need to have explicit knowledge of each other, they need to know where to access the resources that make up the collective know- ledge. Implicitly, this involves either knowing or being able to retrieve the ad- dresses of the resources of the other agents. One possible approach is to agree on a uri for the collaboration. This uri should be dereferencable to an rdf resource containing the addresses of the other repositories in the collective knowledge. For example, we can imagine agent a joining coalition x:coalition. By derefer- encing this uri, an agent might retrieve the following triples: x:coalition prohow:includes a:resource, b:resource, c:resource . rdf triples are here represented in Turtle2 format following the namespace prefixes defined in Table 1. The relation prohow:includes links the identifier of a coalition with the resources included in the collective knowledge of the coalition. These resources are here called seed uris. If none of the seed uris points to a resource editable by an agent a, then it can be inferred that agent a does not belong to this coalition, as it would not be possible for this agent to communicate information to the other agents. We will assume here that resource a:resource is editable by agent a. Agents do not necessarily know how many other agents are involved in the coalition, if any, as Linked Data resources might not belong to any agent, or multiple resources might belong to the same agent. Agents wanting to agree on a coalition uri and on the list of seed uris might have to resort to direct communication or to centralised systems, such 2 http://www.w3.org/TR/turtle/ seed URIs ex6:r ex1:r ... a:resource ex2:r ... ... x:coalition b:resource ex3:r ... c:resource ex4:r ... ... ex5:r ... Fig. 2. Dereferencing uris might lead to the discovery of a large amount of resources. Adding the dashed link reduces the link distance from the seed uris to ex6:r to 1. as matchmaking systems. This process, however, is considered part of coalition formation and therefore outside of the scope of this work. In this paper we assume that agents already have access to the list of seed uris. Having obtained the list of seed uris, agents can now discover the collective knowledge of the coalition. This is done by dereferencing the uris of the entities they are interested in, starting from the seed uris. Agents should attempt to re- quest machine-readable versions of the resources first, using content negotiation. The retrieved resources should then be correctly interpreted if encoded in one of the standard formats, such as Turtle, rdf/xml3 or rdfa.4 After retrieving all the data in the collective knowledge agents can proceed to reason about it by querying it. This process of querying distributed Linked Data resources by retrieving them locally is called a materialisation-based approach. Dereferencing uris allows agents to dynamically discover Linked Data re- sources at runtime. However, as depicted in Figure 2, the discovered resources might contain more uris which might lead to the discovery of even more re- sources, leading to a virtually unconstrained set of resources. Similarly to web crawling, if no limit is imposed on the depth of the exploration, an excessively large number of resources could be discovered and included in the collective knowledge. Considering that agents need to frequently access all the resources in the collective knowledge, unconstrained exploration is impractical. One possible solution to this problem is to limit the exploration of uris only to one level from the seed uris. In other words, only uris directly retrieved from the seed uris would be considered as potential resources in the collective knowledge. Agents discovering relevant resources by exploring beyond this limit can include them in the collective knowledge by including their uris in one of the seed uri resources. In Figure 2, for example, agent a can add ex6:r to the collective knowledge by adding a link to it in resource a:resource. An alternative to dynamic Linked Data discovery is the addition of rele- vant data into a single repository. For example, agent a could copy the rdf data from resource ex6:r into its resource a:resource. Doing this would give more 3 http://www.w3.org/TR/rdf-syntax-grammar/ 4 http://www.w3.org/TR/rdfa-syntax/ flexibility to agent a to decide which triples to include in the collective know- ledge. By linking to resource ex6:r instead, agent a must commit to include all data from this resource in the collective knowledge. Moreover, copying data in the agent’s repository could be considered a more stable approach in case the external resource has a high risk of becoming unavailable. If the external resource is sufficiently stable and does not contain a significant amount of superfluous information, however, the dynamic Linked Data discovery approach could be considered more advantageous. Linking to external resources and allowing their discovery avoids the problem of data duplication. Not only data duplication increases the amount of repository space required by the agents, but it also makes updates more difficult. In fact, if the data duplication approach is followed, an update to repository ex6:r might take a long time to propagate to all the resources that contain its copy, or even not propagate at all. For these reasons, dynamic Linked Data discovery at runtime can be a practical solution if the discovered resources are stable and self-contained. 4.2 Knowledge Representation and Reasoning In order communicate meaningfully, a shared knowledge representation format needs to be established. Following the Linked Data principles, agents represent their knowledge according to the rdf data model. Shared semantics is achieved by the use of uris as global identifiers to unambiguously identify entities and concepts, and by the adoption of a shared vocabulary. In the human-machine collaboration scenario the prohow5 vocabulary is chosen. For the reminder of this paper we will assume that data is represented in rdf and the agents agree on a common vocabulary. However, several approaches could be used in scenarios where these assumptions do not hold. For example, ontology alignment techniques could be used to create semantic interoperability between different conceptualisations. Also, existing systems have been developed to translate non-rdf data into rdf, or mine rdf models from unstructured re- sources, such as natural language text. One such system has been developed to automatically convert human-generated instructions into Linked Data [5] using the prohow vocabulary. The feasibility of this process was demonstrated by the creation of an rdf dataset6 of over 200.000 interlinked procedures extracted from the wikiHow7 and Snapguide8 websites. The ability to integrate a frame- work with existing resources is particularly important to mitigate the cold start problem. This problem occurs when the low value offered by a system discourages its adoption, while at the same time adoption is needed to increase its value. The value of Linked Data applications, which typically focus on integration and stan- dardisation, often depends on the number of people and organisations adopting them. Applications which can be integrated with existing systems and resources can provide a higher initial value to the early adopters. 5 http://w3id.org/prohow# 6 http://w3id.org/knowhow/dataset/ 7 http://www.wikihow.com/ 8 http://snapguide.com/ The prohow vocabulary has been used to represent both real-world human instructions and machine functionalities [4]. This vocabulary represents tasks in terms of instructions and of their execution. For instance, the procedure in the example introduced in Section 3 can be translated in the following prohow representation using an existing parser:9 :t rdfs:label "Weekly report procedure" . :t prohow:has_step :t1 , :t2 , :t3 , :t4 . :t1 rdfs:label "John writes a draft of the report" . :t2 prohow:requires :t1 . :t2 rdfs:label "The report is uploaded to an online repository". :t3 prohow:requires :t2 . :t3 rdfs:label "Ann corrects the report" . :t4 prohow:requires :t3 . :t4 rdfs:label "The report is sent to all the colleagues" . In this example, the main task :t is connected to the four steps :t1 to :t4 using the prohow:has_step relation. The order between the steps is given by the prohow:requires relation. Step :t2 could be linked to a machine functionality :t5 as follows: :t2 prohow:has_method :t5 . :t5 rdfs:label "Upload a document to a repository" . :t5 prohow:requires :t6 , :t7 . :t6 rdfs:label "The document to upload" . :t7 rdfs:label "The repository where to upload it to" . The prohow:has_method relation indicates that task :t5 is one possible way to achieve task :t2. A machine agent might then be able to complete this task automatically given the required inputs :t6, “The document to upload” and :t7, “The repository where to upload it to”. One of the main challenges to enable communication between humans and machine agents is the representation of knowledge in a format which is both human and machine understandable. It is therefore important to map such rep- resentation both to a logical formalism to allow machine reasoning, and to an intuitive representation, such as natural language, which can be understood by humans. Data modelled with the prohow vocabulary can be translated both into a natural language representation and into logical statements [4]. Logical statements allow a machine to infer, for example, that a task is ready to be executed because if all of its requirements are complete, or that it has been accomplished if one of its methods has been completed. While machines rely on logic, human reasoning and actions are mediated through interfaces. In this context, an interface is defined as a software agent capable of translating Linked Data into a human readable representation and of translating human interactions into Linked Data. prohow processes can be represented in different ways, including as a list of steps, methods and require- ments, a common format in human-written instructions. Interfaces also allow human users to translate their actions and decisions into Linked Data. An ex- isting JavaScript implementation,9 for example, presents prohow tasks in an html document and displays buttons to provide additional functionalities. A 9 http://w3id.org/prohow/r1/interfaces user interested in starting task :t can click on a button next to description of the task to publish the following information on the web: :en prohow:has_goal :t . This triple declares a new attempt (or environment) :en to accomplish task :t. This environment :en can be dereferenced to another html document where the user can tick off the steps that have been completed, similarly to a to-do check-list. If the user checks the first step :t1, the interface will publish the following triples online: :ex prohow:has_environment :en . :ex prohow:has_task :t1 . :ex prohow:has_result prohow:complete . This triples state that, within environment :en, the execution :ex of task :t1 was successful. This information could then be used by a machine agent to infer that step :t2 is ready to be executed, and consequently attempt to accomplish its method :t5. 4.3 Knowledge Update and Coordination An agent wanting to communicate a set of triples to the other agents does so by publishing them in a repository which belongs to the collective knowledge. It will be the other agents responsibility to retrieve these triples and to consider them in their reasoning process. This approach is straightforward in case the decision to upload certain triples is independent on the rest of the collective knowledge. However, coordination typically requires inferences based on the whole collective knowledge, and changes to the collective knowledge could potentially invalidate such inferences. For example, an agent’s decision to complete a task might be overridden by the knowledge that the task has already been completed. In Linked Data, changes to a knowledge base could be seen in terms of ad- ditions and deletions of triples. The problem of unexpected deletions of triples can be avoided by choosing a collaboration model that does not require triple deletion. The chosen collaboration model, prohow, is based on a monotonic increase of knowledge and no facts need to be retracted. To avoid the problem of unexpected triples additions, agents should write potentially conflicting state- ments as “candidate” additions which are confirmed on a first-come first-served basis only after verifying that no other conflicting statement exists. Candidate sets of triples can be written using rdf reification.10 Reification provides a unique identifier for each triple, which can be used to attach meta- information about the triple such as the timestamp at which the reified triple was created. Moreover, a reified triple does not entail the triple. This gives agents the flexibility to limit their reasoning only to confirmed facts or to consider also the candidate statements that other agents intend to assert. Going back to the example presented in Section 3, we can imagine both John and a machine agent a to be capable of completing step :t2, “The report is 10 http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#Reif t2: search LD2 t3: search LD3 t5: check conflicts t6: check conflicts LD 2 in LD2 and [t2, t4] in LD3 and [t3, t4] LD3 t4: add serialised t1: search LD1 LD1 update x in LD1 t7: if no conflicts, add x in LD1 Fig. 3. A simple coordination mechanism for distributed datasets. uploaded to an online repository”. In this scenario, it is desirable that only one of the agents accomplishes task :t2. This problem can be solved by coordina- tion, and a simple way of implementing it is on a first-come-first-served basis as depicted in Figure 3. After retrieving the collective knowledge (timestamps t1 to t3 ) and discovering that task :t2 needs to be completed, John publishes on his individual repository LD1 the intention to complete it using timestamped reified statements at time t4 . However, agent a might have also published its intention to complete task :t2 on repository LD3 at an earlier time tx (t3 < tx < t4 ). Before considering his intention as final, John’s interface will access the other agents repositories to verify whether new conflicting statement have been added (timestamps t5 and t6 ), whether reified or not. When accessing resource LD3 at time t6 , John’s interface will discover a’s intention to accomplish task :t2. Since a’s statement has been created before John’s one, the interface will warn John that task :t2 is already in the process of being automated and he will be asked not to complete it. Agent a, instead, not having found any conflicting statements published before its own, will publish its intention to complete :t2 in a non-reified format and proceed in accomplishing the task. 5 Complexity Analysis Agents behaviour described in Section 4 can be seen as a loop where iterations are repeated with a certain frequency. We define frequency f as the number of times the iteration is repeated within a fixed time interval. During each iteration, agents need to retrieve the collective knowledge locally to reason over it. This process of retrieving data locally might need to be repeated a certain number of times k. The coordination algorithm described in Section 4.3 requires retrieving the data twice (k = 2) per iteration. Within a set of agents A, the average amount of data that an agent a needs to retrieve to access the collective knowledge is dA (|A| − 1), where dA is the average number of triples in the repositories of the agents in A, and |A| is the number of agents in A. Therefore, within a fixed time interval, the average amount of triples an agent has to retrieve from the web is f kdA (|A| − 1). An agent collaborating with multiple sets of agents Ā only needs to retrieve the resources of each different agent once. S Agents can merge data from each distinct agent it collaborates with (Ȧ = A∈Ā A) and use it to collaborate with all the other sets of agents in Ā. Using this approach, the average number of triples an agent would need to retrieve is: f kdȦ (|Ȧ| − 1) (1) From this formula it can be observed that the required web traffic increases lin- early with the frequency at which the agent checks and computes updates f and the number of times the same resources need to be accessed per iteration k. In a concrete implementation of the system, both k and f can be considered constant. The remaining variables determining the complexity are therefore the average number of triples in the knowledge bases of all the other collaborative agents (dȦ ) and the absolute number of those agents |Ȧ|. The web traffic generated by a single agent can then be considered as having complexity O(dȦ ∗ |Ȧ|). The space and computational complexity of the implementation of an agent strongly depends on the number of triples that needs to be stored and processed at each iteration. This number equals to all the triples stored by all the agents that are being interacted with: dȦ |Ȧ|, and it is therefore comparable to the web traffic complexity. The space and time complexity is also affected by other factors which depend on the specific implementation of an agent. These factors are the number and type of sparql queries that needs to be executed, and the efficiency of the particular rdf storage and sparql engine used. From this analysis we can observe that the complexity of each agent can be reduced by having agents collaborate with fewer other agents at a time and by having them reason over smaller datasets. It should be noted that, thanks to decentralisation, this complexity does not depend on the total number of agents in the system nor on the total number of triples involved. This complexity can then be kept constant as the overall number of agents and knowledge in the system increases. This analysis suggests that this framework could be applied to a large number of lightweight agents, namely agents that interact with a small number of other agents and that do not publish a large volume of data. 6 Related Work With respect to human-machine collaboration, the main characteristics that dis- tinguish the proposed framework from existing approaches are two. Firstly, the control given to the users on the whole computation allows them to define how tasks should be accomplished. Secondly, automation does not rely on any central system and it can happen opportunistically. In other words, in different environ- ments, or depending on agent availability, the same task could be automated to a lesser or greater degree. In the field of Human Computation (HC) [7], hu- man and machine efforts are combined to solve tasks that neither humans nor machines alone could solve efficiently. However in typical HC systems, such as Galaxy Zoo,11 humans play a subordinate role, and their collaboration relies on an existing centralised software system. In Human-Provided Services [9], users can define and advertise the services they want to offer, although they are not 11 http://www.galaxyzoo.org/ in charge of the overall computation. Related to HC are the fields of Social Ma- chines (SM) and Social Computation (SC) [10]. While HC systems might not have a social dimension, SM and SC focus on how to support social interactions between users. LSCitter [3], for example, is a system supporting collaborative tasks, such as organising a meal or sharing a taxi. While these collaborations are initiated by the users, they must currently follow pre-defined protocols. The proposed approach also shares similarities with blackboard systems [1], where a central knowledge base is used by multiple agents to collaborate, but it differs for two main reasons. First of all, in the proposed approach the shared environment, or collective knowledge, is fragmented into multiple resources which cannot be accessed simultaneously. A perfectly updated view of the collective knowledge is therefore impossible, as during the time required by an agent to retrieve a single resource, all the other resources might have potentially changed. Secondly, agents can only modify their own resources and no resource can be modified by more than one agent. This means that agents cannot agree on a single resource to be used for coordination. The indirect collaboration mechanism of the proposed framework can also be seen as a form of stigmergy. In the context of multi-agent systems the term stigmergy traditionally refers to complex collaborations emerging from simple agents by indirect interactions mediated through an environment. In the case of rational agents, the concept of cognitive stigmergy has been proposed [8]. Given the frequent updates to the collective knowledge of all the agents, we choose to query the distributed Linked Data resources using a materialisation- based approach. Given no assumptions on which resources are likely to be up- dated, and how frequently, this approach guarantees that any update to the collective knowledge will propagate to all agents after at most one iteration. Under different assumptions, other distributed query strategies could be more efficient. One class of approaches assumes that query processing capabilities, such as sparql endpoints, are available remotely [6]. Under this assumption, queries could be split into several sub-queries that will be evaluated remotely against the remote sources. The other class of approaches, to which materialisation-based approaches belong to, does not make this assumption and requires remote re- sources to be retrieved locally before they can be queried. This process can be optimised by creating local indexes of the remote resources based on their schema [11], based on the uris they contain, or based on both of those properties [12]. This information is used to decide which resources to access when computing a query, potentially avoiding the retrieval of irrelevant resources. 7 Conclusion This paper proposed a novel framework for decentralised human-machine col- laboration. To avoid the complexity of direct interactions between distributed human and machine agents, Linked Data is used as an indirect communica- tion mechanism. The problem of coordination is therefore translated into a knowledge-sharing problem, where the only requirement for agent participation is the ability to retrieve and publish Linked Data. Linked Data used for collabo- ration is frequently updated and therefore agents need to constantly retrieve the most updated version. The necessity to regularly retrieve distributed resources imposes practical constraints on their size which results in Linked Data being di- vided into small and self-contained resources. This framework can make efficient use of uri dereferencing to discover Linked Data resources at runtime thanks to the small size of those resources and the necessity to retrieve them frequently. To query distributed Linked Data, materialisation-based approaches are chosen over index-based ones due to the high frequency of updates and the need to compute exact answers. An algorithm is proposed to coordinate the potentially concurrent updates of distributed resources by multiple agents. A complexity analysis of this system shows that this framework can scale to a large number of agents as long as the resources shared by the agents are kept small in size, and agents collaborate only with a limited number of other agents at a time. References 1. D. D. Corkill. Blackboard Systems. AI Expert, 6(9):40–47, 1991. 2. D. Keil and D. Goldin. Indirect Interaction in Environments for Multi-agent Sys- tems, pages 68–87. 2006. 3. D. Murray-Rust and D. Robertson. LSCitter: Building Social Machines by Aug- menting Existing Social Networks with Interaction Models. In Proc. of the 23rd Int. Conf. on World Wide Web, WWW ’14 Companion, pages 875–880, 2014. 4. P. Pareti, E. Klein, and A. Barker. Linking Data, Services and Human Know-How. In The Semantic Web. Latest Advances and New Domains, volume 9678 of LNCS, pages 505–520. 2016. 5. P. Pareti, B. Testu, R. Ichise, E. Klein, and A. Barker. Integrating Know-How into the Linked Data Cloud. In Knowledge Engineering and Knowledge Management, volume 8876 of LNCS, pages 385–396. 2014. 6. B. Quilitz and U. Leser. Querying Distributed RDF Data Sources with SPARQL. In The Semantic Web: Research and Applications, volume 5021 of LNCS, pages 524–538. 2008. 7. A. J. Quinn and B. B. Bederson. Human Computation: A Survey and Taxonomy of a Growing Field. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1403–1412, 2011. 8. A. Ricci, A. Omicini, M. Viroli, L. Gardelli, and E. Oliva. Cognitive Stigmergy: Towards a Framework Based on Agents and Artifacts. In Environments for Multi- Agent Systems III, volume 4389 of LNCS, pages 124–140. 2007. 9. D. Schall. Service-Oriented Crowdsourcing: Architecture, Protocols and Algorithms, chapter Human-Provided Services, pages 31–58. 2012. 10. N. R. Shadbolt, D. A. Smith, E. Simperl, M. Van Kleek, Y. Yang, and W. Hall. Towards a Classification Framework for Social Machines. In Proc. of the 22nd Int. Conf. on World Wide Web, WWW ’13 Companion, pages 905–912, 2013. 11. H. Stuckenschmidt, R. Vdovjak, J. Broekstra, and G. Houben. Towards Distributed Processing of RDF Path Queries. International Journal of Web Engineering and Technology, 2(2/3):207–230, 2005. 12. J. Umbrich, K. Hose, M. Karnstedt, A. Harth, and A. Polleres. Comparing data summaries for processing live queries over Linked Data. World Wide Web, 14(5):495–544, 2011.