=Paper=
{{Paper
|id=Vol-2975/paper3
|storemode=property
|title=Towards Traceability in Data Ecosystems using a Bill of Materials Model
|pdfUrl=https://ceur-ws.org/Vol-2975/paper3.pdf
|volume=Vol-2975
|authors=Iain Barclay,Alun Preece,Ian Taylor,Dinesh Verma
}}
==Towards Traceability in Data Ecosystems using a Bill of Materials Model==
11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019 Towards Traceability in Data Ecosystems using a Bill of Materials Model Iain Barclay, Alun Preece, Ian Taylor Dinesh Verma Crime and Security Research Institute, IBM TJ Watson Research Center, Cardiff University, 1110 Kitchawan Road, Cardiff, UK Yorktown Heights, Email: BarclayIS@cardiff.ac.uk NY 10598, USA Abstract—Researchers and scientists use aggregations of data component parts is well-established, has informed the design from a diverse combination of sources, including partners, open and development of a gateway which enables data ecosystems data providers and commercial data suppliers. As the complexity to be described in terms of sub-assemblies of their constituent of such data ecosystems increases, and in turn leads to the generation of new reusable assets, it becomes ever more difficult data components and supporting artifacts, in a Bill of Materials to track data usage, and to maintain a clear view on where data (BoM) format. Artifacts in a BoM might include data licenses, in a system has originated and makes onward contributions. software descriptions and versions, and lists of staff or other Reliable traceability on data usage is needed for accountability, human resources involved in producing the outputs. When the both in demonstrating the right to use data, and having assurance system described by the BoM is run, the BoM is instantiated, that the data is as it is claimed to be. Society is demanding more accountability in data-driven and artificial intelligence systems queried for the locations of data sources and populated with deployed and used commercially and in the public sector. This any dynamic values for the data or artifacts of each run, paper introduces the conceptual design of a model for data generating a Bill of Lots (BoL). The BoM and BoL together traceability based on a Bill of Materials scheme, widely used provide a record of the static and dynamic elements of the for supply chain traceability in manufacturing industries, and system for an invocation at a particular point in time. This presents details of the architecture and implementation of a gateway built upon the model. Use of the gateway is illustrated allows for later inspection of the data and the supporting through a case study, which demonstrates how data and artifacts environment, and provides a means for scientists to trace used in an experiment would be defined and instantiated to data and artifact usage through and across experiments - for achieve the desired traceability goals, and how blockchain tech- example, identifying all uses of a particular IOT sensor, all nology can facilitate accurate recordings of transactions between runs using a particular version of a machine learning model, contributors. or all uses of data generated by a particular researcher. I. I NTRODUCTION Scientists and researchers increasingly assemble and use A pilot gateway, dataBoM, has been developed to allow rich data ecosystems[1] in their experimentation. As these scientists to describe data ecosystem as a Bill of Materi- ecosystems expand in capability and leverage data from a als, containing pipelines of assemblies detailing sets of data diverse combination of internal sources, partners and third sources and artifacts, and to instantiate the BoM into a BoL party data suppliers, it is becoming necessary for users and for each run of an experiment. The dataBoM gateway has curators of data to have reliable traceability on its origins and been developed using GraphQL[3], which facilitates the rapid uses. This can be important to provide accountability[2], such development of cross-platform applications and web services as proving ownership or legitimate usage of the source data, which scientists can use to generate and query BoMs and as well as being able to identify quality or supply problems populate and store BoL records. Integration of the dataBoM and alert users to problems or to seek redress when things go gateway with blockchain or distributed ledger technologies awry. can provide dynamic behaviour in data acquisition, as well Using a gateway to provide traceability on data used within as providing a permanent audit trail of both the data used and experiments offers mechanisms for demonstrating where data its supporting environment. and assets derived from the data are used, as well as aiding understanding where data contributing to a system has come The remainder of this paper is structured as follows: Sec- from. By coupling the traceability trail with distributed ledger tion II discusses the context in which the BoM model for or blockchain technology, it is possible to provide a distributed data ecosystem traceability has been derived; the architecture store that can record digital data or events in a way that makes and implementation of the dataBoM gateway is discussed in them immutable, non-repudiable and identifiable, thereby lead- Section III, with Section IV describing a case study illustrating ing to a trustworthy record of fact. how a scientist could use the pilot gateway to conduct research Research into manufacturing, agricultural and food indus- using data from several sources to identify traffic congestion. tries, where the need for traceability of products and their Section V considers areas for future work. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019 II. R EQUIREMENTS in software applications. The intent is to give visibility on In manufacturing industries it has been standard practice the underlying components used in software applications and since the late twentieth century to track product through the processes such that vulnerable out-of-date modules can easily life-cycle from its origin as raw materials, through component be identified and replaced. Tools such as CycloneDX2 , SPDX3 , assembly to finished goods in a store, with the relation- and SWID4 are defining formats for identifying and tracking ships and information flows between suppliers and customers such sub-components. recorded and tracked using supply chain management (SCM) As well as the data used and efforts through standards[10] processes[4]. In agri-food industries, traceability through the and research made to secure its provenance in workflows[11], supply chain is necessary to give visibility from a product there are many supporting assets which can be considered use- on a supermarket shelf, back to the farm and to the batch of ful supplementary information when recording the characteris- foodstuff, as well as to other products in which the same batch tics of a data ecosystem, which Singh, Cobbe and Norval[12] has been used. have described as providing decision provenance. Hind, et Describing data ecosystems in terms of the data supply al, describe a document based on a Supplier’s Declaration of chain provides a mechanism to identify data sources and the Conformity[13] as a suitable vehicle for providing an overview assets which contribute to the development of the data com- of an AI system, detailing the purpose, performance, safety, ponents, or which are produced as the results of intermediate security, and provenance characteristics of the overall system. processes. As new assets are created and used in other systems At the component level, Gebru et al explore the benefits of - perhaps by other parties - the supply chain mapping can be developing and maintaining Datasheets for Data[14], which extended to give traceability on the extended data ecosystem. replicates the specification documents that often accompany A definition for traceability is provided by Opara[5], as physical components, and Mitchell et al propose a document ”the collection, documentation, maintenance, and application format for AI model specifications and benchmarks[15]. Schel- of information related to all processes in the supply chain ter, Böse, Kirschnick, Klein and Seufert[16] describe a system in a manner that provides guarantee to the consumer and to automatically document the parameters of machine learning other stakeholders on the origin, location and life history of a experiments by extracting and archiving the metadata from product as well as assisting in crises management in the event the model generation process, which would be appropriate of a safety and quality breach.” information to store alongside the data used in a system. Further helpful terminology is provided by Kelepouris, Members of the scientific community are familiar with Pramatari and Doukidis[6] when discussing the traceability of the use of workflow systems, such as Node-RED[17] and information in terms of the direction of analysis of the supply Pegasus WMS[18], to define and execute the processes for chain. Petroff and Hill[7] define Tracing as the ability to work their experiments. The BoM model proposed herein is intended backwards from any point in the supply chain to find the origin to augment a workflow by providing a means to add contextual of a product (i.e., ‘where-from’ relationships) and Tracking traceability as the workflow progresses, such that it can be as the ability to work forwards, finding products made up archived, and the supporting conditions retrieved and inspected of given constituents (i.e., ‘where-used’ relationships). Thus, later. Workflow blocks typically describe a job or a service, an effective traceability solution should support both tracing and do not allow other contributing artifacts to be described. and tracking; providing effectiveness in one direction does not The proposed BoM model describes a rich set of information necessary deliver effectiveness in the other[6]. per node, which can better represent the data supply chain and Jansen-Vullers, van Dorp, and Beulens[8] and van Dorp[9] associated documents and payloads that are contained at each discuss the composition of products in terms of a Bill of stage. By maintaining a BoM model alongside a workflow, Materials (BoM) and a Bill of Lots (BoL). The BoM is the researchers can populate and capture a record of the data for list of types of component needed to make a finished item of each run, as well as the supporting artifacts for each run, a certain type, whereas the BoL lists the actual components giving traceability of the data and the circumstances in which used to create an instance of the item. In other words, the it was obtained and used. In practical terms, a function could BoM might specify a sub-assembly to be used, and the BoL be written to populate the BoL with dynamic data, and invoked would identify which exact batch the sub-assembly used in at appropriate points in the workflow. the building of a particular instance of a product was part of. Distributed ledger technologies, such as those afforded by Furthermore, a BoM can be multi-level, wherein components blockchain platforms[19], [20], provide a means of recording can be used to create sub-assemblies which are subsequently information and transactions between parties who do not have used in several different product types. formal trust relationships[21], such as inter-organisational or The notion of using a BoM to identify and record compo- commercial data sharing entities. The design of a blockchain nent parts of assets in an IT context is already established, system ensures that data written cannot be changed, pro- with US Department of Commerce working on the NTIA viding a level of immutability and non-repudiation which Software Component Transparency initiative to provide a stan- dardised Software BoM1 format to detail the sub-components 2 https://cyclonedx.org 3 https://spdx.org” 1 https://www.ntia.doc.gov/SoftwareTransparency 4 https://www.iso.org/standard/65666.html 2 11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019 is well suited to keeping an auditable record of events and transactions which occur between parties. Furthermore, the use of a public blockchain platform, such as the Ethereum Project[20], provides an archival resource which remains in existence long after the resources of a project have been retired. State-of-the-art blockchain platforms, including the Ethereum Project, allow for the deployment of so-called smart contracts, which can be considered to be “autonomous agents stored in the blockchain, encoded as part of a creation transaction that introduces a contract to the blockchain”[22]. Such smart contracts enable blockchain platforms to facilitate non-repudiable dynamic behaviours alongside their immutable storage capabilities. Fig. 1. Assemblies can be chained in a BoM III. A DATA T RACEABILITY G ATEWAY In this section the design and implementation of dataBoM, assembly) could be a labelled data set. In the second assembly, a gateway capable of supporting levels of tracking and tracing Artifact 2 would be relevant to the model training process, for appropriate for providing traceability in multi-party decen- example the parameters used in training. The output artifact, tralised data ecosystems, is described. The solution uses a Artifact 2’, would be the trained model. Note that both the model based on a Bill of Materials scheme, where data and intermediate output, Data 1’ and the final output, Artifact 2’ supporting materials are treated as constituent components of could be further used as inputs by other processes and specified a deployed system, which is instantiated into a unique Bill of as inputs to subsequent assemblies. Lots each time the deployment is run. The BoM defines a map of the structure of the system by providing a record of the connections between the as- A. Conceptual Model semblies, and provides a framework to enumerate a system’s The dataBoM gateway employs a BoM model, such that data sources and artifacts as well as any static data that each experiment utilising the system is described in terms applies to the contained data sources or artifacts. This static of its data supply chain. The BoM consists of a collection information could include a location for access to the data, of assemblies, with each assembly being an aggregation of for example, a Digital Object Identifier (DOI) or an API contributing input components and an output component. URL, and metadata specifying acceptable data threshold levels An assembly will typically have at least one data input, or response requirements for active quality of service (QoS) and can produce new data as its output. Data output from monitoring. one assembly can be used as a data input in a subsequent Each time the process described by the BoM is run, the assembly within the current BoM, or used in other systems application code for the process will instantiate a new BoL by being referenced in their BoM. To reflect this, data inputs for the given BoM. In order to provide on-going traceability, and outputs are defined as data sources. a shadow data item is created for each data source and artifact Assemblies can also contain artifacts, which are pertinent in the BoM when it is instantiated in a BoL. The shadow software components, ML models, and documentation such items in the BoL are used to maintain a record of the dynamic as licenses, staff lists, policy documentation, etc.. Including elements of each run. artifacts in assemblies in the BoM definition ensures that each By storing and then later referencing the assemblies, data BoL retains a full record of its heritage and dependencies. sources and artifacts in a BoM, and all the instantiations of the An assembly can produce a new artifact as its output; for BoM in each BoL, along with the shadow data, it is possible example, an assembly which described the training of an AI to derive an overview of the history of the data lifecycle of the model would produce the trained model as its output. The system, such that any item can be traced back to its origins trained model would then be considered an artifact, which or tracked forward to find all its consumers. could be used as an input to other assemblies. One of the roles for the data source elements specified Figure 1 shows two assemblies that are chained to produce in the BoM is to store the means to access the data when a data component (Data 1’) and an artifact (Artifact 2’) the experiment is run. In many cases this will be via a url as outputs. Such a BoM could be used by a scientist to parameterised dynamically at runtime - the static entities of the describe a simple AI model training process containing two url could be stored in the data source as part of the BoM, with assemblies. Assembly 1 represents the data labelling process, the dynamic parameters and the results stored in the shadow and Assembly 2 the model training process. Data 1 is an input data item of the BoL. The intent of the design is that there is data source, which could be training data. Artifact 1 might be flexibility of type, so any metadata could be stored in the BoM a roster of the staff employed to label the data, and the central and retrieved and interpreted in the application process. Uses data source, Data 1’ (which, as illustrated, is both the output of of this metadata could include storing encrypted information, the data labelling assembly and the input to the model training which is unencrypted and subsequently used by the client 3 11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019 application. Further, the metadata could include information The gateway server is written in Node.js7 , using Apollo to initiate an asynchronous data request and an endpoint to GraphQL Server8 , which acts as an abstraction layer above which the data should be delivered. The intent is to provide a the gateway’s Mongo DB database store. flexible storage slot for static data about the data, which can GraphQL allows developers to specify a data schema, and be retrieved, interpreted and used by the client application. define queries and mutations, which are interfaces to allow Experimentally, it has been possible to use the dataBoM reading and writing of the data, respectively. The GraphQL gateway pilot to store and retrieve an encoded blockchain data schema, queries and mutations are public interfaces, contract address and function interface from a data source, which hide the details of the underlying data storage from and use this information to initiate a blockchain transaction users of the interfaces. The server’s data store does not have from the client application to retrieve data at runtime. Such a to match the GraphQL schema, as the server code which transaction could be used to provide immutable proof of a data implements the queries and mutations performs the mapping request, or for gateway users to have a means to access third- to read and write the correct data to its database. GraphQL party data on a pay-per-use basis, which is discussed further is intended to provide an efficient transfer of data between in Section VI. client and server, as queries can be written to request only the data needed. Furthermore, the gateway’s API can be B. The dataBoM Gateway enhanced by extending the queries and mutations offered, without implications for existing users. The dataBoM gateway provides a working implementation The GraphQL interface is self-documenting, and can be of the conceptual data ecosystem BoM model[23] and enables queried by client application developers to find out the data researchers to declare BoMs to describe the data components structures and queries and mutations available to them. of their experiments, and instantiate BoLs to preserve contex- The dataBoM gateway offers access to its GraphQL server tual records for each run to provide traceability. via an https end-point for API access. C. Integration with Client Applications To take advantage of the traceability capabilities provided by the dataBoM gateway, scientists should use the supplied API to define a BoM for their experiments, detailing the as- semblies, data sources and artifacts required in their processes, passing the desired parameters and retaining the identifiers which are returned by the API calls in order to chain entities together - for example, when creating a data source item, the identifier that is returned should be retained so that it can be Fig. 2. The dataBoM Pilot Gateway used as a parameter when creating an assembly. Once the BoM is defined, the researcher should instantiate The architecture of the dataBoM gateway is shown in the BoM whenever they run their experiment, and then use Figure 2. The gateway is to be offered as a web service, with the API from their application code to query the experiment’s interactions between the gateway and researchers conducted BoM for static factors such as the locations of data assets, with through a web interface or via an API. any dynamic state arising during experimentation (eg. data The pilot version of the dataBoM gateway stores data in values) being written to the BoL via the API as the experiment a MongoDB5 database, such that queries can be written to progresses. provide traceability on data sourcing and data use for any Use of the API requires the researcher to integrate a BoM. Further development of the gateway will explore the GraphQL client library with their application code or workflow off-loading of the archival of the BoMs and BoLs to commons- scripts, and support is available for popular web and mobile based decentralised storage, such as IPFS[24], with indexing platforms, including Python, Node.js, iOS and Android. secured on a public ledger or blockchain. This will serve The steps in the integration would typically include: to preserve records beyond the lifetime of the gateway, and • Define data sources, artifacts and assemblies in BoM provide an immutable record of events, suitable for later audit • Use BoM’s ID to instantiate a new BoL for a new run or inspection. • Access data source metadata for data location or endpoint The dataBoM gateway is initially hosted on an intranet, • On receipt of data, populate data source shadow in BoL and it is envisaged that future versions of the gateway will In this way, the BoM and the BoL can combine to gen- be migrated to public facing web services, or serverless[25] erate an evidence trail of the dynamic data values and the environments, such as AWS Appsync6 , to provide a robust and static components of the data and supporting artifacts which reliable service. contributed to each run of an experiment. 5 https://www.mongodb.com 7 https://nodejs.org/en/ 6 https://aws.amazon.com/appsync/ 8 https://www.apollographql.com/docs/apollo-server/ 4 11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019 Section IV, below, describes a case study implementation, to In the application code for the experiment, the BoM should provide further insight and explanation of dataBoM integration be instantiated via its identifier to generate a new BoL for and usage. the run. As the code runs, it should refer to its BoM (via the instantiated BoL) to get locations for data it needs to access, IV. C ASE S TUDY and write any dynamic information to its BoL for permanent By way of illustration of the use of the dataBoM gateway, archival. consider a simple software application which serves to provide In the HPC congestion scoring example, the data source for a ‘traffic congestion score’ for a fixed location, e.g., Hyde the traffic scene holds a static URL for a live camera. The Park Corner, depending on how much traffic the application scientist’s code would retrieve this information through the determines is currently at the location. This simple process has dataBoM API and access the photo, and (if desired) store a a single assembly, Traffic Scene Analysis, an input data source permanent copy of the photo to its own archives, writing a Location Photo, an ML model artifact Congestion Model and reference to the location of the archived copy to the shadow an output data source Congestion Score (Figure 3). data item, such that it will be saved as part of the archival of the BoL. The resultant congestion score should also be written to the BoL, by referencing the appropriate data source item. Thus, each data source and artifact in every BoL would have any dynamic values recorded and stored in a database as a persistent record of the run, so that each of the Assemblies in the BoL would have traceable input and output data values which could be accessed at a later date. V. D ISCUSSION There are a number of interesting directions in which future development of the dataBoM gateway could be taken. Interac- tion with the gateway is currently provided by a GraphQL Fig. 3. The components of a simple traffic congestion system API, which provides good integration with the application code at runtime, however, initial definition of the BoM and In defining the BoM for the Hyde Park Corner (HPC) con- its elements would be more intuitive if it were faciliated gestion rating process, the scientist should give each element a through a visual UI. Thus, the BoM could be authored using a name and an optional description, and declare static elements, visual interface via a web browser, with the runtime invocation such as the URL to be used to retrieve a live photo from and interaction with the BoL remaining an API-driven task. the location of interest. Encoding this simple single assembly There is a similar opportunity to add a visual interface to the process as a BoM through the gateways’s API gives a data overview of each experiment logged by the gateway. Such an model as shown in Listing 1, which is the result of a GraphQL interface would provide a means to explore the composition query on the BoM’s entry. of the data and artifact elements of each experiment, and help ”bom”: { to satisfy the traceability goals of the gateway, by providing ”name”: ”HPC Congestion”, ”description”: ”Determine congestion levels on Hyde Park Corner”, a convenient means of exploring the nodes in the BoM and ”assemblies”: [ each BoL. { Integration of the dataBoM gateway with the workflow man- ”name”: ”Traffic Scene Analysis”, ”description”: ”Determine congestion at Hyde Park Corner”, ager systems that are popular in the research community will ”inputData”: [ facilitate smoother integration of the gateway into experiment { workflows, and help to foster acceptance of the benefits of ”name”: ”Traffic Scene”, ”dataAccess”: ”https://xyz.com/00001.06514.jpg” the BoM model in providing traceability in scientific data- } ecosystems. ], There is scope to extend and deepen the integration of ”outputData”: [ { the gateway and its BoM and BoL models with blockchain ”name”: ”Result” technologies, such as the programmable smart contracts pro- } vided by the Ethereum blockchain platform. By associating ], ”inputArtifacts”: [ smart contracts with the data sources and artifacts from the { BoM model, novel dynamic behaviour in data ecosystems can ”name”: ”Congestion Model” be explored. Such dynamic behaviours might include runtime } ] selection of the most appropriate data source sets, along with } automatic remuneration and sanctioning, based on dynamic ] measures of data quality. Further development of the dataBoM } gateway could provide a means by which scientists are able Listing 1. GraphQL data schema for HPC Congestion BoM to share data and artifacts with their peers, and a blockchain 5 11th International Workshop on Science Gateways (IWSG 2019), 12-14 June 2019 platform might underpin this. Related to blockchain integration [5] L. U. Opara, “Traceability in agriculture and food supply chain: a review is motivation to explore traceability on the human side of the of basic concepts, technological implications, and future prospects,” Journal of Food Agriculture and Environment, vol. 1, pp. 101–106, 2003. experimental process, using Decentralised Identifiers9 (DIDs) [6] T. Kelepouris, K. Pramatari, and G. Doukidis, “Rfid-enabled traceability to associate researchers or crowd-workers with components of in the food supply chain,” Industrial Management & data systems, vol. the system and to provide a means to trace their activity and 107, no. 2, pp. 183–200, 2007. [7] J. N. Petroff and A. V. Hill, “A framework for the design of lot-tracing the data and artifacts they are associated with. systems for the 1990s,” Production and Inventory Management Journal, vol. 32, no. 2, p. 55, 1991. VI. C ONCLUSION [8] M. H. Jansen-Vullers, C. A. van Dorp, and A. J. Beulens, “Managing traceability information in manufacture,” International journal of infor- The dataBoM gateway provides scientists and developers mation management, vol. 23, no. 5, pp. 395–413, 2003. with a means to map the overall structure of the compo- [9] C. Van Dorp, “A traceability application based on gozinto graphs,” in Proceedings of EFITA 2003 Conference, 2003, pp. 280–285. nents that make up complex data ecosystems used in their [10] P. Missier, K. Belhajjame, and J. Cheney, “The w3c prov family of experiments. By going beyond the data, and considering other specifications for modelling provenance metadata,” in Proceedings of contributing factors such as the software and hardware which the 16th International Conference on Extending Database Technology. ACM, 2013, pp. 773–776. produces or manages the data, licenses which govern the use [11] S. B. Davidson and J. Freire, “Provenance and scientific workflows: and sharing of the data, and policies which contributed to the challenges and opportunities,” in Proceedings of the 2008 ACM SIG- generation of the data, the development of a BoM for each MOD international conference on Management of data. ACM, 2008, pp. 1345–1350. system provides a mechanism to archive the ecosystem for [12] J. Singh, J. Cobbe, and C. Norval, “Decision provenance: Harnessing each experiment. Instantiating the BoM into a BoL each time data flow for accountable systems,” IEEE Access, vol. 7, pp. 6562–6574, the system runs augments the static parts list with a dynamic 2019. [13] M. Hind, S. Mehta, A. Mojsilovic, R. Nair, K. N. Ramamurthy, and traceable view into every invocation of the system, such A. Olteanu, and K. R. Varshney, “Increasing trust in ai services through that the data inputs, data outputs and any artifacts which supplier’s declarations of conformity,” arXiv preprint arXiv:1808.07261, are used or produced by the system can be archived, readily 2018. [Online]. Available: https://arxiv.org/pdf/1808.07261.pdf [14] T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. Wallach, identified and traced back to their source. Similarly, future H. Daumeé III, and K. Crawford, “Datasheets for datasets,” users of produced data and artifacts, such as models, can be arXiv preprint arXiv:1803.09010, 2018. [Online]. Available: https: identified, which could prove to be very important if errors //arxiv.org/abs/1803.09010 [15] M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchin- are later found and are notifiable. Storing metadata capable of son, E. Spitzer, I. D. Raji, and T. Gebru, “Model cards for model identifying smart contracts on the blockchain further enables reporting,” in Proceedings of the Conference on Fairness, Accountability, immutable recording of the action and timing of requests for and Transparency. ACM, 2019, pp. 220–229. [16] S. Schelter, J.-H. Böse, J. Kirschnick, T. Klein, and S. Seufert, “Au- data provision, along with the potential for encoding quality tomatically tracking metadata and provenance of machine learning of service requirements, and providing automatic payment for experiments,” in Machine Learning Systems Workshop at NIPS, 2017. services. [17] “Node-red: Flow-based programming for the internet of things.” [Online]. Available: https://nodered.org/ [18] E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J. Maechling, ACKNOWLEDGMENT R. Mayani, W. Chen, R. F. Da Silva, M. Livny et al., “Pegasus, a work- flow management system for science automation,” Future Generation This research was sponsored by the U.S. Army Research Computer Systems, vol. 46, pp. 17–35, 2015. Laboratory and the UK Ministry of Defence under Agreement [19] S. Nakamoto et al., “Bitcoin: A peer-to-peer electronic cash system,” Number W911NF-16-3-0001. The views and conclusions con- 2008. [20] G. Wood, “Ethereum: A secure decentralised generalised transaction tained in this document are those of the authors and should ledger,” Ethereum project yellow paper, vol. 151, pp. 1–32, 2014. not be interpreted as representing the official policies, either [21] D. Tapscott and A. Tapscott, “How blockchain will change organiza- expressed or implied, of the U.S. Army Research Laboratory, tions,” MIT Sloan Management Review, vol. 58, no. 2, p. 10, 2017. [22] L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making smart the U.S. Government, the UK Ministry of Defence or the UK contracts smarter,” in Proceedings of the 2016 ACM SIGSAC Conference Government. The U.S. and UK Governments are authorized on Computer and Communications Security. ACM, 2016, pp. 254–269. to reproduce and distribute reprints for Government purposes [23] I. Barclay, A. Preece, I. Taylor, and D. Verma, “A conceptual architecture for contractual data sharing in a decentralised environment,” arXiv notwithstanding any copyright notation hereon. preprint arXiv:1904.03045, 2019. [24] J. Benet, “Ipfs-content addressed, versioned, p2p file system,” arXiv R EFERENCES preprint arXiv:1407.3561, 2014. [25] E. Jonas, J. Schleier-Smith, V. Sreekanti, C.-C. Tsai, A. Khandelwal, [1] M. I. S. Oliveira, G. d. F. B. Lima, and B. F. Lóscio, “Investigations Q. Pu, V. Shankar, J. Carreira, K. Krauth, N. Yadwadkar et al., “Cloud into data ecosystems: a systematic mapping study,” Knowledge and programming simplified: A berkeley view on serverless computing,” Information Systems, pp. 1–42, 2019. arXiv preprint arXiv:1902.03383, 2019. [2] N. Diakopoulos, “Accountability in algorithmic decision making,” Com- munications of the ACM, vol. 59, no. 2, pp. 56–62, 2016. [3] L. Byron, “Graphql: A data query language.” [On- line]. Available: https://code.facebook.com/posts/1691455094417024/ graphql-a-data-query-language [4] D. M. Lambert, M. C. Cooper, and J. D. Pagh, “Supply chain manage- ment: implementation issues and research opportunities,” The interna- tional journal of logistics management, vol. 9, no. 2, pp. 1–20, 1998. 9 https://w3c-ccg.github.io/did-spec/ 6