A personal agent-based approach for API evolution Cristian Vasquez IDLab, Department of Electronics and Information Systems, Ghent University – imec, cristian.vasquezpaulus@ugent.be Abstract. Personal agents and APIs can enable personalized applica- tions that facilitate the daily processes that a person uses to gather, classify, persist and retrieve information in daily activities. General- purpose APIs will cover part of the functionality, but others need higher degrees of customization. This thesis focus on systems where to count with common models is difficult, and investigate a bottom-up approach where personal agents expose hypermedia APIs and copy, transform and combine API features from others, as a mechanism to improve their means of interaction. Keywords: agents, hypermedia, emergent semantics, semantic web ser- vice discovery, personal dataspace 1 Introduction We refer to a personal dataspace as the grouping of all kind of digital information that is gathered and stored by an individual over long periods of time, under that person’s control (but not necessarily exclusively so)[1]. In the last years, several decentralized social networks have been proposed [2], with applications can access a unified, user-controlled dataspace. In such a decentralized setup, one would want intelligent agents to assist users in their activities while having services tailored to their needs. Furthermore, the agents might interact with other agents through APIs, requiring (i) syntactical interoperability, i.e., establish the basic symbols to exchange a representation without losing structure or information, and (ii) semantic inter- operability, i.e., reducing interpretation gaps to trigger acceptable behaviors through interactions. Interoperability is traditionally achieved when peers can refer to a common reference model or ontology but become increasingly difficult in decentralized scenarios. The representations agents exchange, depends on how their schemes are aligned, but in some cases, direct mappings to common models are not available. In these setups, an agent would prefer to gather, infer or compose ad-hoc mappings during interaction with their peers, to interpret, transform or negotiate structured content. Also, in a growing ecosystem of applications that are separated from storage, we can expect a significant number of models changing over time. In this context, provenance is desired to help to interpret the original intended meaning in the long term. This work focuses on cases where to reach global interoperability is difficult, and investigate an approach where APIs are decomposed in features to be exchanged in a network utilizing clone, mutation, divergence and convergence mechanisms while persisting provenance trails of the API evolution. The utility of such approach is three-fold: (i) to promote API evolution through agents that collect API functionality from others, (ii) to generate useful artifacts for service discovery (iii) and to incrementally build mappings between the underlying models of each agent, to improve their means of interaction. 2 Relevancy Fig. 1. Personal agent exposing APIs generated by means of reasoning and resources in the web There is a growing interest in APIs that adapt to the data of a user, as they show significant improvements over fixed designs according to a specific user profile. One use case is health-care applications, where care plans need to be tailored to medical patient profiles and coordinated across several care providers. A centralized approach has drawbacks since is difficult to acquire and continuously update the user models required to provide sophisticated tailoring [3]. One approach is to have applications that consume APIs exposed by a personal agent with access to a personal dataspace and external sources. This was the case of the GPS4IntegratedCare1 project, which developed a Smart Workflow Engine to suggest and execute care paths tailored to complex medical patient profiles. One of the outputs was an agent designed according to the following requirements: (i) simplicity of a uniform interface (ii) visibility of features by other service agents (iii) portability of components by moving from program code to data. The result was a modular design that enables building APIs as a stock of building blocks or features to add to the agent dataspace. A feature is defined by patterns that define what RDF triples to use as data, which N3 documents to use as rules and a query. The approach extensively applies hypermedia notions to Semantic Web Services and will be further explored in this work. 3 Related work To create the proposed system I will use existing research from two fields, which I categorize as Hypermedia-driven APIs and Bottom-up Interoperability Elicitation. 3.1 Hypermedia-driven APIs Extensive research has been done regarding the discovery, composition, and orchestration of Semantic Web Services [4] which traditionally makes use of central registries where developers advertise their services. More recent and complementary approaches are the ones based in hypermedia. Hydra [5] is a lightweight vocabulary to create hypermedia-driven Web APIs, whereas RESTdesc [6] focus on capturing functionality of APIs to be used by agents. Discovery of linked data interfaces has been proposed in [7] employing hypermedia links and controls to facilitate source selection. 3.2 Bottom-up interoperability elicitation. Since this study focuses on cases where to reach global interoperability is difficult, of particular interest are the approaches that do not aim to reach it. One is the Frisco report [8] that considers constructivist’s notions as essential to building information systems, where shared knowledge can be seen here as the sum of all individual conceptions of the community, whereas personal knowledge is 1 https://www.imec-int.com/en/what-we-offer/research-portfolio/gps4integratedcare closer to the individual’s inner reflections. Taking this into consideration, we can say models need to co-evolve along with their communities of use, disclosing multiple perspectives. A community can engineer vocabularies and ontologies through distinct, strategies or ontology-engineering methodologies [9] that address the inherent difficulty of managing a dynamic artifact that reflects our gradual understanding of reality. How to characterize ontology change was studied through layered change operators [10]. The contrast of classical data integration techniques such as ontology matching [11] and bottom-up interoperability elicitation is studied in [12] and summarized in [13]. Of particular relevance is the bottom-up approach presented by Karl Aberer et al. [14], that aims to incrementally develop a global agreement in a network of peers gossiping queries while building semantic neighborhoods, a notion adopted in this approach. 4 Research question and Hypotheses The following questions (Q) and related hypothesis (H) investigate how hyper- media APIs can be applied in the context of personal agents interacting in the Web, and how can provenance trails be used for service discovery in a decen- tralized setup. This work refers to provenance trails as a record of how API features are copied, transformed and exchanged between agents and corresponds to sequences of operators that transform API features into others and can be collected by following links. – (Q1) Can agents expose resources and service descriptions from their under- lying dataspaces as Linked Data to be discoverable by others? – (Q2) How can trails be represented, so they are suitable to characterize API feature change? – (Q3) To what extent can information derived from provenance trails influence the relevance and precision of API feature search and discovery? – (H1) An agent can discover and clone services from other agents by relying solely on Linked Data principles. – (H2) It is possible to gather provenance trails and discover Semantic Web Services of high relevance without the need of a central service registry. 5 Preliminary results The model is used in the GPS4IC agent, which is part of the GPS4IntegratedCare2 project to provide RESTful APIs to be used by integrated healthcare applications. The agent is a nodeJS application that interprets JSON-LD3 files that contain 2 https://www.imec-int.com/en/what-we-offer/research-portfolio/gps4integratedcare 3 https://json-ld.org/spec/latest/json-ld sets of API features that use patient data, Domain-specific knowledge rules, and other external data sources to build at APIs at runtime for the health-care domain. The GPS4IC agent organizes data in private and public dataspaces, namely (1) patient dataspaces, (2) service definition dataspaces and (3) knowledge dataspaces, which are logical constructs that group resources in a directory structure for simple storage, navigation, and access. Within each directory node, one can define inference operations O with links to input data and queries. Data can be N3 files, responses from a SPARQL endpoint, invocations to other operations and so on. An API is generated at runtime and exposed at a path P using a set of O that resides in a path equivalent to P of the directory. Fig. 2. The current agent uses an interpreter of file-based service descriptions to generate a feature dependency graph that is used by a hypermedia controller to expose APIs matching a request All the core functionality such as generating dynamic workflows for the patient, detecting medication conflicts and domain-specific knowledge like the definition of diseases, etc. do not reside in the Agent. Functionality exists as resources that can be linked, cloned and shared using Change Operators such as: (i) add/remove inference feature (ii) import feature (iii) add/remove a resource. Operations often need definitions of resources sets which in this case, are matches of URI template patterns4 applied to the corresponding directory structure. Agents do not maintain explicit internal schemas but commit to (believe in) a set of health-care knowledge resources. 6 Approach This approach is framed in Web environments, specifically in the domain of Web APIs elicitation and intelligent agent support. 6.1 Feature-based From an API perspective, this work refers to agents that can play client and server roles indistinguishably, consuming and also exposing APIs to others. Each API consists of an interface implemented by components called features, a logical construct that can afford interaction or functionality to be re-used [15]. A feature can be exposed through Hypermedia to be discovered, copied, modified and so on. It can be self-contained, aggregated or linked to outputs of other features, for instance, a my contacts list feature can be used by a today birthdays list through filtering it, or a dynamic workflow can depend on features that expose execution steps. Having API interfaces separated from features give flexibility to agents, enabling them to choose the features that best match their models [7] or to perform modifications at implementation level to minimize the change of their API Interfaces. This approach requires features to be transferred between agents, although this can be done in several ways, I already had results with an agent design where service descriptors are interpreted to expose APIs by using only N3 rules, RDF data, and queries. In this way, the API functionality can be directly exchanged through the Web using simple dereferenciation or linking. API responses also ex- pose links to functional descriptions [6] of the underlying features, with metadata about dependencies such as knowledge bases, external API or other features. 6.2 Provenance trails Each time a feature changes within an agent or is cloned from another agent, new data is registered into what I call provenance trails, that refer to the history of feature change and exchange between agents. To characterize and keep track of feature change, I adopt an approach used in ontology engineering based on layered change operators [10]. In this way, a feature is a result of applying an operator to the previous one, enabling to represent 4 https://tools.ietf.org/html/rfc6570 provenance as chains of operations. Layers are used to support distinct degrees of granularity, categorizing change operators from atomic, i.e. adding a triple to high-level, i.e. add resource collection: photo gallery. The use of layered change operators allows characterizing feature change as a complete sequence of steps and makes easier to understand API evolution through inspection of sequences of high-level operators or their equivalent atomic ones. Other uses are to extract new high-level operators from atomic ones through pattern mining [16] or enabling feature discovery through computing similarity between their respective operator sequences [17]. I expect these trails to be valuable data to support bottom-up interoperability through data mining. For instance, if a transitive closure of same functionality links forms a cycle within a network of features, we can suspect that they can be marked for reconciliation through a single interface. 6.3 Mappings between personal dataspaces In the network, agents might have joined semantic communities and peers pro- viding similar contents (semantic neighborhoods) [18]. If two agents want to interact using features with models located in two disjoint neighborhoods, they might exchange their features and models and provide mappings between them. This results in directed graphs of mappings that allow querying between neigh- borhoods, translating queries in terms of its originator, possibly providing access to the data of a target dataspace. In this sense, trails can valuable to form and find semantic neighborhoods in the network by means of following links. 6.4 Metrics of feature evolution Data about how API features change can be useful to learn about the dynamics of agents and their underlying models. For example, operator sequences allow generating quantitative indicators of feature articulation efforts, the speed of change or reusability. The data can then be analyzed by tools and techniques for mining software repositories, such as dependency network analysis [19]. On the other hand, feature exchange graphs can be used to analyze the dynamics of the agent network, identifying the most transferred features or dominant API interfaces or observe how semantic neighborhoods converge or diverge when the semantics of underlying models change. Metrics will be used in a trade-off analysis to distinguish under what conditions this approach is effective. 7 Evaluation plan The hypothesis will be tested through two research aspects: 1. To evaluate the effects of provenance trails via simulations of agents that exchange features through hypermedia in a decentralized environment. To be measured through API evolution metrics and benchmarked against current semantic web service discovery approaches. 2. The impact that keeping such trails has for users while customizing their agents assisted by service discovery mechanisms without a central service registry. H1 will be validated if instruments of (1) demonstrate automatic feature discovery and exchange using hypermedia APIs and collecting trails. H2 will be validated if (2) demonstrate fewer API articulation efforts by part of the users. Several tasks still need to be done to perform the studies and evaluate results, answering the research questions. – To develop vocabularies or ontologies to represent (i) feature exposition, to expose and transfer functional descriptions and their dependencies, (iii) feature change operators, with representative instances organized in layers, (ii) and feature exchange. (Q1),(Q2). – An extensive trade-off analysis of the approach (Q3). 8 Conclusion In this work, we present an approach where a personal agent collects functionality from other agents, improving their means of interaction while keeping model autonomy. It also introduces provenance trails to support automated API evolution when possible, and if not, diminish the costs that API composition has for humans. I still have considerable work regarding the implementation of ontologies and experiments, which I believe will contribute to state of the art. Being this approach specific, I understand it as complementary to existing approaches such as automatic schema mapping algorithms [11], mapping gossiping [14] between peers and the current Semantic Web Service techniques [4]. There is space for innovation regarding personal highly customizable personal agents, assembling functionality around personal data or even, in more abstract terms, to any relevant life event of a user. I believe that feature reuse is a good compromise between a one-size fits all solution and complex API programming. In the same way, personal agents can help them reach a certain degree of alignment and coherence between personal dataspaces, something essential if users want to interact together and build something valuable with the data they own. 9 Acknowledgments I would like to thank Miel Vander Sande for his useful advice. References [1] G.-j. Houben, “Linking Personal Data : towards a web of digital memories,” Building, no. 342, p. 4, 2010. [2] T. Paul, A. Famulari, and T. Strufe, “A survey on decentralized online social networks,” Computer Networks, vol. 75, pp. 437–452, 2014. [3] A. Cawsey, F. Grasso, and C. Paris, “The adaptive web,” pp. 465–484, 2007. [4] D. Fensel, F. M. Facca, E. Simperl, and I. Toma, Semantic web services. Springer Science & Business Media, 2011. [5] M. Lanthaler, “Creating 3rd generation web apis with hydra,” in Proceedings of the 22Nd international conference on world wide web, 2013, pp. 35–38. [6] R. Verborgh, D. Arndt, S. Van Hoecke, J. De Roo, G. Mels, T. Steiner, and J. Gabarro, “The Pragmatic Proof: Hypermedia API Composition and Execution,” 2015. [7] M. Vander Sande, R. Verborgh, A. Dimou, P. Colpaert, and E. Mannens, “Hypermedia-based Discovery for Source Selection using Low-Cost Linked Data Interfaces,” International Journal on Semantic Web and Information Systems, vol. 12, no. 3, pp. 79–110, 2016. [8] E. D. Falkenberg, W. Hesse, P. Lindgreen, B. E. Nilsson, J. L. H. Oei, C. Rolland, R. K. Stamper, F. J. M. Van Assche, A. A. Verrijn-Stuart, and K. Voss, A framework of information system concepts - The FRISCO Report., vol. 4. International Federation for Information Processing; International Federation for Information Processing WG 8.1, 1998, pp. 282–7. [9] A. Zouaq, “A Survey of Domain Ontology Engineering : Methods and Tools.” [10] C. Javed, M., Abgaz, Y.M., Pahl, “A pattern-based framework of change operators for ontology evolution,” On the Move to Meaningful Internet Systems: OTM Workshops, vol. 5872, pp. 544–553, Jun. 2009. [11] J. Euzenat, J. Barrasa, P. Bouquet, and R. Dieng, “D2. 2.3: State of the art on ontology alignment,” The Contributor, 2004. [12] P. Cudré-Mauroux, “Emergent semantics,” in Encyclopedia of database systems, Springer, 2009, pp. 982–985. [13] K. Aberer, T. Catarci, P. Cudré-Mauroux, T. Dillon, S. Grimm, M.-S. Hacid, A. Illarramendi, M. Jarrar, V. Kashyap, M. Mecella, E. Mena, E. J. Neuhold, A. M. Ouksel, T. Risse, M. Scannapieco, F. Saltor, L. de Santis, S. Spaccapietra, S. Staab, R. Studer, and O. De Troyer, “Emergent Semantics Systems,” in Semantics of a networked world. semantics for grid databases, 2004, pp. 14–43. [14] K. Aberer, P. Cudré-Mauroux, and M. Hauswirth, “The chatty web: emergent semantics through gossiping,” Proceedings of the 12th international conference on World Wide Web, pp. 197–206, 2003. [15] R. Verborgh and M. Dumontier, “A Web API ecosystem through feature- based reuse,” pp. 1–12, 2016. [16] M. Javed, Y. M. Abgaz, and C. Pahl, “Layered Change Log Model: Bridging between Ontology Change Representation and Pattern Mining,” International Journal of Metadata Semantics and Ontologies, vol. 9, no. 3, pp. 184–192, 2014. [17] M. Javed and Y. Abgaz, “Graph-based discovery of ontology change patterns,” Change, pp. 1–16, 2011. [18] D. Bianchini, S. Montanelli, C. Aiello, R. Baldoni, C. Bolchini, S. Bonomi, S. Castano, T. Catarci, V. De Antonellis, A. Ferrara, and others, “Emergent se- mantics and cooperation in multi-knowledge communities: The esteem approach,” World Wide Web, vol. 13, nos. 1-2, pp. 3–31, 2010. [19] R. Kikas, G. Gousios, M. Dumas, and D. Pfahl, “Structure and evolution of package dependency networks,” in Proceedings of the 14th international conference on mining software repositories, 2017, pp. 102–112.