Preface of SeWeBMeDA 2018: Semantic Web solutions for large-scale biomedical data analytics? Ali Hasnain1 , Oya Beyan2 , Stefan Decker2 , and Dietrich Rebholz-Schuhmann1 1 Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland {ali.hasnain,rebholz}@insight-centre.org 2 Fraunhofer FIT, RWTH Aachen University {beyan,decker}@dbis.rwth-aachen.de The second edition of SeWeBMeDA-2018 workshop invited papers for life sciences and biomedical data processing, as well as the amalgamation with Linked Data and Semantic Web technologies for better data analytics, knowledge discovery and user- targeted applications. This workshop at the Extended Semantic Web Conference (ESWC) targeted origi- nal contributions describing theoretical and practical methods and techniques that present the anatomy of large scale linked data infrastructure, which covers: the distributed in- frastructure to consume, store and query large volumes of heterogeneous linked data; using indexes and graph aggregation to better understand large linked data graphs, query federation to mix internal and external data-sources, and linked data visualisation tools for health care and life sciences. It will further cover topics around data integration, data profiling, data curation, querying, knowledge discovery, ontology mapping / matching / reconciliation and data / ontology visualisation, applications / tools / technologies / techniques for life sciences and biomedical domain. SeWeBMeDA aims to provide re- searchers in biomedical and life science, an insight and awareness about large scale data technologies for linked data, which are becoming increasingly important for knowledge discovery in the life sciences domain. This year, we accepted three papers, we invited a keynote speaker, organised a short hackathon and also discussed on current issues along with future steps for large scale data in biomedical domain. Keynote talk was given by Maria-Esther Vidal who is the head of the Scientific Data Management group at TIB Leibniz Information Centre for Science and Tech- nology, Germany and a full professor (on-leave) at Universidad Simón Bolı́var (USB) Venezuela. Her interests include Big data and knowledge management, knowledge rep- resentation, and semantic web with more than 130 peer-reviewed papers in Semantic Web, Databases, Bioinformatics, and Artificial Intelligence. The title of her talk was ”Synthesizing Big Data into Actionable Knowledge”, where she discussed the role of Big data in promoting emerging scientific and interdisciplinary research by enabling decision-making. She described that knowledge-driven approach is capable to ingest Big data sources and integrate them into a knowledge graph that represents not only the meaning of the entities published by these data sources, but also that provides the basis for the discovery of unknown patterns and associations between these en- tities. The features of this knowledge-driven framework are shown in the context of ? Joint proceedings are publicly available in [1]. 2 SeWeBMeDA 2018 organizers the EU funded project iASiS (http://project-iasis.eu/) , where it is used to pave the way for personalized diagnosis and treatments. The presentation slides are available at: (https://goo.gl/aH92pM). As mentioned we had three paper presentations: Gleim et al [3], proposes an automated schema extraction approach compatible with existing Semantic Web-based technologies. The extracted schema enables ad-hoc query formulation against privacy sensitive data sources without requiring data access, and successive execution of that request in a secure enclave under the data provider’s con- trol. The developed approach permit user to extract structural information from non- uniformed resources and merge it into a single schema to preserve the privacy of each data source. Initial experiments show that this approach overcomes the reliance of pre- vious approaches on agreeing upon shared schema and encoding a priori in favor of more exible schema extraction and introspection. Hasnain et al [2], assess the FAIR principles against the LOD principles to deter- mine, to which degree, the FAIR principles reuse LOD principles, and to which degree they extend the LOD principles. This assessment helps to clarify the relationship be- tween both schemes and gives a better understanding, what extension FAIR represents in comparison to LOD. This publication concluds, that LOD gives a clear mandate to the openness of data, whereas FAIR asks for a stated license for access and thus includes the concept of reusability under consideration of the license agreement. Furthermore, FAIR makes strong reference to the contextual information required to improve reuse of the data, e.g., provenance information. According to the LOD principles, such meta-data would be considered interoperable data as well, however, the requirement of extending of data with meta-data does indicate that FAIR is an extension of the LOD (in contrast to the inverse). Nayak et al [4], propose that the use of topic modeling, specifically non-negative matrix factorization (NMF), as a first step towards dimensionality reduction when deal- ing with large amounts of data. In this position paper, as a use case, author applied NMF to the BioSamples metadata and present preliminary results. At the end of the workshop we organised a short Hackathon title ”Privacy-Preserving Information Extraction with Bloom Filters”. At the beginning of the hackathon, we pro- vided a short introduction to the prerequisites, such as bloom filters, general privacy issues and frameworks that can be used (Python or KNIME). Then, each team involved in the hackathon was given a unique Knowledge Graph onto which they could apply in- formation retrieval techniques to build up some experience with the given framework. Next, the the Bloom Filters were applied and discussed the suitable metrics for valuing an unseen knowledge graph based on a query response that may contain false positives. Finally, each team formulated queries for estimating the worth of an unseen Knowl- edge Graph and ultimately made a decision about which other teams Knowledge Graph complements their own Knowledge Graph the best. Acknowledgments We would like to thank the authors for their contribution and active participation in the workshops, and all the program committee members for reviewing the submissions and Title Suppressed Due to Excessive Length 3 provide valuable feedback. We are also grateful to the organisers of the ESWC 2018 conference for their support, and our keynote speaker, Maria-Esther Vidal who is the head of the Scientific Data Management group at TIB Leibniz Information Centre for Science and Technology, Germany and a full professor (on-leave) at Universidad Simón Bolı́var (USB) Venezuela. SeWeBMeDA-2018 workshop was co-organised by Insight Centre for Data Ana- lytics NUI Galway and Fraunhofer FIT, RWTH Aachen University. This workshop has been supported in part by Science Foundation Ireland under Grant Number SFI/12/RC/2289. References 1. O. Beyan, J. Debattista, S. Decker, J. D. Fernández, A. Hasnain, M. I. Ali, P. Patel, D. Rebholz- Schuhmann, D. T. Amit Sheth, J. Umbrich, and M.-E. Vidal, editors. Joint proceedings of the 4th Workshop on Managing the Evolution and Preservation of the Data Web (MEP- DaW), the 2nd Workshop on Semantic Web solutions for large-scale biomedical data analyt- ics (SeWeBMeDA), and the Workshop on Semantic Web of Things for Industry 4.0 (SWeTI), number 2112 in CEUR Workshop Proceedings, Aachen, 2018. 2. A. Hasnain and D. Rebholz-Schuhmann. Assessing fair data principles against the 5-star open data principle. In 2nd workshop on Semantic Web solutions for large-scale biomedical data analytics (SeWeBMeDA), 2018. 3. L. Z. O. K. H. S. S. D. Lars C. Gleim, Md. Rezaul Karim and O. Beyan. Using schema extraction for query design without data access to enable privacy maintaining processing of sensitive data. In 2nd workshop on Semantic Web solutions for large-scale biomedical data analytics (SeWeBMeDA), 2018. 4. A. Z. Stuti Nayak and M. Dumontier. Quality assessment of biomedical metadata using topic modeling. In 2nd workshop on Semantic Web solutions for large-scale biomedical data ana- lytics (SeWeBMeDA), 2018.