Workshops and Tutorials at K-CAP2017 Proceedings Preface Giuseppe Rizzo ISMB Turin, Italy giuseppe.rizzo@ismb.it The International Conference on Knowledge Capture (K-CAP) provides a forum that brings together members of diverse research communities who are interested in efficiently capturing knowledge from a vast range of sources and in creating representations that can be useful for building knowledge-intensive au- tonomous applications. Numerous research fields are investigating and applying these aforementioned research lines and they include natural language process- ing, machine learning, knowledge management, and semantic web. Besides the traditional research track, K-CAP usually hosts workshops and tutorials on topics related to the theme of the conference. In particular, workshops aim to provide opportunities for exchanging views, advancing ideas, and discussing preliminary results in an atmosphere that fosters the active exchange of ideas. Workshops are usually held before the conference and prepare the attendees to the discussions during the conference. Tutorials enable attendees to fully appre- ciate current research trends, main schools of thoughts, and possible application areas. The 2017 conference, also known as the Ninth International Conference on Knowledge Capture,1 aimed at attracting researchers from diverse areas of Ar- tificial Intelligence, including knowledge representation, knowledge acquisition, intelligent user interfaces, problem-solving and reasoning, planning, agents, text extraction, and machine learning, information enrichment and visualization, as well as researchers interested in cyber-infrastructures to foster the publication, retrieval, reuse, and integration of data. Today these data come from an increas- ingly heterogeneous set of resources that differ with regards to their domain, media format, quality, coverage, viewpoint, bias. More than the sheer amount of these data, their heterogeneity allows us to arrive at better models and an- swer complex questions that cannot be addressed in isolation but require the interaction of different scientific fields or perspectives. In most cases, knowledge is not captured as a means to an end but to, for instance, enable better user in- terfaces, improve retrieval beyond simple keyword search. For K-CAP 2017, we 1 http://k-cap2017.org 1 focused on the creation, enrichment, querying, and maintenance of knowledge graphs out of heterogeneous data sources. The 2017 conference welcomed in total two workshops and thee tutorials scheduled the day before the conference started. Workshops and tutorials opened the discussions: the workshops covered the crucial task of capturing knowledge from scientific content and by investigating the need to go beyond the traditional macro-reading processes for extracting knowledge from docu- ments. In detail: Second International Workshop on Capturing Scientific Knowledge 2 From the early days of Artificial Intelligence, researchers have been in- terested in capturing scientific knowledge to develop intelligent systems. There are a variety of formalisms used today in different areas of science. Ontologies are widely used for organizing knowledge, particularly in bi- ology and medicine. Process representations are used to do qualitative reasoning in areas such as physics and chemistry. Probabilistic graphical models are used by machine learning researchers, e.g., in climate modeling. In addition to enabling more advanced capabilities for intelligent systems in science, capturing scientific knowledge enables knowledge dissemination and open science practices. This is increasingly more important to enable the reuse of scientific knowledge across scientific disciplines, businesses and the public. Although great advances have been made, scientific knowledge is complex and poses great challenges for knowledge capture. This work- shop provided a forum to discuss existing forms of scientific knowledge representation and existing systems that use them, and to envision major areas to augment and expand this important field of research. The in- creasing emphasis in open science has had a major focus on data sharing but it needs to encompass knowledge as well. There are many research challenges in open sharing and reuse of scientific knowledge that need to be addressed in future research. The workshop had as opening an invited talk by Suzanne Pierce and seven papers presented. Machine Reading 3 Machine reading holds significant potential for automat- ing knowledge capture, especially given the continuing improvements in natural language processing technologies. Macro-reading techniques (skim- ming many documents) now enable collecting large databases of facts, while modern micro-reading techniques (comprehension of individual para- graphs) have proven effective at factoid question answering. In this work- shop, participants will discuss ways to develop new capabilities in macro- and micro-reading to take these to the next level, in particular to ex- tract useful representations of text (be they symbolic, neural, or a hybrid) that enable, for example, automated reasoning to answer non-trivial ques- tions. This workshop provided a forum to researchers in discussing themes related to knowledge-based approaches applied to deep processing of con- 2 https://sciknow.github.io/sciknow2017 3 http://www.cs.utexas.edu/users/porter/kcap-machinereadingworkshop.php 2 tent. It also addressed the topic of assessing at large scale the quality of knowledge graphs. Five papers were presented at the workshop. In addition to these workshops, three tutorials were included in the program. Also the tutorials attracted a lot of interest, they all shared the same format alternating depth analyses of topics with practical demonstrations. Three main topics were covered: representation learning, knowledge graphs, and deep learn- ing. In detail: Semantic data mining for knowledge acquisition 4 The tutorial provided a synthetic, unifying view on semantic data mining and its application to knowledge acquisition. Semantic data mining is a data mining approach where domain ontologies are used as background knowledge. The challenge is to mine knowledge encoded in domain ontologies and knowledge graphs in addition to purely empirical data. The tutorial aimed to present major research challenges arising from peculiarities of semantic data mining such as proper consideration of the semantics of background knowledge, deal- ing with Open World Assumption, and semantic similarity measures. In addition, it covered also some of the recent advances in the area, namely semantic embeddings (embedding ontological background knowledge into neural networks). DOing REusable MUSical data 5 This tutorial firstly provided an in-depth explanations of the DOREMUS model (and its underlying foundations, CIDOC-CRM and FRBRoo) as well as the necessary controlled vocab- ularies. It then discussed and demonstrated the process to that lead to create a knowledge base of musical content starting from real data coming from musical libraries and be transformed to be compliant to Schema.org for various consumption scenarios. The entire DOREMUS tools chain were presented (e.g. tools for reconciling large multilingual knowledge graphs); the workshop covered also how the DOREMUS data can be consumed through various applications including an exploratory search engine and music recommender systems. Hybrid techniques for knowledge-based NLP. Knowledge graphs meet machine learning and all their friends6 Many different artificial in- telligence techniques can be used to explore and exploit large document corpora that are available inside organizations and on the Web. While natural language is symbolic in nature and first approaches were based on symbolic and rule-based methods (e.g., ontologies and knowledge bases), most widely used methods have been based on statistical approaches (e.g., linear methods such as support vectors machines, probabilistic topic mod- els, and non-linear methods such as neural networks). These two ap- proaches, knowledge-based and statistical methods, have their limitations 4 http://www.cs.put.poznan.pl/alawrynowicz/wordpress/?page_id=662 5 https://doremus-anr.github.io/kcap17_tutorial 6 http://expertsystemlab.com/kcap2017 3 and strengths; there is an increasing trend that seeks to combine them to get the best of both worlds. This tutorial covered the foundations and modern practical applications of knowledge-based and statistical methods, techniques and models and their combination for exploiting large docu- ment corpora. This tutorial firstly focused on the foundations of many of the techniques that can be used for this purpose, including knowledge graphs, word embeddings, neural network methods, probabilistic topic models, and then demonstrated how a combination of these techniques is being used in practical applications and commercial projects where the instructors are currently involved. These five co-located events attracted a large audience, who shared insights and fostered discussions with instructors and organizers. The overall take home message was in line with the conference scope, i.e. better understanding and framing the research of knowledge-based approaches to created autonomous and intelligent systems. 4