=Paper= {{Paper |id=Vol-2065/paper15 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2065/paper15.pdf |volume=Vol-2065 }} ==None== https://ceur-ws.org/Vol-2065/paper15.pdf
                         Hybrid techniques for knowledge-based NLP
                                Knowledge graphs meet machine learning and all their friends

                      Jose Manuel Gomez-Perez                                                Ronald Denaux
                            Expert System                                                    Expert System
                             Madrid, Spain                                                   Madrid, Spain
                       jmgomez@expertsystem.com                                        rdenaux@expertsystem.com

                                 Daniel Vila                                                 Carlos Badenes
                                  Recogn AI                                         Universidad Politecnica de Madrid
                                Madrid, Spain                                                Madrid, Spain
                               daniel@recogn.ai                                           cbadenes@fi.upm.es

ABSTRACT                                                                rigid and brittle in the face of different natural language processing
Many different artificial intelligence techniques can be used to ex-    applications, like e.g. question answering.
plore and exploit large document corpora that are available inside         In parallel, the last decade has witnessed a shift towards statisti-
organizations and on the Web. While natural language is symbolic        cal methods to text understanding due to the increasing availability
in nature and the first approaches in the field were based on sym-      of raw data and cheaper computing power. Such methods have
bolic and rule-based methods, like ontologies, semantic networks        proved to be powerful and convenient in many linguistic tasks.
and knowledge bases, most widely used methods are currently             Particularly, recent results in the field of distributional semantics
based on statistical approaches, including linear methods, such as      have shown promising ways to learn language models from text,
support vectors machines or probabilistic topic models, and non-        encoding the meaning of each word in the corpus as a vector in
linear ones such as neural networks. Each of these two main schools     dense, low-dimensional spaces. Among their applications, word em-
of thought in natural language processing, knowledge-based and          beddings have proved to be useful in term similarity, analogy and
statistical, have their limitations and strengths and there is an in-   relatedness, as well as many downstream tasks in natural language
creasing trend that seeks to combine them in complementary ways         processing.
to get the best of both worlds. This tutorial will cover the founda-       Aimed towards Semantic Web researchers and practitioners, this
tions and modern practical applications of knowledge-based and          tutorial elaborates on the idea introduced in [1] and shows how it is
statistical methods, techniques and models and their combination        possible to bridge the gap between knowledge-based and statistical
for exploiting large document corpora. The tutorial will first focus    approaches to further knowledge-based natural language process-
on the foundations of many of the techniques that can be used           ing. Following a practical and hands-on approach, the tutorial tries
to this purpose, including knowledge graphs, word embeddings,           to address a number of fundamental questions to achieve this goal,
neural network methods, and probabilistic topic models, and will        including: How can Machine Learning techniques be used to com-
then show how these techniques are being effectively combined           plement the knowledge already captured explicitly in knowledge
in practical applications, including commercial projects where the      graphs, extending and curating them in cost-efficient and practical
instructors currently participate.                                      ways, what are the main building blocks and techniques enabling
                                                                        such hybrid approach to natural language processing, how can
KEYWORDS                                                                structured and statistical knowledge representations be seamlessly
                                                                        integrated, how can the quality of the resulting hybrid represen-
Knowledge graphs, Hybrid natural language processing, embed-
                                                                        tations be inspected and evaluated, and how can this improve the
dings, vecisgrafo, topic models
                                                                        overall quality and coverage of our knowledge graphs.
1    MOTIVATION
For several decades, semantic systems were predominantly devel-
                                                                        2   DESCRIPTION OF THE TUTORIAL
oped around knowledge graphs at different degrees of expressivity.      This half-day tutorial provides plenty of practical content, real-life
Through the explicit representation of knowledge in well-formed,        examples and applications, and exercises. We offer an interactive
logically sound ways, knowledge graphs provide knowledge-based          session where both instructors and participants can engage in rich
text analytics with rich, expressive and actionable descriptions of     discussions on the topic. The agenda addresses the following points.
the domain of interest and support logical explanations of reason-          • Probabilistic topic models and topic-based semantic similar-
ing outcomes. On the downside, knowledge graphs can be costly to              ity.
produce since they require a considerable amount of human effort            • Creating a language model through word embeddings.
to manually encode knowledge in the required formats. Addition-             • Extending word embeddings with structured knowledge.
ally, such knowledge representations can sometimes be excessively           • Creating knowledge graph embeddings.
K-CAP2017 Workshops and Tutorials Proceedings, 2017
                                                                            • Building a vecsigrafo - bringing knowledge from text into
©2017 Copyright held by the owner/author(s).                                  knowledge graphs.
K-CAP2017 Workshops and Tutorials Proceedings, 2017            Jose Manuel Gomez-Perez, Ronald Denaux, Daniel Vila, and Carlos Badenes


     • Evaluating vecsigrafos beyond visual inspection and intrinsic        Daniel Vila is co-founder of recogn.ai, a Madrid-based startup
       methods.                                                          and spin-off from UPM, building next generation solutions for text
     • Applications in cross-lingual natural language processing.        analytics and content management using the AI methods. Daniel
     • Putting it all together in a real-life system.                    holds a PhD in Artificial Intelligence by Universidad PolitÃľcnica
     • Beyond text understanding: Cross-modal extensions.                de Madrid (2016), where he worked at the Ontology Engineering
                                                                         Group and developed the solution supporting a large knowledge
3    MATERIALS                                                           graph combining NLP and semantic technologies: the datos.bne.es
The tutorial follows a highly practical approach. The teaching fun-      data service from the National Library of Spain.
damentally consist of Jupyter notebooks that participants can install       Carlos Badenes: After more than 8 years working on the M2M
locally through Docker images with all the necessary software to         world, Carlos began researching about text mining within the con-
run the examples and exercises in their own machines. The materi-        text of the Semantic Web. Since then, he has moved more deeply
als of the K-CAP 2017 tutorial can be found in GitLab1                   into the study of topic modeling techniques to analyze large collec-
                                                                         tions of documents, incorporating semantic resources and working
4    AUDIENCE                                                            on multilingual domains. He currently works as an associate re-
This tutorial seeks to be of special value for members of the Se-        searcher at the Ontology Engineering Group doing a PhD at UPM.
mantic Web community although it is also useful for related com-            Oscar Corcho: Oscar Corcho is Full Professor at Departamento
munities, e.g. Machine Learning and Computational Linguistics.           de Inteligencia Artificial, UPM, and belongs to the Ontology En-
We welcome researchers and practitioners both from industry and          gineering Group. His research is focused on Semantic e-Science
academia, as well as other participants with an interest in hybrid       and Real World Internet, although he also works in more general
approaches to knowledge-based natural language processing.               areas of Semantic Web and Ontological Engineering. He has par-
                                                                         ticipated in numerous EU and Spanish R&D projects as well as
5    PRESENTERS                                                          privately-funded projects like ICPS (International Classification
                                                                         of Patient Safety), funded by the World Health Organisation, and
The tutorial is offered by the following members instructors.
                                                                         HALO, funded by Vulcan Inc. Previously, he worked as a Marie
   Jose Manuel Gomez-Perez works in the intersection of sev-
                                                                         Curie research fellow at the University of Manchester, and was
eral areas of Artificial Intelligence, including Natural Language
                                                                         a research manager at iSOCO. He holds a PhD in Computer Sci-
Processing, Knowledge Discovery, Representation and Reasoning.
                                                                         ence and AI from UPM. He was awarded the Third National Award
His long-term vision is to enable machines to understand text in a
                                                                         by the Spanish Ministry of Education in 2001. He has published
way similar to how humans read, bridging the gap between both
                                                                         several books, from which âĂIJOntological Engineering" can be
through semantically rich knowledge representations and user in-
                                                                         highlighted as it is being used as a reference book in a good num-
terfaces. At Expert System, Jose Manuel leads the Research Lab in
                                                                         ber of university lectures worldwide, and more than 100 papers in
Madrid where he focuses on the combination of structured knowl-
                                                                         journals, conferences and workshops. He usually participates in the
edge graphs and probabilistic methods. Before Expert System, he
                                                                         organization or in the program committees of relevant international
worked at iSOCO, one of the first European companies to deliver
                                                                         conferences and workshops.
semantic and natural language processing solutions on the Web.
He consults for companies like Coca-Cola or ING. Also active as          ACKNOWLEDGMENTS
an entrepreneur, he co-founded a startup and advised another. An
ACM member and Marie Curie fellow, Jose Manuel holds a Ph.D.             Partially funded by the EU H2020 project DANTE (700367) and the
in Computer Science and AI from UPM and regularly publishes              national Spanish project GRESLADIX (20160805).
in top scientific conferences and journals. His views on AI and
applications have appeared in magazines like Nature and Scientific
                                                                         REFERENCES
                                                                         [1] Ronald Denaux and Jose Manuel Gomez-Perez. 2017. Towards a vecsigrafo:
American. In 2015, he was the program chair of the International             Portable semantics in knowledge-based text analytics.. In Proceedings of the 2017
Conference on Knowledge Capture (K-CAP).                                     workshop on Hybrid Statistical Semantic Understanding and Emerging Semantic
   Ronald Denaux is a senior researcher at Expert System. Ronald             (HSSUES ’17). Held in conjunction with the 16th Intl. Semantic Web Conference,
                                                                             CEUR Workshop Proceedings.
obtained his MSc in Computer Science from the Technical Univer-
sity Eindhoven, The Netherlands. After a couple of years working
in industry as a software developer for a large IT company in The
Netherlands, Ronald decided to go back to academia. He obtained a
PhD, again in Computer Science, from the University of Leeds, UK.
Ronald’s research interests have revolved around making semantic
web technologies more usable for end users, which has required
research into (and resulted in various research publications in) the
areas of Ontology Authoring and Reasoning, Natural Language
Interfaces, Dialogue Systems, Intelligent User Interfaces and User
Modelling. Besides research, Ronald also participates in knowledge
transfer and product development.
1 https://gitlab.com/rdenaux/kcap17-tutorial