MultiLingMine 2016: Modeling, Learning and
          Mining for Cross/Multilinguality

              Salvatore Romeo1 , Andrea Tagarelli2 , Dino Ienco3 ,
                      Mathieu Roche4 , and Paolo Rosso5
               1
                 Qatar Computing Research Institute, Doha, Qatar
                  2
                    DIMES, University of Calabria, Rende, Italy
                    3
                       IRSTEA, LIRMM, Montpellier, France
                     4
                        CIRAD, LIRMM, Montpellier, France
               5
                 Universitat Politecnica de Valencia, Valencia, Spain


      Abstract. The increasing availability of text information coded in many
      different languages poses new challenges to modern information retrieval
      and mining systems in order to discover and exchange knowledge at a
      larger world-wide scale. The 1st International Workshop on Modeling,
      Learning and Mining for Cross/Multilinguality (dubbed MultiLingMine
      2016) provides a venue to discuss research advances in cross-/multilingual
      related topics, focusing on new multidisciplinary research questions that
      have not been deeply investigated so far (e.g., in CLEF and related
      events relevant to CLIR). This includes theoretical and experimental
      on-going works about novel representation models, learning algorithms,
      and knowledge-based methodologies for emerging trends and applica-
      tions, such as, e.g., cross-view cross-/multilingual information retrieval
      and document mining, (knowledge-based) translation-independent cross-
      /multilingual corpora, applications in social network contexts, and more.


1    Motivations

In the last few years the phenomenon of multilingual information overload has
received significant attention due to the huge availability of information coded
in many different languages. We have in fact witnessed a growing popularity of
tools that are designed for collaboratively editing through contributors across
the world, which has led to an increased demand for methods capable of ef-
fectively and efficiently searching, retrieving, managing and mining different
language-written document collections. The multilingual information overload
phenomenon introduces new challenges to modern information retrieval systems.
By better searching, indexing, and organizing such rich and heterogeneous infor-
mation, we can discover and exchange knowledge at a larger world-wide scale.
However, since research on multilingual information is relatively young, impor-
tant issues still remain uncovered:

 – how to define a translation-independent representation of the documents
   across many languages;
2         Romeo et Al.

    – whether existing solutions for comparable corpora can be enhanced to gen-
      eralize to multiple languages without depending on bilingual dictionaries or
      incurring bias in merging language-specific results;
    – how to profitably exploit knowledge bases to enable translation-independent
      preserving and unveiling of content semantics;
    – how to define proper indexing and multidimensional data structures to bet-
      ter capture the multi-topic and/or multi-aspect nature of multi-lingual doc-
      uments;
    – how to detect duplicate or redundant information among different languages
      or, conversely, novelty in the produced information;
    – how to enrich and update multi-lingual knowledge bases from documents;
    – how to exploit multi-lingual knowledge bases for question answering;
    – how to efficiently extend topic modeling to deal with multi/cross-lingual
      documents in many languages;
    – how to evaluate and visualize retrieval and mining results.


2      Objectives, topics, and outcomes

The aim of the 1st International Workshop on Modeling, Learning and Min-
ing for Cross/Multilinguality (dubbed MultiLingMine 2016 ),6 held in conjunc-
tion with the 2016 ECIR Conference, is to establish a venue to discuss research
advances in cross-/multilingual related topics. MultiLingMine 2016 has been
structured as a full-day workshop. Its program schedule includes invited talks
as well as a panel discussion among the participants. It is mainly geared to-
wards students, researchers and practitioners actively working on topics related
to information retrieval, classification, clustering, indexing and modeling of mul-
tilingual corpora collections. A major objective of this workshop is to focus on
research questions that have not been deeply investigated so far. Special interest
is devoted to contributions that aim to consider the following aspects:

    – Modeling: methods to develop suitable representations for multilingual cor-
      pora, possibly embedding information from different views/aspects, such as,
      e.g., tensor models and decompositions, word-to-vector models, statistical
      topic models, representational learning, etc.
    – Learning: any unsupervised, supervised, and semi-supervised approach in
      cross/multilingual contexts.
    – The use of knowledge bases to support the modeling, learning, or both stages
      of multilingual corpora analysis.
    – Emerging trends and applications, such as, e.g., cross-view cross-/multilingual
      IR, multilingual text mining in social networks, etc.

Main research topics of interest in MultiLingMine 2016 include the following:

    – Multilingual/cross-lingual information access, web search, and ranking
6
    http://events.dimes.unical.it/multilingmine/
                                                       MultiLingMine 2016       3

 – Multilingual/cross-lingual relevance feedback
 – Multilingual/cross-lingual text summarization
 – Multilingual/cross-lingual question answering
 – Multilingual/cross-lingual information extraction
 – Multilingual/cross-lingual document indexing
 – Multilingual/cross-lingual topic modeling
 – Multi-view/Multimodal representation models for multilingual corpora and cross-
   lingual applications
 – Cross-view multi/cross-lingual information retrieval and document mining
 – Multilingual/cross-lingual classification and clustering
 – Knowledge-based approaches to model and mine multilingual corpora
 – Social network analysis and mining for multilinguality/cross-linguality
 – Plagiarism detection for multilinguality/cross-linguality
 – Sentiment analysis for multilinguality/cross-linguality
 – Deep learning for multilinguality/cross-linguality
 – Novel validity criteria for cross-/multilingual retrieval and learning tasks
 – Novel paradigms for visualization of patterns mined in multilingual corpora
 – Emerging applications for multilingual/cross-lingual domains

    The ultimate goal of the MultiLingMine workshop is to increase the visi-
bility of the above research themes, and also to bridge closely related research
fields such as information access, searching and ranking, information extraction,
feature engineering, text mining and machine learning.


3   Advisory board
The scientific significance of the workshop is assured by a Program Committee
which includes 20 research scholars, coming from different countries and widely
recognized as experts in cross/multi-lingual information retrieval:
Ahmet Aker, Univ. Sheffield, United Kingdom
Rafael Banchs, I2R Singapore
Martin Braschler, Zurich Univ. of Applied Sciences, Switzerland
Philipp Cimiano, Bielefeld University, Germany
Paul Clough, Univ. Sheffield, United Kingdom
Andrea Esuli, ISTI-CNR, Italy
Wei Gao, QCRI, Qatar
Cyril Goutte, National Research Council, Canada
Parth Gupta, Universitat Politcnica de Valncia, Spain
Dunja Mladenic, Jozef Stefan International Postgraduate school, Slovenia
Alejandro Moreo, ISTI-CNR, Italy
Alessandro Moschitti, Univ. Trento, Italy; QCRI, Qatar
Matteo Negri, FBK - Fondazione Bruno Kessler, Italy
Simone Paolo Ponzetto, Univ. Mannheim, Germany
Achim Rettinger, Institute AIFB, Germany
Philipp Sorg, Institute AIFB, Germany
Ralf Steinberger, JRC in Ispra, Italy
Marco Turchi, FBK - Fondazione Bruno Kessler, Italy
4       Romeo et Al.

Vasudeva Varma, IIIT Hyderabad, India
Ivan Vulic, KU Leuven, Belgium


4    Related events

A COLING’08 workshop [1] was one of the earliest events that emphasized the
importance of analyzing multilingual document collections for information ex-
traction and summarization purposes. The topic also attracted attention from
the semantic web community: in 2014, [2] solicited works to discuss principles
on how to publish, link and access mono and multilingual knowledge data col-
lections; in 2015, another workshop [3] took place on similar topics in order to
allow researchers continue to address multilingual knowledge management prob-
lems. A tutorial on Multilingual Topic Models was presented at WSDM 2014 [4]
focusing on how statistically model document collections written in different lan-
guages. In 2015, a WWW workshop aimed at advancing the state-of-the-art in
Multilingual Web Access [5]: the contributing papers covered different aspects
of multilingual information analysis, leveraging attention on the lack of current
information retrieval techniques and the necessity of new techniques especially
tailored to manage, search, analysis and mine multilingual textual information.
    The main event related to our workshop is the CLEF initiative [6] which
has long provided a premier forum for the development of new information ac-
cess and evaluation strategies in multilingual contexts. However, differently from
MultiLingMine, it does not have emphasized research contributions on tasks such
as searching, indexing, mining and modeling of multilingual corpora.
    Our intention is to continue the lead of previous events about multilingual
related topics, however from a broader perspective which is relevant to various
information retrieval and document mining fields. We aim at soliciting contribu-
tions from scholars and practitioners in information retrieval that are interested
in Multi/Cross-lingual document management, search, mining, and evaluation
tasks. Moreover, differently from previous workshops, we would emphasize some
specific trends, such as cross-view cross/multilingual IR, as well as the grow-
ing tightly interaction between knowledge-based and statistical/algorithmic ap-
proaches in order to deal with multilingual information overload.


References
1. Bandyopadhyay, S., Poibeau, T., Saggion, H., Yangarber, R. (2008). Procs. of the
   Workshop on Multi-source Multilingual Information Extraction and Summarization
   (MMIES). ACL.
2. Chiarcos, C., McCrae J. P., Montiel E., Simov, K., Branco, A., Calzolari, N., Osen-
   ova, P., Slavcheva, M., Vertan, C. (2014). Procs. of the 3rd Workshop on Linked
   Data in Linguistics: Multilingual Knowledge Resources and NLP (LDL).
3. McCrae, J. P., Vulcu G. (2015). CEUR Procs. of the 4th Workshop on the Multi-
   lingual Semantic Web (MSW4), Vol. 1532.
                                                        MultiLingMine 2016         5

4. Moens, M.-F., Vulié, I. (2014). Multilingual Probabilistic Topic Modeling and Its
   Applications in Web Mining and Search. In Procs. of the 7th ACM WSDM Conf.
5. Steichen, B., Ferro, N., Lewis, D., Chi, E. E. (2015). Procs. of the Int. Workshop
   on Multilingual Web Access (MWA).
6. The CLEF Initiative. http://www.clef-initiative.eu/.