Introduction

Various methods and techniques have been explored in the past in the aim of automatically generating new bilingual (and multilingual) dictionaries based on existing ones, so that given L1>L2 and L2>L3 sets, a new L1>L3 dictionary is produced. The intermediate language that is used in the process is called a pivot, and it is possible to use multiple pivots for this purpose. Although considerable work has been done to this end, it was usually conducted on different types of datasets and evaluated in different ways, applying various algorithms that are often not comparable.

The Translation Inference Across Dictionaries shared task (TIAD-2017) was launched with the intention of offering quality lexical resources within the context of a single coherent experiment, which enables reliable validation of results and solid comparison of methods and techniques for auto-generating translation equivalents across languages, as well as stimulating and enhancing further research.

The data for TIAD-2017 was released by K Dictionaries Ltd and extracted from its Global Series, which includes monolingual, bilingual and multilingual lexical sets for 24 languages (Kernerman, 2011). Each language core is compiled independently within a common overall framework and sharing the same technical infrastructure with all other languages. It consists of detailed, well-structured lexicographic information on the core language, which serves as a base for translation to any other language. The language pairs were thus created directly between each other, with no intermediate language bias (e.g. English, in WordNet), and joining the different pairs of each language core forms a multilingual network.

All the evaluation data and results is freely available on the Web (see https://tiad2017.wordpress.com/ )

The shared task

The objective of the shared task was to indirectly generate translations for three language pairs, based on the available translations among eight languages altogether, in 14 bilingual dictionaries, involving four possible paths -all from German to Brazilian Portuguese -that feature between 1 to 4 pivot languages.

The test dataset consisted of 100 randomly-selected German dictionary entries with their translations into a second language, and recursively exploring further translations in chained-up dictionaries -including up to 817 entries with 1,532 translation equivalents in the largest language pair that is provided. Besides the headwords and translations, the data includes information about parts of speech, subject domains and synonyms, as well as examples of usage and their translations.

The following language pairs were provided for the four different paths:

1. German>English | English>Portuguese The three new language pairs that were generated are:

• German>Portuguese • Danish>Spanish • Dutch>French

The Portuguese>German sets were provided for the sole purpose of closing the loop as an aid for validating the results, but were not allowed to be reversed to improve results. However, it was allowed to reverse the Spanish>Danish dataset -that was provided as part of path (4) -o helpt with improving the results.

Evaluation of the results of each system was carried out against KD's manually compiled dictionaries for these pairs from the Global Series and other resources, as well as by human translators.

Participants were invited to contribute on either or both of the following tracks:

• Systems that use only the KD data released for the task • Systems that exploit, in addition to the KD data, other freely available sources of background knowledge (e.g., lexical linked open data and parallel corpora) to improve performance Beyond performance, participants were encouraged to consider the following issues in particular:

• The role of the language family with respect to the newly generated pairs • The asymmetry of pairs, and how the translation direction affected the results • The behavior of different parts of speech among different languages • The role that the number of pivots played in the process