Preliminary Experiments on an Improved Artificial Player for a Word Association Game Alberto Coffrini1 , Stefania Monica2 , and Federico Bergenti1 1 Dipartimento di Scienze Matematiche, Fisiche e Informatiche Università degli Studi di Parma, 43124 Parma, Italy alberto.coffrini@studenti.unipr.it, federico.bergenti@unipr.it 2 Dipartimento di Scienze e Metodi dell’Ingegneria Università degli Studi di Modena e Reggio Emilia, 42122 Reggio Emilia, Italy stefania.monica@unimore.it Abstract. This paper presents recent developments of a software sys- tem that acts as an artificial player for a popular word association game. The game was proposed for the Evaluation Campaign of Natural Lan- guage Processing and Speech Tools for Italian in 2020, and it attracted the interest of various researchers. Several aspects of the recent devel- opments of the artificial player are discussed, from the collection of the texts used to acquire sufficient linguistic knowledge, to the improvements of the algorithm employed to play the game. Preliminary, but encour- aging, experimental results are also discussed in comparison with other artificial players for the same game. Keywords: Word association games · Lexical semantics · Natural lan- guage processing · Artificial intelligence 1 Introduction Natural Language Processing (NLP ) is a broad research field that studies the interactions between computers and human languages in the attempt to make computers speak and understand human languages (e.g., [13]). By its nature, NLP is an interdisciplinary field located at the intersection of linguistics, com- puter science, and artificial intelligence. The history of NLP includes a long list of particular problems that were addressed and effectively solved, but many other problems are still open and challenging. Among the traditional problems of NLP, it is worth mentioning automatic translation (e.g., [8]), which is the problem of generating a fluent text in a target language preserving the meaning of the original text written in a source language. A second traditional problem of NLP is text classification (e.g., [11]), which is the task of categorizing texts on the basis of their contents. Finally, a third traditional problem of NLP is information retrieval (e.g., [12]), which is the problem of automatically obtaining relevant information from texts. Copyright c 2021 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 A. Coffrini et al. Besides these traditional problems, the recent advent of new technologies promoted the interest in new application contexts for NLP. For example, the increasing pervasiveness of personal assistants such as Microsoft Cortana, Apple Siri, Amazon Alexa, and Google Home, renewed the interest in tasks related to automatic speech recognition (e.g., [10]). In addition, the diffusion of chatbots accelerated the research on question answering (e.g., [1]). Finally, the massive use of social networking services contributed to spread the interest in tasks related to sentiment analysis (e.g., [2]). A plethora of approaches have been experimented over the years to effectively solve NLP problems. For example, logic programming has been playing a crucial role in NLP since the very first studies on computational linguistics (e.g., [7]). Logic programming is based on facts and rules, which is a feature shared with the ordinary approach to describe the surface grammars of human languages. This shared feature makes the use of logic programming particularly well suited to accomplish NLP tasks. Note that inductive logic programming (e.g., [14]) and probabilistic logic programming (e.g., [15, 16]) have also been successfully applied to accomplish NLP tasks. In addition to logic-based methods and techniques, statistical methods have been extensively used in the context of NLP (e.g., [5]). Such methods are typically based on decision trees and hidden Markov models. More recently, several approaches based on neural networks (e.g., [4]) and deep learning (e.g. [18]) have been successfully applied solve NLP problems. The analysis of the specific application context is crucial to design and im- plement effective NLP systems. Actually, the common approach to design NLP systems is based on the identification of the relevant NLP problems to be solved. Such problems are then addressed using specific methods that are often designed for the purpose. It is common opinion among researchers interested in NLP that the use of methods specifically designed to target the problems at hand is the only viable approach to accomplish complex NLP tasks. The NLP problem discussed in this paper was proposed for the Evalua- tion Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA) in 2020, and it is called Ghigliottin-AI [3]. The challenge is to build a software system that can play a word association game called La Ghigliottina (Italian for The Guillotine), which is the closing game of a popular Italian tele- vision show. The rules of the game are simple, and they can be summarized as follows. Given five words in Italian, the player needs to guess a sixth word that must be related to each one of the five words. Various relationships among words are acceptable. For instance, two words can be related because they are synonyms, antonyms, or because they form a compound word. Similarly, two words can be related because they are included in a proverb or a movie title. The software system discussed in this paper is an artificial player for this game. A description of the initial design and implementation of the player was presented in [6], and this paper focuses on recent developments of the player. In particular, this paper outlines an enhanced algorithm for the player, and it shows preliminary experimental results that document the improved performance of the player with respect to the performance discussed in [6]. Experiments on an Improved Artificial Player for a Word Association Game 3 This paper is organized as follows. Section 2 discusses the collection and the processing of the texts used to acquire the needed linguistic knowledge. Section 3 outlines the main characteristics of the algorithm used by the artificial player. Section 4 examines the metrics used by the artificial player and outlines relevant improvements. Section 5 shows preliminary experimental results based on real instances of the game. Section 6 concludes the paper and outlines possible future research directions. 2 Collection and Processing of Texts Various steps were involved in the design and implementation of the artificial player. The first step concerned the collection of a relevant amount of texts from various Web sources. Then, collected texts were properly processed to remove punctuation marks and words that are not used in the game, such as articles and prepositions. The obtained cleaned text were then further processed to identify pairs of related words by simply grouping words that are close to each other. Each pair of related words was then associated with the number of its occurrences in the cleaned texts. The collection of the pairs of related words is particularly relevant for the construction of the artificial player because the game is based on finding relationships among words. Therefore, the obtained pairs of related words were stored and used as the knowledge base of the artificial player, as discussed in Section 3. 2.1 Collection of Texts The words needed to play the considered word association game are from the Italian lexicon, and therefore all collected texts are written in Italian. The col- lected texts are taken from diverse Web sources, and they concern various topics. The diversification of the collected texts ensures that a large amount of diverse pairs of related words can be retrieved and used to play the game. Note that some of the collected texts were authored in Italian, while the others are professional translations from other languages, such as English, French, and Spanish. The set of collected texts includes thirteen books that were downloaded free of charge from e-book platforms. The set of books includes some books from Italian authors, such as Pinocchio and Zeno’s Conscience, and some books profession- ally translated from other languages, such as Alice’s Adventures in Wonderland, The Little Prince, and Don Quixote. The set of collected texts also includes a large assortment of Italian texts called Corpus Paisà (www.corpusitaliano.it), which is a free corpus of approxi- mately 1.5 GB of text files. Originally, Corpus Paisà was created with the aim of providing authentic and freely available texts to learn Italian. Today, Corpus Paisà is mostly used as a resource for research activities related to the Ital- ian language, and it is commonly considered as a valuable resource to acquire linguistic knowledge for the Italian language. 4 A. Coffrini et al. Besides the books and Corpus Paisà, some texts from the Italian edition of Wikipedia were also downloaded. These texts include the titles of all the Italian articles of Wikipedia (corresponding to nearly 100 MB of text) and 150 full articles written in Italian. The chosen articles concern various topics, such as cooking, science, and music. Finally, a long list of proverbs, compound words, and idiomatic phrases was also collected from various Web sources. This list is particularly useful to success- fully play the game because proverbs, compound words, and idiomatic phrases are often used in the game. As a matter of fact, preliminary empirical observa- tions of game instances aired on television confirm that at least one of the given five words is often related to the correct sixth word through proverbs, compound words, or idiomatic phrases. 2.2 Processing of Texts The collected texts are processed to remove punctuation marks and words that are not used in the game. Then, the obtained cleaned texts are used to create a set of pairs of related words to form the knowledge base of the artificial player. First, all punctuation marks are replaced with an uncommon symbol, namely $, which is used to break sentences. Then, the words that are not useful to play the game are removed. For example, articles and prepositions can be removed from the collected texts without affecting the performance of the artificial player. Actually, articles and prepositions are so common in Italian that the rules of the game prohibit to use them. Finally, conjugated verbs are removed from the collected texts because they are also prohibited by the rules of the game. After the elimination of punctuation marks and prohibited words, cleaned texts are further processed to obtain the set of pairs of related words used by the artificial player. A word pair is a couple of two subsequent words in the same sentence. The identification of word pairs is particularly relevant because the game is based on finding relationships among words. Actually, every instance of the game implicitly involves five word pairs because each one of the given five words must be related to the sixth word. This is the reason why all cleaned texts are parsed to extract word pairs. Note that, while parsing cleaned texts, the number of times that each word pair appears in cleaned texts is stored together with the word pair. The adopted nomenclature to refer to word pairs and related metrics is as follows. Given a word pair, the first word of the pair is called token and the second word is called related token. Therefore, a generic word pair is a couple htoken, related tokeni. (1) Each word pair is associated with its occurrence, which is the number of times that the pair is found in cleaned texts. For each (direct) word pair, its inverse word pair is formed by exchanging the token with the related token. Note that the occurrence of the inverse word pair is set equal to the occurrence of the direct word pair. Experiments on an Improved Artificial Player for a Word Association Game 5 The reason for considering inverse word pairs is as follows. As outlined in Section 3, the artificial player considers the given five words as tokens, and it searches for the sixth word among the corresponding related tokens. Since every word can be either included in the set of five words or it can be the sixth word, inverse pairs are needed to ensure that words can be equally considered as tokens and as related tokens. The following example is shown to explain the cleaning process and the nomenclature of tokens, related tokens, and occurrences. Let us assume that the sequence of words acquistare un computer (Italian for buy a computer ) is found in the collected texts. The word un (Italian for a) is an article and, as explained earlier in this section, it is removed during the cleaning process. After removing the article, the two words acquistare and computer are close to each other in the cleaned text and, therefore, the word pair hacquistare, computeri (2) is created. In this case, acquistare is considered as the token of the pair and computer is considered as the related token of the pair. Let us assume that the word pair (2) is found eight times in all cleaned texts, then the occurrence of the word pair (2) is set equal to eight. As discussed in Section 3, the word pair (2) ensures that if acquistare is one of the given five words, then computer is considered as a candidate for the sixth word. However, if computer is one of the given five words, then the artificial player should consider acquistare as a candidate for the sixth word. Since the two words should be considered as related regardless of which one is the token and which one is the related token, the inverse word pair of (2) is also created. The occurrence of the inverse pair is set equal to eight because it equals the occurrence of the direct pair. The inverse pair allows finding acquistare as a candidate for the sixth word when computer is one of the given five words. The word pairs obtained by the collected texts are stored together with their respective occurrences, and they are used by the artificial player to play the game, as discussed in the following section. Currently, the set of word pairs used by the artificial player comprises more than 34,000 tokens, and every token is related with a number of related tokens between 100 and 1,000. 3 The Algorithm of the Artificial Player This section outlines the algorithm used by the artificial player to play the game. The input of the algorithm is a set of five words, and the computed output is a sixth word that is related with each one of the given five words. An enhanced variation of the algorithm is presented in Section 4. The algorithm starts by searching each one of the given five words as a token of a word pair in the set of word pairs obtained using the collected texts. Then, assuming that all the given five words are actually found as tokens of word pairs, the algorithm considers the set of the five tokens T = {ti }5i=1 . (3) 6 A. Coffrini et al. For each token ti , with 1 ≤ i ≤ 5, the set Ri of its related tokens is also consid- ered. All the related tokens of the five tokens are treated as valid candidates for the sixth word, and therefore, the algorithm searches the sixth word in the set R obtained as the union of the five sets Ri , with 1 ≤ i ≤ 5. In other words, the sixth word is searched in 5 [ R= Ri . (4) i=1 Note that if some of the given five words are not found as tokens of the available word pairs, then R is computed as the union of the sets of related tokens of the words that were actually found as tokens of word pairs. Let us denote a generic element of the set R as rj . Assuming that the word pair hti , rj i is found in the set of word pairs, the occurrence oi,j of the pair is immediately available from the collected texts as discussed in the previous section. On the contrary, if the pair hti , rj i is not found in the set of word pairs, the occurrence of the pair is conventionally set to zero. The conventional extension of the occurrence of a word pair is used to define the frequency fj of the generic related token rj as 5 X fj = oi,j , (5) i=1 which is the sum of the occurrences {oi,j }5i=1 of all word pairs that include the considered related token. Note that, according to the definition of frequency, the higher are the occur- rences of word pairs, the higher is the frequency. This means that if the word pair hti , rj i for a generic related token rj is found frequently in the cleaned texts, then the value of the frequency fj is expected to be high. Since the sixth word is expected to appear often as a related token of the given five words, then it can be concluded that the sixth word is also expected to have a high frequency. Besides the frequency, each word in the set of related tokens R is also asso- ciated with a second metrics. The match of a generic related token rj , denoted as mj , is defined as the number of tokens for which rj is a related token in the available word pairs. In other words, the match of a generic related token rj is equal to the number of word pairs hti , rj i in the set of word pairs that have different ti . Possible values for the match of a generic related token rj are integer numbers from 1 to 5. In particular, if rj is related to only one of the given five words, then mj is equal to 1. On the contrary, if rj is related to all the given five words, then mj is equal to 5. The values of frequency and match are evaluated for each related token rj in R, and they are used altogether to find the best candidate for the sixth word. First, the set of candidates for the sixth word is restricted to the related tokens with the largest match. This guarantees that the sixth word is related to as many words as possible. Then, the sixth word is chosen in the restricted set as the one with the highest frequency. If two ore more related tokens share the same frequency, then the sixth word is randomly chosen among them. Experiments on an Improved Artificial Player for a Word Association Game 7 In order to clarify the ideas behind the algorithm used by the artificial player, and to exemplify the computation of the values of frequency and match, let us consider the five tokens in T = {ti }5i=1 and the following simple set of word pairs that include nine related tokens ht1 , r1 i, ht1 , r2 i, ht1 , r4 i ht2 , r1 i, ht2 , r3 i, ht2 , r4 i, ht2 , r7 i ht3 , r1 i, ht3 , r3 i, ht3 , r4 i (6) ht4 , r1 i, ht4 , r4 i, ht4 , r5 i, ht4 , r8 i, ht4 , r9 i ht5 , r1 i, ht5 , r4 i, ht5 , r6 i For each pair, consider the following values of their occurrences o1,1 = 3, o1,2 = 4, o1,4 = 4 o2,1 = 2, o2,3 = 3, o2,4 = 2 o2,7 = 8 o3,1 = 7, o3,3 = 5, o3,4 = 2 (7) o4,1 = 4, o4,4 = 2, o4,5 = 7, o4,8 = 6, o4,9 = 6 o5,1 = 10, o5,4 = 1, o5,6 = 9 Let us now consider some of the nine related tokens, and let us evaluate the values of their frequencies and matches. Consider, for example, the related token r1 . Since r1 is a related token of all the five tokens, its match is m1 = 5. The frequency f1 of the related token r1 is 5 X f1 = oi,1 = 26. (8) i=1 Let us now consider the related token r3 , which is related only to t2 and t3 . In this case, the match is m3 = 2, and the frequency is 5 X f3 = oi,3 = 8. (9) i=1 Note that the value of f3 is obtained recalling that o1,3 , o4,3 , and o5,3 are con- ventionally set to 0 because r3 is not related to any of the tokens t1 , t4 , and t5 . Finally, let us consider the related token r6 . Since r6 is a related token only for the token t5 , its match is m6 = 1 and its frequency is simply f6 = o5,6 = 9. Among the nine related tokens considered in the example, r1 and r4 are those with the largest match. As a matter of fact, r1 and r4 are related tokens of each one of the five tokens {ti }5i=1 , so that m1 = m4 = 5. Since the frequency of r1 is f1 = 26, and the frequency of r4 is f4 = 11, the sixth word proposed by the artificial player for this example is the related token r1 . As discussed in [6], in order to test the validity of the proposed algorithm, 100 random instances of the game were taken from the instances that actually aired on television. The artificial player was able to find the correct sixth word for 24 of the considered game instances. Even if this success rate may seem low, it is worth noting that human players often fail in finding the correct sixth word, and the 8 A. Coffrini et al. expected success rate of human players is low. Regarding other artificial players, to the best of our knowledge, only two other players have been proposed to play the considered game, namely Il Mago della Ghigliottina [17] and GUiLlotine gLovE replayer (GUL.LE.VER.) [9]. The success rate of the first player is 68.6%, and the success rate of the second player is 26%. Hence, the success rate of the first player is significantly higher than the success rate obtained using the algorithm outlined in this section, while the success rate of the second player is comparable with the success rate obtained using the proposed algorithm. In order to improve the performance of the proposed algorithm, further refinements are discussed in the next section, and improved results are presented in Section 5. 4 The Improved Algorithm Various tests on the algorithm described in the previous section were performed to possibly identify relevant improvements. During such tests, it was noticed that word pairs associated with high occurrences can have a negative impact on the performance of the artificial player. In order to better understand the role of these word pairs, let us consider the following instance of the game: – Punto (Italian for point) – Saggio (Italian for essay) – Arte (Italian for art) – Occhio (Italian for eye) – Giudizio (Italian for judgment) The correct sixth word of this instance of the game is critico (Italian for criti- cal ). As a matter of fact, one can say in Italian: punto critico (Italian for critical point); saggio critico (Italian for critical essay); critico d’arte (Italian for art critic); occhio critico (Italian for critical look ); and giudizio critico (Italian for critical assessment). However, the sixth word proposed by the artificial player using the algorithm described in the previous section is riferimento (Italian for reference). Note that, in Italian, the first word of the previous list is commonly used together with the proposed sixth word. As a matter of fact, punto di rifer- imento (Italian for point of reference) is a common phrase in Italian. In order to understand the reason why the artificial player fails to find the correct sixth word, let us denote the given five words as {ti }5i=1 , and let us denote the word critico as r1 and the word riferimento as r2 . The occurrences for the word pairs that include the related token r1 (namely, the word critico) are: o1,1 = 121, o2,1 = 20, o3,1 = 178, o4,1 = 31, o5,1 = 50. (10) Instead, the occurrences for the word pairs that include the related token r2 (namely, the word riferimento) are: o1,2 = 652, o2,2 = 3, o3,2 = 14, o4,2 = 1, o5,2 = 3. (11) Note that all the occurrences {oi,1 }5i=1 of the word pairs that include the related token r1 are greater than 0. Therefore, the match m1 is equal to 5. Similarly, Experiments on an Improved Artificial Player for a Word Association Game 9 all the occurrences {oi,2 }5i=1 of the word pairs that include the related token r2 are greater than 0. Therefore, the match m2 is also equal to 5. Since both words have the same match, let us consider their frequencies. The frequency f1 of the related token r1 is X5 f1 = oi,1 = 400, (12) i=1 while the frequency f2 of the related token r2 is 5 X f2 = oi,2 = 673. (13) i=1 Since the frequency of the related token r2 is higher than the frequency of the related token r1 , the player chooses the wrong sixth word, which is the related token r2 (namely, the word riferimento). Let us analyze in detail the occurrences that contribute to the frequencies to better understand the reasons for the failure. For each one of the given five words, let us compare the values of the occurrences o1,1 = 121 < 652 = o1,2 o2,1 = 20 > 3 = o2,2 o3,1 = 178 > 14 = o3,2 (14) o4,1 = 31 > 1 = o4,2 o5,1 = 50 > 3 = o5,2 Note that token t1 is more often paired with the related token r2 than with the related token r1 . As a matter of fact, the occurrence of the pair ht1 , r2 i is o1,2 = 652, while the occurrence of the pair ht1 , r1 i is o1,1 = 121. On the contrary, the remaining four tokens are more often paired with related token r1 than with related token r2 . Moreover, the word pairs that include these four tokens and the related token r2 have very low occurrences, which suggests that these four tokens are rarely used together with the related token r2 . At the opposite, the word pairs that include the same four tokens and the related token r1 have occurrences greater than 20, which suggests that these four tokens are used quite often together with the related token r1 . These considerations hint that the correct sixth word of the considered in- stance of the game is the related token r1 , as it is indeed the case, since it is used often with all the given five words. However, the only high value of the occurrences of the word pairs that include the related token r2 (namely, o1,2 ) causes the frequency f2 to be greater than the frequency f1 , which causes the player to choose the wrong sixth word. In order to overcome the problems that caused the player to fail in this game instance, a threshold on the occurrences of word pairs is introduced to lower the impact of high frequencies. The occurrences that exceed the threshold are set equal to the threshold, so that frequencies are kept within a known range. 10 A. Coffrini et al. The threshold was set empirically in the current implementation of the ar- tificial player. The values of the threshold between 10 and 30 were considered and, after an extensive experimental campaign, the threshold was set to 13. As a matter of fact, this value corresponds to the maximum success rate of the artificial player for the considered game instances. In order to better understand how the threshold is used, let us reconsider the example discussed earlier in this section. Let us first reconsider the occurrences of the word pairs that include the related token r1 after the introduction of the threshold. Since all the occurrences for the related token r1 shown in (10) are greater than the threshold, they are all set equal to the threshold o1,1 = o2,1 = o3,1 = o4,1 = o5,1 = 13. (15) Let us then reconsider the occurrences for the word pairs that contain the related token r2 after the introduction of the threshold. From (11) it can be observed that only o1,2 and o3,2 are greater than the threshold and, therefore, they are set equal to the threshold. The occurrences for the word pairs that include the related token r2 after the introduction of the threshold are o1,2 = o3,2 = 13, o2,2 = 3, o4,2 = 1, o5,2 = 3. (16) These changes on the occurrences do not influence the values of the match of the two related tokens r1 and r2 , and they both remain equal to 5. Instead, the new occurrences have an impact on the frequencies f1 and f2 . The frequency f1 of the related token r1 evaluated after the introduction of the threshold is 5 X f1 = oi,1 = 65. (17) i=1 The frequency f2 of the related token r2 evaluated after the introduction of the threshold is X5 f2 = oi,2 = 33. (18) i=1 Observe that, according to these new frequencies, the player proposes the correct sixth word, namely the related token r1 . As a matter of fact, the new frequency of the related token r1 is now higher than the frequency of the related token r2 . 5 Experimental Results The current version of the artificial player uses the modified algorithm that em- ploys a threshold on occurrences to limit the problems caused by related tokens with high frequencies, as discussed in the previous section. It is expected that the adoption of the modified algorithm can improve the success rate of the player with respect to its initial performance because a preliminary informal analysis of the game instances in which the player failed suggests that wrong sixth words were often caused by the presence of related tokens with high frequencies. Experiments on an Improved Artificial Player for a Word Association Game 11 The following is an example of an instance of the game in which the current version of the player found the correct sixth word: – Originale (Italian for original ) – Mattino (Italian for morning) – Segretaria (Italian for secretary) – Curare (Italian for treat) – Straordinaria (Italian for extraordinary) The player was able to correctly identify the correct sixth word, which is edizione (Italian for edition). As a matter of fact, one can say in Italian: edizione originale (Italian for original edition); edizione del mattino (Italian for morning edition); segretaria di edizione (Italian for script girl ); curare un’edizione (Italian for edit a publication); and edizione straordinaria (Italian for special edition). In some cases, the player cannot find the correct sixth word. Indeed, it is worth noting that, in such cases, it returns a word that is still related to the given five words. For example, consider the following instance of the game: – Volo (Italian for flight) – Dare (Italian for give) – Mezzi (Italian for means) – Ente (Italian for society) – Intervento (Italian for intervention) The correct sixth word is assistenza (Italian for assistance). As a matter of fact, one can say in Italian: assistenza di volo (Italian for flight assistance); dare assistenza (Italian for give assistance); mezzi di assistenza (Italian for means of assistance); ente di assistenza (Italian for rescue society); and intervento di assistenza (Italian for assistance intervention). In this example, the player does not return the correct sixth word, and it returns the word controllo (Italian for control ). This word is not the correct sixth word, but it is strictly related to all the given five words. As a matter of fact, one can say in Italian: controllo del volo (Italian for flight control ); dare il controllo (Italian for give the control ); mezzi di controllo (Italian for means of control ); ente di controllo (Italian for control unity); and intervento di controllo (Italian for control intervention). Note that the identified relationships that link the the word controllo with each one of the given five words are all correct, and they are commonly used in Italian. The current version of the player was tested using the same 100 instances of the game that were considered in [6] to compare the performance of the original algorithm, as described in Section 3, with the performance of the modified algo- rithm that uses the threshold. Using the modified version of the algorithm, the new success rate of the player is 47%, which ensures that for nearly half of the considered instances of the game the correct sixth word is proposed. Therefore, it can be concluded that the use of the threshold for the occurrences leads to a significant increase of the success rate, which is (almost) doubled with respect to the success rate of the previous version of the player. 12 A. Coffrini et al. In addition, note that the sixth word proposed by the current version of the artificial player is strongly related with at least four (and sometimes five) of the given five words in 28 of the 100 instances of the game, even if the sixth word is not actually correct. Therefore, in 75 of the 100 instances of the game, the player returns a sixth word that is correct (47 cases) or that is strongly related with the given five words (28 cases). This result is encouraging and further improvements are planned to increase the success rate of the player. Finally, it is worth noting that the adoption of the modified algorithm ensures that the current version of the artificial player can outperform GUL.LE.VER., which exhibits a success rate equal to 26%. On the contrary, the adoption of the modified algorithm is not sufficient to obtain a success rate better than the success rate of Il Mago della Ghigliottina, which equals 68.6%. Finally, note that the mentioned success rates were not obtained using a common set of game instances, and therefore their relevance to compare the players is limited. 6 Conclusion This paper discussed the design of an artificial player for a specific word associa- tion game. The first step in the construction of the artificial player involved the collection of sufficient texts to acquire needed linguistic knowledge. The collected texts were processed to extract word pairs, their occurrences, and two other met- rics called frequency and match. The collected word pairs and their metrics form the knowledge base used by the player. Note that a suitable threshold on the values of the occurrences was defined to improve the success rate of the player. The player was tested on 100 instances of the game, and its success rate was 47%. Future developments of the player include the extension of the collected texts to include new word pairs. In addition, the use of additional metrics is planned to increase the success rate of the player. References 1. Abacha, A.B., Zweigenbaum, P.: MEANS: A medical question-answering system combining NLP techniques and Semantic Web technologies. Information Process- ing & Management 51(5), 570–594 (2015) 2. Ahmed, K., Tazi, N., Hossny, A.H.: Sentiment analysis over social networks: An overview. In: 2015 IEEE International Conference on Systems, Man, and Cyber- netics. pp. 2174–2179 (2015) 3. Basile, P., Lovetere, M., Monti, J., Pascucci, A., Sangati, F., Siciliani, L.: Ghigliottin-AI @ EVALITA 2020: Evaluating artificial players for the language game “La Ghigliottina”. In: Basile, V., Croce, D., Maro, M.D., Passaro, L.C. (eds.) Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020). CEUR Workshop Proceedings, vol. 2765. RWTH Aachen (2020) 4. Belinkov, Y., Glass, J.: Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics 7, 49–72 (2019) Experiments on an Improved Artificial Player for a Word Association Game 13 5. Chater, N., Manning, C.D.: Probabilistic models of language processing and ac- quisition. Trends in Cognitive Sciences 10(7), 335–344 (2006) 6. Coffrini, A., Monica, S., Bergenti, F.: On the design of an artificial player for a popular word game. In: Italian Conference on Computational Logic (CILC 2021). CEUR Workshop Proceedings, vol. 3002, pp. 122–132. RWTH Aachen (2021) 7. Dahl, V.: Natural language processing and logic programming. The Journal of Logic Programming 19-20, 681–714 (1994) 8. Dorr, B.J., Jordan, P.W., Benoit, J.W.: A survey of current paradigms in machine translation. Advances in Computers 49, 1–68 (1999) 9. de Francesco, N.: GUL.LE.VER @ GhigliottinAI: A glove based artificial player to solve the language game “La Ghigliottina”. In: Basile, V., Croce, D., Maro, M.D., Passaro, L.C. (eds.) Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020). CEUR Workshop Proceedings, vol. 2765. RWTH Aachen (2020) 10. Këpuska, V., Bohouta, G.: Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home). In: IEEE 8th Annual Com- puting and Communication Workshop and Conference (CCWC). pp. 99–103 (2018) 11. Kowsari, K., Jafari, M., Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., D. Brown, D.: Text classification algorithms: A survey. Information 10(4) (2019) 12. Lewis, D.D., Jones, K.: Natural language processing for information retrieval. Com- mununications of the ACM 39(1), 92–101 (1996) 13. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Process- ing. MIT Press (1999) 14. Mooney, R.J.: Inductive logic programming for natural language processing. In: Muggleton, S. (ed.) Inductive Logic Programming. pp. 1–22. Springer (1997) 15. Riguzzi, F., Bellodi, E., Lamma, E., Zese, R., Cota, G.: Probabilistic logic program- ming on the Web. Software Practice and Experience 46(10), 1381–1396 (2016) 16. Riguzzi, F., Lamma, E., Alberti, M., Bellodi, E., Zese, R., Cota, G.: Probabilistic logic programming for natural language processing. In: Chesani, F., Mello, P., Milano, M. (eds.) AI*IA Workshop on Deep Understanding and Reasoning: A Challenge for Next-generation Intelligent Agents. CEUR Workshop Proceedings, vol. 1802, pp. 30–37. RWTH Aachen (2016) 17. Sangati, F., Pascucci, A., Monti, J.: “Il Mago della Ghigliottina” @ GhigliottinAI: When linguistics meets artificial intelligence. In: Basile, V., Croce, D., Maro, M.D., Passaro, L.C. (eds.) Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020). CEUR Workshop Proceedings, vol. 2765. RWTH Aachen (2020) 18. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine 13(3), 55–75 (2018)