UNED-NLP at eRisk 2022: Analyzing gambling disorders in Social Media using Approximate Nearest Neighbors

UNED-NLP at eRisk 2022: Analyzing gambling disorders in Social Media using Approximate Nearest Neighbors HermenegildoFabregat gildo.fabregat@lsi.uned.es Dpto. Lenguajes y Sistemas Informáticos NLP & IR Group Universidad Nacional de Educación a Distancia (UNED)

Juan del Rosal 16 28040 Madrid Spain

AndresDuque aduque@lsi.uned.es Dpto. Lenguajes y Sistemas Informáticos NLP & IR Group Universidad Nacional de Educación a Distancia (UNED)

Juan del Rosal 16 28040 Madrid Spain

IMIENS: Instituto Mixto de Investigación Escuela Nacional de Sanidad

Monforte de Lemos 5 28019 Madrid Spain

LourdesAraujo Dpto. Lenguajes y Sistemas Informáticos NLP & IR Group Universidad Nacional de Educación a Distancia (UNED)

Juan del Rosal 16 28040 Madrid Spain

IMIENS: Instituto Mixto de Investigación Escuela Nacional de Sanidad

Monforte de Lemos 5 28019 Madrid Spain

JuanMartinez-Romo juaner@lsi.uned.es Dpto. Lenguajes y Sistemas Informáticos NLP & IR Group Universidad Nacional de Educación a Distancia (UNED)

Juan del Rosal 16 28040 Madrid Spain

IMIENS: Instituto Mixto de Investigación Escuela Nacional de Sanidad

Monforte de Lemos 5 28019 Madrid Spain

Evaluation Forum

September 5-8 2022 Bologna Italy

UNED-NLP at eRisk 2022: Analyzing gambling disorders in Social Media using Approximate Nearest Neighbors 1613-0073 6EA1DF7C45B1048BE4FA7D4341546DFE GROBID - A machine learning software for extracting information from scholarly documents Pathological gambling detection Approximate Nearest Neighbors Vector representations Relabeling

This paper describes our proposal for tackling Task 1 (Early Detection of Signs of Pathological Gambling) from the CLEF 2022 eRisk Workshop. The challenge consists in the processing of messages written by Social Media users for the detection of early signs of pathological gambling. Our proposal is based on the calculation of Approximate Nearest Neighbors (ANN) performed on vectorial representations of the given messages. We introduce a relabeling process to modify the granularity of the labeling schema in the training dataset, thus converting it from the original user-based annotation to a message-based one. Our approach achieves the best average performance in the decision-based evaluation, as well as in the ranking-based evaluation. In addition, our system shows to be the fastest one in terms of time needed to process the whole test dataset. This indicates that the proposed relabeling scheme allows us to capture more easily the textual information that leads to a correct detection of pathological gambling.

Introduction

In the Internet era, social media analysis for the early detection of potential health risks is a particularly interesting research area. In this context, the different editions of the eRisk workshop, usually located within the Conference Labs of the Evaluation Forum (CLEF) since 2017, can be found among the efforts carried out by the scientific community. This workshop serves as a meeting point in which both methodologies and practical approaches have been developed for the early detection of different types of health risks, such as eating disorders, self-harm or depression, through the textual analysis of posts and messages of social media users.

In this paper we present a system for tackling Task 1 of the eRisk 2022 Workshop: Early Detection of Signs of Pathological Gambling [1]. The approach first relies on generating vector- [1][2][3][4][5][6][7][8][9][10][11] based representations of users messages through sentence embeddings, for subsequently detect positive messages using methods based on Approximate Nearest Neighbors (ANN) techniques. Although ANNs can be seen as a simple machine learning technique, we show in the paper how an adequate pre-processing of the training dataset based on the reduction of the original label granularity allows us to obtain the best overall results in the competition.

The rest of the paper is structured as follows: an overview of previous work related to the task considered and the techniques used in this work is shown in Section 2. Section 3 is devoted to describe the addressed task, including the available dataset and evaluation metrics, while the developed system is presented in Section 4. The achieved results are shown, compared to other participating systems, and discussed in Section 5. Finally, Section 6 presents the main conclusions and future lines of work.

Related Work

Gambling disorder [2] (GD) is characterized by a persistent and recurrent pattern of gambling that is associated with significant distress or substantial upset. The prevalence of GD has been estimated at 0.5% of the adult population in the United States, with comparable or even higher estimates in other countries.

People with GD are often not treated or even recognized as such. GD often co-occurs with other psychiatric disorders. High rates of mood, anxiety, attention deficit disorders and substance use disorders have been reported [3] in people with GD. It is also often accompanied by a higher rate of unemployment, economic difficulties, divorce, and poorer health. In addition, GD is closely related to other addictive disorders, being the first non-substance addictive behavior to be recognized [4].

Social networks are an excellent source of information where studies can be carried out for the early detection of people with gambling problems. In this line, the eRisk competition considered the problem of pathological gambling for the first time in 2021 [5]. Several systems participated in the shared task with different approaches: RELAI [6], UPV-Symamnto [7], BLUE [8], UNSL [9], and CEDRI [10]. Considering the "test-only" nature of this first version of the task, several of these participating systems [6,7,8,10] used external resources, such as posts from Reddit crawled by themselves, for training their systems. Most of them applied Transformer-based architectures [11], as well as other types of neural networks. The UNSL team obtained the best results using the Early Risk Detection Framework (ERD).

This year we participated for the first time in the competition on gambling disorder. Our system is based on a simple approach that has proven to be very effective. The idea is to carry out a re-labeling of users' messages using a method based on Approximate Nearest Neighbor (ANN) search. The exact nearest neighbor search (NNS) for the point corresponding to a given query is defined as the point corresponding to the shortest distance to the query. A generalization of the nearest neighbor search is the k-nearest neighbor search (k-NNS), which targets the k nearest vectors for the query. Due to the cost associated with dimensionality, many proposals have been developed focusing on the approximate solution of the NNS and k-NNS problem. A recent work [12] has presented a comparison and evaluation of different approaches to the problem. According to this work, state-of-the-art ANN methods can be classified into three types: Hashing-based, Partition-based and Graph-based. Hashing-based methods transform data points to a low-dimensional representation, where each point is represented by a short code (hash code). Partition-based methods can be seen as the division of high-dimensional space into multiple disjoint regions. The partitioning process is usually done recursively, hence these methods often use a tree-or forest-based representation. We have used one of these methods in this work, Annoy [13], a hyperplane partitioning method that recursively divides the space by the hyperplane with random direction. Graph-based methods construct a proximity graph in which each datum corresponds to a node and the edges connecting some nodes define the neighborhood relationship. The main idea of these methods is that a neighbor's neighbor is likely to also be a neighbor. The search can be performed efficiently by iteratively extending neighbors of neighbors in a best-first search strategy. Depending on the structure of the graph, different graph-based methods can be distinguished. In this work we have used a method for Hierarchical Navigable Small World graphs [14].

Task 1: Early Detection of Signs of Pathological Gambling

Task 1 of eRisk 2022 [1] is denoted "Early detection of signs of pathological gambling". This is the second edition of the task, which was first introduced in the CLEF 2021 eRisk Workshop [5]. In this task, participating systems are asked to determine whether an individual can be classified as a pathological gambler (positive users) or a non-pathological gambler (negative users) based on the user's Social Media messages. Systems must sequentially analyze chronological posts for each user for detecting early traces of pathological gambling.

Dataset

The dataset used in the task is composed of a set of XML documents, each of them containing chronologically ordered Social Media posts belonging to a particular user. The training dataset contains a total of 2,348 documents, each of them annotated as "1" (positive) if the user is labeled as a pathological gambler, and "0" (negative) otherwise.

The test dataset is provided through a server to which participants must connect to iteratively receive user writings. The total number of test users is 2,079 (81 pathological gamblers and 1,998 control users), with a maximum number of user writings of 2,001, while the average number of user writings is 495.

Metrics

System evaluation is twofold:

• Decision-based evaluation: This first type of evaluation aims to analyze the performance of the participating systems in terms of standard measures such as Precision, Recall and F-Measure. However, other metrics are also introduced in this evaluation that take into account the delay incurred by a system before it detects a true positive. Two of these metrics, denoted 𝐸𝑅𝐷𝐸 and 𝐸𝑅𝐷𝐸 𝑜 consider the number or the percentage of messages that have to be processed before emitting an alert of positive user. In order to overcome the low interpretability of these latter metrics, a latency-weighted F-Score is also introduced by multiplying the standard F-Measure by a penalty factor based on the median delay of true positive detection. • Ranking-based evaluation: The second type of evaluation is a complementary approach that requires the systems to provide a score indicating the risk of pathological gambling of a user every time a new message is analyzed. Users are then ranked using this score and standard ranking metrics such as 𝑃 @𝑘 or 𝑁 𝐷𝐶𝐺@𝑘 can be applied, with the parameter 𝑘 being the number of analyzed messages before evaluating the ranking.

More information about the complete set of metrics employed in the evaluation can be found in previous overviews of eRisk competitions [15,5].

Proposed Model

Due to the large amount of information available in social networks, an approach based on Approximate Nearest Neighbors (ANN) has been proposed, being its main benefit its efficiency in processing large data collections. The following sections describe the main components of the proposed model and the configurations that have been explored.

Data representation

We use Universal Sentence Encoder [16] to encode each user's messages. Such models are trained and optimized for encoding texts longer than words e.g. sentences, phrases or short paragraphs. The model we use is trained with a deep average network [17] (DAN) using data from different sources in English. Although DAN approaches produce unordered representations of the information by averaging the terms in a given text, these models are able to capture subtle differences between similar texts. In short, for each message encoded by this model, a 512-dimensional vector is generated.

Approximate Nearest Neighbors

Although nearest neighbor retrieval is a conceptually simple procedure, in domains such as social networks, where a large amount of information is available, it is a difficult problem to address. In this domain the use of brute force based search techniques is replaced by the use of non-exact techniques based on the use of more complex structures e.g. graphs and trees. Currently there are different tools and approaches that have proven to be very successful when analyzing recall results and queries per second [18]. Due to their popularity and performance we have explored the use of Annoy1 and Non-Metric Space Library [14] (NMSLIB):

• Annoy: This library uses tree-like structures for the representation of nodes and random projections for the division of the subspace between adjacent nodes. To explore this library, we have used a space generated by the inner-dot product of the 𝐿 2 normalized vectors generated by the Universal Sentence Encoder.

• NMSLIB: Library for approximate K-nearest neighbor search based on navigable smallworld graphs with controllable hierarchy (Hierarchical NSW, HNSW). For the calculation of similarity between instances NMSLIB supports the use of different metrics and data formats. In this sense, we explored a dense 𝐿 2 space.

Tag and scoring function

Once the training set was transformed using Universal Sentence Encoder, and after generating the nearest neighbor index using Annoy or NMSLIB libraries, we propose a labeling and scoring approach based on the classes of the neighbors retrieved for each message in the test set. Given a message 𝑀 from a user 𝑈 we classify 𝑈 𝑀 as positive if the 20 nearest neighbors retrieved correspond to messages from positive users. Following the same idea, we considered as scoring function the distance of 𝑈 𝑀 from the nearest recovered neighbors ( 1− ∑︀ 20 𝑥=1 𝑐𝑜𝑠𝑖𝑛𝑒(𝑈 𝑀 , 𝑀 𝑥 )). This number of 𝑘 = 20 nearest neighbors was set from a previous parameter tuning evaluation in which some different values of 𝑘 were explored.

Relabeling process

The corpus provided by the organizers presents a user-based labeling, i.e., each user is labeled as positive if at least a positive message can be found within his/her posts, and negative otherwise. However, positive/negative annotations for each message in the corpus are not provided. We consider that the correct classification of positive and negative messages is crucial for achieving a good performance in this task. Hence, we propose an approach to re-annotate the training corpus in order to generate a message-level labeling. For this purpose, we first consider all messages of a positive user to be positive, and all messages of a negative user to be negative. Once the k-nearest neighbor query index is generated, we iteratively process each message from each positive user of the training set, and re-annotate its class according to the above-mentioned labeling algorithm. We assume that only positive users may contain negative messages, since if negative users contained positive messages, they would have been labeled as positive. Hence, in each iteration of the algorithm, the number of positive messages is reduced if the algorithm re-labels them as negative. After processing the training set, if modifications have been made, the same method is applied again until convergence is reached, this is, until there are no changes in the training set labels.

Crawling new positive instances

In order to reduce the impact on recall that the relabeling algorithm could have, the following data were collected from gamblers' help associations:

• Testimonial facts: A total of 234 testimonials were collected from websites2 containing information about pathological gamblers and their friends and family. Unlike the Reddit posts, these new data are more carefully structured and contain longer texts.

• Forums: Messages from a forum devoted to help players 3 were automatically collected and those potentially positive messages were selected using the proposed system. Finally, we included in the training set those messages classified as positive by the system. In short, a total of 232 new instances were added.

Analyzing the format of the corpus texts, the instances extracted from the forums present a similar format and structure. No specific pre-processing techniques such as text size limitation or language control have been added, e.g., no text size limitation, no language control.

As shown in Table 1, we submitted 5 different configurations, in which we tried to explore combinations of the previously mentioned different aspects of the proposed approach.

Results and Discussion

The results obtained by our approach are shown and discussed below.

Execution time:

In order to avoid possible errors during the test phase due to power or network failures, we processed the test data on a shared server with two Intel(R) Xeon(R) CPUs E5-2630 v4 @ 2.20GHz and 64 GB of RAM. As can be seen in Table 2, the proposed batch of experiments achieved the best execution times among the systems that processed the whole test set. These results were influenced using non-exhaustive nearest-neighbor recovery algorithms. Although we presented runs using different algorithms, all of them are oriented to the processing of large datasets and include optimizations for this purpose. While Annoy uses tree-like structures for the representation of nodes and random projections for the division of the subspace between adjacent nodes, NMSLIB uses a graph-based structure and the projection of the different nodes onto a skip-list. Both algorithms include customizable parameters to optimize their performance, e.g. number of trees (Annoy) or number of Zero node links (NMSLIB). Although we do not perform an exhaustive study of these parameters, we try to limit their growth. The final configuration for each of the algorithms is as follows:

• Annoy

1-11

-Trees 24 • NMSLIB -index_params {'M': 200, 'efConstruction': 1000, 'post': 2} -method 'hnsw' -efSearch 100 Finally, although they are not included in this comparison, our system also achieved execution time results that were below many systems that processed the test set only partially. Decision-based performance: Table 3 shows the results obtained during the decision-based evaluation. This table shows the set of metrics analyzed by the task organizers: Precision, Recall, 𝐹 1, ERDE 5 , ERDE 50 , latency, speed and latency-weigthed 𝐹 1. In addition to the results of our runs, the best run of each team participating in the competition is shown. As it can be seen in the table, considering the latency-weighted 𝐹 1 metric as the summary metric, our R4 configuration obtained the best results, achieving the highest precision/recall ratio. If we analyze the achieved results in terms of latency, i.e., delay shown by the system expressed as the median number of messages that need to be processed before detecting a positive case, as we used the same inference process in all the runs, no great differences can be found between the different submitted runs. However, if we compare runs R0 and R1, which are differentiated by the application of the relabelling process in R1, we find improvements in precision of around 27% with no excessive penalization of other metrics such as recall. The relabeling process presents a high impact on the corpus since the label of more than 90% of the positive instances is modified after applying it. Considering the amount of discarded information and the improvements obtained through this approach, the analysis of the filtered messages can be of great value to achieve a better understanding of the problem. On the other hand, and seeking to reduce the effect on recall produced by the relabelling process, the inclusion of new data automatically collected was considered in the R2 and R3 runs. The obtained results indicate that our approach to collect and process the new data was not the most efficient one. Finally, R1 and R4 differ by the algorithm for nearest neighbor retrieval used (R1: Annoy, R4: NMSLIB). These algorithms include a parameter space that has not been studied in depth. For this reason, and although the NMSLIB algorithm performs significatively better than Annoy, we consider that a more thorough study on the parameters of the latter technique should be performed before discarding its use. Ranking-based performance: Table 4 shows the results obtained in the ranking-based evaluation. During this evaluation, the performance of the system is measured after processing 1, 100, 500 and 1000 messages. As shown in the Table, the R4 run obtains the best results during this evaluation for all metrics in almost all stages. Comparing the differences between R4 and the best runs presented by BLUE and UNSL, our system outperforms in most aspects except for NDCG@100 when analyzing 1 and 100 writings. This results indicate that the scoring function described in Section 4.3 is an effective heuristic for assessing the risk of pathological gambling after processing each user message.

Table 4

Test results: Results of the ranking-based evaluation for task T1. For the models included in the comparison, the best results are shown in bold.

1 writing 100 writings 500 writings 1000 writings P@10 NDCG@10 NDCG@100 P@10 NDCG@10 NDCG@100 P@10 NDCG@10 NDCG@100 P@10 NDCG@10 NDCG@100 Run 0 0.9 0.88 0.75 0.4 0.29 0.7 0.

Conclusions and Future Work

This article describes our proposed approach for early detection of signs of pathological gambling addressed in Task 1 of eRisk 2022 [1]. The main contributions presented in this work include the use of Approximate Nearest Neighbor algorithms for retrieving subsets of similar messages previously transformed into a vectorial space using sentence embeddings, as well as the development of a relabeling technique successfully applied to the training set. The use of algorithms such as Annoy or NMSLIB for large scale nearest neighbor retrieval has been of great help for the fast processing of the data. As shown in Table 2 and having processed all the messages from the test set, our system obtained the best execution times. On the other hand, as shown in Tables 3 and 4, our model has obtained the best results for the 𝐹 1, ERDE 50 and 𝐹 -latency metrics in the decision-based evaluation, as well as the best overall results in the ranking-based evaluation. Most of these results are due to the application of the iterative re-labeling process of the corpus described in Section 4.4 and based on the use of the system itself. Through this process we have also validated the use of the vector space generated by Universal Sentence Encoder to analyze the similarity between messages of different classes.

The following lines of future work are being currently considered: study of encoders based on more complex approaches such as BERT [19], or trained with in-domain information; deeper exploration of the parameters used for the construction of the ANN index; analysis of the impact of different thresholds within the scoring function in the ranking-based evaluation (e.g. distance of retrieved neighbors); and application of the proposed system to similar tasks.

Finally, we believe that an analysis of the identified positive messages would be of great value. Theoretically, these messages should exhibit easily identifiable features and characteristics that can help in the profiling of this type of pathology.

Table 11Submitted Runs: Description of the configurations explored in the test phase. Universal Sentence Encoder has been used as encoder while Annoy and Non-Metric Space Library (NMSLIB) have been explored as methods for k-nearest neighbor retrieval. On the other hand, we studied a relabeling process of the training set and the consideration of new data collected automatically.ANN Library Relabeling New dataUNED-NLP Run 0AnnoyNoNoUNED-NLP Run 1AnnoyYesNoUNED-NLP Run 2AnnoyNoYesUNED-NLP Run 3AnnoyYesYesUNED-NLP Run 4NMSLIBYesNo

Table 22Test results: Comparison of the execution times required by those systems that processed the whole test set.Team#runs #user writings processed lapse of time (from 1st to last response)UNED-NLP5200117:58:48BLUE320013 days 13:15:25UNSL520011 day 21:53:51

Table 33Test results: Results of the decision-based evaluation for task T1. For the models included in the comparison, the best results are shown in bold.PrecRecF1ERDE5 ERDE50 latency speedlatency-weighted F1UNED-NLP R00.285 0.975 0.4410.0190.0102.00.9960.4405UNED-NLP R10.555 0.938 0.6970.0190.0092.50.9940.693UNED-NLP R20.296 0.988 0.4560.0190.0092.00.9960.454UNED-NLP R30.536 0.926 0.6790.0190.0093.00.9920.673UNED-NLP R40.809 0.938 0.8690.0200.0083.00.9920.862SINAI R20.908 0.728 0.8080.0160.0111.01.0000.808BioInfo_UAVR R10.067 1.000 0.1260.0470.0245.00.9840.124RELAI R20.052 0.963 0.0990.0360.0291.01.0000.099BLUE R00.260 0.975 0.4100.0150.0091.01.0000.410BioNLP_UniBuc R40.046 1.000 0.0890.0320.0311.01.0000.089UNSL R10.461 0.938 0.6180.0410.008110.9610.594NLPGroup-IISERB R3 0.140 1.000 0.2460.0250.0142.00.9960.245stezmo3 R40.160 0.901 0.2710.0430.0117.00.9770.265

https://github.com/spotify/annoy https://gamblershelp.com.au/learn-about-gambling/personal-stories/; http://getgamblingfacts.ca/personalstories/; https://www.gamtalk.org/stories-of-hope/; https://www.gamcare.org.uk/understanding-gamblingproblems/people-weve-helped/ https://www.gamtalk.org/groups/community/

Acknowledgments

This work has been partially supported by the Spanish Ministry of Science and Innovation within the DOTT-HEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32, as well as project RAICES (IMIENS 2022) and the research network AEI RED2018-102312-T (IA-Biomed).

Overview of erisk 2022: Early risk prediction on the internet JParapar PMartín Rodilla DELosada FCrestani Experimental IR Meets Multilinguality, Multimodality, and Interaction. 13th International Conference of the CLEF Association, CLEF 2022

Bologna, Italy

2022 Gambling disorder MNPotenza IMBalodis JDerevensky JEGrant NMPetry AVerdejo-Garcia SWYip Nature reviews Disease primers 5 2019 Pathological gambling MNPotenza TRKosten BJRounsaville Jama 286 2001 A review of gambling disorder and substance use disorders CJRash JWeinstock RVan Patten Substance abuse and rehabilitation 7 3 2016 Overview of erisk at CLEF 2021: Early risk prediction on the internet (extended overview) JParapar PMartín-Rodilla DELosada FCrestani Proceedings of the Working Notes of CLEF 2021 -Conference and Labs of the Evaluation Forum the Working Notes of CLEF 2021 -Conference and Labs of the Evaluation Forum

Bucharest, Romania

2021. 2936. 2021 Early detection of signs of pathological gambling, self-harm and depression through topic extraction and neural networks DMaupomé MDArmstrong FRancourt TSoulas M.-JMeurs Proceedings of the Working Notes of CLEF the Working Notes of CLEF 2021 ABasile MChinea-Rios A.-SUban TMüller LRössler SYenikent BChulví PRosso MFranco-Salvador Upv-symanto at erisk 2021: Mental health author profiling for early risk prediction on the internet 2021 Working Notes of CLEF Early risk detection of pathological gambling, self-harm and depression using bert A.-MBucur ACosma LPDinu Working Notes of CLEF 2021 Unsl at erisk 2021: A comparison of three early alert policies for early risk detection JMLoyola SBurdisso HThompson LCagnina MErrecalde Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum

Bucarest, Romania

2021 Cedri at erisk 2021: A naive approach to early detection of psychological disorders in social media RPLopes CEUR Workshop Proceedings, CEUR Workshop Proceedings 2021 Attention is all you need AVaswani NShazeer NParmar JUszkoreit LJones ANGomez LKaiser IPolosukhin CoRR abs/1706.03762 2017 Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement WLi YZhang YSun WWang MLi WZhang XLin IEEE Transactions on Knowledge and Data Engineering 32 2019 Annoy: Approximate Nearest Neighbors in C++/Python EBernhardsson 2018 Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs YAMalkov DAYashunin CoRR abs/1603.09320 2016 Overview of erisk at CLEF 2020: Early risk prediction on the internet (extended overview) DELosada FCrestani JParapar Working Notes of CLEF 2020 -Conference and Labs of the Evaluation Forum

Thessaloniki, Greece

2020. 2020 2696 Universal sentence encoder DCer YYang SKong NHua NLimtiaco RSJohn NConstant MGuajardo-Cespedes SYuan CTar YSung BStrope RKurzweil CoRR abs/1803.11175 2018 Deep unordered composition rivals syntactic methods for text classification MIyyer VManjunatha JBoyd-Graber HDaumé Iii 10.3115/v1/P15-1162 Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing Long Papers the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing

Beijing, China

2015 1 Association for Computational Linguistics MAumüller EBernhardsson AJFaithfull CoRR abs/1807.05614 Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms 2018 BERT: pre-training of deep bidirectional transformers for language understanding JDevlin MChang KLee KToutanova 10.18653/v1/n19-1423 Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019 JBurstein CDoran TSolorio the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019

Minneapolis, MN, USA

June 2-7, 2019. 2019 1 Association for Computational Linguistics