Can Readability Enhance Recommendations on Community Question Answering Sites? Oghenemaro Anuyah, Ion Madrazo Azpiazu, David McNeill, Maria Soledad Pera People and Information Research Team Department of Computer Science, Boise State University Boise, Idaho, USA {oghenemaroanuyah,ionmadrazo,davidmcneill,solepera}@boisestate.edu ABSTRACT answer provided by a recommender can highly condition the value We present an initial examination on the impact text complexity the user gives to the answer. has when incorporated into the recommendation process in com- In this paper, we present an initial analysis that explores the munity question answering sites. We use Read2Vec, a readability influence of incorporating reading level information into the CQA assessment tool designed to measure the readability level of short recommendation process. With this objective in mind, we consider documents, to inform a traditional content-based recommenda- the answer recommendation task, where a user generates a query tion strategy. The results highlight the benefits of incorporating that needs to be matched with an existing question and its corre- readability information in this process. sponding answer. We address this task by ranking question-answer pairs and selecting the top-ranked pair to recommend to the user. CCS CONCEPTS For doing so, we build upon a basic content-based recommendation strategy which we enhance using readability estimations. Using a • Information systems → Social recommendation; Question an- recent Yahoo! Question-Answering dataset, we measure the per- swering; formance of the basic recommender and the one informed by text complexity and demonstrate that readability has indeed an impact KEYWORDS on user satisfaction. Community question answering, Readability, Recommender 2 READABILITY-BASED RECOMMENDATION 1 INTRODUCTION We describe below the strategy we use for conducting our analysis. Community question answering (CQA) sites allow users to sub- Given a query q, generated by a user U , we locate each candidate mit questions on various domains so that they can be answered answer Ca —along with the question Q a associated with Ca —that by the community. Sites like Yahoo! Answers, StackExchange, or potentially addresses the needs of U expressed in q. Thereafter, the StackOverflow, are becoming increasingly popular, with thousands highest-ranked Ca -Q a pair is recommended to U . of new questions posted daily. One of the main concerns of such sites, however, is the amount of time a user has to wait before 2.1 Examining Content his question is answered. For this reason, CQA sites depend upon To perform content matching, we use an existing WordNet-based knowledge already available and refer users to older answers, i.e., semantic similarity algorithm described by Li et al. in [4]. We use answers provided for previously-posted questions and archived on this strategy for computing the degree of similarity between q and the site, so that users can get a more immediate response to their in- C a , denoted Sim(q, C a ), and also the similarity between q and Q a , quiries. This recommendation process has been extensively studied denoted as Sim(q, Q a ). We depend upon these similarity scores for by researchers using a wide range of content similarity measures ensuring that the recommended Ca -Q a pair matches U ’s intent that go from the basic bag-of-words model to semantically related expressed in q. We use a semantic strategy, as opposed to the well models, such as ranksLDA [6]. known bag-of-words, to better capture sentence resemblance when We argue that the recommendation process within CQA sites sentences include similar, yet not exact-matching words, e.g. ice need to go beyond content matching and answer-feature analysis cream and frozen yogurt. and consider that not every user has similar capabilities, in terms of both reading skills and domain expertise. User’s reading skills can 2.2 Estimating Text Complexity be measured by readability, which refers to the ease with which To estimate the reading level of Ca and U (the latter inferred indi- a reader can comprehend a given text. This information has been rectly through q), we first considered traditional readability formu- applied in the past with great success for informing tasks such as las, such as Flesch Kincaid [2]. However, we observed that these K-12 book recommendation [5], Twitter hashtag recommendation formulas were better suited for scoring long texts. Consequently, [1] and review rating prediction [3]. Yet, it has not made its way we instead use Read2Vec, which is a deep neural network-based to CQA recommendations, where we hypothesize it can have a readability model tailored to estimate complexity of short texts. significant impact, given that whether the user understands the The deep neural network is composed of two fully connected layers and a recurrent layer. Read2Vec was trained using documents from RecSys 2017 Poster Proceedings, August 27–31, Como, Italy. © 2017 Copyright held by the owner/author(s). Wikipedia and Simple Wikipedia, and obtained a statistically sig- nificant improvement (72% for Flesch vs. 81% for Read2Vec) when RecSys 2017 Poster Proceedings, August 27–31, Como, Italy. Anuyah et al. predicting the readability level of short texts, compared to tradi- tional formulas including Flesch, SMOG and Dale-Chall [2]. Given that the answer to be recommended to U should match U ’s reading ability to ensure comprehension, we compute the Euclidean distance between the corresponding estimations, using Equation 1. d(q, Ca ) = R2V (q) − R2V (C a ) (1) where R2V (q) and R2V (C a ) are the readability level of q and Ca , respectively, estimated using Read2Vec. 2.3 Integrating Text Complexity with Content We use a linear regression model1 for combining the scores com- puted for each Ca –Q a pair. This yields a score, Rel(Ca , Q a ), which we use for ranking purposes i.e., the pair with the highest score is the one recommended to U . Rel(Ca , Q a ) = β 0 + β 1 Sim(q, Ca ) + β 2 Sim(q, Q a ) + β 3 d(q, Ca ) (2) Figure 1: Performance assessment based MRR using the Ya- hoo! Answers Query to Questions. where β 0 is the bias weight, and β 1 , β 2 and β 3 are the weights that capture the importance of the data points defined in Sections 2.1 and 2.2. This model was trained using least squares for optimization. 4 CONCLUSIONS AND FUTURE WORK In this study, we analyzed the importance of incorporating read- 3 INITIAL ANALYSIS ability level information into the recommendation process when For analysis purposes, we use the L16 Yahoo! Answers Query to it comes to the community based question answering domain. We Questions dataset [7], which consists of 438 unique queries. Each treat the reading level as a personalization value and compare the query is associated with related question-answer pairs, as well as a readability level on an answer with respect to the reading abili- user rating that reflects query-answer satisfaction on a [1-3] range, ties of a user, inferred through his query. We demonstrated that where 1 indicates “highly satisfied", i.e., the answer addresses the reading level can be an influential factor in terms of deciding the information needs of the corresponding query. This yields 1,571 answer quality and can be used to improve user satisfaction in a instances, 15% of which we use for training purposes, and the recommendation process. remaining 1,326 instances we use for testing. In the future, we plan to conduct a deeper study using other com- In addition to our Similarity+Readability recommendation strat- munity question answering sites such as Quora or StackExchange. egy (presented in Section 2), we consider two baselines: Random, We also plan to analyze queries for additional factors, such as rela- which recommends question-answer pairs for each test query in tive content-area expertise, to better predict a user’s familiarity with an arbitrary manner; and Similarity, which recommends question- content-specific vocabulary used on archived answers to be recom- answer pairs for each test query based on the content similarity mended. We suspect that readability and domain-knowledge exper- between the answer and the query, computed as in Section 2.1. tise will be highly influential when the recommendation occurs on An initial experiment revealed that regardless of the metric, i.e., CQA sites like StackExchange, given the educational orientation of Mean Reciprocal Rank (MRR) and Normalized Discounted Cumula- questions posted on the site. tive Gain (NDCG), the strategies exhibit similar behavior, thus we report our results using MRR. ACKNOWLEDGMENTS As shown in Figure 1, recommendations generated using the This work has been partially funded by NSF Award 1565937. semantic similarity strategy discussed in Section 2.1 yield a higher MRR than the one computed for the random strategy. This is an- REFERENCES ticipated, as Similarity explicitly captures the query-question and [1] I. M. Azpiazu and M. S. Pera. Is readability a valuable signal for hashtag recom- query-answer closeness. More importantly, as depicted in Figure mendations? 2016. [2] R. G. Benjamin. Reconstructing readability: Recent developments and recommen- 1, integrating readability with a content-based approach for sug- dations in the analysis of text difficulty. Educational Psychology Review, 24(1):63–88, gesting question-answer pairs in the CQA domain is effective, in 2012. terms of enhancing the overall recommendation process2 . In fact, [3] B. Fang, Q. Ye, D. Kucukusta, and R. Law. Analysis of the perceived value of online tourism reviews: Influence of readability and reviewer characteristics. Tourism as per its reported MRR, Similarity+Readability positions suitable Management, 52:498–506, 2016. question-answer pairs high in the recommendation list, which is a [4] Y. Li, D. McLean, Z. A. Bandar, J. D. O’shea, and K. Crockett. Sentence similarity non-trivial task, given that for the majority of the test queries (i.e., based on semantic nets and corpus statistics. IEEE TKDE, 18(8):1138–1150, 2006. [5] M. S. Pera and Y.-K. Ng. Automating readers’ advisory to make book recommen- 83 %), there are between 5 and 23 candidate question-answer pairs. dations for k-12 readers. In ACM RecSyS, pages 9–16, 2014. [6] J. San Pedro and A. Karatzoglou. Question recommendation for collaborative 1 We empirically verified that among well-known learning models, the one based on question answering systems with rankslda. In ACM RecSys, pages 193–200, 2014. linear regression was the best suited to our task. We attribute this to its simplicity, [7] Webscope. L16 yahoo! answers dataset . yahoo! answers, 2016. [Online; accessed which can better generalize over few training instances than most sophisticated models. 17-June-2017 ]. 2 The weights learned by the model: β , β , β , β 3 = {2.26, 0.58, 0.20, 0.12} .  0 1 2