Can Readability Enhance Recommendations on Community
                      Question Answering Sites?
                    Oghenemaro Anuyah, Ion Madrazo Azpiazu, David McNeill, Maria Soledad Pera
                                                   People and Information Research Team
                                            Department of Computer Science, Boise State University
                                                             Boise, Idaho, USA
                                     {oghenemaroanuyah,ionmadrazo,davidmcneill,solepera}@boisestate.edu
ABSTRACT                                                                 answer provided by a recommender can highly condition the value
We present an initial examination on the impact text complexity          the user gives to the answer.
has when incorporated into the recommendation process in com-               In this paper, we present an initial analysis that explores the
munity question answering sites. We use Read2Vec, a readability          influence of incorporating reading level information into the CQA
assessment tool designed to measure the readability level of short       recommendation process. With this objective in mind, we consider
documents, to inform a traditional content-based recommenda-             the answer recommendation task, where a user generates a query
tion strategy. The results highlight the benefits of incorporating       that needs to be matched with an existing question and its corre-
readability information in this process.                                 sponding answer. We address this task by ranking question-answer
                                                                         pairs and selecting the top-ranked pair to recommend to the user.
CCS CONCEPTS                                                             For doing so, we build upon a basic content-based recommendation
                                                                         strategy which we enhance using readability estimations. Using a
• Information systems → Social recommendation; Question an-
                                                                         recent Yahoo! Question-Answering dataset, we measure the per-
swering;
                                                                         formance of the basic recommender and the one informed by text
                                                                         complexity and demonstrate that readability has indeed an impact
KEYWORDS                                                                 on user satisfaction.
Community question answering, Readability, Recommender
                                                                         2     READABILITY-BASED RECOMMENDATION
1     INTRODUCTION                                                       We describe below the strategy we use for conducting our analysis.
Community question answering (CQA) sites allow users to sub-             Given a query q, generated by a user U , we locate each candidate
mit questions on various domains so that they can be answered            answer Ca —along with the question Q a associated with Ca —that
by the community. Sites like Yahoo! Answers, StackExchange, or           potentially addresses the needs of U expressed in q. Thereafter, the
StackOverflow, are becoming increasingly popular, with thousands         highest-ranked Ca -Q a pair is recommended to U .
of new questions posted daily. One of the main concerns of such
sites, however, is the amount of time a user has to wait before          2.1    Examining Content
his question is answered. For this reason, CQA sites depend upon         To perform content matching, we use an existing WordNet-based
knowledge already available and refer users to older answers, i.e.,      semantic similarity algorithm described by Li et al. in [4]. We use
answers provided for previously-posted questions and archived on         this strategy for computing the degree of similarity between q and
the site, so that users can get a more immediate response to their in-   C a , denoted Sim(q, C a ), and also the similarity between q and Q a ,
quiries. This recommendation process has been extensively studied        denoted as Sim(q, Q a ). We depend upon these similarity scores for
by researchers using a wide range of content similarity measures         ensuring that the recommended Ca -Q a pair matches U ’s intent
that go from the basic bag-of-words model to semantically related        expressed in q. We use a semantic strategy, as opposed to the well
models, such as ranksLDA [6].                                            known bag-of-words, to better capture sentence resemblance when
   We argue that the recommendation process within CQA sites             sentences include similar, yet not exact-matching words, e.g. ice
need to go beyond content matching and answer-feature analysis           cream and frozen yogurt.
and consider that not every user has similar capabilities, in terms of
both reading skills and domain expertise. User’s reading skills can      2.2    Estimating Text Complexity
be measured by readability, which refers to the ease with which          To estimate the reading level of Ca and U (the latter inferred indi-
a reader can comprehend a given text. This information has been          rectly through q), we first considered traditional readability formu-
applied in the past with great success for informing tasks such as       las, such as Flesch Kincaid [2]. However, we observed that these
K-12 book recommendation [5], Twitter hashtag recommendation             formulas were better suited for scoring long texts. Consequently,
[1] and review rating prediction [3]. Yet, it has not made its way       we instead use Read2Vec, which is a deep neural network-based
to CQA recommendations, where we hypothesize it can have a               readability model tailored to estimate complexity of short texts.
significant impact, given that whether the user understands the          The deep neural network is composed of two fully connected layers
                                                                         and a recurrent layer. Read2Vec was trained using documents from
RecSys 2017 Poster Proceedings, August 27–31, Como, Italy.
© 2017 Copyright held by the owner/author(s).                            Wikipedia and Simple Wikipedia, and obtained a statistically sig-
                                                                         nificant improvement (72% for Flesch vs. 81% for Read2Vec) when
RecSys 2017 Poster Proceedings, August 27–31, Como, Italy.                                                                                                        Anuyah et al.


predicting the readability level of short texts, compared to tradi-
tional formulas including Flesch, SMOG and Dale-Chall [2].
   Given that the answer to be recommended to U should match U ’s
reading ability to ensure comprehension, we compute the Euclidean
distance between the corresponding estimations, using Equation 1.

                        d(q, Ca ) = R2V (q) − R2V (C a )                           (1)
where R2V (q) and R2V (C a ) are the readability level of q and Ca ,
respectively, estimated using Read2Vec.

2.3      Integrating Text Complexity with Content
We use a linear regression model1 for combining the scores com-
puted for each Ca –Q a pair. This yields a score, Rel(Ca , Q a ), which
we use for ranking purposes i.e., the pair with the highest score is
the one recommended to U .

 Rel(Ca , Q a ) = β 0 + β 1 Sim(q, Ca ) + β 2 Sim(q, Q a ) + β 3 d(q, Ca ) (2)            Figure 1: Performance assessment based MRR using the Ya-
                                                                                          hoo! Answers Query to Questions.
where β 0 is the bias weight, and β 1 , β 2 and β 3 are the weights that
capture the importance of the data points defined in Sections 2.1 and
2.2. This model was trained using least squares for optimization.                         4     CONCLUSIONS AND FUTURE WORK
                                                                                          In this study, we analyzed the importance of incorporating read-
3     INITIAL ANALYSIS                                                                    ability level information into the recommendation process when
For analysis purposes, we use the L16 Yahoo! Answers Query to                             it comes to the community based question answering domain. We
Questions dataset [7], which consists of 438 unique queries. Each                         treat the reading level as a personalization value and compare the
query is associated with related question-answer pairs, as well as a                      readability level on an answer with respect to the reading abili-
user rating that reflects query-answer satisfaction on a [1-3] range,                     ties of a user, inferred through his query. We demonstrated that
where 1 indicates “highly satisfied", i.e., the answer addresses the                      reading level can be an influential factor in terms of deciding the
information needs of the corresponding query. This yields 1,571                           answer quality and can be used to improve user satisfaction in a
instances, 15% of which we use for training purposes, and the                             recommendation process.
remaining 1,326 instances we use for testing.                                                 In the future, we plan to conduct a deeper study using other com-
    In addition to our Similarity+Readability recommendation strat-                       munity question answering sites such as Quora or StackExchange.
egy (presented in Section 2), we consider two baselines: Random,                          We also plan to analyze queries for additional factors, such as rela-
which recommends question-answer pairs for each test query in                             tive content-area expertise, to better predict a user’s familiarity with
an arbitrary manner; and Similarity, which recommends question-                           content-specific vocabulary used on archived answers to be recom-
answer pairs for each test query based on the content similarity                          mended. We suspect that readability and domain-knowledge exper-
between the answer and the query, computed as in Section 2.1.                             tise will be highly influential when the recommendation occurs on
    An initial experiment revealed that regardless of the metric, i.e.,                   CQA sites like StackExchange, given the educational orientation of
Mean Reciprocal Rank (MRR) and Normalized Discounted Cumula-                              questions posted on the site.
tive Gain (NDCG), the strategies exhibit similar behavior, thus we
report our results using MRR.                                                             ACKNOWLEDGMENTS
    As shown in Figure 1, recommendations generated using the                             This work has been partially funded by NSF Award 1565937.
semantic similarity strategy discussed in Section 2.1 yield a higher
MRR than the one computed for the random strategy. This is an-                            REFERENCES
ticipated, as Similarity explicitly captures the query-question and                       [1] I. M. Azpiazu and M. S. Pera. Is readability a valuable signal for hashtag recom-
query-answer closeness. More importantly, as depicted in Figure                               mendations? 2016.
                                                                                          [2] R. G. Benjamin. Reconstructing readability: Recent developments and recommen-
1, integrating readability with a content-based approach for sug-                             dations in the analysis of text difficulty. Educational Psychology Review, 24(1):63–88,
gesting question-answer pairs in the CQA domain is effective, in                              2012.
terms of enhancing the overall recommendation process2 . In fact,                         [3] B. Fang, Q. Ye, D. Kucukusta, and R. Law. Analysis of the perceived value of online
                                                                                              tourism reviews: Influence of readability and reviewer characteristics. Tourism
as per its reported MRR, Similarity+Readability positions suitable                            Management, 52:498–506, 2016.
question-answer pairs high in the recommendation list, which is a                         [4] Y. Li, D. McLean, Z. A. Bandar, J. D. O’shea, and K. Crockett. Sentence similarity
non-trivial task, given that for the majority of the test queries (i.e.,                      based on semantic nets and corpus statistics. IEEE TKDE, 18(8):1138–1150, 2006.
                                                                                          [5] M. S. Pera and Y.-K. Ng. Automating readers’ advisory to make book recommen-
83 %), there are between 5 and 23 candidate question-answer pairs.                            dations for k-12 readers. In ACM RecSyS, pages 9–16, 2014.
                                                                                          [6] J. San Pedro and A. Karatzoglou. Question recommendation for collaborative
1 We empirically verified that among well-known learning models, the one based on
                                                                                              question answering systems with rankslda. In ACM RecSys, pages 193–200, 2014.
linear regression was the best suited to our task. We attribute this to its simplicity,   [7] Webscope. L16 yahoo! answers dataset . yahoo! answers, 2016. [Online; accessed
which can better generalize over few training instances than most sophisticated models.       17-June-2017 ].
2 The weights learned by the model: β , β , β , β
                                                    3 = {2.26, 0.58, 0.20, 0.12} .
                                      
                                         0  1    2