=Paper= {{Paper |id=Vol-1924/ialatecml_paper7 |storemode=property |title=Probabilistic Expert Knowledge Elicitation of Feature Relevances in Sparse Linear Regression |pdfUrl=https://ceur-ws.org/Vol-1924/ialatecml_paper7.pdf |volume=Vol-1924 |authors=Pedram Daee,Tomi Peltola,Marta Soare,Samuel Kaski |dblpUrl=https://dblp.org/rec/conf/pkdd/DaeePSK17 }} ==Probabilistic Expert Knowledge Elicitation of Feature Relevances in Sparse Linear Regression== https://ceur-ws.org/Vol-1924/ialatecml_paper7.pdf
     Probabilistic Expert Knowledge Elicitation of
    Feature Relevances in Sparse Linear Regression

          Pedram Daee∗ , Tomi Peltola∗ Marta Soare∗ , and Samuel Kaski

                Helsinki Institute for Information Technology HIIT and
              Department of Computer Science, Aalto University, Finland,
                            firstname.lastname@aalto.fi
                              ∗
                                Authors contributed equally.



1     Introduction

In this extended abstract1 , we consider the “small n, large p” prediction problem,
where the number of available samples n is much smaller compared to the number
of covariates p. This challenging setting is common for multiple applications,
such as precision medicine, where obtaining additional samples can be extremely
costly or even impossible. Extensive research effort has recently been dedicated
to finding principled solutions for accurate prediction. However, a valuable source
of additional information, domain experts, has not yet been efficiently exploited.
    We propose to integrate expert knowledge as an additional source of infor-
mation in high-dimensional sparse linear regression. We assume that the expert
has knowledge on the relevance of the features in the regression and formulate
the knowledge elicitation as a sequential probabilistic inference process with
the aim of improving predictions. We introduce a strategy that uses Bayesian
experimental design [2] to sequentially identify the most informative features on
which to query the expert knowledge. By interactively eliciting and incorporating
expert knowledge, our approach fits into the interactive learning literature [1, 8].
The ultimate goal is to make the interaction as effortless as possible for the
expert. This is achieved by identifying the most informative features on which to
query expert feedback and asking about them first.


2     Method

We introduce a probabilistic model that subsumes both a sparse regression model
which predicts external targets, and a model for encoding expert knowledge. We
then present a method to query expert knowledge sequentially (one feature at a
time), with the aim of getting fast improvement in the predictive accuracy of the
regression with a small number of queries.
   For the regression, a Gaussian observation model with a spike-and-slab
sparsity-inducing prior [5] on the regression coefficients is used: y ∼ N(Xw, σ 2 I),
wj ∼ γj N(0, ψ 2 ) + (1 − γj )δ0 ; γj ∼ Bernoulli(ρ), j = 1, . . . , p, where y ∈ Rn are
1
    This extended abstract is adapted from [3].


                                          64
Probabilistic Expert Knowledge Elicitation of Feature Relevances

the output values and X ∈ Rn×p the matrix of covariate values. The regression
coefficients are denoted by w1 , . . . , wp , and σ 2 is the residual variance. The γj
indicate inclusion (γj = 1) or exclusion (γj = 0) of the covariates in the regression
(δ0 is a point mass at zero). The prior expected sparsity is controlled by ρ. The
expert knowledge on the relevance of the features for the regression is encoded
by a feedback model: fj ∼ γj Bernoulli(π) + (1 − γj ) Bernoulli(1 − π), where
fj = 1 indicates that feature j is relevant and fj = 0 not-relevant, and π is
the probability that the expert feedback is correct relative to the state of the
covariate inclusion indicator γj .
    As the number of covariates p can be large, we assume that it is infeasible,
or at least unnecessarily burdensome, to ask the expert about each feature.
Instead, we aim to ask first about the features that are estimated to be the most
informative given the (small) training data, and frame this problem as a Bayesian
experimental design task [2, 9]. We prioritize features based on their expected
information gain for the predictive distribution of the regression. As the expert
is queried for the feedbacks sequentially, the posterior distribution of the model
and the prioritization are recomputed after each feedback in order to use the
latest knowledge. At iteration t for feature j, the expected information gain is
                          "                                              #
                            X
                 E ˜p(fj |Dt )  KL[p(ỹ|Dt , xi , f˜j ) k p(ỹ|Dt , xi )] ,
                                 i


where Dt = {(yi , xi ) : i = 1, . . . , n} ∪ {fj1 , . . . , fjt−1 } denotes the training data
together with the feedback that has been given at previous iterations and p(f˜j |Dt )
is the posterior predictive distribution of the feedback for the jth feature. The
summation over i goes over the training dataset. This query scheme goes beyond
pure prior elicitation [4, 6, 7] as the training data is used to facilitate an efficient
expert knowledge elicitation. This is a crucial aspect that enables the elicitation
in high-dimensional regression.


3    Discussion

The proposed method was tested in several “small n,large p” scenarios on synthetic
and real data with simulated and real users [3]. The results confirm that improved
prediction accuracy is already possible with a small number of user interactions,
for the task of predicting product ratings based on the relevance of some of the
words used in textual reviews. Our method can naturally be used on many other
applications where expert feedback is needed, its main advantage being that it
efficiently reduces the burden on the expert by asking first the most informative
queries. However, the amount of improvement in different applications depends
on the type of feedback requested, and on willingness and confidence of experts
to provide the feedback. In addition, appropriate interface and visualization
techniques are also required for a complete and effective interactive elicitation.
These considerations are left for future work.


                                             65
Probabilistic Expert Knowledge Elicitation of Feature Relevances

Acknowledgements This work was financially supported by the Academy of
Finland (Finnish Center of Excellence in Computational Inference Research
COIN; grants 295503, 294238, 292334, and 284642), Re:Know funded by TEKES,
and MindSee (FP7–ICT; Grant Agreement no 611570).


References
1. Amershi, S.: Designing for Effective End-User Interaction with Machine Learning.
   Ph.D. thesis, University of Washington (2012)
2. Chaloner, K., Verdinelli, I.: Bayesian experimental design: A review. Statistical
   Science 10(3), 273–304 (08 1995)
3. Daee, P., Peltola, T., Soare, M., Kaski, S.: Knowledge elicitation via sequential
   probabilistic inference for high-dimensional prediction. Machine Learning (Jul 2017),
   https://doi.org/10.1007/s10994-017-5651-7
4. Garthwaite, P.H., Dickey, J.M.: Quantifying expert opinion in linear regression
   problems. Journal of the Royal Statistical Society. Series B (Methodological) pp.
   462–474 (1988)
5. George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. Journal of the
   American Statistical Association 88(423), 881–889 (1993)
6. Kadane, J.B., Dickey, J.M., Winkler, R.L., Smith, W.S., Peters, S.C.: Interactive
   elicitation of opinion for a normal linear model. Journal of the American Statistical
   Association 75(372), 845–854 (1980)
7. O’Hagan, A., Buck, C.E., Daneshkhah, A., Eiser, J.R., Garthwaite, P.H., Jenk-
   inson, D.J., Oakley, J.E., Rakow, T.: Uncertain Judgements. Eliciting Experts’
   Probabilisties. Wiley, Chichester, England (2006)
8. Porter, R., Theiler, J., Hush, D.: Interactive machine learning in data exploitation.
   Computing in Science & Engineering 15(5), 12–20 (2013)
9. Seeger, M.W.: Bayesian inference and optimal design for the sparse linear model.
   Journal of Machine Learning Research 9, 759–813 (2008)




                                          66