<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Probabilistic Expert Knowledge Elicitation of Feature Relevances in Sparse Linear Regression</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedram Daee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomi Peltola</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marta Soare</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuel Kaski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Helsinki Institute for Information Technology HIIT and Department of Computer Science, Aalto University</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <fpage>64</fpage>
      <lpage>66</lpage>
      <abstract>
        <p>In this extended abstract1, we consider the “small n, large p” prediction problem, where the number of available samples n is much smaller compared to the number of covariates p. This challenging setting is common for multiple applications, such as precision medicine, where obtaining additional samples can be extremely costly or even impossible. Extensive research effort has recently been dedicated to finding principled solutions for accurate prediction. However, a valuable source of additional information, domain experts, has not yet been efficiently exploited. We propose to integrate expert knowledge as an additional source of information in high-dimensional sparse linear regression. We assume that the expert has knowledge on the relevance of the features in the regression and formulate the knowledge elicitation as a sequential probabilistic inference process with the aim of improving predictions. We introduce a strategy that uses Bayesian experimental design [2] to sequentially identify the most informative features on which to query the expert knowledge. By interactively eliciting and incorporating expert knowledge, our approach fits into the interactive learning literature [1, 8]. The ultimate goal is to make the interaction as effortless as possible for the expert. This is achieved by identifying the most informative features on which to query expert feedback and asking about them first.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>We introduce a probabilistic model that subsumes both a sparse regression model
which predicts external targets, and a model for encoding expert knowledge. We
then present a method to query expert knowledge sequentially (one feature at a
time), with the aim of getting fast improvement in the predictive accuracy of the
regression with a small number of queries.</p>
      <p>
        For the regression, a Gaussian observation model with a spike-and-slab
sparsity-inducing prior [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] on the regression coefficients is used: y ∼ N(Xw, σ2 I),
wj ∼ γj N(0, ψ2) + (1 − γj )δ0; γj ∼ Bernoulli(ρ), j = 1, . . . , p, where y ∈ Rn are
1 This extended abstract is adapted from [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
the output values and X ∈ Rn×p the matrix of covariate values. The regression
coefficients are denoted by w1, . . . , wp, and σ2 is the residual variance. The γj
indicate inclusion (γj = 1) or exclusion (γj = 0) of the covariates in the regression
(δ0 is a point mass at zero). The prior expected sparsity is controlled by ρ. The
expert knowledge on the relevance of the features for the regression is encoded
by a feedback model: fj ∼ γj Bernoulli(π) + (1 − γj) Bernoulli(1 − π), where
fj = 1 indicates that feature j is relevant and fj = 0 not-relevant, and π is
the probability that the expert feedback is correct relative to the state of the
covariate inclusion indicator γj.
      </p>
      <p>
        As the number of covariates p can be large, we assume that it is infeasible,
or at least unnecessarily burdensome, to ask the expert about each feature.
Instead, we aim to ask first about the features that are estimated to be the most
informative given the (small) training data, and frame this problem as a Bayesian
experimental design task [
        <xref ref-type="bibr" rid="ref2 ref9">2, 9</xref>
        ]. We prioritize features based on their expected
information gain for the predictive distribution of the regression. As the expert
is queried for the feedbacks sequentially, the posterior distribution of the model
and the prioritization are recomputed after each feedback in order to use the
latest knowledge. At iteration t for feature j, the expected information gain is
Ep(f˜j|Dt)
"
      </p>
      <p>#
X KL[p(y˜|Dt, xi, f˜j) k p(y˜|Dt, xi)] ,</p>
      <p>
        i
where Dt = {(yi, xi) : i = 1, . . . , n} ∪ {fj1 , . . . , fjt−1 } denotes the training data
together with the feedback that has been given at previous iterations and p(f˜j|Dt)
is the posterior predictive distribution of the feedback for the jth feature. The
summation over i goes over the training dataset. This query scheme goes beyond
pure prior elicitation [
        <xref ref-type="bibr" rid="ref4 ref6 ref7">4, 6, 7</xref>
        ] as the training data is used to facilitate an efficient
expert knowledge elicitation. This is a crucial aspect that enables the elicitation
in high-dimensional regression.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Discussion</title>
      <p>
        The proposed method was tested in several s“mall n,large p” scenarios on synthetic
and real data with simulated and real users [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The results confirm that improved
prediction accuracy is already possible with a small number of user interactions,
for the task of predicting product ratings based on the relevance of some of the
words used in textual reviews. Our method can naturally be used on many other
applications where expert feedback is needed, its main advantage being that it
efficiently reduces the burden on the expert by asking first the most informative
queries. However, the amount of improvement in different applications depends
on the type of feedback requested, and on willingness and confidence of experts
to provide the feedback. In addition, appropriate interface and visualization
techniques are also required for a complete and effective interactive elicitation.
These considerations are left for future work.
Acknowledgements This work was financially supported by the Academy of
Finland (Finnish Center of Excellence in Computational Inference Research
COIN; grants 295503, 294238, 292334, and 284642), Re:Know funded by TEKES,
and MindSee (FP7–ICT; Grant Agreement no 611570).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Amershi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Designing for Effective End-User Interaction with Machine Learning</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Washington (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chaloner</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verdinelli</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Bayesian experimental design: A review</article-title>
          .
          <source>Statistical Science</source>
          <volume>10</volume>
          (
          <issue>3</issue>
          ),
          <fpage>273</fpage>
          -
          <lpage>304</lpage>
          (08
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Daee</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peltola</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soare</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Knowledge elicitation via sequential probabilistic inference for high-dimensional prediction</article-title>
          .
          <source>Machine Learning (Jul</source>
          <year>2017</year>
          ), https://doi.org/10.1007/s10994-017-5651-7
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Garthwaite</surname>
            ,
            <given-names>P.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dickey</surname>
            ,
            <given-names>J.M.:</given-names>
          </string-name>
          <article-title>Quantifying expert opinion in linear regression problems</article-title>
          .
          <source>Journal of the Royal Statistical Society</source>
          . Series B (Methodological) pp.
          <fpage>462</fpage>
          -
          <lpage>474</lpage>
          (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>George</surname>
            ,
            <given-names>E.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCulloch</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          :
          <article-title>Variable selection via Gibbs sampling</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          <volume>88</volume>
          (
          <issue>423</issue>
          ),
          <fpage>881</fpage>
          -
          <lpage>889</lpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kadane</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dickey</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winkler</surname>
            ,
            <given-names>R.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>W.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          :
          <article-title>Interactive elicitation of opinion for a normal linear model</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          <volume>75</volume>
          (
          <issue>372</issue>
          ),
          <fpage>845</fpage>
          -
          <lpage>854</lpage>
          (
          <year>1980</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>O</given-names>
            <surname>'Hagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Buck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.E.</given-names>
            ,
            <surname>Daneshkhah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Eiser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.R.</given-names>
            ,
            <surname>Garthwaite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.H.</given-names>
            ,
            <surname>Jenkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.J.</given-names>
            ,
            <surname>Oakley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.E.</given-names>
            ,
            <surname>Rakow</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Uncertain Judgements</article-title>
          . Eliciting Experts' Probabilisties. Wiley, Chichester, England (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Porter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Theiler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hush</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>Interactive machine learning in data exploitation</article-title>
          .
          <source>Computing in Science &amp; Engineering</source>
          <volume>15</volume>
          (
          <issue>5</issue>
          ),
          <fpage>12</fpage>
          -
          <lpage>20</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Seeger</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          :
          <article-title>Bayesian inference and optimal design for the sparse linear model</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          ,
          <fpage>759</fpage>
          -
          <lpage>813</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>