<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>CEUR Workshop Proceedings</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.18287/1613-0073-2016-1638-902-908</article-id>
      <title-group>
        <article-title>FEATURE SELECTION IN THE EFFECTIVENESS RESEARCH OF A TRAINING PROGRAM FOR PATIENTS WITH THE ATRIAL FIBRILLATION</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>V.V. Kutikova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A.V. Gaidel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A.G. Khramov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Image Processing Systems Institute, Russian Academy of Sciences</institution>
          ,
          <addr-line>Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1638</volume>
      <fpage>902</fpage>
      <lpage>908</lpage>
      <abstract>
        <p>We investigated training therapeutic program effectiveness of the school “Stop a Stroke”, which aimed at reducing a risk of a stroke for patients with the atrial fibrillation. On the basis of two feature selection methods using a criterion of the discriminant analysis to determine the best feature subset, we concluded that patients, who trained at the school, in contrast to patients, who have not received training, take anticoagulants for a long time and have a higher level of knowledge about the atrial fibrillation.</p>
      </abstract>
      <kwd-group>
        <kwd>data mining</kwd>
        <kwd>feature selection</kwd>
        <kwd>discriminant analysis criterion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Reducing the dimensionality of a feature space is one of the central issues in data
mining. For the most classification and recovery regression problems it is necessary to
select the best subset from a given feature set. This is because the use of a large
feature number is not only computational expensive, but also affects the recognition
accuracy, as irrelevant and redundant features, which complicate the decision-making
process, can be used.</p>
      <p>
        Feature selection methods are commonly used for the biomedical data analysis. In the
work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] using the method based on the ANCOVA an 80-gene biomarker of the
smokers lung cancer was identified from 22216 features describing the expression
levels of different genes. The accuracy, sensitivity and specificity of this biomarker
were 83 %, 80% and 84%, respectively. For the selection of a small number of
features one can use the brute force [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In biomedical data mining the sequential search
algorithms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and genetic algorithms [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are also used. Feature selection methods
based on a criterion of the discriminant analysis have showed their effectiveness in [
        <xref ref-type="bibr" rid="ref5 ref6">5,
6</xref>
        ] for the analysis of biomedical images.
The aim of this work is the research of the effectiveness of the training therapeutic
program of the school "Stop a Stroke", which aimed at reducing a risk of a stroke for
patients with the atrial fibrillation. The used dataset contains observations of 12
features and classes of the observations are patient groups, namely: a main group and a
comparison group. Patients of the main group have been trained at the school, and
patients of the comparison group visited to a doctor, but have not received training.
The data were obtained during two visits of patients to the doctor, namely: before (the
first visit) and after (the second visit) the training course.
      </p>
      <p>In order to evaluate the effectiveness of the training course, first of all, a feature
subset, which distinguish the original data into two groups in the best way, is selected on
the basis of data from the second visit; then, the performances of the obtained feature
subset for two visits are compared; finally, conclusions about the effectiveness of the
training course are draw.</p>
      <p>As the main research tools two feature selection methods are used. The first method is
based on the quality estimation of separate features using the discriminant analysis
criterion and the second method allows to estimate the quality of feature subsets based
on the same criterion.</p>
    </sec>
    <sec id="sec-2">
      <title>Methods 2</title>
      <p>2.1
J  tr Sm ,
tr Sw</p>
      <sec id="sec-2-1">
        <title>Feature ordering in correspondence with the discriminant analysis criterion</title>
        <p>
          According to the described in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] method, features are ordered in correspondence with
the discriminant analysis criterion [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
(1)
where tr S w is the trace of a within-class scatter matrix, tr S m is the trace of a mixture
scatter matrix.
        </p>
        <p>The within-class scatter matrix shows the scatter of samples around their respective
class expected vectors:
where pi is a class prior probability, X (i) is a feature vector from the i-th class, M i
is an expected vector of the i-th class.</p>
        <p>The mixture scatter matrix is the correlation matrix of all feature vectors regardless of
their class:
Sm  M X  M 0 X  M 0 T ,
where М 0  p1M1  p2M 2 is the expected vector of the mixture distribution.
The higher criterion value (1) is, the better a feature distinguishes the samples from
different classes.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Sequential search of the best feature subset</title>
        <p>The previously described approach to feature selection allows to assess the
performance of each feature separately, but does not take into account dependencies
between the features. Some of them can be useless by themself, but effective when
combined with other features.</p>
        <p>A sequential algorithm provides a search of the best features on the space of feature
subsets. The basic idea is that starting from some initial subset on each step we move
to the next state, in which one element is included into the current feature subset or
excluded from it.</p>
        <p>Let F be a set of all features, which participated in the selection, X be a current best
feature subset of F and Y be a subset of the rest features that is Y  F \ X . The
sequential algorithm consists of the following steps:
1. The choice of an evaluation function to measure the performance of a feature
subset (in this work it is the criterion (1)), stopping criterion (the dimension of the
current set X is equal to the dimension of the set F) and some initial subset (let it be
the empty set).
2. The choice of “the search direction”. We step forward when we search the best
subset among new subsets formed by including one feature in a current best subset:
X  X  y,
where y  arg max J X c . We step back when we search the best subset among
cY
subsets which formed by excluding one feature from a current best subset:
X  X \ x,
where x  arg max J X \ c .</p>
        <p>cX
3. If after finding a next subset the stopping criterion is fulfilled, then the search
process is stopped, else we go to the step 2.</p>
        <p>In this paper, we offer the "two steps forward, one step back" approach, and the best
subset among other subsets with the same dimension is chosen as feature subset with
the highest value of the criterion (1).
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Evaluation of the training program effectiveness</title>
        <p>Let J 0 be a value of the criterion (1) for some feature subset obtained on the basis of
data from the first visit to the doctor, and J1 be a criterion value calculated according
E  J1 .</p>
        <p>J0
to data from the second visit. An evaluation of the training program effectiveness for
some feature subset is calculated as follows:
(2)
The school has a “positive” effect on the feature subsets for which E  1 , a
“negative” effect on the subsets for which E  1 , and the school has no effect on subsets
with E  1 .</p>
        <p>A conclusion about the effectiveness of the training program is done on the basis of
obtained effectiveness values and within-group mean values of features which have
more impact from the school.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental results</title>
      <p>The study of training program effectiveness were carried on the basis of 12 features,
namely: answers to a questionnaire, which was filled by patients before and after the
training course, blood pressure (systolic, diastolic), hemostasis parameters
(prothrombin time, prothrombin, fibrinogen, partial thromboplastin time). The dataset included
69 observations (36 observations from the main group, and 33 from the comparison
group) for two visits.
3.1</p>
      <sec id="sec-3-1">
        <title>Effectiveness of the training course for separate features</title>
        <p>feature better distinguished the patient groups before the training course than after it.
This effect is explained by the fact that within-class mean values have improved
slightly from the first visit to the second (the patients began to realize importance of
drug intake) and a within-class variance has increased.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Feature</title>
        <p>Anticoagulant therapy
(0 – don’t take , 1 – take less than a year , 2 –
from 1 to 5 years, 3 – more than 5 years)
How do you assess your level of knowledge
about the atrial fibrillation?</p>
        <p>(1 – low, 5 – high)
How important is it to regularly to take a drug
for stroke prevention in accordance with a
prescription? (1 – no important, 5 – very
important)
How do you assess your knowledge about the
risk of stroke as the main complication of the
atrial fibrillation?
(1 – low, 5 – high)</p>
        <p>Prothrombin (per cent)
Systolic blood pressure (mm Hg)</p>
        <p>Prothrombin time (seconds)
Partial thromboplastin time (seconds)</p>
        <p>Fibrinogen (g/l)
Aspirin (1 – take, 0 – don’t take )</p>
        <p>Diastolic blood pressure (mm Hg)
How much has the atrial fibrillation changed
your daily life?
(1 – hasn’t changed, 5 – has changed greatly)</p>
        <p>E
3.2</p>
      </sec>
      <sec id="sec-3-3">
        <title>Effectiveness of the training course for feature subsets</title>
        <p>One can see that all 10 subsets better distinguish the patient groups after the training
course, than before it. However, the first 3 subsets, which included features 1, 2 and
10, have large effectiveness value E compared to the remaining subsets. In addition,
mean values of features 1 and 2 within the main group have increased from the first
visit to the second and have not changed significantly within the comparison group.
Taking into account these facts, we can conclude that the therapeutic training program
is effective for features 1 and 2.
In this paper the research results of effectiveness of the training therapeutic program
of the school "Stop a Stroke" are presented. Using two feature selection methods the
feature subsets, which distinguish patients of the main and comparison groups in the
best way, are founded on the basis of obtained after the training course patient data.
Feature subsets, which have the large effectiveness values, are selected among the
chosen best subsets. Taking into account the within-class mean values we concluded
that, in general, this program turned out to be effective.</p>
        <p>In particular, the research of the training program effectiveness for separate features
has shown that training at school “Stop a stroke” is effective for features
“Anticoagulant therapy” (Е = 9.62), “How do you assess your level of knowledge about the atrial
fibrillation?” (Е = 5.33) и “How do you assess your knowledge about the risk of a
stroke as the main complication of the atrial fibrillation?” (Е = 2.50). Evaluating the
effectiveness of the training course for feature subsets we obtained similar results. In
that case the training program of the school turned out to be effective for the pair of
features “Anticoagulant therapy” and “How do you assess your level of knowledge
about the atrial fibrillation?” (Е = 6.14). On the other hand, the patients, who trained
at the school in contrast to patients, who have not received training, take
anticoagulants for a long time, and have a higher level of knowledge about the atrial fibrillation,
stroke risk, as the main complication of the atrial fibrillation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>The work was partially funded by Russian Science Foundation (RSF), grant No.
1407-97040-р_поволжье_а, the Russian Federation Ministry of Education and Science
and Fundamental Research Program NITD RAS «Bioinformatics, modern
information technologies and mathematical methods in medicine».</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Spira</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beane</surname>
            <given-names>JE</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steiling</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schembri</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gilman</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumas Y-M</surname>
            , Calner
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebastiani</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sridhar</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beamis</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lamb</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson</surname>
            <given-names>T</given-names>
          </string-name>
          , Gerry,
          <string-name>
            <given-names>N</given-names>
            ,
            <surname>Keane</surname>
          </string-name>
          <string-name>
            <given-names>J</given-names>
            ,
            <surname>Lenburg</surname>
          </string-name>
          <string-name>
            <given-names>ME</given-names>
            ,
            <surname>Brody</surname>
          </string-name>
          <string-name>
            <surname>JS</surname>
          </string-name>
          .
          <article-title>Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer</article-title>
          .
          <source>Nature Medicine</source>
          ,
          <year>2007</year>
          ;
          <volume>13</volume>
          (
          <issue>3</issue>
          ):
          <fpage>361</fpage>
          -
          <lpage>366</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ilyasova</surname>
            <given-names>NY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kupriyanov</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paringer</surname>
            <given-names>RA</given-names>
          </string-name>
          .
          <article-title>Formation of features for improving the quality of medical diagnosis based on discriminant analysis methods</article-title>
          .
          <source>Computer Optics</source>
          ,
          <year>2014</year>
          ;
          <volume>38</volume>
          (
          <issue>4</issue>
          ):
          <fpage>851</fpage>
          -
          <lpage>855</lpage>
          . [In Russian]
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Peng</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            <given-names>J.</given-names>
          </string-name>
          <article-title>A novel feature selection approach for biomedical data classification</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          ,
          <year>2010</year>
          ;
          <volume>43</volume>
          :
          <fpage>15</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Tsai C-F</surname>
            , Eberle
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chu</surname>
            <given-names>CY</given-names>
          </string-name>
          .
          <article-title>Genetic algorithms in feature and instance selection</article-title>
          .
          <source>Knowledge-Based Systems</source>
          ,
          <year>2013</year>
          ;
          <volume>39</volume>
          :
          <fpage>240</fpage>
          -
          <lpage>247</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gaidel</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pervushkin</surname>
            <given-names>SS</given-names>
          </string-name>
          .
          <article-title>Research of the textural features for the bony tissue diseases diagnostics using the roentgenograms</article-title>
          .
          <source>Computer Optics</source>
          ,
          <year>2013</year>
          ;
          <volume>37</volume>
          (
          <issue>1</issue>
          ):
          <fpage>113</fpage>
          -
          <lpage>119</lpage>
          . [In Russian]
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kutikova</surname>
            <given-names>VV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaidel</surname>
            <given-names>AV</given-names>
          </string-name>
          .
          <article-title>Study of informative feature selection approaches for the texture image recognition problem using the Laws' masks</article-title>
          .
          <source>Computer Optics</source>
          ,
          <year>2015</year>
          ;
          <volume>39</volume>
          (
          <issue>5</issue>
          ):
          <fpage>744</fpage>
          -
          <lpage>750</lpage>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>0134</fpage>
          -2452-2015-39-5-
          <fpage>744</fpage>
          -750.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fukunaga</surname>
            <given-names>K.</given-names>
          </string-name>
          <article-title>Introduction to statistical pattern recognition</article-title>
          . San Diego: Academic Press,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>