<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Dimensionality Reduction Approach for Semantic Document Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oskar Ahlgren</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pekka Malo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ankur Sinha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pekka Korhonen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jyrki Wallenius</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aalto University School of Economics P.</institution>
          <addr-line>O. Box 21210, FI-00076 AALTO</addr-line>
          ,
          <country country="FI">FINLAND</country>
        </aff>
      </contrib-group>
      <fpage>114</fpage>
      <lpage>121</lpage>
      <abstract>
        <p>The curse of dimensionality is a well-recognized problem in the field of document filtering. In particular, this concerns methods where vector space models are utilized to describe the document-concept space. When performing content classification across a variety of topics, the number of different concepts (dimensions) rapidly explodes and as a result many techniques are rendered inapplicable. Furthermore the extent of information represented by each of the concepts may vary significantly. In this paper, we present a dimensionality reduction approach which approximates the user's preferences in the form of value function and leads to a quick and efficient filtering procedure. The proposed system requires the user to provide preference information in the form of a training set in order to generate a search rule. Each document in the training set is profiled into a vector of concepts. The document profiling is accomplished by utilizing Wikipedia-articles to define the semantic information contained in words which allows them to be perceived as concepts. Once the set of concepts contained in the training set is known, a modified Wilks' lambda approach is used for dimensionality reduction by ensuring minimal loss of semantic information.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Most information retrieval systems are based on free language searching, where
the user can compose any ad hoc query by presenting a list of keywords or a
short phrase to describe a topic. The popularity of phrase-based methods can
largely be explained by their convenience for the users. However, the ease of usage
comes with a few drawbacks as well. While exploring a new topic or searching
within an expert domain with specialized terminology it can be surprisingly
hard to find the right words for getting the relevant content. To cope with the
ambiguity of the vocabulary, concept-based document classification techniques
have been proposed, as concepts by definition cannot be ambiguous. However,
the use of concepts instead of keywords is only part of the solution. If the filtering
methods rely on vector-space models of documents and concepts, a dimension
reduction technique comes in handy. Instead of training the classifiers using the
entire concept-base, the learning of filtering models is improved by restricting
the space to those concepts that are most relevant for the given task.</p>
      <p>
        In this paper, we introduce, Wilks-VF, a light-weight concept selection method
inspired by Wilks’ lambda to reduce the curse of dimensionality. In Wilks-VF the
document classification task is carried out in the following three stages: 1) Once
the user has supplied a training sample of relevant and irrelevant documents, a
semantic profiler is applied to build a document-concept space representation.
The semantic knowledge is drawn from Wikipedia, which provides the semantic
relatedness information. 2) Next, the Wilks’ lambda based dimension
reduction method is used to select concepts that provide the best separation between
relevant and irrelevant documents. 3) Finally, the value function framework
proposed by Malo et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is employed to learn a classification rule for the given
topic.
      </p>
      <p>The main contribution of the Wilks-VF, as compared to the existing
literature, is a light-weight concept selection method, where a clustering based Wilks’
lambda approach is used to equip the methodology for on-line usability.
Evaluation of the framework’s classification performance is carried out using the Reuters
TREC-11 corpus. The result is then benchmarked with other well-known feature
selection methods. As primary performance measures we use F-Score, precision
and recall. The obtained results are promising, but the work is still preliminary
and further evaluation with other corpora needs to be carried out.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        During the last decade, the role of document content descriptors (words/phrases
vs. categories/concepts) in the performance of information retrieval systems has
piqued considerable interest [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Consequently, a number of studies have
examined the benefits of using concept hierarchies or controlled vocabularies derived
from ontologies and folksonomies [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In particular, the use of Wikipedia
as a source of semantic knowledge has turned out to be an increasingly popular
choice, see e.g. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The use of value function in preference
modeling is well founded in the fields of operations research and management
science [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. There the purpose is to develop interactive
methods for helping the users to find preferred solutions for complex decision-making
problems with several competing objectives. The existence of a value function
which imitates a decision maker’s choices makes the two problems very similar.
The essential difference between decision making problems and document
classification is the high-dimensionality of information classification problems, which
leads to concerns about the ability of value function based methods to deal with
large number of attributes.
      </p>
      <p>
        To alleviate the curse of the dimensionality problem which is often
encountered in classification tasks, such as document filtering, a number of feature
selection techniques have been proposed [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. For an
extensive overview of the various methods, see e.g. Fodor [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. However, most
of these techniques are designed for general purposes, whereas the approach
suggested in this paper is mainly suited for concept-selection task.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Wilks-VF Framework</title>
      <p>This section describes the steps of the procedure and the implementation of the
framework. First, we describe the Wikipedia based document indexing
procedure, where every document is transformed into a stream of concepts. Next, we
present the dimensionality reduction approach utilizing Wilks’ lambda, and
finally we describe an efficient linear optimization method for learning the value
function based document classifier.
3.1</p>
      <sec id="sec-3-1">
        <title>Document profiling</title>
        <p>
          The document profiling approach used in this paper is similar to the technique
adopted by Malo et al.[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], where each document is profiled into a collection of
concepts. To illustrate this idea, consider the example in Fig. 1. The text in the
figure on the left-hand-side is transformed into a vector of concepts by the
profiler. The profiler is implemented as a two-stage classifier, where disambiguation
and link recognition are accomplished jointly to detect Wikipedia-concepts in the
documents and each concept corresponds to a Wikipedia article. On the
righthand-side (under concept space), a small network is shown, corresponding to
the central concepts found in the document. In addition to the concepts directly
present in the document, the network displays also some other concepts that
are specified in the Wikipedia link structure. As discussed by [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] the link structure can be used for mining semantic relatedness information,
which is useful for constructing concept-based classification models.
        </p>
        <sec id="sec-3-1-1">
          <title>Original Text</title>
          <p>Even after the financial
crisis of 2008 just six
megabanks - Bank of
America, Morgan Stanley,
Citigroup, Goldman Sachs,
Wells Fargo and JPMorgan
Chase – are as strong as
ever. Together they control
assets adding up to more
than 60 percent of the US
gross domestic product.
Morgan Stanley
Subprime</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Concept space</title>
          <p>Finacial crisis of 2008</p>
          <p>Goldman Sachs</p>
          <p>Bill Clinton
Investment banking</p>
          <p>Leveraged buyout</p>
          <p>
            In this paper, we used the concept-relatedness measure described by Malo
et al.[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], which in turn is inspired by the Normalized Google Distance approach
proposed by Cilibrasi and Vitanyi [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ]. In the following definition we introduce
the concept relatedness measure and thereafter discuss its usage in Sect. 3.3.
Definition 1. Concept relatedness: Let C denote concept-space and c1 and c2
be an arbitrary pair of Wikipedia-concepts. If C1, C2 ⊂ C denote the sets of all
articles that link to c1 and c2, respectively, the concept-relatedness measure is
given by the mapping c-rel: C × C → [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ],
where the ND measure is N D(c1, c2) = loglo(mg(a|Cx||C)−1l|o.|gC(2m|)in−(l|oCg1(||,C|C1 ∩2|C) 2| .
3.2
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Dimension Reduction with Wilks’ lambda</title>
        <p>
          Wilks’ lambda is used to identify the concepts which best separate the relevant
documents from the irrelevant ones. Once the documents have been profiled into
a matrix where each row represents a document and each column represents a
concept, Wilks’ lambda tests whether there are differences between the means
of the two identified groups of subjects (relevant and irrelevant documents) on
a number of dependent variables (concepts). A large difference in the means
indicates that the chosen concept can be used to distinguish a relevant document
from an irrelevant one [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
        </p>
        <p>Wilks’ lambda statistic. Let X ∈ IRN×|Cˆ| denote the document-concept
matrix, where N is the number of documents in the training set and |Cˆ| is
the number of different concepts found in the documents. The matrix can be
decomposed into two parts according to the relevance of the documents
X =</p>
        <p>XR
XIR
,
where XR and XIR are the collections of profiles corresponding to the relevant
documents and the irrelevant ones respectively. The dimensionality reduction
procedure is based on the assumption that the profile means, x¯R and x¯IR in
the two document groups are different. If x¯R = x¯IR, none of the concepts are
able to differentiate between relevant and irrelevant concepts. The hypothesis
H0 : x¯R = x¯IR can be tested by the principle of maximum likelihood, using a
Wilks’ lambda statistic, where T denotes the total centered cross product and
W denotes the within groups cross product
Λ =
|W |
|T |
=
|XRT HXR + XITRHXIR| .</p>
        <p>
          |XT HX|
In the above equation, Λ follows the F-distribution and can be tested with the
approximation developed by Rao [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>Additional information. In order to choose the concepts that provide the
best separation between the two groups, we employ Wilks’ lambda to evaluate
the information content of the concepts. Let</p>
        <p>T =</p>
        <p>T11 T12
T21 T22
and W =</p>
        <p>W11 W12
W21 W22
be the block-representation of the total and between groups matrices, where
T11 = T11(q,q) and W11 = W11(q,q) refer to those variables, q, that are included
into the model. For simplicity, we can assume that the variables in the model
have indices 1, 2, . . . , q. For this purpose, we introduce the decomposition of
Wilks’ lambda into two parts: the information content of the selected concepts
Λ1 and the information content of the remaining concepts Λ2,1:
Λ = Λ1Λ2,1 =
|W11| |W22 − W21W1−11W12| .</p>
        <p>|T11| |T22 − T21T1−11T12|
The parts of the decomposition can be interpreted as follows:
1. if Λ1 ≈ 1, then variables i = 1, 2, . . . , q are not able to separate the groups
2. if Λ1 &lt;&lt; 1, then variables i = 1, 2, . . . , q separate the groups very well
3. if Λ2,1 ≈ 1, then variables i = q + 1, q + 2, . . . , p are not able to provide
additional information
4. if Λ2,1 &lt;&lt; 1, then variables i = q + 1, q + 2, . . . , p contain at least some
additional information
Selection heuristic. Motivated by the Wilks’ lambda statistic, we now
introduce the following heuristic for concept selection:
1. Initiation: Let M = {1, 2, . . . , |C|} denote the index set of all concepts and
let N = ∅ be the collection of selected concept indices.
2. Ranking: For every concept index i ∈ M , compute λi = wii/tii and sort
the index set M in ascending order according to (λi)i∈M values. The smaller
the λi, the better the separation power of the concept.
3. Selection: For the sorted index set M , choose q concepts with smallest
λ-values. Denote this set as Mq and write M = M \ Mq and N = N ∪
Mq. Construct cross-product matrices Wii and Tii, in such a way that they
correspond to N selected concept indices.
4. Evaluation: Test Λ1 and Λ2,1. If the test based on Λ2,1 indicates no
remaining information, then stop.
5. Update: For the remaining indices j ∈ M , compute λj = (wii−wj1W1−11w1j )
/(tjj −tj1T1−11t1j ). This step is carried out to remove the effect of the already
selected variables. Then the execution moves to Step 2 and the process is
repeated. In practice choosing a large q (i.e. several concepts are selected at
once), leads to a quick termination of the algorithm. which is preferable for
on-line use. That is, for most topics a single iteration should give sufficiently
good results.
6. Output: The collection of selected concepts corresponding to the index set</p>
        <p>N .
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Value function based classification</title>
        <p>In the Wilks-VF framework, the utility or value of each document is obtained
based on a combination of the individual attributes (i.e. concepts). Learning a
filtering rule in this system is in essence equal to finding the optimal parameters
for the user’s value function. The process used in this paper is formalized as
follows:
Definition 2. Value Function: Let D denote the space of profiled documents,
where each d ∈ D is a vector of Wikipedia concepts. A value function representing
the user’s preference information is defined as mapping V : D → IR, given by
V (d) = X μ(c, d)w(c),</p>
        <p>
          c∈CN
where w(c) ∈ [
          <xref ref-type="bibr" rid="ref1">−1, 1</xref>
          ] denotes the weight of concept c and CN is the set of selected
concepts from the dimension reduction step. The function μ : CN × D → {0, 1}
determines the presence of a concept in the document by the rule
μ(c, d) =
1 if d-rel(c, d) ≥ α
0 otherwise
where d-rel(c, d) = maxc¯∈d c-rel(c, c¯) is a document-concept relatedness measure.
        </p>
        <p>
          Let D(R) and D(IR) denote the set of relevant and irrelevant documents
originally supplied by the user. The parameters of the value function are determined
by solving the following linear optimization problem. Maximize ǫ subject to
V (d(R)) − V (d(IR)) ≥ ǫ
∀d(R) ∈ D(R), d(IR) ∈ D(IR)
A positive weight indicates a relevant concept and a negative an irrelevant one.
When ǫ &gt; 0, the obtained value function is consistent with the user’s preferences.
Based on the Wilks-VF a simple document classification rule is obtained by
choosing a suitable cutoff [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. If a document’s value is above the cutoff, it is
considered relevant.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>In this section, we present the results of the Wilks-VF method. The method
has been tested on Reuters TREC-11 newswire documents, which is a collection
of news stories from 1996 to 1997. The collection is divided into 100 subsets
or topics. All documents belonging to a topic are classified as either relevant or
irrelevant to the given topics. These topics are further partitioned into a training
and an evaluation set. The purpose of the training set is to generate a search
query, which is then applied onto the evaluation set in order to evaluate its
performance.</p>
      <p>
        The results from all 100 topics are reported together with five benchmark
methods in Table 1. As benchmarks, we consider the following commonly
applied feature selection techniques: Gain ratio [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], Kullback−Leibler divergence
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], Symmetric uncertainty [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], SVM based feature selection [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and
Relief [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The performance is recorded in terms of precision and recall.
F-Score is used to combine the two measures as: F-Score = (2 × Precision ×
Recall)/(Precision + Recall). The reported performance measures are calculated
as averages over all topics. In the experiment we set q = 10.
Model
      </p>
      <p>F-Score Recall Precision Model</p>
      <p>F-Score Recall Precision</p>
      <p>As can be seen from the table, the Wilks-VF method is competitive in terms
of F-Score. When searching for reasons, it appears that the performance
differences are largely explained by recall levels. The recall for Wilks-VF is
considerably better than other methods, as can be observed in Table 1. Differences
across precision are however, smaller for different methods. Thisproposed by
Cilibrasi and Vitanyi means that the advantage of Wilks-VF is its ability to
retrieve relevant instances.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>The paper discusses an important aspect in document classification, i.e.
dimensionality reduction. Dimensionality reduction is ubiquitous in various fields and
has been widely studied. In this paper, we have specialized a well known Wilks
lambda procedure for document classification. The novelty introduced in the
approach is a cluster based concept selection procedure which ensures that all the
concepts which are significant for classification are selected. The dimensionality
reduction procedure has been integrated with a recently suggested value
function approach which makes the overall system computationally less expensive to
an extent that the methodology can be developed for on-line usage. The
empirical results computed on the Reuters TREC-11 corpus show that the Wilks-VF
approach is efficient when compared with other widely used methods for
dimensionality reduction.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Malo</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sinha</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallenius</surname>
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Korhonen</surname>
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Concept-based Document Classification Using Wikipedia and Value Function</article-title>
          .
          <source>Journal of American Society of Information Science and Technology</source>
          (
          <year>2011</year>
          ) to appear
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rajapakse</surname>
            <given-names>R.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Denham</surname>
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Text retrieval with more realistic concept matching and reinforcement learning</article-title>
          .
          <source>Information Processing and Management</source>
          (
          <year>2006</year>
          ) vol.
          <volume>42</volume>
          ,
          <fpage>1260</fpage>
          -
          <lpage>1275</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kim</surname>
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>ONTOWEB: Implementing an ontology-based web retrieval system</article-title>
          .
          <source>Journal of American Society for Information Science and Technology</source>
          (
          <year>2005</year>
          ) vol.
          <volume>56</volume>
          , no.
          <volume>11</volume>
          .
          <fpage>1167</fpage>
          -
          <lpage>1176</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kim</surname>
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Toward Video Semantic Search Based on a Structured Folksonomy: Journal of the American Society for Information Science</article-title>
          and Technology (
          <year>2011</year>
          )
          <fpage>478</fpage>
          -
          <lpage>492</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rao</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <article-title>Linear Statistical Inference</article-title>
          . Wiley, NYC, NY (
          <year>1973</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pera</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lund</surname>
            <given-names>W.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Ng Y.-K.: A Sophisticated Library</surname>
          </string-name>
          <article-title>Search Strategy Using Folksonomies and Similary Matching</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          (
          <year>2009</year>
          ) vol.
          <volume>60</volume>
          ,
          <fpage>1392</fpage>
          -
          <lpage>1406</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Malo</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siitari</surname>
            <given-names>P.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Sinha</surname>
            <given-names>A.</given-names>
          </string-name>
          :
          <source>Automated Query Learning with Wikipedia and Genetic Programming. Artificial Intelligence</source>
          (
          <year>2010</year>
          )
          <article-title>Conditionally accepted</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gabrilovich</surname>
            <given-names>E.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Markovitch</surname>
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Computing Semantic Relatedness Using Wikipediabased Explicit Semantic Analysis</article-title>
          .
          <source>In Proc. IJCAI-07</source>
          (
          <year>2007</year>
          )
          <fpage>1606</fpage>
          -
          <lpage>1611</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Malo</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siitari</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahlgren</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallenius</surname>
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Korhonen</surname>
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Semantic Content Filtering with Wikipedia and Ontologies</article-title>
          ,
          <source>10th IEEE International Conference on Data Mining Workshops</source>
          <year>2010</year>
          , Los Alamitos, CA, USA: IEEE Computer Society (
          <year>2010</year>
          )
          <fpage>518</fpage>
          -
          <lpage>526</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ponzetto</surname>
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Strube</surname>
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Knowledge Derived From Wikipedia For Computing Semantic Relatedness</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          (
          <year>2007</year>
          ) vol.
          <volume>30</volume>
          ,
          <fpage>181</fpage>
          -
          <lpage>212</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Milne</surname>
            <given-names>D.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Witten</surname>
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Learning to link with Wikipedia</article-title>
          .
          <source>Proc. CIKM</source>
          , (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Medelyan</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Milne</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Legg</surname>
            <given-names>C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Witten</surname>
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Mining meaning from Wikipedia</article-title>
          .
          <source>International Journal of Human-Computer Studies</source>
          (
          <year>2009</year>
          ) vol.
          <volume>67</volume>
          ,
          <fpage>716</fpage>
          -
          <lpage>754</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Korhonen</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moskowitz</surname>
            <given-names>H.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Wallenius J.:</surname>
          </string-name>
          <article-title>A progressive algorithm for modeling and solving multiple-criteria decision problems</article-title>
          , Operations Research (
          <year>1986</year>
          ) vol.
          <volume>34</volume>
          , no.
          <issue>5</issue>
          ,
          <fpage>726</fpage>
          -
          <lpage>731</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Korhonen</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moskowitz</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salminen</surname>
            <given-names>P.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Wallenius</surname>
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Further developments and test of a progressive algorithm multiple-criteria decision making</article-title>
          ,
          <source>Operations Research</source>
          (
          <year>1993</year>
          ) vol.
          <volume>41</volume>
          , no.
          <issue>6</issue>
          ,
          <fpage>1033</fpage>
          -
          <lpage>1045</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Deb</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sinha</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korhonen</surname>
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wallenius</surname>
            <given-names>J.:</given-names>
          </string-name>
          <article-title>An interactive evolutionary multiobjective optimization method based on progressively approximated value functions</article-title>
          ,
          <source>IEEE Transactions on Evolutionary Computation</source>
          (
          <year>2010</year>
          ), vol.
          <volume>14</volume>
          , no.
          <issue>5</issue>
          ,
          <fpage>723</fpage>
          -
          <lpage>739</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Zionts</surname>
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Wallenius</surname>
            <given-names>J.:</given-names>
          </string-name>
          <article-title>An interactive programming method for solving the multiple criteria problem</article-title>
          .
          <source>Management Science</source>
          (
          <year>1976</year>
          ) vol.
          <volume>22</volume>
          ,
          <fpage>656</fpage>
          -
          <lpage>663</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Roy</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mackin</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallenus</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corner</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keith</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmick</surname>
            <given-names>G.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Arora</surname>
            <given-names>H.:</given-names>
          </string-name>
          <article-title>An interactive search method based on user preferences</article-title>
          .
          <source>Decision Analysis</source>
          (
          <year>2009</year>
          ) vol.
          <volume>5</volume>
          <fpage>203</fpage>
          -
          <lpage>229</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Abeel</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van de Peer</surname>
            <given-names>Y.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Saeys</surname>
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Java-ML: A Machine Learning Library</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>10</volume>
          (
          <year>2009</year>
          )
          <fpage>931</fpage>
          -
          <lpage>934</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Yu</surname>
            <given-names>L.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Liu</surname>
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Feature Selection for High-Dimensional Data: A Fast CorrelationBased Filter Solution</article-title>
          .
          <source>Proceedings of the Twentieth International Conference on Machine Learning</source>
          (
          <year>2003</year>
          )
          <fpage>856</fpage>
          -
          <lpage>863</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Guyon</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barnhill</surname>
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Vapnik</surname>
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Gene selection for cancer classification using support vector machines</article-title>
          .
          <source>Machine Learning</source>
          . (
          <year>2002</year>
          )
          <volume>46</volume>
          :
          <fpage>389</fpage>
          -
          <lpage>422</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Burges</surname>
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A Tutorial on Support Vector Machines for Pattern Recognition</article-title>
          . Kluwer Academic Publishers, Boston
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Kira</surname>
            <given-names>K.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Rendell</surname>
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>A Practical Approach to Feature Selection</article-title>
          .
          <source>Ninth International Workshop on Machine Learning</source>
          (
          <year>1992</year>
          )
          <fpage>249</fpage>
          -
          <lpage>256</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Robnik-Sikonja</surname>
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Kononenko</surname>
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>An adaptation of Relief for attribute estimation in regression</article-title>
          .
          <source>Fourteenth International Conference on Machine Learning</source>
          (
          <year>1997</year>
          )
          <fpage>296</fpage>
          -
          <lpage>304</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Harris</surname>
            <given-names>E.: Information</given-names>
          </string-name>
          <string-name>
            <surname>Gain Versus Gain Ratio</surname>
          </string-name>
          :
          <article-title>A Study of Split Method Biases</article-title>
          .
          <source>The MITRE Corporation</source>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Fodor</surname>
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>A Survey of Dimension Reduction Techniques</article-title>
          . U.S. Department of Energy (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Cilibrasi</surname>
            <given-names>R.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Vitnyi P. M. B.:</surname>
          </string-name>
          <article-title>The Google Similarity Distance</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          . (
          <year>2007</year>
          )
          <volume>19</volume>
          (
          <issue>3</issue>
          ):
          <fpage>370</fpage>
          -
          <lpage>383</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Friedman</surname>
            <given-names>H. P.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Rubin</surname>
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>On Some Invariant Criteria for Grouping Data</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          (
          <year>1967</year>
          ) vol.
          <volume>62</volume>
          , no.
          <volume>320</volume>
          ,
          <fpage>1159</fpage>
          -
          <lpage>1178</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>