<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IIS at ImageCLEF 2015: Multi-label classi cation task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio J Rodr guez-Sanchez</string-name>
          <email>antonio.rodriguez-sanchez@uibk.ac.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabrina Fontanella</string-name>
          <email>fontanellasabrina@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Justus Piater</string-name>
          <email>justus.piater@uibk.ac.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sandor Szedmak</string-name>
          <email>sandor.szedmak@uibk.ac.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Salerno</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Austria https://iis.uibk.ac.at/</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose an image decomposition technique that captures the structure of a scene. An image is decomposed into a matrix that represents the adjacency between the elements of the image and their distance. Images decomposed this way are then classi ed using a maximum margin regression (MMR) approach where the normal vector of the separating hyperplane maps the input feature vectors into the outputs vectors. Multiclass and multilabel classi cation are native to MMR, unlike other more classical maximum margin approaches, like SVM. We have tested our approach with the ImageCLEF 2015 multi-label classi cation task, obtaining high rankings at that task.</p>
      </abstract>
      <kwd-group>
        <kwd>ImageCLEF</kwd>
        <kwd>Kronecker decomposition</kwd>
        <kwd>Maximum Margin</kwd>
        <kwd>MMR</kwd>
        <kwd>SVM</kwd>
        <kwd>multi-label classi cation</kwd>
        <kwd>medical images</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Automatic image classi cation is a fundamental part of computer vision. An
image is classi ed according to the visual content it contains. Tasks in image
classi cation include if an image contains a certain object, person, animal or
plant; if the image is from a street or it is indoors; or in the case that applies to
this paper, if it is a medical gure and the type of medical images and/or graphs
it contains. Image classi cation spans several decades from the rst character
or digit recognition challenges (still used today), such as the MNIST dataset
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to more recent, challenging image classi cation tasks, such as the Pascal [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
Imagenet [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or ImageCLEF [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] challenges.
      </p>
      <p>Image classi cation algorithms usually consist of a rst step where keypoints
or regions are found, which are then assigned a representation in terms of a
feature vector. During training, the feature vectors extracted from a set of training
images are usually grouped into histograms that approximate the distribution
of the features for the di erent types of images. These histograms compose the
input to a classi cation algorithm, such as an SVM (discriminative) or Naive
Bayes (generative).</p>
      <p>One of the central problems in exploring the general structure of an image is
to recognize the relations between the objects appearing on the image. The task
is not really the recognition of the objects but rather building a model on the
structure: what belongs to what and how they can be related. It might be similar
to the way how an animal could observe the world without labels attached to
the object but only relying on relations among them in a certain environment.
Those relations could provide the knowledge needed to identify scenes.</p>
      <p>One of the most popular streams of machine learning research is to nd
efcient methods for learning structured outputs. Several researchers introduced
similar approaches to these kind of problems [6{10]. Those methods directly
incorporate the structural learning into a specially chosen optimization framework.</p>
      <p>It is generally assumed that to learn a discriminating function when the
output space is a labeled hierarchy is a much more complex problem than binary
classi cation. In this paper we show that the complexity of this kind of problem
can be detached from the optimization model and can be expressed by an
embedding into Hilbert space. This allows us to apply a universal optimization model,
processing inputs and outputs represented in a properly chosen Hilbert space
which can solve the corresponding optimization task without tackling with the
underlying structural complexity. The optimization model is an implementation
of a certain type of maximum margin regression, an algebraic generalization
of the well-known Support Vector Machine. The computational complexity of
the optimization scales only with the number of input-output pairs and it is
independent from the dimensions of both spaces. Furthermore its overall
complexity is equal to a binary classi cation. Our approach can be easily extended
towards other structural learning problems without giving up e ciency on the
basic optimization framework.</p>
      <p>Three fundamental steps are needed for structural learning:
Embedding The structures of the input and output objects are represented as
abstract vectors in properly chosen Hilbert spaces re ecting the similarity
and the dissimilarity of the objects.</p>
      <p>Optimization The optimization phase is implemented via a universal solver
which tries to nd the best similarity based matching between the input
and the output representations. Since these representation are expressed
as general vectors, the optimizer needs not directly tackle the underlying
structural complexity.</p>
      <p>
        Inversion The optimizer provides a decision function which emits a vector. The
inversion phase has to nd the best tting output structure by projecting
the image vector back. This is often referred to as the pre-Image problem,
see some alternatives presented in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. If the embedding is realized as a
bijective mapping the inversion task is well de ned.
      </p>
      <p>We will make use of the following mathematical notation conventions in the
rest of the paper: X stands for the space of the input objects, Y for the space
of the outputs. H is a Hilbert space comprising the feature vectors, the images
of the input vectors with respect to the embedding (). H is a Hilbert space
comprising the image of label vectors with respect to the embedding (). W is
a matrix representing the linear operator projecting the feature space H into
H . h:; :iHz denotes the inner product in Hilbert space Hz, k:kHz is the norm
de ned in Hilbert space Hz. tr(W) is the trace of matrix W. dim(H) is the
dimension of the space H. x1 x2 denotes the tensor product of the vectors
x1 2 H1 and x2 2 H2, and it represents a linear operator A : H2 ! H1 which
acts on a vector z 2 H2 as (x1 x2)z d=ef (x1x02)z = x1hx2; ziH2 . hA; BiF is the
Frobenius inner product of a matrix represented by the linear operators A and B
and it is de ned by tr(A0B). kAkF stands for the Frobenius norm of a matrix
represented by the linear operator A and de ned by phA; AiF . A B is the
element-wise(Schur) product of the matrices A and B. A0; a0 is the transpose
of any matrix A or any vector a.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Image feature generation via decomposition</title>
      <p>Let us consider a real 2D image decomposition, where we can expect that the
points close to each other within continuous 2D blocks relate more strongly to
each other than only considering their connection in 1D rows and columns. To
represent the image decomposition, the Kronecker product is applied, which can
be expressed as</p>
      <p>X = A
7. k = k + 1
8. Goto 3</p>
      <p>A(k)</p>
      <p>B(k)k2</p>
      <p>The question is, if X is given, how do we compute A and B? It turns out
that the Kronecker decomposition can be carried out by Singular Value
Decomposition (SVD) working on a reordered representation of the matrix X.</p>
      <p>For an arbitrary matrix X with size m n the SVD is given by X = USVT
where U 2 Rm m is an orthogonal matrix, UUT = Im, of left singular vectors,
V 2 Rn n, is an orthogonal matrix, VVT = In, of right singular vectors, and
S 2 Rm n, is a diagonal matrix containing the singular values with nonnegative
components in its diagonal.
2.1</p>
      <sec id="sec-2-1">
        <title>Reordering of the matrix</title>
        <p>
          Since the algorithm solving the SVD problem does not depend directly on the
order of the elements of the matrix [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], any permutation of the indexes, i.e.
reordering the columns and(or) rows, preserves the same solution.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Kronecker decomposition as SVD</title>
        <p>
          The solution to the Kronecker decomposition via the SVD can be found in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
This approach considers the aforementioned observation regarding the invariance
of the SVD on the reordering of the matrix elements.
        </p>
        <p>In order to show how the reordering of matrix X can help to solve the
Kronecker decomposition problem we present the following example. The matrices
in the Kronecker product</p>
        <p>X = A
2 x11 x12 x13 x14 x15 x16 3
6 x21 x22 x23 x24 x25 x26 77 2 a11 a12 a13 3
66 x31 x32 x33 x34 x35 x36 77 = 4 a21 a22 a23 5
66 x41 x42 x43 x44 x45 x46 77 a31 a32 a33
64 x51 x52 x53 x54 x55 x56 5
x61 x62 x63 x64 x65 x66
B
b11 b12 ;
b21 b22
can be reordered into</p>
        <p>X~ = A~</p>
        <p>2 b11 3
= 664 bb1221 757
b22</p>
        <p>2 x11 x13 x15 x31 x33 x35 x51 x53 x55 3
B~ = 664 xx1221 xx1243 xx1265 xx3421 xx3443 xx3465 xx5621 xx5643 xx5665 757</p>
        <p>x22 x24 x26 x42 x44 x46 x62 x64 x66
a11 a12 a13 a21 a22 a23 a31 a32 a33 ;
where the blocks of X and the matrices A and B are vectorized in row wise
order. In this vectorization we follow that order which is applied in most of the
well known programming languages, C, Java, Python, MATLAB, instead of the
column wise order, e.g. used in the Fortran language.</p>
        <p>
          We can recognize that X~ = A~ B~ can be interpreted as the rst step in the
SVD algorithm where we might apply the substitution psu = A~ and psv = B~.
The proof that this reordering generally provides the correct solution to the
Kronecker decomposition can be found in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>We can summarize the main steps of the Kronecker decomposition in the
following steps:
1. Reorder(reshape) the matrix,
2. Compute the SVD decomposition,
3. Compute the approximation of X~ by A~
4. Invert the reordering.
~
B</p>
        <p>
          This kind of Kronecker decomposition is often referred as Nearest Orthogonal
Kronecker Product as well [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Learning task</title>
      <p>The learning task that we are going to solve is the following: There is a set - called
sample - of pairs of output and input objects f(yi; xi) : yi 2 Y; xi 2 X ; i =
1; : : : ; m; g independently and identically chosen out of an unknown multivariate
distribution P(Y; X). Here we would like to emphasize that the input and the
output objects can be arbitrary, e.g. they may be graphs, matrices, functions,
probability distributions etc. To these objects, let's consider two functions :
X ! H and : Y ! H mapping the input and output objects respectively
into linear vector spaces, called from now on, the feature space in case of the
inputs and the label space when the outputs are considered.</p>
      <p>The objective is to nd a linear function acting on the feature space
f ( (x)) = W (x) + b;
that produces a prediction of every input object in the label space and in this
way could implicitly give back a corresponding output object. Formally we have
y =
1( (y)) =
1(f ( (x))):
(2)
(3)
The learning procedure can be summarized as follows:</p>
      <sec id="sec-3-1">
        <title>Embedding</title>
      </sec>
      <sec id="sec-3-2">
        <title>Similarity transformation Inversion</title>
        <p>X H
: iznput}|space{ ! fzeatur}e|space{;
H</p>
        <p>Y
: zoutpu}t|space{ ! lzabel}|spac{e;
Wf = (W; b) )
(y)</p>
        <p>Wf (x);</p>
        <p>H Y
1 : lzabel}|spac{e ! zoutpu}t|spac{e :
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Optimization model</title>
      <sec id="sec-4-1">
        <title>The \Classical" scheme of Support Vector Machine (SVM)</title>
        <p>In the framework of the Support Vector Machine the outputs represent two
classes and the labels are chosen out of the set yi 2 f 1; +1g. The aim is to nd
a separating hyperplane, via its normal vector, such that the distance between
the elements of the two classes, called margin, is the largest one measured in the
direction of this normal vector. This base scheme can be extended allowing some
sample items to fall closer to the separating hyperplane than to the margin.</p>
        <p>This learning scenario can be formulated as an optimization problem:
min 12 kwk22 + C10
w.r.t. w : H ! R; normal vector</p>
        <p>b 2 R; bias; 2 Rm; error vector
s.t. yi(w0 (xi) + b) 1 i</p>
        <p>0; i = 1; : : : ; m:
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Reinterpretation of the normal vector w</title>
        <p>The normal vector w formally behaves as a linear transformation acting on the
feature vectors whose capabilities can be even further extended. This extension
can be characterized brie y in the following way</p>
      </sec>
      <sec id="sec-4-3">
        <title>SVM ExtendedView</title>
        <p>{ w is the normal vector of the { W is a linear operator projecting the
separating hyperplane. feature space into the label space.
{ yi 2 f 1; +1g binary outputs. { yi 2 Y arbitrary outputs
{ The labels are equal to the bi- { (yi) 2 H are the labels, the
embednary objects. ded outputs in a linear vector space
If we apply a one-dimensional normalized label space invoking binary labels
f 1; +1g in the general framework, one can restore the original scenario of the
SVM, and the normal vector is a projection into the one dimensional label space.</p>
        <p>To summarize the learning task, we end up in the following optimization
problem when compared to the original primal form of the SVM:
min</p>
      </sec>
      <sec id="sec-4-4">
        <title>Primal problems for maximum margin learning</title>
        <p>Binary class learning Vector label learning
Support Vector Machine(SVM) Maximum Margin Regression(MMR)
21 kwk22 + C10 12 kWk2F + C10
w.r.t. w : H
! R; normal vector</p>
        <p>W : H
! H ; linear operator;
b 2 R; bias;</p>
        <p>b 2 H ; translation(bias);
s.t.</p>
        <p>yi(w0 (xi) + b)
1
i;</p>
        <p>In the extended formulation we exploit the fact that the Frobenius norm and
inner product correspond to the linear vector space of matrices with dimension
equal to the number of elements of the matrices, hence it gives an isomorphism
between the space spanned by the normal vector of the hyperplane occurring in
the SVM and the space spanned by the linear transformations.</p>
        <p>One can recognize that if no bias term is included in the MMR problem then
we have a completely symmetric relationship between the label and the feature
space via the representations of the input and the output items, namely
After solving the dual problem with the help of the optimum dual variables we
can write up the optimal linear operator</p>
        <p>W = Pm</p>
        <p>i=1 i (yi) (xi)0:</p>
        <p>We can solve this expression by comparing it to the corresponding formula
which gives the optimal solution to the SVM, i.e. w = Pim=1 iyi (xi). The
new part includes the vectors representing the output items which in the SVM
were only scalar values but we could say in the new interpretation that they are
one-dimensional vectors. With the expression of the linear operator W at hand,
the prediction to a new input item x can be written as
(y) = W
(x) = Pm
i=1 i (yi) h (xi); (x)i :
| ({xzi;x) }
which involves only the input kernel and provides the implicit representation
of the prediction (y) to the corresponding output y.</p>
        <p>Because only the implicit image of the output is given, we need to invert the
function to obtain its corresponding y. This inversion problem is called the
pre-image problem. Unfortunately there is no general procedure to do that. We
mention here a scheme that can be applied when the set of all possible outputs is
nite with a reasonable small cardinality. The meaning of the \reasonable small"
cardinality depends on the given problem, e.g. how expensive is to compute
the inner product between the output items in the label space where they are
represented.</p>
        <p>At the conditions mentioned we can follow this scenario
y = arg maxy2Ye
(y)0W (x)</p>
        <p>(y;yi) (xi;x)
= arg maxy2Ye Pim=1 i hz (y); (yi){i hz (xi})|0 (x)i</p>
        <p>}| {
where y 2 Ye = fy1; : : : ; yN g ( is the set of the possible outputs
The main advantage of this approach is that it requires only the inner
products in label space. in addition to this, it is independent from the representation
of the output items and can be applied in any complex structural learning
problem, e.g. on graphs. Probably the best candidate for Ye could be the training
set.
4.5</p>
      </sec>
      <sec id="sec-4-5">
        <title>Hierarchy learning</title>
        <p>As mentioned above in this paper we focus on the case where the output space
is a labeled hierarchy (Figure 1a). The hierarchy learning is realized via an
embedding of each path going from a node to the root of the tree. Let V be
the set of nodes in the tree. A path p(v) V is de ned as a shortest path
from the node v to the root of the tree and its length is equal to jp(v)j. The set
I = 1; : : : ; jV j gives an indexing of the nodes. The embedding is realized by a
vector valued function : V ! RjV j, and the components of (v) are given by
(v)i =
r if vi 2= p(v);
sqk if vi 2 v(p) and k = jp(v)j jp(vi)j;
(4)
where r; q; s are the parameters of the embedding. The parameter q expresses
the diminishing weight of the nodes being closer to the root. If q = 0, assuming
00 = 1, then the intermediate nodes and the root are disregarded, thus we have
a simple multiclass classi cation problem. The value of r can be 0 but some
experiments show it may help to improve the classi cation performance. We
might conjecture the best choice of the parameters are those which minimize the
correlation between all pairs of the label vectors.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6 Input and output kernels</title>
        <p>For the concrete learning task we need to construct the input and output kernels.
To build the input kernel, the second component of the Kronecker decomposition
of each image - the matrix B in (1) - is used. The inner product between those
matrices is computed by applying the Frobenius inner product. The output
kernel is created from the inner products of the vectors representing the path in the
hierarchy in (4).
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental evaluation</title>
      <sec id="sec-5-1">
        <title>ImageCLEF multi-label classi cation [4, 5]</title>
        <p>The challenge we participated was the characterization of compound gures.
These gures contain sub gures from di erent types and sources (see gure 1b,c
for two examples). The task consists of labeling the compound gures with each
of the 30 classes that appear in the hierarchy in gure 1a without knowing where
the separation lines are. The training set consists of 1,071 gures, the test set
consists of 927 gures.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Results on the challenge</title>
        <p>In the computation of the prediction results, a 5-fold cross-validation procedure
is applied. The original dataset is split uniformly and randomly into 5 equal
parts. Then, each part is chosen as test data in a loop and the remaining four
parts are taken as training. In the learning procedure, rst, a kernel is computed
from the corresponding features. Parameters corresponding to each kernel are
found by cross validation restricted to the training data, namely it is divided
into validation test and validation training parts. Then the learner is trained
only on the validation training items. The values of the parameters are chosen
which maximize the F 1 score on the validation test. We will report here on two
types of kernels: polynomial and Gaussian.</p>
        <p>We submitted ten runs to the challenge providing our predictions on the
labels for the test set (before it was made public). For the ten runs, we used
a third degree polynomial kernel, the only factor changing at each run was the
random selection in the 5-fold cross-validation. Generation of the feature vectors
for the training set took around 60 minutes. The training of MMR would take
around one minute, and obtaining the labels for the test set took less than a
minute for each run. The ImageCLEF organizers provided the Hamming loss,
which is a classical measure for multi-label classi cation tasks and evaluates the
fraction of wrong labels to the total number of labels. The perfect case would
be obtaining a Hamming loss of 0. Hamming loss values for the challenge in our
case were exceptionally low (very close to 0) and ranged from 0.0671 to 0.0817.</p>
        <p>Once the test set was available we performed extra evaluation. We used three
other di erent evaluation measures that are popular in multi-label classi cation,
namely precision, recall and their combination into the F1 score. They are given
by a combination of the true positives Tp, false positives Fp and false negatives
Fn:</p>
        <p>P = TpT+pFp ; R = TpT+pFn ; F 1 = P2P+RR
(5)
where P is the precision and R is the recall. Here, the perfect case would have a
recall value of 1 for any precision. The F 1 measure combines both values into one
so that false positives and false negatives are taken into account in this one value.
Precision-Recall curves for six di erent Kronecker 2D lter sizes (4, 8, 12, 20, 28,
34) are given in gure 2a for polynomial kernerls of di erent degrees and in gure
3a for Gaussian kernels having di erent standard deviations. Their respective
F1 scores are in gures 2b and 3b. The parameter for the Precision-Recall curve
(and the F1 plot) in the polynomial kernel was the degree of the polynomial,
from 1 to 10. The parameter that was varied in the Gaussian kernel to generate
its Precision-Recall curve (and the F1 plot) was the standard deviation of the
Gaussian: 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 2, 5 and 10.</p>
        <p>Recall
Recall</p>
        <p>Degree of polynomial
Standard deviation
(b)</p>
        <p>These results show that larger lter sizes provide better results, although at
the largest lter sizes, Precision, Recall and F1 score are very similar. Regarding
kernels, when using a polynomial kernel, there is a dramatically increase in F1
scores when using a cubic kernel as compared to a linear or quadratic one.
Although at kernels of degree 4 and larger, F1 scores are very similar. In the
case of a Gaussian kernel, the best scores happen at standard deviations smaller
than 1, although values in the middle (e.g. 0.5, 0.6) provide better results than
very small values (e.g. 0.01, 0.05). The best F1 score using a polynomial kernel
was 0.38, in the case of a Gaussian kernel, the highest F1 score was 0.43.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>
        We have presented an approach based on a structured decomposition of the
environment. For example, elements appearing on a scene can be incorporated into
a graph in which the objects play the role of the vertices and the edges related to
the distances between those objects. Then the knowledge about the environment
can be represented by the adjacency matrix of the graph. By decomposing the
image matrix into a similar structure, e.g. into a sequence of Kronecker
products, the structure behind the scene could be captured. For classi cation we have
applied a version of a maximum margin based regression (MMR) technique [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
MMR relies on the fact that the normal vector of the separating hyperplane
can be interpreted as a linear operator mapping the feature vectors of input
items into the space of the feature vectors of the outputs. The evaluation of
our methodology in the ImageCLEF 2015 [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] multi-label challenge provided
promising results.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <p>The research leading to these results received funding from the EU 7th
Framework Programme FP7/2007-2013 under grant agreement no. 270273, Xperience.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            <given-names>ner</given-names>
          </string-name>
          , P.:
          <article-title>Gradient-based learning applied to document recognition</article-title>
          .
          <source>Proceedings of the IEEE</source>
          <volume>86</volume>
          (
          <issue>11</issue>
          ) (
          <year>1998</year>
          )
          <volume>2278</volume>
          {
          <fpage>2324</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Everingham</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eslami</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Gool</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>C.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The pascal visual object classes challenge: A retrospective</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>111</volume>
          (
          <issue>1</issue>
          ) (
          <year>2014</year>
          )
          <volume>98</volume>
          {
          <fpage>136</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Russakovsky</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Satheesh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Ma,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Berg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>ImageNet Large Scale Visual Recognition Challenge</article-title>
          .
          <source>International Journal of Computer Vision</source>
          (IJCV) (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Gilbert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolajczyk</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bromuri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amin</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohammed</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Acar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uskudarli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marvasti</surname>
            ,
            <given-names>N.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aldana</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>del Mar</surname>
          </string-name>
          Roldan Garc a, M.:
          <article-title>General Overview of ImageCLEF at the CLEF 2015 Labs</article-title>
          . Lecture Notes in Computer Science. (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Garc a Seco de Herrera,
          <string-name>
            <given-names>A.</given-names>
            , Muller, H.,
            <surname>Bromuri</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Overview of the ImageCLEF 2015 medical classi cation task</article-title>
          .
          <source>In: Working Notes of CLEF</source>
          <year>2015</year>
          (
          <article-title>Cross Language Evaluation Forum)</article-title>
          .
          <source>CEUR Workshop Proceedings (September</source>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Taskar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Max-margin markov networks</article-title>
          .
          <source>In: NIPS</source>
          <year>2003</year>
          .
          <article-title>(</article-title>
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Altun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsochantaridis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Hidden markov support vector machines</article-title>
          . In: ICML'
          <fpage>03</fpage>
          . (
          <year>2003</year>
          )
          <volume>3</volume>
          {
          <fpage>10</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Tsochantaridis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Large margin methods for structured and interdependent output variables</article-title>
          .
          <source>Journal of Machine Learning Research (JMLR) 6(Sep)</source>
          (
          <year>2005</year>
          )
          <volume>1453</volume>
          {
          <fpage>1484</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Rousu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saunders</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szedmak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shawe-Taylor</surname>
          </string-name>
          , J.:
          <article-title>Learning hierarchical multi-category text classi cation models</article-title>
          . In: ICML. (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Bakir</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , B. Scholkopf,
          <string-name>
            <given-names>A.J.S.</given-names>
            ,
            <surname>Taskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Vishwanathan</surname>
          </string-name>
          , S.V.N., eds.:
          <article-title>Predicting Structured Data</article-title>
          . MIT Press (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Loan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The ubiquitous kronecker product</article-title>
          .
          <source>Journal of Computational and Applied Mathematics</source>
          <volume>123</volume>
          (
          <year>2000</year>
          )
          <article-title>85{100 The nearest Kronecker product</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szedmak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piater</surname>
          </string-name>
          , J.: Scalable,
          <article-title>Accurate Image Annotation with Joint SVMs and Output Kernels</article-title>
          .
          <source>Neurocomputing</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>