<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>in journal Machine Learning</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ADAPTIVE USER-DEFINED SIMILARITY MEASURE J. Philippeau, P. Joly, J. Pinquier IRIT UMR 5505 CNRS-INP-UT1-UPS 118 Route de Narbonne 31062 Toulouse Cedex 9, FRANCE</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Fig. 1. Annotated screen shot of the Graphic User Interface</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1995</year>
      </pub-date>
      <volume>20</volume>
      <abstract>
        <p>With the aim of audiovisual database consulting, the prospect of an interactive visual organization tool should be enviable. We consider a document as a visual entity in a computer screen. Thanks to an appropriate simplified GUI, a user can organize a few documents. Its notion of audiovisual similarity is relative to the Euclidean distance between entities: the more they are close, the more they are similar. The system infers a measure of similarity, that relies on the analysis of low-level features, and reorganizes the remaining of the database. This measure, based on support vector regression, conciliates human perception of audiovisual similarity and low-level automatically extracted data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>We want to develop a system with respect to several
principles:</p>
    </sec>
    <sec id="sec-2">
      <title>1. Context-free user behavior: We do not want to focus</title>
      <p>on a particular trade (documentalist, montage specialist...).
This precise point encouraged us to design an uncluttered
dynamic interface, without any trade-oriented information, that
is used for both similarity learning and result presentation.</p>
      <p>2. Documents heterogeneity: Any kind of mono-media
or multi-media document should be accessible in the same
way (textual modality has not been explored yet).</p>
      <p>3. Lack of descriptive values information: The field
of semi-supervised audiovisual analysis we are interested in
relies on low-level features extraction and modeling. We
consider that descriptors we have are independent (so descriptive
values computed on documents are), and that we do not know
their behavior: Are they linear, logarithmic, etc...? What are
their minimum and maximum values? Are they continuous
or not?</p>
      <p>4. Resulting application as generic as possible:
Similarity is used in classification, identification and
characterization tasks [2]. Our application has to cover the possible
accomplishment of any task relying on those three aspects of
the expression of similarity. We will use the global term of
organization to evoke them. We want to create an interactive
system that allows to learn a user-defined similarity and to
organize a database with the same Graphical User Interface
(GUI).</p>
      <p>This problematic leads to focus on two points of view: a
subjective one relying on user-defined audiovisual
similarities, and an objective one that depends on low-level
descriptive values.</p>
      <p>2. GUI</p>
    </sec>
    <sec id="sec-3">
      <title>2.1. Functionalities</title>
      <p>- All functionalities usually implemented in a WIMP-like
(Window, Icon, Menu, Pointing device) interface are present:
single or multiple selection of elements, “drag and drop”
process, etc...</p>
      <p>We added some other functionalities: zoom and global
translations movements of the graphical environment; content
consulting for video or audio medias; possibility to change
entities’ representations (texture, color) in interests of
readability; possibility to save and to load entities positions and
computed similarity measure.</p>
      <p>- The user can “anchor” the entities (drawn as diamonds
in figure 1), opposed to free elements (drawn as spheres):
anchored items are not able to move in space anymore.</p>
      <p>- The user select several documents and click on the
“Learn” button. Those document (composing the training
set) are anchored to symbolize users trust in their placement.
The learning process generates a similarity measure based on
a set of descriptive values.</p>
      <p>- By clicking on the Reorganize button, the system
directly uses the learnt similarity measure on a set of selected
entities. During the reorganization phase chosen entities are
moving in the space until they reach a stability state.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2. Dynamic visual engine</title>
      <p>We decided to consider visual entities as physical particles
and to implement a “mass spring” dynamic physical model.
We implemented this model with the four order Runge-Kutta
algorithm, a temporal explicit integration schema known to be
very accurate and well-behaved for a wide range of problems.
The global set of documents is considered as a complete graph
whose nodes are weighted particles and edges are springs.</p>
      <p>Given an arbitrary time step, this algorithm computes the
new position of a point with an approximation of its velocity,
regarding all the implemented physical constraints which are
the weight of a particle, an attraction force that is
proportionate to the expected spring length / actual spring length ration,
a moderation force adjoining the particle (proportionate to its
velocity) that constraints the system to reach a stability state
even if it does not exist and the strains of the springs.</p>
    </sec>
    <sec id="sec-5">
      <title>3. LEARNING ENGINE</title>
      <p>The main idea is to generate a behavioral model of a set of
low-level descriptors. This model has to be representative
enough of the arrangement made by the user in the visual
interface.</p>
      <p>We choose an early fusion strategy because of the
heterogeneity of our features and the lake of information we have
about them. We apply it inside a modality (audio or video)
and between them.</p>
      <p>We use the following Min-Max normalization method:
each feature is scaled in a [0-1] range before being
concatenated.</p>
      <p>The Mean Square Error (or MSE) has been chosen as an
indicator to process our similarity measure.</p>
      <p>To summarize, each pair of documents, placed by the user
in the visual space, generates a normalized distance value and
a vector composed by normalized concatenated descriptive
values. We choose a regressive model, based on "-Support
Vector Regression [3], to bind them.</p>
      <p>Furthermore, an iterative concatenation process specifies
which descriptors to keep, to constitute a good model. Here
is a short explanation of the general idea :</p>
      <p>* After the dynamic users organization phase, regression
models are created (one model for each feature vector) by
iteratively concatenating descriptive values, like a Sequential
Forward Selection algorithm [1].</p>
      <p>* We use the MSE to evaluate whether a regression is
better than another.</p>
      <p>* Iteration is done by concatenating the descriptive values
that give the best results (regarding the MSE) at each
algorithm step. This is done until the algorithm leads to a Loss of
Performance (i.e. the best MSE performed at step p is lower
than at the one at step (p+1)) or to an Information
Redundancy (i.e. the best descriptor found at the actual step has
already been chosen during the iterations).</p>
      <p>* The reorganization phase consists in the application of
the previously computed regression on a new subset of
documents to generate a similarity matrix.</p>
      <p>* Finally, the dynamic visualization engine generates at
each time step a new distance matrix from the similarity
matrix, until the global schema reaches a stability state.</p>
    </sec>
    <sec id="sec-6">
      <title>4. CONCLUSION</title>
      <p>Adjustment of appropriate evaluation tasks is currently
being developed. Furthermore, extension of those tests on
bigger corpora is in progress, and we plain to integrate textual
modality as soon as possible.</p>
      <p>Moreover, a very interesting aspect we are working on
is the possibility to use our system to hierarchically navigate
inside a database (from collections of TV broadcast to TV
news, then to reports and lastly to video shots for example):
because each level has its own set of specific descriptive
values, it should be possible to automatically reorganize lower
hierarchical levels while projecting the user-defined
similarity expressed in an upper one.</p>
      <p>5. REFERENCES
[2] G. Bisson. “Why and How to Define a Similarity Measure
for Object Based Representation Systems”, in Proceedings
of 2nd international conference on building and sharing
very large-scale knowledge bases (KBKS), IOS press,
236246,</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>