<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UAIC: Participation in VideoCLEF Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tudor-Alexandru Dobrilă</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihail-Ciprian Diaconaşu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irina-Diana Lungu</string-name>
          <email>diana.lungu@info.uaic.ro</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrian Iftene</string-name>
          <email>adiftene@info.uaic.ro</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>UAIC: Faculty of Computer Science, “Alexandru Ioan Cuza” University</institution>
          ,
          <country country="RO">Romania</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This year marked UAIC1's first participation at the VideoCLEF competition. Our group built two separated systems for tasks “Subject Classification” and “Affect Detection”. For first task we created two resources starting from Wikipedia pages and pages identified with Google and used two tools for classification: Lucene and Weka. For the second task we extract the audio component from a given video file, with FFmpeg codec. After that, we computed the average intensity for each word from the transcript, by using Fast Fourier Transformations to analyze the sound. A brief description of our system components is given in this paper.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>VideoCLEF2 offers cross-language classification, retrieval and analysis tasks on a
video collection containing documentaries and talk shows.</p>
      <p>In 2009, the collection extended the corpus used for the 2008 VideoCLEF pilot
track. Task participants were provided with video data along with speech recognition
transcripts, archival metadata, shot segmentation and shot-level keyframes. Two
classification tasks were evaluated: “Subject Classification”, which involves
automatically tagging videos with subject labels, and “Affect and Appeal”, which
involves classifying videos according to characteristics beyond their semantic content.
The track was coordinated by Dublin City University (IE) and Delft University of
Technology (NL).</p>
      <p>Our team participated in the following tasks: in Subject Classification (in which
participants must tagging automatically videos with subject labels such as
‘Archeology’, ‘Dance’, ‘History’, ‘Music’, etc.) and in Affect Detection (in which
participants must select keyframes using a combination of video and speech/audio
features and these selected keyframes should represent the semantic content of the
video, e.g., an episode of a documentary).</p>
      <p>The way in which we classified a video accordingly to it transcript is described in
Section 2, while Section 3 is concerned with presentation of details related to the
extraction of keyframes. Last Section presents conclusions regarding our participation
in VideoCLEF 2009.
In order to classify a video according to its transcripts we perform the following steps:
• Step 1: For every category we extract from Wikipedia and Google web
pages related to it;
• Step 2: From documented extracted at Step 1 we extract only relevant words
and count the number of appearance for them. For every category we build
resources with relevant words and number of appearances and normalize at
1000 the sum of number of appearances for every category;
• Step 3: Similar with Step 2 we extract and count the relevant words from
video transcripts;
• Step 4: Video classification using extracted words from Step 3 in category
clusters built at Step 2. In classification process we used combinations
between results offered by Lucene and Weka tools, using resources obtained
from Google or from Wikipedia, or both.</p>
      <p>Details related to previous steps are presented below.</p>
      <sec id="sec-1-1">
        <title>2.1 Extract Relevant Words from Wikipedia</title>
        <p>First of all, we found an URL pattern for each relevant article of each category. Using
this pattern we identified the most important pages from Wikipedia. The source pages
for these pages are retrieved directly from Wikipedia Server creating direct
connections for each page and then these source pages are save into a single file
according to each category.</p>
        <p>Second of all, for each such a file, the XHTML/HTML tags are eliminated from
the files and we save only the paragraphs (the information contained in the &lt;p&gt;&lt;/p&gt;
tags).</p>
        <p>Third of all, the stop words and punctuation signs are eliminated; and all words are
transformed to lower case.</p>
        <p>Next, we lemmatize all remaining words and for each lemma we count the number
of appearances.</p>
        <p>In the end we normalize to 1000 for each category the sum of number of
appearances. This step was necessary because initial for some categories like Music
the number of relevant pages was very high, and the sum of number of appearances
was also very high in comparison with other categories. Without normalizing a
transcript with a word from Music category is automatically classified in Music
category.</p>
      </sec>
      <sec id="sec-1-2">
        <title>2.2 Extract Relevant Words from Google</title>
        <p>This part is similar with part performed on Wikipedia with few differences. One of
differences is related to fact that from relevant pages we extract only words from
&lt;keywords&gt; tag. The second main difference is the fact that we split the content of
the &lt;keyword&gt; tag after comma separator and in this way we considers important for
one category a succession of few words. In this way we search the context in which
relevant words for one category appear.</p>
      </sec>
      <sec id="sec-1-3">
        <title>2.3 Lucene</title>
        <p>
          Lucene is a high performance, scalable Information Retrieval (IR) library. It allows
adding indexing and searching capabilities to applications. Lucene is a mature, free,
open-source project implemented in Java
          <xref ref-type="bibr" rid="ref1">(Hatcher, E. and Gospodnetic, 2005)</xref>
          .
        </p>
        <p>
          Instead to index files corresponding to categories created at previous steps from
Google and Wikipedia, we created another files from these files in which every word
appear by a number proportional with associated number from corresponding file. In
this way the Lucene score will be higher if the word from associated file to categories
has a higher number of appearances.
2.4 Weka
Weka3 (Waikato Environment for Knowledge Analysis) is a popular suite of machine
learning software written in Java, developed at the University of Waikato. The Weka
workbench
          <xref ref-type="bibr" rid="ref1 ref2">(Witten and Frank, 2005)</xref>
          contains a collection of visualization tools and
algorithms for data analysis and predictive modeling, together with graphical user
interfaces for easy access to this functionality.
        </p>
      </sec>
      <sec id="sec-1-4">
        <title>2.5 Submitted Runs</title>
        <sec id="sec-1-4-1">
          <title>We submitted 4 runs described below:</title>
          <p>Tools and Resources used
• It uses only Lucene for classification
• Like resources are used both resources obtained
from Wikipedia and Google
• It uses only Weka for classification
• Like resources are used only resources obtained
from Google
• It uses only Weka for classification
• Like resources are used only resources obtained
from Wikipedia
• It uses Weka and Lucene for classification
• Like resources are used both resources obtained
from Wikipedia and Google</p>
        </sec>
        <sec id="sec-1-4-2">
          <title>3 Weka: http://www.cs.waikato.ac.nz/ml/weka/</title>
          <p>At this part, our work is based on the assumption that a narrative peak is a point in the
movie where the narrator raises his voice within a given phrase, in order to emphasize
a certain idea. This means that a group of words is said more intensely than the way
previous words are said and, since this applies in any language, we were able to
develop a language independent application.</p>
          <p>This is why our approach is based on two aspects of the video: the sound and the
ASR transcript.</p>
          <p>The first step is the extraction of the audio from a given video file, which we
accomplished by using FFmpeg4 codec. We then computed the average intensity for
each word from the transcript, by using Fast Fourier Transformations (FFT5) to
analyze the sound.</p>
        </sec>
        <sec id="sec-1-4-3">
          <title>4 FFmpeg: http://ffmpeg.org/</title>
          <p>5 FFT: http://en.wikipedia.org/wiki/Fast_Fourier_transform</p>
          <p>We then computed a score for any group of words (which spanned between 5 and
10 seconds) based on the previous group of words. The score is a weighted mean of
several metrics.</p>
        </sec>
      </sec>
      <sec id="sec-1-5">
        <title>Example of results for BG_36926.cinepak.avi :</title>
      </sec>
      <sec id="sec-1-6">
        <title>Interval</title>
        <p>We then considered only the top 3 scores, which were exported in .anvil format for
later use in Anvil Player. An example of output is in next table:</p>
        <sec id="sec-1-6-1">
          <title>We submitted 3 runs with following characteristics:</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4 Conclusions</title>
      <p>This paper presents the UAIC system which took part in the VideoCLEF 2009
competition. Our group built two separated systems for tasks “Subject Classification”
and “Affect Detection”.</p>
      <p>For Subject Classification task we created two resources starting from Wikipedia
pages and pages identified with Google search engine. These resources are then used
by Lucene and Weka tools for classification.</p>
      <p>For Affect Detection task we extract the audio component from a given video file,
with FFmpeg codec. After that, we computed the average intensity for each word
from the transcript, by using Fast Fourier Transformations to analyze the sound. In the
end, like final result in this task we considered only the top 3 values obtained in
previous step.</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgements References</title>
      <p>We also like to give a special “thank you” to those who helped from the very
beginning of the project: our colleagues from second year group 1 B.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hatcher</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gospodnetic</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Lucene in action</article-title>
          .
          <source>Manning Publications Co</source>
          .
          <article-title>(</article-title>
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Data Mining: Practical machine learning tools and techniques, 2nd Edition</article-title>
          . Morgan Kaufmann, San Francisco. Retrieved 2007-
          <volume>06</volume>
          -
          <fpage>25</fpage>
          . (
          <year>2005</year>
          ) http://www.cs.waikato.ac.nz/~ml/weka/book.html.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>