<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Experimenting Text Summarization on Multimodal Aggregation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giuliano Armano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Giuliani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Messina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Montagnuolo</string-name>
          <email>maurizio.montagnuolog@rai.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eloisa Vargiu</string-name>
          <email>vargiug@diee.unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>A. Messina and M. Montagnuolo RAI Centre for Research and Technological Innovation</institution>
          ,
          <addr-line>C.so Giambone, 68, I10135 Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>G. Armano, A. Giuliani and E. Vargiu University of Cagliari, Dept.of Electrical and Electronic Engineering</institution>
          ,
          <addr-line>Piazza d'Armi, I09123 Cagliari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays, Web is characterized by a growing availability of multimedia data together with a strong need for integrating different media and modalities of interaction. Hence, the main goal is to bring into the Web data thought and produced for different media, such as TV or radio content. In this scenario, we focus on multimodal news aggregation retrieval and fusion. In particular, we present preliminary experiments aimed at automatically suggesting keywords to news and news aggregations. The proposed solution is based on the adoption of extraction-based text summarization techniques. Experiments are aimed at comparing the selected text summarization techniques with respect to a simple technique based on part-ofspeech tagging. Results show that the proposed solution performs better than the baseline solution in terms of precision, recall, and F1.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Modern broadcasters are facing an unprecedented technological revolution from
traditional dedicated equipment to commodity hardware and software components, and
from yesterday-one-to-many delivery paradigms to nowadays-Internet-based
interactive platforms. In this challenging scenario, information engineering and
integration plays a vital role in optimizing costs and quality of the provided services, and
in reducing the “time to market” of data.</p>
      <p>In this scenario, this paper focuses on multimodal news aggregation, retrieval,
and fruition. Multimodality is intended as the capability of processing, gathering,
manipulating, and organizing data from multiple media (e.g., television, radio, the
Internet) and made of different modalities such as audio, speech, text, image, and
video. In particular, we present a preliminary study aimed at automatically
generating tag clouds for representing the content of multimodal aggregations (MMAs) of
news information from television and from the Internet. To this end, we propose a
solution based on Text Summarization (TS) and we make experiments to compare
classical extraction-based TS techniques with respect to a simple technique based
on part-of-speech (POS) tagging.</p>
      <p>
        The rest of the paper is organized as follows. Section 2 recalls relevant work
on multimedia semantics, information fusion, heterogeneous data clustering, and
text summarization. In Section 3, we recall the model for multimodal aggregation
previously presented in [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] and we illustrate how news are stored according to that
model. Section 4 focuses on the problem addressed in this paper by describing the
adopted extraction-based TS techniques. In Section 5, we illustrate our experiments
aimed at exploiting TS in MMA. Section 6 ends the paper reporting conclusions and
future research directions.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2 Background</title>
      <sec id="sec-2-1">
        <title>2.1 Multimedia Semantics</title>
        <p>
          Recently, several research activities have attempted to provide the state-of-the-art of
content-analysis-based extraction of multimedia semantics, with the stated intention
to provide a unified perspective to the field [
          <xref ref-type="bibr" rid="ref10 ref12 ref19 ref31">10, 12, 19, 31</xref>
          ]. Mostly, these works
succeed in giving a complete and updated panorama of the existing techniques based on
content analysis for multimedia knowledge representation. In our opinion, the work
done so far has only partially achieved the objective of giving a deep understanding
of problems related to multimedia semantics. This statement comes from the
observation that only a very few research solutions and tools end up to be useful for
practical purposes in the media industry. In our opinion, this is due to a significant
lack of precision in the definition of relevant problems, which led to huge research
efforts, but only seldom in directions exploitable by the media industry (e.g.,
broadcasters, publishers, producers) in a straightforward way. Emergent technologies like
Omni-Directional Video [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] enhance the urgency of a high-level re-elaboration of
the discipline.
        </p>
        <p>
          Modern research efforts in multimedia information retrieval (MIR) have been
recently summarized by Lew et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. One of the key issues pointed out by the
authors is the lack of a common and accepted test set for researchers conducting
experiments in the field of MIR. The somewhat central claim of Lew et al. is that
published test sets are typically scarcely relevant for real-world applications, so that
this situation may bring in the risk to see the research community around MIR to
be “isolated from real-world interests” in the near future. This claim sounds as a
serious alarm bell for researchers and practitioners of the field. Let us also consider
that in concrete scenarios, as the one proposed in [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], the accuracy figures obtained
by state-of-the-art tools may not be fully satisfactory for an industrial exploitation
[
          <xref ref-type="bibr" rid="ref17 ref23">17, 23</xref>
          ]. Lew et al. give also an interesting hint on some future research directions,
including human centered methods, multimedia collaboration, neuroscience
methods exploitations, folksonomies.
        </p>
        <p>
          Integration between semantic Web technologies and multimedia retrieval
techniques is considered a future challenge by many researchers [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. In this field, the
task of concept detection is concerned with identifying of instances of semantically
evocative language terms through the numerical analysis of multimedia items. The
work of Bertini et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] proposes a solution in the domain of television sport
programmes. Their approach uses a static hierarchy of classes (named pictorially
enriched ontology) to describe the prototypical situations findable in football matches
and associate them with low-level visual descriptors. In [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], the authors present
a complete system for creating multimedia ontologies, automatic annotation, and
video sequences retrieval based on ontology reasoning.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Information Fusion and Heterogeneous Data Clustering</title>
        <p>Information (or data) fusion can be defined as the set of methods that combine
data from multiple sources and use the obtained information to discover additional
knowledge, potentially not discoverable by the analysis of the individual sources.</p>
        <p>
          First attempts to organize a theory have been done in [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], in which the author
proposes a cross-document structure theory and a taxonomy of cross-document
relationships. Recently, some proposals have been made to provide a unifying view.
The work in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] classifies information fusion systems in terms of the underlying
theory and formal languages. Moreover, in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], the author describes a method
(Finite Set Statistics) which unifies most of the research on information fusion under a
Bayesian paradigm.
        </p>
        <p>
          Many information fusion approaches currently exist in many areas of research,
e.g., multi-sensor information fusion, notably related to military and security
applications, and multimedia information fusion. In the latter branch, the closest to the
present research, the work in [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] analyses best practices for selection and
optimization of multimodal features for semantic information extraction from multimedia
data. More recent relevant works are [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] and [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. In [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], the authors present a
self-organizing network model for the fusion of multimedia information. In [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ],
the authors implement and evaluate a fusion platform implementing the framework
within a recommendation system for smart television in which TV programme
descriptions coming from different sources of information are fused.
        </p>
        <p>
          Heterogeneous data clustering is the usage of techniques and methods to
aggregate data objects that are different in nature, for example video clips and textual
documents. A type of heterogeneous data clustering is co-clustering, which allows
simultaneous clustering of the rows and columns of a matrix. Given a set of m rows
in n columns, a co-clustering algorithm generates co-clusters, i.e., a subset of rows
which exhibit similar behavior across a subset of columns, or vice-versa. One of
the first methods conceived to solve the co-clustering of documents using word sets
as features is represented by [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], where RSS items are aggregated according to a
taxonomy of topics. More challenging approaches are those employing both
crossmodal information channels, such as radio, TV, the Internet, and multimedia data
[
          <xref ref-type="bibr" rid="ref34 ref9">9, 34</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3 Text Summarization</title>
        <p>
          Radev et al. [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] define a summary as “a text that is produced from one or more
texts, that conveys important information in the original text(s), and that is no longer
than half of the original text(s) and usually significantly less than that”. This
simple definition highlights three important aspects that characterize the research on
automatic summarization: (i) summaries may be produced from a single document
or multiple documents; (ii) summaries should preserve important information; and
(iii) summaries should be short. Unfortunately, attempts to provide a more elaborate
definition for this task are in disagreement within the community [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>
          Summarization techniques can be divided in two groups [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]: those that extract
information from the source documents (extraction-based approaches) and those
that abstract from the source documents (abstraction-based approaches). The
former impose the constraint that a summary uses only components extracted from
the source document. These approaches put strong emphasis on the form, aiming to
produce a grammatical summary, which usually requires advanced language
generation techniques. The latter latter relax the constraints on how the summary is
created. These approaches are mainly concerned with what the summary content
should be, usually relying solely on extraction of sentences.
        </p>
        <p>Although potentially more powerful, abstraction-based approaches have been far
less popular than their extraction-based counterparts, mainly because generating the
latter is easier. While focusing on information retrieval, one can also consider
topicdriven summarization, which assumes that the summary content depends on the
preferences of the user and can be assessed via a query, making the final summary
focused on a particular topic. Since in this paper we are interested in extracting
suitable keywords, we exclusively focus on extraction-based methods.</p>
        <p>An extraction-based summary consists of a subset of words from the original
document and its bag of words (BoW ) representation can be created by selectively
removing a number of features from the original term set. Typically, an
extractionbased summary whose length is only 10-15% of the original is likely to lead to a
significant feature reduction as well. Many studies suggest that even simple summaries
are quite effective in carrying over the relevant information about a document. From
a text categorization perspective, their advantage over specialized feature selection
methods lies in their reliance on a single document (the one that is being
summarized) without computing the statistics for all documents sharing the same category
label, or even for all documents in a collection. Moreover, various forms of
summaries become ubiquitous on the Web and in certain cases their accessibility may
grow faster than that of full documents.</p>
        <p>
          Earliest instances of research on summarization of scientific documents extract
salient sentences from text using features like word and phrase frequency [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ],
position in the text [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and key phrases [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Various works published since then had
concentrated on other domains, mostly on newswire data [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. Many approaches
addressed the problem by building systems dependent on the type of the required
summary.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Multimodal Aggregation</title>
      <p>
        Multimodal aggregation of heterogeneous data, also known as information mash-up,
is a hot topic in the World Wide Web community. A multimodal aggregator is a
system that merges content from different data sources (e.g., Web portals, IPTV, etc.)
to produce new, hybrid data that was not originally provided. Here, the challenge
lies in the ability of combining and presenting heterogeneous data coming from
multiple information sources, i.e., multimedia, and consisting of multiple types of
content, i.e., cross-modal. As a result of this technological breakthrough, the
content of modern Web is characterized by an impressive growth of multimedia data,
together with a strong trend towards integration of different media and modalities
of interaction. The mainstream paradigm consists in bringing into the Web what
was thought (and produced) for different media, like TV content (acquired and
published on websites and then made available for indexing, tagging, and browsing).
This gives rise to the so-called Web Sink Effect. This effect has rapidly started,
recently, to unleash an ineluctable evolution from the original concept of the Web as
a resource where to publish things produced in various forms outside the Web, to
a world where things are born and live on the Web. In this paper, we adopt Web
newspaper articles and TV newscasts as information sources to produce multimodal
aggregations of informative content integrating items coming from both
contributions. In the following of this section we briefly overview the main ideas behind this
task, and we point the interested reader to [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] for details.
      </p>
      <p>
        The corresponding system can be thought as a processing machine having two
inputs, i.e., digitized broadcast news streams (DTV) and online newspapers feeds
(RSSF), and one output, i.e., the multimodal aggregations that are automatically
determined from the semantic aggregation of the input streams by applying a
coclustering algorithm whose kernel is an asymmetric relevance function between
information items [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Television news items are automatically extracted from the daily programming
of several national TV channels. The digital television stream is acquired and
partitioned into single programmes. On such programmes, newscast detection and
segmentation into elementary news stories is performed. The audio track of each story
is finally transcribed by a speech-to-text engine and indexed for storage and
retrieval. Further details can be found in [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
      <p>The RSSF stream consists of RSS feeds from several major online newspapers
and press agencies. Each published article is downloaded, analyzed, and indexed for
search purposes. The first step of the procedure consists in cleaning the downloaded
article Web pages from boilerplate content, i.e., HTML markups, links, scripts, and
styles. Linguistic analysis, i.e., sentence boundary detection, sentence tokenization,
word lemmatization and POS tagging, is then performed on the extrapolated
contents. The output of this analysis is then used to transform the RSS content into a
query to access the audio transcriptions of the DTV news stories, thus allowing to
combine text and multimedia in an easy way.</p>
      <p>The output of the clustering process is a set of multimodal aggregations of
broadcast news stories and newspaper articles related to the same topic. TV news stories
and Web newspaper articles are fully cross-referenced and indexed. For each
multimodal aggregation, users can use automatically extracted tag clouds, to perform
local or Web searches. Local searches can be performed either on the specific
aggregation the tags belong to or to the global set of discovered multimodal aggregations.
Tag clouds are automatically extracted from each thread topic as follows: (i) each
word classified as proper noun by the linguistic analysis is a tag; (ii) a tag belongs to
a multimodal aggregation if it is present in at least one aggregated news article; and
(iii) the size of a tag is proportional to the cumulative duration of television news
items which are semantically relevant to the aggregated news article to which the
tag belongs. In so doing, each news aggregation, also called subject, is described by
a set of attributes, the main being:
info, the general information included title and description;
categories, the set of most relevant categories to which the news aggregation
belong. They are automatically assigned by AI:Categorizer1, trained with radio
programme transcriptions, according to a set of journalistic categories (e.g.,
Politics, Currents Affairs, Sports);
tagclouds, a set of automatically-generated keywords;
items, the set of Web articles that compose the aggregation;
videonews, the collection of relevant newscasts stories that compose the news
aggregation.</p>
      <p>Hence, a news aggregation is composed by online articles (items) and parts of
newscasts (videonews). In this paper, we concentrate only in the former. Each item
is described as set of attributes, such as:
pubdate, the timestamp of first publication;
lastupdate, the timestamp when the item was updated;
link, the URL of the news Web page;
feed, the RSS feed link that includes the item;
title, the title;
description, the content;</p>
      <sec id="sec-3-1">
        <title>1 http://search.cpan.org/ kwilliams/AI-Categorizer-0.09/lib/AI/Categorizer.pm</title>
        <p>category, the category to which the news belong (according to the previously
mentioned classification procedure);
keywords, the keywords automatically extracted as described above.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Text Summarization in Multimodal Aggregation</title>
      <p>
        In this paper, we are interested in automatically suggesting keywords to news and
news aggregations in the area of news distribution and retrieval. In particular, we are
aimed at selecting keywords relevant to the news and news aggregations. Among
other solutions, we decided to use suitable extraction-based TS techniques. To this
end, we first consider six straightforward but effective extraction-based text
summarization techniques proposed and compared in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] (in all cases, a word occurring
at least three times in the body of a document is a keyword, while a word occurring
at least once in the title of a document is a title-word):
      </p>
      <sec id="sec-4-1">
        <title>Title (T), the title of a document;</title>
        <p>First Paragraph (FP), the first paragraph of a document;
First Two Paragraphs (F2P), the first two paragraphs of a document;
First and Last Paragraphs (FLP), the first and the last paragraphs of a document;
Paragraph with most keywords (MK), the paragraph that has the highest number
of keywords;
Paragraph with Most Title-words (MT), the paragraph that has the highest
number of title-words.</p>
        <p>Let us note that we decided to not consider the Best Sentence technique, i.e. the
technique that takes into account sentences in the document that contain at least 3
title-words and at least 4 keywords. This method was defined to extract summaries
from textual documents such as articles, scientific papers and books. In fact, news
are often inadequate to find meaningful sentences composed by at least 3 title-words
and 4 keywords in the same sentence.</p>
        <p>
          Furthermore, we consider the enriched techniques proposed in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]:
Title and First Paragraph (TFP), the title of a document and its first paragraph:
Title and First Two Paragraphs (TF2P), the title of a document and its first two
paragraphs;
Title, First and Last Paragraphs (TFLP), the title of a document and its first and
last paragraphs;
Most Title-words and Keywords (MTK), the paragraph with the highest number
of title-words and that with the highest number of keywords.
        </p>
        <p>
          One may argue that the above methods are too simple. However, as shown in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ],
extraction-based summaries of news articles can be more informative than those
resulting from more complex approaches. Also, headline-based article descriptors
proved to be effective in determining user’s interests [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Moreover, these
techniques have been successfully applied in the contextual advertising field [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5 Experiments and Results</title>
      <p>To assess the effectiveness of TS in the task of suggesting relevant keywords to news
and news aggregations, we perform some comparative experiments. In particular,
we performed two sets of experiments: (i) experiments on the sole news comparing
the performance with those corresponding to the adoption of the keywords provided
in the keyword attribute and (ii) experiments on news aggregations comparing the
performance with those corresponding to the adoption of the keywords provided in
the tagclouds attribute. Results have been calculated in terms of precision, recall,
and F1 by exploiting a suitable classifier.</p>
      <p>Experiments have been performed on about 45,000 Italian news and 4,800 news
aggregations from January 16, 2011 to May 26, 2011. The adopted dataset is
composed by XML files, each one describing a subject according to the attributes
described in Section 3. News and news aggregations were previously classified into 15
categories, i.e., the same categories adopted for describing news and news
aggregations.</p>
      <sec id="sec-5-1">
        <title>5.1 Experimenting Text Summarization on News</title>
        <p>Experiments on news have been performed by adopting a system that takes as input
an XML file that contains all the information regarding a news aggregation. For each
TS technique, first the system extracts the news, parses each of them, and adopts
stop-word removing and stemming. Then, it applies the selected TS technique to
extract the corresponding keywords in a vector representation (BoW ). To calculate
the effectiveness of that technique, the extracted BoW is given as input to a
centroidbased classifier, which represents each category with a centroid calculated starting
from a suitable training set2. A BoW vector is then classified by measuring the
distance between it and each centroid, by adopting the cosine similarity measure.</p>
        <p>Performances are calculated in terms of precision, recall, and F1. As for the
baseline technique (B), we considered the BoW corresponding to the set of keywords of
the keywords attribute. Table 1 summarizes the results.
2 In order to evaluate the effectiveness of the classifier, we performed a preliminary experiment in
which news are classified without using TS. The classifier shown a precision of 0.862 and a recall
of 0.858.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2 Experimenting Text Summarization on News Aggregations</title>
        <p>Experiments on news aggregations have been performed in a way similar to the one
adopted for the sole news. For each TS technique, first the system processes each
news belonging to the news aggregation in order to parse it, to disregard stop-words,
and to stem each remaining term. Then, it applies to each news the selected TS
technique in order to extract the corresponding keywords in a BoW representation.
Each extracted BoW is then given in input to the same centroid-based classifier
used for the news. The category to which the news aggregation belongs to is then
calculated averaging the scores given by the classifier for each involved item.</p>
        <p>Table 2 shows the results obtained by comparing each TS technique, the baseline
(B) being the BoW corresponding to the set of keywords of the tagclouds attribute.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3 Discussion</title>
        <p>Results clearly show that, for both news and news aggregations, TS improves
performances with respect to the adoption of the baseline keywords. In particular, best
performances in terms of precision, recall, and –hence– F1, are obtained by
adopting the TF2P technique. The last row of Table 1 and Table 2 shows the number of
terms extracted by each TS technique. It is easy to note that, except for the T
technique, TS techniques extract a number of terms greater than that extracted by the
baseline approach. Let us also note that precision, recall, and F1 calculated for news
aggregations are always better than those calculated for news. This is due to the fact
that news aggregations are more informative than the sole news and the number of
extracted keywords is greater.</p>
        <p>To better highlight the adopted extraction techniques, Figure 1 shows the
description of a news aggregation3, a selection of its tag clouds, and a selection of the
keywords extracted by the most effective TS technique, i.e., TF2P.
3 Actually, we are using Italian news but, for the sake of clarity, we decided to translate it</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6 Conclusions and Future Work</title>
      <p>In this paper, we presented a preliminary study aimed at verifying the effectiveness
of adopting text summarization techniques to suggest keywords to news and news
aggregations in a multimodal aggregation system. To perform our study, we
compared ten different extraction-based techniques with the keywords provided by the
adopted multimodal aggregation system. Results, calculated in terms of precision,
recall, and F1, shown that the best performances are obtained when using the TF2P
technique for both news and news aggregations. In other words, the best set of
keywords is obtained considering the title, the first and second paragraph of each news.</p>
      <p>As for the future work, we are setting up new experiments aimed at using
further metrics to evaluate the effectiveness of the adopted techniques. In particular,
we are studying how to adapt the approach to measure the keyword effectiveness
index. Furthermore, we are setting up experiments aimed at investigating if
merging the baseline keywords with those extracted by the most effective adopted text
summarization technique lead to an improvement. Moreover, we are planning to
select some users, asking them to give a degree of relevance to each keyword, e.g.,
relevant, somewhat relevant, or irrelevant.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Armano</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giuliani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vargiu</surname>
          </string-name>
          , E.:
          <article-title>Experimenting text summarization techniques for contextual advertising</article-title>
          .
          <source>In: IIR'11: Proceedings of the 2nd Italian Information Retrieval (IIR) Workshop</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Armano</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giuliani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vargiu</surname>
          </string-name>
          , E.:
          <article-title>Studying the impact of text summarization on contextual advertising</article-title>
          .
          <source>In: 8th International Workshop on Text-based Information Retrieval</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baxendale</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>Machine-made index for technical literature - an experiment</article-title>
          .
          <source>IBM Journal of Research and Development</source>
          <volume>2</volume>
          ,
          <fpage>354</fpage>
          -
          <lpage>361</lpage>
          (
          <year>1958</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bertini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Del Bimbo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torniai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Enhanced ontologies for video annotation and retrieval</article-title>
          .
          <source>In: Proc. of the 7th ACM SIGMM international workshop on Multimedia information retrieval</source>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>96</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bertini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Del Bimbo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torniai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Automatic annotation and semantic retrieval of video sequences using multimedia ontologies</article-title>
          .
          <source>In: Proc. of the 14th annual ACM international conference on Multimedia</source>
          , pp.
          <fpage>679</fpage>
          -
          <lpage>682</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Brandow</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitze</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rau</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          :
          <article-title>Automatic condensation of electronic publications by sentence selection</article-title>
          .
          <source>Information Processing Management</source>
          <volume>31</volume>
          ,
          <fpage>675</fpage>
          -
          <lpage>685</lpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martins</surname>
            ,
            <given-names>A.F.</given-names>
          </string-name>
          :
          <article-title>A survey on automatic text summarization</article-title>
          .
          <source>Tech. Rep</source>
          .
          <article-title>Literature Survey for the Language and Statistics II course at CMU (</article-title>
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Deschacht</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moens</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          :
          <article-title>Finding the Best Picture: Cross-Media Retrieval of Content</article-title>
          .
          <source>In: Proc. of ECIR</source>
          <year>2008</year>
          , pp.
          <fpage>539</fpage>
          -
          <lpage>546</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dimitrova</surname>
          </string-name>
          , N.:
          <article-title>Multimedia content analysis: The next wave</article-title>
          .
          <source>In: CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval</source>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>18</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Edmundson</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          :
          <article-title>New methods in automatic extracting</article-title>
          .
          <source>Journal of ACM</source>
          <volume>16</volume>
          ,
          <fpage>264</fpage>
          -
          <lpage>285</lpage>
          (
          <year>1969</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hanjalic</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Content-Based Analysis of Digital Video</article-title>
          . Kluwer Academic Publishers (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanaka</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Modeling omni-directional video</article-title>
          .
          <source>In: Advances in Multimedia Modeling, 13th International Multimedia Modeling Conference, MMM 2007</source>
          , pp.
          <fpage>176</fpage>
          -
          <lpage>187</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kokar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Formalizing classes of information fusion systems</article-title>
          .
          <source>Information Fusion</source>
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <fpage>189</fpage>
          -
          <lpage>202</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Ko´lcz,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Alspector</surname>
          </string-name>
          , J.:
          <article-title>Asymmetric missing-data problems: Overcoming the lack of negative data in preference ranking</article-title>
          .
          <source>Information Retrieval 5</source>
          ,
          <fpage>5</fpage>
          -
          <lpage>40</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kolcz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prabakarmurthi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalita</surname>
          </string-name>
          , J.:
          <article-title>Summarization as feature selection for text categorization</article-title>
          .
          <source>In: CIKM '01: Proceedings of the tenth international conference on Information and knowledge management</source>
          , pp.
          <fpage>365</fpage>
          -
          <lpage>370</lpage>
          . ACM, New York, NY, USA (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kraaj</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smeaton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Over</surname>
            ,
            <given-names>P.: TRECVID</given-names>
          </string-name>
          <year>2004</year>
          :
          <article-title>An overview</article-title>
          .
          <source>In: Proc. of TRECVID Workshop</source>
          <year>2004</year>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Laudy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganascia</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          :
          <article-title>Information fusion in a tv program recommendation system</article-title>
          .
          <source>In: 11th International Conference on Information Fusion</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Lew</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebe</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Djeraba</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
          </string-name>
          , R.:
          <article-title>Content-based multimedia information retrieval: State of the art and challenges</article-title>
          .
          <source>ACM Transactions on Multimedia Computing, Communications and Applications</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          :
          <article-title>A novel clustering-based RSS aggregator</article-title>
          .
          <source>In: Proc. of WWW07</source>
          , pp.
          <fpage>1309</fpage>
          -
          <lpage>1310</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Luhn</surname>
          </string-name>
          , H.:
          <article-title>The automatic creation of literature abstracts</article-title>
          .
          <source>IBM Journal of Research and Development</source>
          <volume>2</volume>
          ,
          <fpage>159</fpage>
          -
          <lpage>165</lpage>
          (
          <year>1958</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Mahler</surname>
            ,
            <given-names>R.P.S.: Statistical</given-names>
          </string-name>
          <string-name>
            <surname>Multisource-Multitarget Information Fusion. Artech House</surname>
          </string-name>
          , Inc., Norwood, MA, USA (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Messina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bailer</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schallauer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , et al., V.T.:
          <article-title>Content analysis tools</article-title>
          .
          <source>Deliverable</source>
          <volume>15</volume>
          .4,
          <string-name>
            <given-names>PrestoSpace</given-names>
            <surname>Consortium</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Messina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boch</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimino</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allasia</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , et al., R.B.:
          <article-title>Creating rich metadata in the tv broadcast archives environment: the prestospace project</article-title>
          .
          <source>In: Proc. of IEEE AXMEDIS06 Conference</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Messina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgotallo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimino</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gnota</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boch</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Ants: A complete system for automatic news programme annotation based on multimodal analysis</article-title>
          .
          <source>In: Intl. Workshop on Image Analysis for Multimedia Interactive Services</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Messina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montagnuolo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Information Retrieval and Mining in Distributed Environments, chap. Multimodal Aggregation and Recommendation Technologies Applied to Informative Content Distribution and</article-title>
          <string-name>
            <given-names>Retrieval. A.</given-names>
            <surname>Soro</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Vargiu</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Armano</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Paddeu</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Messina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montagnuolo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Heterogeneous data co-clustering by pseudo-semantic affinity functions</article-title>
          .
          <source>In: Proc. of the 2nd Italian Information Retrieval Workshop</source>
          (IIR) (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Automatic text summarization of newswire: lessons learned from the document understanding conference</article-title>
          .
          <source>In: Proceedings of the 20th national conference on Artificial intelligence -</source>
          Volume
          <volume>3</volume>
          , pp.
          <fpage>1436</fpage>
          -
          <lpage>1441</lpage>
          . AAAI Press (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>L.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woon</surname>
            ,
            <given-names>K.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>A.H.:</given-names>
          </string-name>
          <article-title>A self-organizing neural model for multimedia information fusion</article-title>
          .
          <source>In: 11th International Conference on Information Fusion</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.R.:</given-names>
          </string-name>
          <article-title>A common theory of information fusion from multiple text sources step one: cross-document structure</article-title>
          .
          <source>In: Proceedings of the 1st SIGdial workshop on Discourse and dialogue</source>
          , pp.
          <fpage>74</fpage>
          -
          <lpage>83</lpage>
          . Association for Computational Linguistics, Morristown, NJ, USA (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McKeown</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Introduction to the special issue on summarization</article-title>
          .
          <source>Computational Linguistic</source>
          <volume>28</volume>
          ,
          <fpage>399</fpage>
          -
          <lpage>408</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Snoek</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Worring</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Multimodal video indexing: A review of the state-of-the-art</article-title>
          .
          <source>In: Multimedia Tools and Applications</source>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>35</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>nReader: reading news quickly, deeply and vividly</article-title>
          .
          <source>In: In Proc. of CHI '06 extended abstracts on Human factors in computing systems</source>
          , pp.
          <fpage>1385</fpage>
          -
          <lpage>1390</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>E.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          :
          <article-title>Optimal multimodal fusion for multimedia data analysis</article-title>
          .
          <source>In: MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia</source>
          , pp.
          <fpage>572</fpage>
          -
          <lpage>579</lpage>
          . ACM, New York, NY, USA (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Zhang, Y.:
          <article-title>A novel framework for semantic annotation and personalized retrieval of sports video</article-title>
          .
          <source>IEEE Trans. on Multimedia</source>
          <volume>10</volume>
          (
          <issue>3</issue>
          ),
          <fpage>421</fpage>
          -
          <lpage>436</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>