<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Managing Knowledge Extraction and Retrieval from Multimedia Contents: a Case Study in Judicial Domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elisabetta Fersini</string-name>
          <email>fersini@disco.unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Cislaghi</string-name>
          <email>Mauro.Cislaghi@p-a.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Mazzilli</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Callegaro</string-name>
          <email>Fabrizio.Callegaro@p-a.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Somaschini</string-name>
          <email>Stefano.Somaschini@p-a.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Muscillo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domenico Pellegrini</string-name>
          <email>Domenico.Pellegrini@giustizia.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Consorzio Milano Ricerche</institution>
          ,
          <addr-line>Via Cozzi, 53 - Edificio U5, 20125 Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Italian Ministry of Justice</institution>
          ,
          <addr-line>Via Crescenzio 17/C, 00193, Roma</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Project Automation S.p.A.</institution>
          ,
          <addr-line>Viale Elvezia 42, 20052, Monza</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present the main challenges and opportunities in exploiting the knowledge embedded into multimedia judicial folders in criminal trials and their influence on the courtroom infrastructure. The paper describes the results of a one year analysis conducted in the Italian and Polish Courtrooms and how to face them in order to make this knowledge available to judicial operators, focusing on the criminal cases during the trials phase.</p>
      </abstract>
      <kwd-group>
        <kwd>digital libraries</kwd>
        <kwd>audio processing</kwd>
        <kwd>video processing</kwd>
        <kwd>e-justice</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The available information technologies have significantly expanded quantity and type
of information obtained by End Users.</p>
      <p>In addition to a virtually unlimited supply of unstructured text content, there is the
opportunity to access an even broader set of static multimedia content, typically
images, and dynamic digital sound recordings and audio-video. This raises the
problem of indexing this content, to enable an efficient and effective information
recovery by the judicial operators. In parallel to the traditional ways of indexing
multimedia contents, providing manual addition of metadata to catalogue and recover
the contents, in the last years techniques for automation knowledge extraction made
relevant progresses. These technologies are able to extract automatically information
from textual and multimedia folders and link this information to concepts directly
used by the user, such as the recognition of scene changes in a video or time in a
audio recording, identification of an object in an image or a video or a concept in a
text, identifying the emotional state of a person. This article analyzes the possibilities
offered by the use of techniques for automatic knowledge extraction from multimedia
contents, with specific application to the legal domain and criminal trials, illustrating
on the technical point of view benefits and limitations of using these technologies.
The specific application context, the criminal trial, has been developed in the JUMAS
project, co-funded by the European Commission in the context of the ICT program
7th Framework program.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Content Management in the Judicial Domain</title>
      <p>
        The ICT systems in use in the judicial domain [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] during trials can be divided into
two major categories: the Case Management Systems (CMS), for the management of
administrative and procedural information of the legal proceedings, and the electronic
Court Systems (eCS) that support the execution of debate hearings and allow the
management of the documentation produced during the trials. A CMS is a
transactional system designed to manage into a relational database the events and the
data provided by the Criminal Procedure and needed by the users involved in the
management of the legal proceedings. The information inserted into a CMS follows
the “life cycle” of the criminal case, such as case opening, investigation phase,
predebate phase, debate and post trial enforcement and surveillance. A CMS may be
considered as the storage of the static data related to the criminal trials.
      </p>
      <p>Examples of the information processed in a CMS during a trial are: general
information of the trial (registration number, type of trial, kind of criminal act, current
status, list of the defendants with their lawyers, names of judges involved, etc.), legal
acts to imprison the defendants or against the properties, requests by the defendant to
review the acts, the separation or the combination of two or more proceedings, the
outcome of the case: the sentence.</p>
      <p>CMS can also include simple content, mostly textual, management functionalities,
such as the storage of the hearing transcription or the assisted generation of trail
documentation (e.g. the hearing report). Considering the complexity of the acquisition
and consultation process, the complexity of the infrastructure and jointly the
importance of the documentation acquired during the legal proceedings, the most
adopted solution is to use dedicated document information system, the eCS, integrated
with the CMS, and dedicated to the management of the digital trial folder, including
the access management to third parts (e.g. Lawyers).</p>
      <p>Information managed by the CMS and the ECS are both an essential reference to
ensure compliance with the provisions of the Code of Criminal Procedure and the
main information source on which Public Prosecutor and attorneys build their
arguments.</p>
      <p>The complete set of data stored in the transactional database of CMS, but mainly
the data managed by the eCS regarding the not structured contents (the documents),
constitute the reference information source for the end users (prosecutors, lawyers,
judges). During his/her legal activities, Prosecutors and lawyers use CMS and eCS to
consult the contents and also to:
•
•
•
•
obtain knowledge from textual contents, such as the extraction from the
transcriptions of lists of names, places, dates;
classify and search the textual contents based on a particular personal
classification, such as collection all the specific instants of a defendant
deposition to use them in their thesis;
search and consult audio-video recordings of a hearing or part of them,
synchronized with the corresponding textual transcription to review a
particular deposition or other phases of the trial;
produce audio-video-text highlights of a trial to support their legal activities.</p>
      <p>The generation of the judicial decision is therefore based, both for judges,
prosecutors and lawyers, on the detailed review of all trial documentation, including
evidences gathered during the investigations and showed in the courtroom, becoming
fundamentally to the objectivity of the conclusions and the economy of work. These
activities require a significant manual work to allow the users to recovery, extract,
reorganize and link the contents useful for their judicial activities. Audio-video
contents are accessible only sequentially, while the text documents acquired during
the various judicial phases are organized in a chronological structure.</p>
      <p>
        An influencing element in many judicial systems is the time latency in the
availability of the information, directly consequence of the average duration of a
criminal case. In Italy, the average [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] time from investigation beginning and the
corresponding debate start is 1000 days; to this it has to be added debate duration,
averagely 300 days for monocratic (with one judge) and 560 for collegial (with more
judges) trials.
      </p>
      <p>Every year about 550,000 manual transcriptions are produced, for a total of over
6.5 million pages, and a yearly cost of more than 20 million Euro. This do not include
the costs for waiting the transcription, difficult to quantify.</p>
      <p>A high percentage of cases are sent to Appeal and Cassation Courts (2nd and 3rd
level), where trials are carried out on the basis of all documents produced in the
previous judgment level(s).</p>
      <p>These data give evidence of the opportunity to provide advanced tools to
categorize and retrieve efficiently the amount of rich audio-video-text, considering
both the innate heterogeneity that the required time availability.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Automatic Management of Multimedia Contents in the Judicial</title>
    </sec>
    <sec id="sec-4">
      <title>Domain</title>
      <p>The introduction of automatic knowledge extraction, considering especially the
integration with the eCS, can give significant benefits for a full and efficient use of
contents, especially unstructured audio-video-text, collected during the entire cycle of
the judicial action.</p>
      <p>
        The technologies currently available for automatic indexing of multimedia content
are particularly focused on the extraction both syntactic and semantic information.
The knowledge obtainable from the judicial multimedia contents can usefully be
organized in a hierarchy of abstraction levels [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In case of video content, the basic
physical properties of the video itself constitute the first level of information; the next
level is the identification of the elements that compose the scene; the concepts related
to the interpretation of the scene constitute an higher abstraction level.
      </p>
      <p>This modelling abstraction allows to classify the information content of a video
into syntactic, generally coincident with the information related to the physical
characteristics of images such as colour brightness or the corresponding event, and
semantics, consisting of property related concepts such as the place where the scene
takes place, the name of person / object present in the video, or the mood contained in
the scene.</p>
      <p>Recovery of multimedia content on the basis of semantic information is, compared
with the syntactic level, closer to user needs. For example in a criminal trial the
questions "shows the next intervention of the lawyer of the civil part" or "I want to
hear again the testimony of Mr. Rossi" are more immediate and easier to interpret
than "showing scenes that begin with the time t0 and t1" or "retrieves documents
where there is the word Rossi”. According to the previous approach is therefore
possible to structure into levels the knowledge extracted from multimedia content,
with the specific objective of providing search modalities as close as possible to the
users domain knowledge and efficient and automated methods of indexing and
automatic extraction of the huge corpus of audio-video-text collected during the
investigations and the trial hearings.</p>
      <p>Analysis of the applicability of specific technologies of indexing and extraction of
knowledge from audio, video and text contents in criminal justice domain follows.</p>
      <sec id="sec-4-1">
        <title>3.1 Audio Recordings Processing</title>
        <p>Automatic Speech Recognition (ASR) technologies actually available allow to extract
many relevant information embedded in the audio recordings. It is possible to classify
parts of the audio recording, for example by identifying the interval with and without
speech; to identify the speaker and the start and end moment of the respective
speeches; identify specific words (Keyword Spotting); generate the textual
transcription from audio recordings (ASR), in similar way as it is done with the
techniques for automatic character recognition (Optical Character Recognition, OCR);
to identify the emotional state (neutral, nervous, scared, angry, sad, annoyed, etc.) of
one or more speakers.</p>
        <p>Considering the casual nature of the speech and of the context where it is allocated,
it is impossible to derive a deterministic formula capable to create a link between the
acoustic signal of the speech and the related sequence of associated words.
Statisticalprobabilistic formulations are therefore used, capable to model the problem of speech
recognition on a non-deterministic point of view. In this modelling, the acoustic
observations derived from the speech audio signal allow to estimate the probability
that a specific sequence of words has been pronounced. In ASR state of the art, the
more used modelling approach is a combination of two probabilistic models:
•
•
an acoustic model, capable to represent the knowledge base of the system
about phonetics, the pronounce variability, the time dynamics
(coutterance), etc.;
a language model, capable to represent the knowledge base of the system on
the concatenation and on the word sequences.</p>
        <p>While the acoustic model has the main goal to estimate the probability that a given
word is pronounced, on the basis of the acoustic observations, the language model has
the objective to estimate in advance the probability of a words sequence.</p>
        <p>
          In both modelling, Markovian models [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] represent one of the more used
solutions. In particular, the acoustic model is composed by a concatenation of Hidden
Markovian Models (HMMs) for each phoneme, while the language model assumes
that a word sequence follows a markovian process of order n-1 (n-Grams, e.g. the
probability to have a given word depends on the previous n-1).
        </p>
        <p>Audio recordings of hearing in criminal trials and during interrogations, but also
phone interceptions, are one of the mains sources where to extract automatically
knowledge finalised to contents indexing, including those on video and text format,
and to factual knowledge collection, useful in the decision process.</p>
        <p>Automation generation of verbatim transcription starting from the digital audio
recording of an interrogation or of a hearing opens scenarios of great interest. First, it
has to be considered the possibility to supply, in a time frame almost proportional to
the one of the duration of the audio recording, a transcription substantially equivalent
to the one generated by transcribers. ASR algorithms provide also information useful
for synchronising audio and text, so making possible to consult the text and hear the
corresponding speech or to join the visualisation of an audio-video clip with the
related subtitles (“Closed Captioned”).</p>
        <p>The possibility to perform direct search on the audio recording using keywords
constitute a second relevant advantage. Thanks to the availability of the verbatim
transcription synchronised word by word with the corresponding audio recording, the
user has the possibility to search for terms or phrases in the ASR generated
transcription, so activating the corresponding audio or audio-video clips playing. In
case of application on vertical domain knowledge, such as the juridical one, the
effectiveness of the queries can be achieved through query expansion through
dictionaries (thesaurus) or ontology of the words used for the research on
automatically generated texts.</p>
        <p>Techniques for automatic audio processing thus allow to index the huge knowledge
embedded into the audio and audio-video recordings with costs that are significantly
reduced compared with the ones needed by other methods (manual transcription,
stenotype o re-speaking technique).</p>
        <p>In criminal justice, and more in general in other domains such as journalism or
security, the applicability of the above described audio processing methods is affected
mainly by the quality of the available audio recordings. Many factors may negatively
influence the performance of ASR systems: presence of multiple speakers and cross
talking in the microphones, reverberation introducing audio signal distortion, the
presence of background noise or whispering, the heterogeneity of lexica and
language, including the usage of words not belonging to the common language
(person names, acronyms and abbreviations, technical terms and dialectal
expressions), the spontaneous speech characterised by uttering, hesitations, false
starts, shouts, data compression and signal loss during the audio acquisition.</p>
        <p>Quality of interrogations and hearing recordings, can be significantly improved in
relationship with the subsequent automatic processing by ASR systems. It is possible
to dedicate separate recording tracks to the different speakers, to manage the
activation/deactivation of microphones, to use directional or even personal
microphones, to enhance the courtroom acoustics with architectural interventions and
by digitally processing the collected audio signals, to use sampling and coding
systems with reduced loss of information (lossless codecs), train and tune ASR
language and acoustic models with in-domain lexica and specific inflexions, thanks to
the wide library of available transcriptions and recordings in criminal justice. The
main challenge is to find the right balance between costs, recorded audio file
dimension and ASR performances.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2 Video Recording Processing</title>
        <p>Automatic video processing offers, in analogy with audio processing, important
advantages in terms of reducing the complexity of interpretation, search and retrieval
of multimedia contents. Application fields of today available techniques for video
processing range from automatic identification (detection) of scene elements (such as
objects or persons) ed their movements (tracking) to automatic extraction of features
(colour, brightness) referred to the scene or to portion of it to be used to extract
properties of the videoclip (such as detection of abandoned objects in case of security
applications), to the identification of specific behaviours (referred to persons or
groups) or situations (for example fights), to automation generation of summaries
(storyboards) of long video sequences.</p>
        <p>
          The first studies to extract high level semantics from multimedia data were focused
on manual text annotations. These methods were extremely expensive in terms of
human effort. More recent approaches to the problem introduce the automatic
semantic annotations [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>Different sequential steps are required in processing video recordings to create a
link between low level information coming from cameras and high level concepts
useful for the end users.</p>
        <p>Video contents are initially split into “atomic units”, called frames, eventually
filtered from the noise generated by the acquisition equipment. The next step consists
in an activity known as video time segmentation (SBD Shot Boundary Detection),
that identifies the transitions between continuous sequences of sequential frames
acquired from a single camera (shot). SBD activity finalised to shot boundary
identification is performed considering the length of the time interval, the
identification and classification of transitions between frames, the identification of
scene changes, based on colour histograms, on brightness changes, on contrast and
different intensity of pixels.</p>
        <p>At the end of the video segmentation, a further analysis is linked to the semantic
annotation of concepts.</p>
        <p>
          According to knowledge acquisition and representation processes, semantic
annotation may be distinguished into implicit, implemented through machine learning
techniques, and explicit, implemented through model-based approaches. In case of
implicit annotation, the associations between “low level features” and “semantic
concepts” are learnt automatically through Neural Networks, Hidden Markov Models
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], Bayesian Networks, Support Vector Machines [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and Genetic Algorithm. In
case on explicit annotation, a priori knowledge (in terms of facts, models, rules)
capable to provide a semantic model coherent with the domain is used.
        </p>
        <p>The low cost and relatively simple management are increasing the number of
cameras also in the courtrooms. Availability of witnesses and defendants hearing
recordings enriches with important relational elements the otherwise anonymous
textual transcriptions. Movements of face, eyes, body or hands that accompanies the
evidences of a witness or a defendant are information elements of great importance,
not available in transcriptions or audio recordings.</p>
        <p>Video recording may also be seen as one of the starting points for searching and
navigating into the knowledge collected during the hearing. The usage of algorithm
for the automatic generation of the summary (storyboard) of a hearing makes
automatically available the chronological index of the speeches (for example it is
possible to request the consultation of the forth speech of the judge). The possibility
to identify faces allows to lay automatically on the video the label with the names of
each participant to the hearing for an immediate contextualisation of the events. From
video recordings is also possible to go directly to the verbatim transcription for
example in the form of video subtitles (Closed Caption Text).</p>
        <p>Automatic extraction of information from video recordings, jointly with the audio
recordings, transforms without additional effort of the end users the video of a
criminal and civil trial hearing into a navigation tool to gain access not only to the
verbatim transcription, but also to other types of available contents (hearing report,
documents submitted during the hearing, etc.).</p>
        <p>The effectiveness of processing technologies and their consequent applicability is
affected by the quality of the video recordings generated into the courtroom. Problems
they may limit the performances of video recordings processing are mostly connected
to insufficient courtroom lighting, to the usage of analogic cameras, to camera
positioning not compatible with the automatic video processing (for example shots
containing parts of the courtroom useless for search and navigation purposes), to
analogic recording systems (VHS) that compromise recordings quality.</p>
        <p>As for audio recordings, it is possible to effectively intervene to enhance the
quality of the video shots in connection with their automatic post-processing at
acceptable costs. A practically costless operation is to arrange shots having the
hearing actors in foreground, in particular considering the great usefulness of
connecting the hearing of the depositions with the observation of the movements of
the body and the hands. A correct lightning and a correct static camera with PAL
resolution or at least 640x480 pixels and a minimum sampling frequency of 800
kb/sec will provide a not compressed signal that will result as adequate for being
processed by the algorithm of automatic extraction of knowledge.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3 Managing textual information</title>
        <p>The software available for automatic text indexing have been the enabling technology
for the global diffusion of the Internet. Search engines make constantly accessible to
the users, through simple keyword-based queries, the creation of new knowledge. The
application of Information Retrieval technologies is not limited to indexing, search
and retrieval of text, but it extends to the wider knowledge management. These
technologies are fundamental, for example, for automatic document classification in
thematic portals, for extracting valuable structured knowledge from unstructured data
(template filling) and for automatic entity and relationships extraction in specific
domain (e.g. to highlight obvious or embedded links between facts).</p>
        <p>An Information Retrieval (IR) system can be viewed as composed by three main
core elements: representation of texts, representation of search queries (user’s query)
and retrieval function stated according to a specific notion of relevance. The
approaches for realizing a retrieval model are basically of three types:
(1)
(2)
(3)
keyword-based, aimed at modelling the relevance of a document (with
regard to a query) according to Boolean conditions;
statistical-probabilistic, which makes use of clustering techniques, frequency
analysis and conditional probabilities;
semantic-based, aimed at using linguistic and conceptual representations,
such as dictionaries, ontology and semantic networks.</p>
        <p>
          Alternatives approaches are represented by hybrid-techniques, aimed at combining
"semantic-based" [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and "keyword-based" [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] retrieval methods joint with
automatic relevance feedback and reinforcement learning models for automatically
infer future user behaviour.
        </p>
        <p>The IR systems play a crucial role within legal domains, especially concerning the
huge amount of data available in a textual form derived from the digital version of
paper proceedings and judicial actions.</p>
        <p>Important applications of Information retrieval technologies are related not only to
traditional search and retrieval of textual documents, but also to the automatic
extraction of structured information from a non-structured form and to the automatic
text classification. These kinds of functionality represent an important opportunity for
improving the efficiency of accessing and sharing penal juridical knowledge.</p>
        <p>The automatic extraction of information from a non-structured source - records and
documents of criminal proceedings as evidences, reports of inspections, and even
audio/video clips - to structured records allow judicial actors to extremely simplify
the filling process of juridical data bases.</p>
        <p>For fighting the organized crime, shared hierarchies of data bases (Local, National
and Trans-national) have been exploited for supporting investigation activities and in
particular for simplifying the identification of connections between different criminal
facts and for ensuring the completeness and the availability of information in real
time. For this purpose each magistrate have to insert facts of interest about the
investigative phase – such as inquired subjects, the relationship between theme, the
mentioned criminal facts, etc. - coming from its assigned proceedings.</p>
        <p>The filling of these data bases is obtained through a complex and laborious
procedure of analysis that, starting from the legal proceedings through manual
document categorization and concept identification, inside maps non-structured
contents to structured (predefined) database entries. The procedure of extraction of
information is carried out according to a specific and formalized methodology that
consists of several phases: (1) reading and understanding of contents within textual
documents, (2) concept identification, (3) database filling, (4) resolution of eventual
ambiguous entries, (5) consistency verification of new information inserted with those
already present in the original data base. Once inserted the informative contents in a
structure form, the consultation functionalities are available to magistrates through
textual queries and evocative representations.</p>
        <p>A further area of interest related to the judicial domain is represented by the
automatic classification of textual documents. This possibility is extremely useful in
case of investigative proceedings or judicial hearing characterized by a huge amount
of documents. Also in this case the proceedings are currently manually managed,
implying then a high-cost of human effort by magistrates and court clerks.</p>
        <p>There are no limitations about the applicability of IR technologies to the judicial
domain, also taking into account the current conditions and limitations of algorithm
effectiveness. In case of adoption of semantic models, which exploit conceptual
modelling for retrieving information, there will be necessary to provide up-to-date
linguistic and concept representations. Automatic classification and extraction
methods are rarely characterized by an accuracy of 100%.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusions</title>
      <p>
        The applicability of indexing and automatic knowledge extraction technique from
multimedia contents offers concrete possibilities for improving the power of current
systems of Case and eCourt Management [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It is possible to figure out a scenario in
information management automation in which simple activities of storing images,
audio, video and text could be combined with post-processing activities for automatic
knowledge extraction (completely understandable by the users). The obtained
information could be directly used by end users, such as the transcription of audio
recordings or list of concepts of particular interest from an unstructured text, or they
can be used to improve the content retrieval activities (such as indexing of automatic
audio or audio-video recording, automatic association of frames or images with other
related contents) without manual annotations or metadata insertion. An effective
application of techniques for automatic knowledge extraction from media contents
seems to have a greater chance of success if applied to vertical domains of knowledge
and in contexts where it is possible to govern the processes of acquiring audio or
audio-video contents. This is the case of criminal legal domain where, taking into
account the importance of benefits that the application of these techniques could
provide, several initiatives related to 7th FP of the European Community have been
proposed to assess and quantify the real effectiveness of this kind of automation.
      </p>
      <sec id="sec-5-1">
        <title>Acknowledgments</title>
        <p>This work was partly funded by the European Union under the ICT program of
FP7 project JUMAS, Contract No. 214306.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Eurispes:
          <article-title>Indagine sul Processo Penale</article-title>
          . Roma, (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          :
          <article-title>An overview of Multimedia Content-Based Retrieval Strategies</article-title>
          .
          <source>In Informatica 30. pagg</source>
          .
          <volume>347</volume>
          -
          <fpage>356</fpage>
          . (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Dimitrova: Multimedia Content Analysis and Indexing for Filtering and Retrieval Application</article-title>
          .
          <source>Informing Science Special Issue on Multimedia Informing technologies - Part.1 Volume 2. No 4</source>
          . (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>JUMAS</given-names>
            <surname>Project</surname>
          </string-name>
          <article-title>Judicial Management by Digital Library Semantics</article-title>
          .
          <source>ICT Programme 7th FP EC Funded Research Project</source>
          , http://www.jumasproject.eu/ (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>L.R.</given-names>
            <surname>Rabiner</surname>
          </string-name>
          :
          <article-title>A tutorial on hidden Markov models and selected applications in speech recognition. In Readings in Speech Recognition, A. Waibel and</article-title>
          K. Lee, Eds. Morgan Kaufmann Publishers, San Francisco, CA,
          <fpage>267</fpage>
          -
          <lpage>296</lpage>
          (
          <year>1990</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Federico: A system for the retrieval of Italian broadcast news</article-title>
          ,
          <source>Speech Communication</source>
          , Volume
          <volume>32</volume>
          ,
          <string-name>
            <surname>Issues</surname>
          </string-name>
          1-
          <issue>2</issue>
          , Accessing Information in Spoken Audio,
          <year>September 2000</year>
          , pp
          <fpage>37</fpage>
          -
          <lpage>47</lpage>
          , ISSN 0167-
          <fpage>6393</fpage>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Federico</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Bertoldi: Broadcast news LM adaptation over time</article-title>
          ,
          <source>Computer Speech &amp; Language</source>
          , Volume
          <volume>18</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>4</given-names>
          </string-name>
          ,
          <year>October 2004</year>
          , pp
          <fpage>417</fpage>
          -
          <lpage>435</lpage>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>G. De Silva</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Yamasaki</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>Aizawa: Evaluation of video summarization for a large number of cameras in ubuquitus home</article-title>
          ,
          <source>in proceedings of the 13th annual ACM international conference on multimedia</source>
          , pp
          <fpage>820</fpage>
          -
          <lpage>828</lpage>
          , (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.</given-names>
            <surname>Jaimes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Echigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Teraguchi</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Satoh: Learning personalized video highlights from detailed MPEG-7 metadata</article-title>
          ,
          <source>in proc. of the IEEE international conference on image processing</source>
          , pp
          <fpage>133</fpage>
          -
          <lpage>136</lpage>
          , (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Takahashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nitta</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Babaguchi: Video summarization for large sport video archives</article-title>
          ,
          <source>in proc. of the IEEE international conference on multimedia and expo</source>
          , pp
          <fpage>1170</fpage>
          -
          <lpage>1173</lpage>
          , (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Assfalg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berlini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Del Bimbo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nunziat</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pala</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Soccer highlights detection and recognition using HMMs</article-title>
          ,
          <source>IEEE International Conference on Multimedia &amp; Expo (ICME)</source>
          ,
          <fpage>825</fpage>
          -
          <lpage>828</lpage>
          , (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>F.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Support vector machine learning for image retrieval</article-title>
          ,
          <source>International conference on image processing,</source>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Metzler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          : W.:
          <article-title>Linear feature-based models for information retrieval</article-title>
          .
          <source>Journal of Information Retrieval</source>
          .
          <volume>10</volume>
          , pp.
          <fpage>257</fpage>
          -
          <lpage>274</lpage>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Buckley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Term Weighting Approaches in Automatic Text Retrieval</article-title>
          .
          <source>Technical Report</source>
          . UMI Order Number:
          <fpage>TR87</fpage>
          -
          <lpage>881</lpage>
          ., Cornell University (
          <year>1987</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>EPOC I - II - III Commission</surname>
          </string-name>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16
          <string-name>
            <given-names>M.</given-names>
            <surname>Velicogna</surname>
          </string-name>
          :
          <article-title>Use of information and communication technologies (ICT) in European judicial systems, Council of Europe, European Commission for the Efficiency of Justice (CEPEJ)</article-title>
          ,
          <source>CEPEJ Studies No</source>
          .
          <volume>7</volume>
          <fpage>http</fpage>
          ://www.coe.int/t/dghl/cooperation/cepej/series/default_en.asp (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>