<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">MPEG7ADB: Automatic RDF annotation of audio files from low level low level MPEG-7 metadata</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Giovanni</forename><surname>Tummarello</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">DEIT -Università Politecnica delle Marche</orgName>
								<address>
									<settlement>Ancona</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christian</forename><surname>Morbidoni</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">DEIT -Università Politecnica delle Marche</orgName>
								<address>
									<settlement>Ancona</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Francesco</forename><surname>Piazza</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">DEIT -Università Politecnica delle Marche</orgName>
								<address>
									<settlement>Ancona</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Paolo</forename><surname>Puliti</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">DEIT -Università Politecnica delle Marche</orgName>
								<address>
									<settlement>Ancona</settlement>
									<country key="IT">ITALY</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">MPEG7ADB: Automatic RDF annotation of audio files from low level low level MPEG-7 metadata</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">4E8D7D3244D45FD592255FD327F1CC23</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:00+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>MPEG-7, a ISO standard since 2001, has been created recognizing the need for standardization within multimedia metadata. While efforts have been made to link the higher level semantic content to the languages of the semantic web, a big semantic gap remains between the machine extractable metadata (Low Level Descriptors) and meaningful, concise RDF annotations. In this paper we address this problem and present MPEG7ADB, a computational intelligence/signal processing based toolkit that can be used to quickly create components capable of producing automatic RDF annotations from MPEG-7 metadata coming from heterogeneous sources.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>While MPEG-7 and the tools of the Semantic Web (Notably RDF/S) were developed concurrently, the two efforts have been largely independent resulting in several integration challenges . At data model level, MPEG-7 is directly based on XML+Schema while the tools of Semantic Web use these just as an optional syntax format while conceptually relying on graph structures. At the semantic description level, it is thanks to a later effort <ref type="bibr" target="#b7">[8]</ref>[24] that RDF/DAML+OIL mappings have been made to allow interoperability. While such mappings are possible, their scope (semantic scene description) is currently beyond anything that can be machine automated. Previous works have also shown <ref type="bibr" target="#b3">[4]</ref> that pure XML tools are very ineffective for handling MPEG-7 data. Although the syntax is well specified by the standard, generalized MPEG-7 usability is not simple. In fact, while it is relatively easy to create syntactically compliant MPEG-7 annotations, the freedom in terms of structures and parameters is such that generally, understanding MPEG-7 produced by others is difficult or worse. For the same reason, computational intelligence techniques, which are bound to play a key role in the applications envisioned for the standard, are not easy to apply directly. As MPEG-7 descriptions of identical objects could in fact be very different from each other when coming from different sources. Recognizing the intrinsic difficulty of full interoperability, work is currently under way <ref type="bibr" target="#b2">[3]</ref> to standardize subsets of the base features as "profiles" for specific applications, generally trading off generality and expressivity in favor of the ease and lightness of the implementation. Necessarily, this also means to give up on interesting scenarios. In this paper we address the hard problem of "semantic mismatch", that is, techniques to "distill" concise RDF annotations from raw, low level, MPEG-7 metadata. These techniques are implemented in a set of tools (MPEG7ADB) by which it is possible to simply build powerful RDF audio automatic annotation components feeding on MPEG-7 low level descriptors (LLDs).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">The MPEG7ADB</head><p>S e m a n tic A sse rtio n s The simplified representation of the proposed architecture (as currently implemented by the MPEG7ADB project <ref type="bibr" target="#b6">[7]</ref>) is depicted in Figure <ref type="figure" target="#fig_0">1</ref>. URIs are both used as references to the audio files and become the subjects of the annotations produced in standard RDF/OWL format.</p><p>When the database component is given the URI of a new audio clip to index, it will first try to locate an appropriate MPEG-7 resource describing it. At this logical point it is possible to envision several alternative models of metadata research including calls to Web Services, queries on distributed P2P systems or lookup in a local storage or cache. If this preliminary search fails to locate the MPEG-7 file, a similar mechanism will attempt to fetch the actual audio file if the URI turns out to be a resolvable URL and process it with the included, co-developed MPEG7ENC library <ref type="bibr" target="#b5">[6]</ref>. Once a schema valid MPEG-7 has been retrieved, the basic raw sequences of data belonging to Low Level Descriptors are mapped into flat, array structures. These will not only serve as a convenient and compact container, but also provide abstraction from some of the basic free parameters allowed by MPEG-7. As an example, the MPEG7 ACT type provides the basic time interpolation/integration capabilities to handle the cases when LLDs have different sampling periods and different grouping operators applied.</p><p>To exploit the benefits of computational intelligence (e.g. neural networks) and perform clustering, matching, comparisons and classifications, each MPEG-7 resource will have to be projected to a single, fixed dimension vector in a consistent and mathematically justified way. The projection blocks performs this task, best understood as driven by a "feature space request". A "feature space" deemed suitable for the desired computational intelligence task will be composed of pairs, one per dimension, of feature names and functions capable of projecting a series of scalars or vectors into a single scalar value. Among these, the framework provides a full set of classical statistical operators (mean, variance, higher data moments, median, percentiles etc.. ) that can be cascaded with other "pre processing" such as, i.e. a time domain filter. Since MPEG-7 coming from different sources and processes could have different low level features available and not necessarily those that we have selected as the application "feature space", the projection block will attempt to recursively predicting the missing features by means of those available (cross prediction). It is also interesting to notice that when a direct adaptation algorithm is not available, cross prediction based on neural networks proves to be, for a selected number of features, a viable alternative. For a more detailed tractation see .</p><p>Once a set of uniform projections have been obtained for descriptions within the database, classical computational intelligence methods, such as those provided in the framework and used in the example application (section 9), can be applied to fulfill the desired annotation task. Once higher level results have been inferred (e.g, piece with URI "file://c:/MyLegalMusic/foo.mp3" belongs to the genre "punk ballade") they can be saved into "semantic containers" which will, hiding all the complexity, provide RDF annotations using terms given in an appropriate ontology pre-specified in OWL notation. Finally, prior to outputting the annotation stream, the system will make sure that local URIs (e.g. "file://foo.mp3" ) are converted into globally meaningful formats like binary hash based URIs (e.g. hash "urn:md5: " , "ed2k:// " , etc.).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Producing annotations for the Semantic Web</head><p>Once obtained the mathematically homogeneous projection vectors representing the MPEG-7 files in the db, these can easily processed using a variety of well known techniques. While MPEG7ADB provides internal tools such as neural networks classifiers and clustering, many more can be interfaced at this point.</p><p>Among the tools provided by MPEG7ADB are those allowing the production of RDF annotations. Annotations produced by the MPEG7ADB will be of "rdf quality" that is, much more terse and qualitatively different than the original LLD metadata. Finally it is important to stress the importance of explicit context stating when delivering computational intelligence derived results on the Semantic Web. Virtually all the computational intelligence results are in fact subjects to change or revision according to the local state of the entity providing the annotation (e.g. the extraction settings). As new knowledge or settings could make previously obtained results invalid, this sort of inference is by nature nonmonotonic. Although the RDF framework is monotonic, it is known that results coming from nonmonotonic processes can be still mapped as long as context information are provided .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">Implementation and conclusions</head><p>In this paper we discussed some of the challenges associated with making use of MPEG-7 low level audio descriptors to provide RDF annotations. Furthermore, we introduce MPEG7ADB, a library by which it is possible to create automatic RDF annotation components feeding not on actual (e.g. PCM or MP3) audio sources but on low level MPEG-7 metadata descriptions. Sophisticated adaptation capabilities are provided to compensate for the many free parameters of the MPEG-7 standard itself. With these capabilities, "profile less" use can be made which fits the picture of the Semantic Web as also made of heterogeneous devices MPEG7ADB has been implemented in Java (see <ref type="bibr" target="#b4">[5]</ref> on why this is also computationally acceptable) and is available <ref type="bibr" target="#b6">[7]</ref> for public use, review, suggestions and collaborative enhancement in the free software/open source model. Among the examples provide in the MPEG7ADB is a Voice recording quality annotation component . This purely demonstrative example, shows how a full RDF/MPEG-7/Neural Network audio annotation component can be built in approximately 40 lines of source code using MPEG7ADB. For lack of space the source code or an accurate description cannot be given directly here but is available at <ref type="bibr" target="#b6">[7]</ref> and . Being, to the best of our knowledge, currently the only available tool with these capabilities, MPEG7ADB is hard to compare it directly but we believe it to be a good starting point for both implementation and research into audio MPEG-7 / Semantic Web annotation components.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>(Figure 1 .</head><label>1</label><figDesc>Figure 1. The overall structure of the proposed architecture.</figDesc></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<idno>MPEG-7</idno>
		<title level="m">SC29/WG11 N4031</title>
				<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m">ISO/IEC JTC1/SC29/WG11 N5527, MPEG-7 Profiles under Consideration</title>
				<meeting><address><addrLine>Pattaya, Thailand</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003-03">March 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">An analysis of XML database Solutions for the management of MPEG-7 media descriptions</title>
		<author>
			<persName><forename type="first">Utz</forename><surname>Westermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wolfgang</forename><surname>Klas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys (CSUR)</title>
		<imprint>
			<date type="published" when="2003-12">Dec. 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Java and Numerical Computing</title>
		<author>
			<persName><forename type="first">Ronald</forename><forename type="middle">F</forename><surname>Boisvert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jose</forename><surname>Moreira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Philippsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Roldan</forename><surname>Pozo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Computing in Science and Engineering</title>
		<imprint>
			<date type="published" when="2001-04">March. April 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">Holger</forename><surname>Crysandt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giovanni</forename><surname>Tummarello</surname></persName>
		</author>
		<ptr target="http://sf.net/projects/mpeg7audioenc" />
		<title level="m">MPEG7AUDIOENC</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">G</forename><surname>Tummarello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Morbidoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Piazza</surname></persName>
		</author>
		<ptr target="http://sf.net/projects/MPEG7ADB" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Enhancing the semantic interoperability through a core ontology</title>
		<author>
			<persName><forename type="first">Jane</forename><surname>Hunter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Transactions on circuits and systems for video technologies</title>
				<imprint>
			<date type="published" when="2003-02">Feb 2003</date>
		</imprint>
	</monogr>
	<note>special issue</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">Ralf</forename><surname>Klamma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marc</forename><surname>Spaniol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matthias</forename><surname>Jarke</surname></persName>
		</author>
		<title level="m">Digital Media Knowledge Management with MPEG-7</title>
				<meeting><address><addrLine>Budapest</addrLine></address></meeting>
		<imprint>
			<publisher>WWW2003</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">From Multimedia to the Semantic Web using MPEG-7 and Computational Intelligence</title>
		<author>
			<persName><forename type="first">G</forename><surname>Tummarello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Morbidoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Puliti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">F</forename><surname>Dragoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Piazza</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Wedelmusic 2004</title>
				<meeting>Wedelmusic 2004<address><addrLine>Barcellona</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">An Examination of practical information manipulation using the MPEG-7 low level Audio Descriptors&quot; 1st Workshop on the Internet</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lukasiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Stirling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Jackson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Harders</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Telecommunications and Signal Processing</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<idno>/SC 29/WG 11N5727</idno>
		<title level="m">Classification Schemes used in ISO/IEC 15938-4:Audio, ISO/IEC JTC 1</title>
				<meeting><address><addrLine>Trondheim, Norway/</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003-07">Jul 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">An RDF Schema/DAML+OIL Representation of MPEG-7 Semantics</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hunter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MPEG Document: ISO/IEC JTC1/SC29/WG11 W7807</title>
				<meeting><address><addrLine>Pattaya</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001-12">December 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">An MPEG7 Library for Music</title>
		<author>
			<persName><forename type="first">H</forename><surname>Crysand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Tummarello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Piazza</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">3 rd MUSICNETWORK Open Workshop</title>
				<meeting><address><addrLine>Munich</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">Match 2004</date>
			<biblScope unit="page" from="13" to="14" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Combining RDF and XML Schemas to Enhance Interoperability Between Metadata Application Profiles</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hunter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lagoze</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001-05">May 2001</date>
			<pubPlace>WWW10, HongKong</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">That Obscure Object of Desire: Multimedia Metadata on the Web (Part I and II)</title>
		<author>
			<persName><forename type="first">J</forename><surname>Van Ossenbruggen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Nack</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hardman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Multimedia</title>
		<imprint/>
	</monogr>
	<note>to be published in 2004</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
