<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Compressing Multi-Modal Temporal Knowledge Graphs of Videos</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Shusaku</forename><surname>Egami</surname></persName>
							<email>egami@aist.go.jp</email>
							<affiliation key="aff0">
								<orgName type="institution">National Institute of Advanced Industrial Science and Technology (AIST)</orgName>
								<address>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Takanori</forename><surname>Ugai</surname></persName>
							<email>ugai@fujitsu.com</email>
							<affiliation key="aff0">
								<orgName type="institution">National Institute of Advanced Industrial Science and Technology (AIST)</orgName>
								<address>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Fujitsu Limited</orgName>
								<address>
									<settlement>Kanagawa</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ken</forename><surname>Fukuda</surname></persName>
							<email>ken.fukuda@aist.go.jp</email>
							<affiliation key="aff0">
								<orgName type="institution">National Institute of Advanced Industrial Science and Technology (AIST)</orgName>
								<address>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Compressing Multi-Modal Temporal Knowledge Graphs of Videos</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">A83B4DEB0D15EADE12ADFE83703B876C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Multi-Modal Knowledge Graph</term>
					<term>RDF Compression</term>
					<term>Video Dataset</term>
					<term>Temporal Knowledge Graph</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The construction of multi-modal temporal knowledge graphs (MMTKGs) that ground non-symbolic and time-series data, such as videos, into entities in the graph is still in the early stages. Hence, there is a lack of discussion about compressing and publishing MMTKG with huge data size. In this paper, we propose compression methods for MMTKGs of videos based on splitting images and inference rules and conduct experiments to evaluate their performance. As a result, our methods reduced the size of the MMTKGs by 27.7-36.1%. This study contributes to reducing the cost of distributing large MMTKGs on the web.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Multi-modal knowledge graphs (MMKGs) <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>, which ground non-symbolic data into symbolic entities, have attracted attention as datasets for semantic and conceptual processing across modalities. However, constructing and publishing multi-modal temporal knowledge graphs (MMTKG) that ground multi-modal and time-series data, such as videos, into entities in the graph is still in the early stages.</p><p>Typical MMKGs describe multi-modal contents by URLs or file paths. This approach may not be suitable for the permanent publication of MMKGs as the multi-modal contents may become inaccessible due to broken links. This issue could potentially be resolved by encoding the file's binary data as an entity in the KG <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>. However, building an MMTKG that describes the content of a video in fine-grained time intervals, such as in seconds or video frames, would result in huge data size, making it expensive to publish and share.</p><p>We proposed methods compressing MMTKGs of videos and conducted experiments to determine their effectiveness. We focused on two types of MMTKGs: KGs with video frame images encoded in Base64 and KGs with entire video files encoded in Base64. The proposed methods include differential compression based on knowledge representation of splitting video frame images and reduction of redundant triples based on inference rules. The results demonstrated that our compression methods reduced the size of the MMTKGs by 27.7-36.1%. This study contributes to reducing the cost of distributing large MMTKGs on the web. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Zhu et al. <ref type="bibr" target="#b0">[1]</ref> and Chen et al. <ref type="bibr" target="#b1">[2]</ref> comprehensively surveyed and summarized works on MMKGs. Typical multimodal knowledge graphs are MMpedia <ref type="bibr" target="#b4">[5]</ref> and IMGpedia <ref type="bibr" target="#b5">[6]</ref>, which ground images to entities in the graph. VisionKG <ref type="bibr" target="#b6">[7]</ref> is an MMKG containing bounding boxes (bboxes) of objects extracted from various image datasets such as MS-COCO <ref type="bibr" target="#b7">[8]</ref>, CIFAR <ref type="bibr" target="#b8">[9]</ref>, and PASCAL VOC <ref type="bibr" target="#b9">[10]</ref>. These MMKGs represent images by URIs or file paths. Studies on video KGs have evolved in the context of video indexing and retrieval <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13]</ref>. VEKG <ref type="bibr" target="#b13">[14]</ref> is an MMKG based on the extracted events from videos, bboxes, and image features. However, the data is not publicly available. There have been a lot of studies of compression methods for KGs <ref type="bibr" target="#b14">[15]</ref>. However, MMKGs for videos are not covered.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Approach</head><p>MMKGs usually describe images and videos by URIs or file paths, which causes broken links to multi-modal files. Thus, we focus on permanently accessible MMTKGs that embed multi-modal files in a KG as an entity, and propose compression methods for these MMTKGs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data preparation</head><p>As an example, we constructed MMTKGs of indoor daily activities from multi-modal data of videos, text, and JSON output by VirtualHome-AIST<ref type="foot" target="#foot_0">1</ref>  <ref type="bibr" target="#b15">[16]</ref>, as shown in the upper left of Figure <ref type="figure" target="#fig_0">1</ref>. The multi-modal data was output every five frames. The dataset contains over 3,500 videos, which include both fixed camera views and third-person views of the camera moving. The average video length is 64.2 seconds, with a maximum of 268.9 seconds and a minimum of 12.5 seconds. We prepared two types of MMTKGs: a KG with every five video frame images encoded in Base64 described as literal values (i.e., image-embedded MMTKG), and a KG with videos encoded in Base64 described as literal values (i.e., video-embedded MMTKG). We reused the Multimedia Semantic Sensor Network (MSSN) ontology <ref type="bibr" target="#b16">[17]</ref> and VirtualHome2KG <ref type="bibr" target="#b17">[18]</ref> ontology for schema design.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">MMTKG compression</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Compressing image-embedded MMTKG</head><p>If the MMTKG contains video frame image data, each video frame image is first compressed as a JPEG. Next, each image is split into a grid. The grid image is encoded in Base64 and described in the knowledge representation as shown in the upper right of Figure <ref type="figure" target="#fig_0">1</ref>. Here, if there is no difference between the grid image of the current frame and the grid image at the same position in the previous frame, the entity and the literal value of the current grid image are not created, and those of the previous frame are reused.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">Compressing video-embedded MMTKG</head><p>We adopted MPEG-4 <ref type="bibr" target="#b18">[19]</ref> to reduce the video data size. Each video frame entity has a frame number instead of having a Base64 value, and the video entity has a Base64 value for the compressed video. It is possible to extract arbitrary frame images from the video using FFmpeg <ref type="bibr">[20]</ref>. The MMTKG size can be further reduced, but long videos take a longer time to decompress.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.3.">Removing redundant triples using inference rules</head><p>The MMTKGs have redundant triples if the 2D bboxes are not changed. We reduced the number of entities and triples by referring to the previous entities if the current 2D bboxes have not changed since the previous frame.</p><p>Moreover, inspired by the approach of removing triples that can be inferred from the rules <ref type="bibr" target="#b19">[21]</ref>, we create only the relation equivalentFrame(𝑒 𝑝𝑓 , 𝑒 𝑐𝑓 ) between previous frame entity 𝑒 𝑝𝑓 and current frame entity 𝑒 𝑐𝑓 when all 2D bboxes are not changed from the previous frame. We defined the rule as follows: hasMediaDescriptor(𝑒 𝑝𝑓 , 𝑒 𝑏𝑏𝑜𝑥 ) ∧ equivalentFrame(𝑒 𝑐𝑓 , 𝑒 𝑝𝑓 ) − → hasMediaDescriptor(𝑒 𝑐𝑓 , 𝑒 𝑏𝑏𝑜𝑥 ). Similarly, for grid images, we removed triples that can be inferred from the following rule: image(𝑒 𝑝𝑓 , 𝑒 𝑖𝑚𝑎𝑔𝑒 ) ∧ equivalentImage(𝑒 𝑐𝑓 , 𝑒 𝑝𝑓 ) − → image(𝑒 𝑐𝑓 , 𝑒 𝑖𝑚𝑎𝑔𝑒 ). Note that the image property here refers to a split image.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Result</head><p>Tables <ref type="table" target="#tab_1">1 and 2</ref> show the results of the compression experiments. Our methods achieved data size reductions of 36.1% for image-embedded MMTKG and 28.3% for video-embedded MMTKG.</p><p>There is a trade-off between the number of grid divisions and the number of triples. The best strategy is 4 × 4. In this study, we experimented with 𝑛 × 𝑛 grid divisions; however, experiments with 𝑛×𝑚 grid divisions are also necessary for a more detailed analysis. We published MMTKGs in a permanently accessible format. <ref type="foot" target="#foot_1">2</ref> In addition, tools for decoding and extracting images and videos from compressed MMTKG are available.<ref type="foot" target="#foot_2">3</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>We proposed compression methods for two types of MMTKGs: image-embedded and videoembedded MMTKGs. The former MMTKGs can display arbitrary images on the web using HTML &lt;img&gt; tags without decoding the videos. The latter MMTKGs can apply video compression methods, and if the video is decoded, any frame can be extracted based on the frame number of the image. These MMTKGs can help create benchmark datasets for vision-language models since it is possible to extract arbitrary text and images using SPARQL queries <ref type="bibr" target="#b15">[16]</ref>. The compression method for image-embedded MMTKGs might be effective for image stream data in which no video file is created. In contrast, the compression method for video-embedded MMTKGs is more effective when video files are available. Our compression methods for MMTKGs are effective for fixed-camera view videos but are less effective for first-person view videos.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>We proposed compression methods for two types of permanently available MMTKGs in which video data are directly embedded as literal values. As a result, our methods achieved data size reductions of 36.1% for image-embedded MMTKG and 28.3% for video-embedded MMTKG. The two MMTKG datasets and the tools are available on GitHub. In the future, we will consider combining our methods with other RDF compression methods <ref type="bibr" target="#b20">[22,</ref><ref type="bibr" target="#b21">23]</ref>.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overview of multi-modal temporal knowledge graph compression</figDesc><graphic coords="2,89.29,84.19,416.69,273.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell cols="2">Image-embedded MMTKG</cell></row><row><cell cols="3">MMTKG # of triples Size [GB]</cell></row><row><cell>raw</cell><cell cols="2">134,945,485 62.0</cell></row><row><cell cols="2">3×3 grid 64,242,296</cell><cell>41.8 (-32.5%)</cell></row><row><cell cols="2">4×4 grid 78,384,156</cell><cell>39.6 (-36.1%)</cell></row><row><cell cols="2">5×5 grid 96,401,621</cell><cell>39.9 (-35.6%)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc></figDesc><table><row><cell>Video-embedded MMTKG</cell><cell></cell><cell></cell></row><row><cell>MMTKG</cell><cell cols="2"># of triples Size [GB]</cell></row><row><cell>raw</cell><cell cols="2">131,786,665 17.3</cell></row><row><cell>w/o redundant triples</cell><cell>37,646,681</cell><cell>12.5 (-27.7%)</cell></row><row><cell>w/o redundant triples and triples can be inferred</cell><cell>36,284,402</cell><cell>12.4 (-28.3%)</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/aistairc/virtualhome_aist</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://github.com/aistairc/vhakg</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://github.com/aistairc/vhakg-tools</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This paper is based on results obtained from a project, JPNP20006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO), and JSPS KAKENHI Grant Number JP22K18008 and JP23H03688.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Multi-Modal Knowledge Graph Construction and Application: A Survey</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">J</forename><surname>Yuan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Fang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Geng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.05391</idno>
		<title level="m">Knowledge graphs meet multi-modal learning: A comprehensive survey</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The knowledge graph as the default data model for learning on heterogeneous knowledge</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wilcke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bloem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">De</forename><surname>Boer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data Science</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="39" to="57" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">kgbench: A collection of knowledge graph datasets for evaluating relational and multimodal machine learning</title>
		<author>
			<persName><forename type="first">P</forename><surname>Bloem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wilcke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Berkel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>De Boer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Hose</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P.-A</forename><surname>Champin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Maleshkova</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Corcho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Ristoski</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Alam</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="614" to="630" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">MMpedia: A Large-Scale Multi-modal Knowledge Graph</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ruan</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-47243-5_2</idno>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2023</title>
				<editor>
			<persName><forename type="first">T</forename><forename type="middle">R</forename><surname>Payne</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Presutti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Qi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Poveda-Villalón</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Stoilos</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Hollink</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Kaoudi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Cheng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</editor>
		<meeting><address><addrLine>Nature Switzerland, Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="18" to="37" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">IMGpedia: A Linked Dataset with Content-Based Analysis of Wikimedia Images</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ferrada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bustos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-68204-4_8</idno>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2017</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Fernandez</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Tamma</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Lecue</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Cudré-Mauroux</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Sequeda</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Lange</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Heflin</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="84" to="93" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Le-Tuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nguyen-Duc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-K</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hauswirth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Le-Phuoc</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-60635-9_5</idno>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Meroño Peñuela</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Hartig</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Acosta</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Alam</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Lisena</surname></persName>
		</editor>
		<meeting><address><addrLine>Nature Switzerland, Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="75" to="93" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Microsoft COCO: Common Objects in Context</title>
		<author>
			<persName><forename type="first">T.-Y</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Belongie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hays</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Perona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ramanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dollár</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Zitnick</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-10602-1_48</idno>
	</analytic>
	<monogr>
		<title level="m">Computer Vision -ECCV 2014</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Fleet</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Pajdla</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Schiele</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Tuytelaars</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="740" to="755" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
		<title level="m">Learning multiple layers of features from tiny images</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The Pascal Visual Object Classes (VOC) Challenge</title>
		<author>
			<persName><forename type="first">M</forename><surname>Everingham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Gool</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">K I</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Winn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11263-009-0275-4</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Vision</title>
		<imprint>
			<biblScope unit="volume">88</biblScope>
			<biblScope unit="page" from="303" to="338" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Rdf-powered semantic video annotation tools with concept mapping to linked data for next-generation video indexing: a comprehensive review</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">F</forename><surname>Sikos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<biblScope unit="volume">76</biblScope>
			<biblScope unit="page" from="14437" to="14460" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Massive semantic video annotation in high-end customer service</title>
		<author>
			<persName><forename type="first">K</forename><surname>Fukuda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vizcarra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nishimura</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">HCI in Business, Government and Organizations</title>
				<editor>
			<persName><forename type="first">F</forename><forename type="middle">F</forename><surname>-H. Nah</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Siau</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="46" to="58" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Ontology-based human behavior indexing with multimodal video data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Vizcarra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nishimura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Fukuda</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICSC50631.2021.00052</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE 15th International Conference on Semantic Computing (ICSC)</title>
				<imprint>
			<date type="published" when="2021">2021. 2021</date>
			<biblScope unit="page" from="262" to="267" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Vekg: Video event knowledge graph to represent video streams for complex event pattern matching</title>
		<author>
			<persName><forename type="first">P</forename><surname>Yadav</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Curry</surname></persName>
		</author>
		<idno type="DOI">10.1109/GC46384.2019.00011</idno>
	</analytic>
	<monogr>
		<title level="m">2019 First International Conference on Graph Computing (GC)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="13" to="20" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Besta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hoefler</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1806.01799</idno>
		<title level="m">Survey and taxonomy of lossless graph compression and space-efficient graph representations</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">VHAKG: A multi-modal knowledge graph based on synchronized multi-view videos of daily activities</title>
		<author>
			<persName><forename type="first">S</forename><surname>Egami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ugai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">N N</forename><surname>Htun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Fukuda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 33rd ACM International Conference on Information and Knowledge Management</title>
				<meeting>the 33rd ACM International Conference on Information and Knowledge Management</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note>To appear</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">MSSN-Onto: An ontology-based approach for flexible event processing in Multimedia Sensor Networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Angsuchotmetee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chbeir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cardinale</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.future.2018.01.044</idno>
	</analytic>
	<monogr>
		<title level="j">Future Generation Computer Systems</title>
		<imprint>
			<biblScope unit="volume">108</biblScope>
			<biblScope unit="page" from="1140" to="1158" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Synthesizing Event-Centric Knowledge Graphs of Daily Activities Using Virtual Space</title>
		<author>
			<persName><forename type="first">S</forename><surname>Egami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ugai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Oono</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kitamura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Fukuda</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2023.3253807</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="23857" to="23873" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Mpeg-4 natural video coding -an overview</title>
		<author>
			<persName><forename type="first">T</forename><surname>Ebrahimi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Horne</surname></persName>
		</author>
		<idno type="DOI">10.1016/S0923-5965(99)00054-5</idno>
		<idno>)00054-5</idno>
		<ptr target="https://doi.org/10.1016/S0923-5965(99" />
	</analytic>
	<monogr>
		<title level="j">Signal Processing: Image Communication</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="365" to="385" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Logical linked data compression</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hitzler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Dong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web: Semantics and Big Data</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Cimiano</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Corcho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Presutti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Hollink</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Rudolph</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="170" to="184" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Compact representation of large rdf data sets for publishing and exchange</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Fernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Martínez-Prieto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gutierrez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -ISWC 2010</title>
				<editor>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Patel-Schneider</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Pan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Hitzler</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Mika</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">Z</forename><surname>Pan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Glimm</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="193" to="208" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Rdsz: An approach for lossless rdf stream compression</title>
		<author>
			<persName><forename type="first">N</forename><surname>Fernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Arias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fuentes-Lorenzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ó</forename><surname>Corcho</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web: Trends and Challenges</title>
				<editor>
			<persName><forename type="first">V</forename><surname>Presutti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Gandon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>D'aquin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Staab</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Tordai</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="52" to="67" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
