<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Compressing Multi-Modal Temporal Knowledge Graphs of Videos</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shusaku Egami</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takanori Ugai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ken Fukuda</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fujitsu Limited</institution>
          ,
          <addr-line>Kanagawa</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Advanced Industrial Science and Technology (AIST)</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The construction of multi-modal temporal knowledge graphs (MMTKGs) that ground non-symbolic and time-series data, such as videos, into entities in the graph is still in the early stages. Hence, there is a lack of discussion about compressing and publishing MMTKG with huge data size. In this paper, we propose compression methods for MMTKGs of videos based on splitting images and inference rules and conduct experiments to evaluate their performance. As a result, our methods reduced the size of the MMTKGs by 27.7-36.1%. This study contributes to reducing the cost of distributing large MMTKGs on the web.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multi-Modal Knowledge Graph</kwd>
        <kwd>RDF Compression</kwd>
        <kwd>Video Dataset</kwd>
        <kwd>Temporal Knowledge Graph</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Multi-modal knowledge graphs (MMKGs) [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], which ground non-symbolic data into symbolic
entities, have attracted attention as datasets for semantic and conceptual processing across
modalities. However, constructing and publishing multi-modal temporal knowledge graphs
(MMTKG) that ground multi-modal and time-series data, such as videos, into entities in the
graph is still in the early stages.
      </p>
      <p>
        Typical MMKGs describe multi-modal contents by URLs or file paths. This approach may not
be suitable for the permanent publication of MMKGs as the multi-modal contents may become
inaccessible due to broken links. This issue could potentially be resolved by encoding the file’s
binary data as an entity in the KG [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. However, building an MMTKG that describes the
content of a video in fine-grained time intervals, such as in seconds or video frames, would
result in huge data size, making it expensive to publish and share.
      </p>
      <p>We proposed methods compressing MMTKGs of videos and conducted experiments to
determine their efectiveness. We focused on two types of MMTKGs: KGs with video frame images
encoded in Base64 and KGs with entire video files encoded in Base64. The proposed methods
include diferential compression based on knowledge representation of splitting video frame
images and reduction of redundant triples based on inference rules. The results demonstrated
that our compression methods reduced the size of the MMTKGs by 27.7-36.1%. This study
contributes to reducing the cost of distributing large MMTKGs on the web.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Zhu et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Chen et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] comprehensively surveyed and summarized works on MMKGs.
Typical multimodal knowledge graphs are MMpedia [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and IMGpedia [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which ground images
to entities in the graph. VisionKG [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is an MMKG containing bounding boxes (bboxes) of
objects extracted from various image datasets such as MS-COCO [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], CIFAR [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and PASCAL
VOC [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These MMKGs represent images by URIs or file paths. Studies on video KGs have
evolved in the context of video indexing and retrieval [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ]. VEKG [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is an MMKG
based on the extracted events from videos, bboxes, and image features. However, the data is
not publicly available. There have been a lot of studies of compression methods for KGs [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
However, MMKGs for videos are not covered.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <p>MMKGs usually describe images and videos by URIs or file paths, which causes broken links to
multi-modal files. Thus, we focus on permanently accessible MMTKGs that embed multi-modal
ifles in a KG as an entity, and propose compression methods for these MMTKGs.</p>
      <sec id="sec-3-1">
        <title>3.1. Data preparation</title>
        <p>
          As an example, we constructed MMTKGs of indoor daily activities from multi-modal data of
videos, text, and JSON output by VirtualHome-AIST1 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], as shown in the upper left of Figure 1.
The multi-modal data was output every five frames. The dataset contains over 3,500 videos,
which include both fixed camera views and third-person views of the camera moving. The
average video length is 64.2 seconds, with a maximum of 268.9 seconds and a minimum of
12.5 seconds. We prepared two types of MMTKGs: a KG with every five video frame images
encoded in Base64 described as literal values (i.e., image-embedded MMTKG), and a KG with
videos encoded in Base64 described as literal values (i.e., video-embedded MMTKG). We reused
the Multimedia Semantic Sensor Network (MSSN) ontology [17] and VirtualHome2KG [18]
ontology for schema design.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. MMTKG compression</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Compressing image-embedded MMTKG</title>
          <p>If the MMTKG contains video frame image data, each video frame image is first compressed as
a JPEG. Next, each image is split into a grid. The grid image is encoded in Base64 and described
in the knowledge representation as shown in the upper right of Figure 1. Here, if there is no
diference between the grid image of the current frame and the grid image at the same position
in the previous frame, the entity and the literal value of the current grid image are not created,
and those of the previous frame are reused.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Compressing video-embedded MMTKG</title>
          <p>We adopted MPEG-4 [19] to reduce the video data size. Each video frame entity has a frame
number instead of having a Base64 value, and the video entity has a Base64 value for the
compressed video. It is possible to extract arbitrary frame images from the video using FFmpeg [20].
The MMTKG size can be further reduced, but long videos take a longer time to decompress.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Removing redundant triples using inference rules</title>
          <p>→−</p>
          <p>The MMTKGs have redundant triples if the 2D bboxes are not changed. We reduced the number
of entities and triples by referring to the previous entities if the current 2D bboxes have not
changed since the previous frame.</p>
          <p>Moreover, inspired by the approach of removing triples that can be inferred from the
rules [21], we create only the relation equivalentFrame( ,  ) between previous frame entity
 and current frame entity  when all 2D bboxes are not changed from the previous frame.
We defined the rule as follows: hasMediaDescriptor(  , ) ∧ equivalentFrame( ,  )
hasMediaDescriptor( , ). Similarly, for grid images, we removed triples that
can be inferred from the following rule: image( , ) ∧ equivalentImage( ,  →)−
image( , ). Note that the image property here refers to a split image.
1https://github.com/aistairc/virtualhome_aist</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Result</title>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>
        We proposed compression methods for two types of MMTKGs: image-embedded and
videoembedded MMTKGs. The former MMTKGs can display arbitrary images on the web using HTML
&lt;img&gt; tags without decoding the videos. The latter MMTKGs can apply video compression
methods, and if the video is decoded, any frame can be extracted based on the frame number of
the image. These MMTKGs can help create benchmark datasets for vision-language models since
it is possible to extract arbitrary text and images using SPARQL queries [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The compression
method for image-embedded MMTKGs might be efective for image stream data in which no
video file is created. In contrast, the compression method for video-embedded MMTKGs is more
efective when video files are available. Our compression methods for MMTKGs are efective
for fixed-camera view videos but are less efective for first-person view videos.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We proposed compression methods for two types of permanently available MMTKGs in which
video data are directly embedded as literal values. As a result, our methods achieved data size
reductions of 36.1% for image-embedded MMTKG and 28.3% for video-embedded MMTKG. The
two MMTKG datasets and the tools are available on GitHub. In the future, we will consider
combining our methods with other RDF compression methods [22, 23].</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This paper is based on results obtained from a project, JPNP20006, commissioned by the New
Energy and Industrial Technology Development Organization (NEDO), and JSPS KAKENHI
Grant Number JP22K18008 and JP23H03688.
2https://github.com/aistairc/vhakg
3https://github.com/aistairc/vhakg-tools
synchronized multi-view videos of daily activities, in: Proceedings of the 33rd ACM International
Conference on Information and Knowledge Management, 2024. To appear.
[17] C. Angsuchotmetee, R. Chbeir, Y. Cardinale, MSSN-Onto: An ontology-based approach for flexible
event processing in Multimedia Sensor Networks, Future Generation Computer Systems 108 (2020)
1140–1158. doi:10.1016/j.future.2018.01.044.
[18] S. Egami, T. Ugai, M. Oono, K. Kitamura, K. Fukuda, Synthesizing Event-Centric Knowledge Graphs
of Daily Activities Using Virtual Space, IEEE Access 11 (2023) 23857–23873. doi:10.1109/ACCESS.
2023.3253807.
[19] T. Ebrahimi, C. Horne, Mpeg-4 natural video coding – an overview, Signal Processing: Image
Communication 15 (2000) 365–385. doi:https://doi.org/10.1016/S0923-5965(99)00054-5.
[20] FFmpeg Team, FFmpeg, 2000. URL: https://fmpeg.org/, accessed: 2024-05-27.
[21] A. K. Joshi, P. Hitzler, G. Dong, Logical linked data compression, in: P. Cimiano, O. Corcho,
V. Presutti, L. Hollink, S. Rudolph (Eds.), The Semantic Web: Semantics and Big Data, Springer
Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 170–184.
[22] J. D. Fernández, M. A. Martínez-Prieto, C. Gutierrez, Compact representation of large rdf data sets
for publishing and exchange, in: P. F. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, J. Z. Pan,
I. Horrocks, B. Glimm (Eds.), The Semantic Web – ISWC 2010, Springer Berlin Heidelberg, Berlin,
Heidelberg, 2010, pp. 193–208.
[23] N. Fernández, J. Arias, L. Sánchez, D. Fuentes-Lorenzo, Ó. Corcho, Rdsz: An approach for lossless
rdf stream compression, in: V. Presutti, C. d’Amato, F. Gandon, M. d’Aquin, S. Staab, A. Tordai
(Eds.), The Semantic Web: Trends and Challenges, Springer International Publishing, Cham, 2014,
pp. 52–67.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Yuan</surname>
          </string-name>
          <article-title>, Multi-Modal Knowledge Graph Construction and Application: A Survey, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 36 (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Chen,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , et al.,
          <article-title>Knowledge graphs meet multi-modal learning: A comprehensive survey</article-title>
          ,
          <source>arXiv preprint arXiv:2402.05391</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wilcke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bloem</surname>
          </string-name>
          , V. De Boer,
          <article-title>The knowledge graph as the default data model for learning on heterogeneous knowledge, Data Science 1 (</article-title>
          <year>2017</year>
          )
          <fpage>39</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bloem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wilcke</surname>
          </string-name>
          , L. van
          <string-name>
            <surname>Berkel</surname>
          </string-name>
          , V. de Boer,
          <article-title>kgbench: A collection of knowledge graph datasets for evaluating relational and multimodal machine learning</article-title>
          , in: R.
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>P.-A.</given-names>
          </string-name>
          <string-name>
            <surname>Champin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maleshkova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Ristoski</surname>
          </string-name>
          , M. Alam (Eds.),
          <source>The Semantic Web</source>
          , Springer International Publishing, Cham,
          <year>2021</year>
          , pp.
          <fpage>614</fpage>
          -
          <lpage>630</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          , J. Liu, T. Ruan,
          <article-title>MMpedia: A Large-Scale Multi-modal Knowledge Graph</article-title>
          , in: T. R.
          <string-name>
            <surname>Payne</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Presutti</surname>
            , G. Qi,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Poveda-Villalón</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hollink</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kaoudi</surname>
          </string-name>
          , G. Cheng, J.
          <source>Li (Eds.)</source>
          ,
          <source>The Semantic Web - ISWC 2023</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>37</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -47243-
          <issue>5</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferrada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bustos</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Hogan,
          <article-title>IMGpedia: A Linked Dataset with Content-Based Analysis of Wikimedia Images</article-title>
          , in: C.
          <string-name>
            <surname>d'Amato</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Tamma</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Lecue</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Cudré-Mauroux</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lange</surname>
          </string-name>
          , J. Heflin (Eds.),
          <source>The Semantic Web - ISWC 2017</source>
          , Springer International Publishing, Cham,
          <year>2017</year>
          , pp.
          <fpage>84</fpage>
          -
          <lpage>93</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -68204-
          <issue>4</issue>
          _
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Le-Tuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nguyen-Duc</surname>
          </string-name>
          , T.-K. Tran,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hauswirth</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          Le-Phuoc,
          <article-title>VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph</article-title>
          , in: A.
          <string-name>
            <surname>Meroño Peñuela</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Troncy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Acosta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Alam</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , P. Lisena (Eds.),
          <source>The Semantic Web</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>93</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -60635-
          <issue>9</issue>
          _
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.-Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Belongie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hays</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ramanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollár</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Zitnick</surname>
          </string-name>
          ,
          <string-name>
            <surname>Microsoft</surname>
            <given-names>COCO</given-names>
          </string-name>
          :
          <article-title>Common Objects in Context</article-title>
          , in: D.
          <string-name>
            <surname>Fleet</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Pajdla</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Schiele</surname>
          </string-name>
          , T. Tuytelaars (Eds.),
          <source>Computer Vision - ECCV 2014</source>
          , Springer International Publishing, Cham,
          <year>2014</year>
          , pp.
          <fpage>740</fpage>
          -
          <lpage>755</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>319</fpage>
          -10602-1_
          <fpage>48</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , et al.,
          <article-title>Learning multiple layers of features from tiny images (</article-title>
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Everingham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Van</given-names>
            <surname>Gool</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K. I.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Winn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>The Pascal Visual Object Classes (VOC) Challenge</article-title>
          ,
          <source>International Journal of Computer Vision</source>
          <volume>88</volume>
          (
          <year>2010</year>
          )
          <fpage>303</fpage>
          -
          <lpage>338</lpage>
          . doi:
          <volume>10</volume>
          . 1007/s11263-009-0275-4.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Sikos</surname>
          </string-name>
          ,
          <article-title>Rdf-powered semantic video annotation tools with concept mapping to linked data for next-generation video indexing: a comprehensive review</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          <volume>76</volume>
          (
          <year>2017</year>
          )
          <fpage>14437</fpage>
          -
          <lpage>14460</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Fukuda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vizcarra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nishimura</surname>
          </string-name>
          ,
          <article-title>Massive semantic video annotation in high-end customer service</article-title>
          , in: F.
          <string-name>
            <given-names>F.-H.</given-names>
            <surname>Nah</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          Siau (Eds.),
          <source>HCI in Business, Government and Organizations</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>46</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Vizcarra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nishimura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fukuda</surname>
          </string-name>
          ,
          <article-title>Ontology-based human behavior indexing with multimodal video data</article-title>
          ,
          <source>in: 2021 IEEE 15th International Conference on Semantic Computing (ICSC)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>262</fpage>
          -
          <lpage>267</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICSC50631.
          <year>2021</year>
          .
          <volume>00052</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Yadav</surname>
          </string-name>
          , E. Curry, Vekg:
          <article-title>Video event knowledge graph to represent video streams for complex event pattern matching</article-title>
          ,
          <source>in: 2019 First International Conference on Graph Computing (GC)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>20</lpage>
          . doi:
          <volume>10</volume>
          .1109/GC46384.
          <year>2019</year>
          .
          <volume>00011</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Besta</surname>
          </string-name>
          , T. Hoefler,
          <article-title>Survey and taxonomy of lossless graph compression and space-eficient graph representations</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1806</year>
          .01799.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Egami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ugai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N. N.</given-names>
            <surname>Htun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fukuda</surname>
          </string-name>
          ,
          <string-name>
            <surname>VHAKG:</surname>
          </string-name>
          <article-title>A multi-modal knowledge graph based on</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>