<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Study of Narrative Creation by Means of Crowds and Niches</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oana Inel</string-name>
          <email>oana.inel@vu.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabrina Sauer</string-name>
          <email>s.c.sauer@rug.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lora Aroyo</string-name>
          <email>lora.aroyo@vu.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Groningen</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vrije Universiteit Amsterdam</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>connotations of video material, and (2) annotations that are coarse-grained, i.e., focusing on keyframes and video fragments as opposed to full length videos. The main findings of the study are used to facilitate the automatic creation of narratives in the digital humanities exploratory search tool DIVE+1.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Social media provide a mainstream environment to produce,
share and comment on video material, which constitutes the
largest and still growing portion of Web content
        <xref ref-type="bibr" rid="ref3">(CISCO
2016)</xref>
        . An increasingly popular form of shared content are
GIFS
        <xref ref-type="bibr" rid="ref2">(Bakhshi et al. 2016)</xref>
        as micro-stories, i.e., short video
fragments that contain summaries or highlights of video
content on participatory platforms GIPHY and Twitter Vine
or social media platforms such as Facebook and Instagram.
      </p>
      <p>
        Humanities scholars use AV archives
        <xref ref-type="bibr" rid="ref1 ref7">(De Jong,
Ordelman, and Scagliola 2011)</xref>
        to answer their research questions
        <xref ref-type="bibr" rid="ref12 ref6">(Melgar et al. 2017)</xref>
        , but they face the challenge of
grappling with a vast amount of diverse AV content. The DIVE+
        <xref ref-type="bibr" rid="ref5">(De Boer et al. 2015)</xref>
        tool is conceived to assist scholars
in their exploration of digital content to ultimately create
meaningful stories and narratives. DIVE+ extends the
digital hermeneutics approach
        <xref ref-type="bibr" rid="ref10 ref16">(Van Den Akker et al. 2011)</xref>
        by
Copyright c 2018 for this paper by its authors. Copying permitted
for private and academic purposes.
      </p>
      <p>1http://diveplus.beeldengeluid.nl/
providing interactive access to multimedia objects enriched
with events, people, locations and concepts.</p>
      <p>
        Visualizing, mapping and constructing narratives play
a significant role in humanities research as they help to
contextualize historical material
        <xref ref-type="bibr" rid="ref11 ref4 ref8">(de Leeuw 2012; Mamber
2012)</xref>
        . The remix of AV content as animated GIFs
(Highfield and Leaver 2016) gained popularity as an object of
study and it is considered a powerful way of understanding
the implicit aspects of storytelling. However, the availability
of metadata information and semantic annotations
        <xref ref-type="bibr" rid="ref1 ref7">(Maccatrozzo et al. 2013; Aroyo, Nixon, and Miller 2011)</xref>
        such as
events, objects depicted in the video, relevance of the videos
is still a fundamental requirement (Kemman et al. 2013) for
scholars to accelerate their narrative-formation process.
      </p>
      <p>
        The focus of this paper is to understand how niches
        <xref ref-type="bibr" rid="ref4 ref8">(De Boer et al. 2012)</xref>
        , humanities scholars, interact with AV
archives to generate (micro-)narratives. Our research
question is: can we model the data and the semantics of AV
content to ease the creation of narratives? To answer this
question we conduct a nichesourcing study with millenials,
humanities students in which they use AV content to create
stories by means of sequences of GIFs. We analyze the
narrative creation process on three levels: (1) data - the remixed
videos to understand how the story is developed, (2)
narrative - the micro-story created in and across sequences of
GIFs to understand what drives the creation of a narrative,
and (3) semantics - the keywords describing the story to
understand the data enrichment needed to generate narratives.
      </p>
    </sec>
    <sec id="sec-2">
      <title>On the Use of Narratives in Digital Humanities</title>
      <p>
        DIVE+ accommodates the digital hermeneutics approach by
means of proto-narratives, i.e., relations between events and
their participating entities. To support the creation of such
proto-narratives, we gathered events and links between their
participating entities in textual AV content (i.e., description)
through a hybrid machine-crowd pipeline
        <xref ref-type="bibr" rid="ref12 ref6">(de Boer et al.
2017)</xref>
        . To further improve the narrative exploration and
creation in DIVE+, we performed a nichesourcing study with
millennial digital humanities master students to understand
how this community builds stories using AV material and
which are the needs in terms of data representation. While
in previous studies we focused on textual AV content, the
current study aims to understand the creation of narratives
through visual aspects such as video stills and fragments.
      </p>
      <p>Nine international humanities master students (age
between 21-25) enrolled in an interdisciplinary course about
urban street visualization in Amsterdam participated in our
niche study. Their task was to explore a dataset of archival
AV material and to construct overarching micro-stories, in
the shape of sequences of GIFs. A GIF is composed of three
keyframes, or a (set of) short video fragment(s). The
students were free to explore the dataset and to create GIFs
about topics that drew their attention in relation to the city
of Amsterdam, or in relation to the course literature.</p>
      <p>The dataset consists of archival video material about
Amsterdam, part of the Netherlands Institute for Sound and
Vision2 (NISV) open collections. We retrieved 624 videos
created between 1910-1989 on the NISV portal using the search
keyword “Amsterdam”. The dataset consists of news
broadcasts, varying in length from 50 seconds to 10 minutes, from
which we identified three time periods, as shown in Table 1.</p>
      <p>In the study we asked the students to choose a time period
in Table 1 and to watch at least 20 videos from that period.
The users had one week to complete the entire task, to log
their activity3 and: (1) indicate the GIF type, i.e.,
keyframeor fragment-based; (2) describe each GIF, keyframe and
video fragment with keywords; (3) provide the timestamps
of the keyframes (keyframe-based GIFs) or the interval of
the video fragment (fragment-based GIFs), among others.
The students were also asked to prepare a short presentation
to describe and motivate (1) the videos and the time period
they selected, (2) the selection of keyframes and video
fragments and (3) the story that is told in their GIFs.</p>
    </sec>
    <sec id="sec-3">
      <title>Nichesourcing Study Results</title>
      <p>We present the study results4 and analyze the data gathered
from the participating users by focusing on keyframes, video
fragments, GIFs and finally, the overarching micro-stories.</p>
      <sec id="sec-3-1">
        <title>The Data Level</title>
        <p>The users picked a time period as shown in Table 1. Their
choice was informed by either: (1) feeling unknowledgeable
about that period or (2) curiosity about a period when their
parents were their own current age. In total, 68 videos were
used across all the micro-stories and seven videos were used
in more than one micro-story. All the overlaps occurred for
the users that chose period P3, which is explained by the low
number of videos in P3 and the fact that the users were asked
to watch at least 20 videos. On the average, each user used
eight videos to generate a story, with a minimum of three
and a maximum of 20 videos per story.</p>
        <sec id="sec-3-1-1">
          <title>2http://www.beeldengeluid.nl 3Log File Template: http://tinyurl.com/zwgotp7 4https://tinyurl.com/alternate-stories</title>
          <p>Each story was composed of around eight GIFs (stdev of
five GIFs), with a minimum of four and a maximum of 20
GIFs. In total, 75 GIFs were generated: seven
keyframebased GIFs and 68 fragment-based GIFs. Only two users
generated keyframe-based GIFs, while all nine users
generated fragment-based GIFs. The 68 fragment-based GIFs
were generated by remixing and combining 89 video
fragments, meaning that around 25% of the fragment-based
GIFs were composed of more than one video fragment. On
average, 10 video fragments (stdev of 10) were used in each
micro-story, with a minimum of two and a maximum of 35
video fragments. Furthermore, eight GIFs were generated
by remixing keyframes and video fragments from multiple
videos (six keyframe-based and two fragment-based GIFs).</p>
          <p>In general, mostly keyframes and fragments from the
beginning of the videos were picked (55.45%), followed by
keyframes and fragments from the middle (24.55%) and
then by keyframes and fragments from the end of the video
(20%). When multiple keyframes and fragments from the
same video were remixed in the same GIF, the order was
always preserved, i.e., the keyframes and the fragments
were used in chronological order with respect to the video
stream. However, when looking at the entire story, we
observe that the users break the natural temporal and linear
sequence of videos by starting the story with video fragments
or keyframes from the middle or the end part of the videos.</p>
          <p>The majority of the GIFs are shorter than six seconds,
with only a few longer than 10 seconds. The average length
of a story is 43 seconds, with a maximum length of one
minute and 48 seconds and a minimum length of 12
seconds. On average, only 3.6% of the videos length was used
to generate each story, but, the length of the story is not
always proportional with the total length of the videos.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>The Narrative Level</title>
        <p>The users focused their micro-stories around themes that
were either inspired by the content of the videos, or by the
course literature (i.e., visualization of urban spaces). The
themes of the stories are: (1) mobility across the city, (2)
citizens co-constructing urban spaces, (3) gender relations
and (4) how urban routines relate to feelings of alienation
in a globalized world. Some users created literal narratives,
depicting aeroplanes, trains and bicycles to indicate
mobility, while others worked on an abstract level by, for example,
juxtaposing fragments of a person in a deep-sea diving suit
with shots of a newspaper article lamenting loneliness in the
city, to create a story about alienation.</p>
        <p>
          Users reported that creating sequences of GIFs enabled
them to develop more elaborate stories. However, moving
from GIF to GIF does not denote a sequential development
in time, but it is used to zoom out spatially, or to create a
jarring contrast between GIFs and thus, a more abstract story
for example, moving from a GIF about riots in the street, to
a deserted, ruined square in the city, to children repainting
a building, to create a story about urban decay and ideals.
Similarly, the story about gender relation creates a
counterpoint between women undergoing beauty procedures, while
men, in a separate GIF, seemingly loom over them.
The users were asked to provide keywords, tags, for their
GIFs, selected keyframes and video fragments. These tags
represent the users’ interpretation of the multimedia content
comprising their narratives and do not necessarily describe
the content, but act as an interpretation medium for the story.
To determine the type of keywords, we manually evaluated
them using the Panofsky-Shatford model
          <xref ref-type="bibr" rid="ref13 ref14">(Panofsky 1962;
Shatford 1986)</xref>
          presented in
          <xref ref-type="bibr" rid="ref10 ref16">(Gligorov et al. 2011)</xref>
          . We
distinguish three levels of keywords: abstract - symbolic or
subjective concepts that allow for various interpretations,
general - generic words and specific - property of being
unique. Further, each level consists of four facets: who -
subject, what - object or event, where - location and when - time.
        </p>
        <p>We classified 207 (168 unique) tags that describe the GIFs
and 262 (159 unique) tags that describe the keyframes and
fragments composing the GIFs. The majority of the
keywords are general, followed by specific and then by abstract
keywords. When looking at the facets, we observe that more
than 60% of the keywords belong to the what facet. The
smallest number of keywords belongs to the when facet, with
around 1% in all cases. While the keywords describing the
who and where facets are evenly distributed among the
keywords describing the GIFs, the amount of keywords
describing the keyframes and fragments belonging to the where
facet is much greater than the amount of keywords
describing the who facet. While at the abstract and general levels a
significant amount of keywords belong to the what facet, at
the specific level, the users provided more keywords
belonging to the where facet, and less for the what facet, showing
that users tend to provide specific locations.</p>
        <p>
          In storytelling, people can refer to concepts,
perspectives, opinions that are not physically present in the video,
but are referred to or expressed. As research
          <xref ref-type="bibr" rid="ref15 ref9">(Trant 2009;
Gligorov et al. 2010)</xref>
          indicates, there is also a gap between
professional and lay user tags describing video content. To
understand the semantics of the keywords provided by users,
we look at their overlap with: (1) the machine extracted
keywords and (2) the professional tags. We retrieved the
professional tags from the NISV portal and we extracted the visual
tags and concepts from each video fragment and keyframe
composing each GIF using the online tool Clarifai5, which
performs both image and video concepts recognition.
        </p>
        <p>The overlap between the visual and the keywords
provided by the users is quite low: 33% with the keywords
describing stills and fragments and 49% with the keywords
describing the GIFs. At the level of general concepts, the
tags provided by the scholars overlap in proportion of 99%
with the visual tags. This suggests that for (micro)narrative
creation, what is visualized - generally - steers the narrative
contained in the story. The overlap between user and
professional keywords is even lower, 26% for keyframes and
fragments and 30% for GIFs. In contrast to the visual tags,
the professional tags do contain specific tags which usually
refer to places, the where facet. For the facet distribution at
the general level, the proportion of overlapping what facets
is higher at the level of sequences but lower at the level of the</p>
        <sec id="sec-3-2-1">
          <title>5https://www.clarifai.com</title>
          <p>GIFs when compared to the visual tags. The
professionalsuser gap is clearly defined at the level of abstract concepts.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Future Work</title>
      <p>The nichesourcing study aimed to bring insight into
storytelling in digital humanities by exploring the interaction and
interpretation of micro-narratives remixed using archival AV
content. Overall, users tend to generate GIFs by remixing
material positioned in the first part of the videos,
disregarding the GIF position in the final produced micro-story. The
temporal aspect is even more disrupted when users start their
narrative with GIFs that contain keyframes and fragments
from the middle and the end part of the videos, or when they
finish their story with GIFs containing keyframes and
fragments from the beginning of videos. Therefore, the original
temporal sequence of the video is not relevant when
remixing video footage for creative storytelling.</p>
      <p>Users ascribe similar interpretations and meanings to their
micro-narratives to those contained in visual tags, while they
tag the chosen sequences more in terms of their function as
a narrative building-block. Although at the GIF level users
ascribe similar meaning to the video material as the
professionals, they engage in scholarly interpretation on the
keyframe level. Thus, the interpretation of meaning in
storytelling is, to some extent, developed serendipitously and as a
user- and context-centric development, driven by humanities
research interests. Time seems - as our facet analysis
emphasizes - less important than the where or what facets. Hence,
people find events and objects the most relevant when
building narratives. General keywords referring to events, objects,
places and people almost entirely overlap with visual tags.
Thus, the understanding of visual aspects, especially event
and concept-centric, is needed to steer the story line.</p>
      <p>In summary, humanities scholars need rich enrichments of
AV datasets to facilitate the creation of narratives. However,
storytelling through video remixing is a creative process that
can not rely only on visual aspects. Deep semantic
enrichment is needed to cover both implicit and explicit video
concepts and perspectives. For exploratory-centric tools such as
DIVE+ it is crucial to: (1) provide easy access to already
extracted keyframes and video fragments as opposed to
expecting the user to watch full videos; (2) provide deep
semantic enrichment of keyframes and video fragments
focusing on specific and general actors or people, locations, time
periods, objects and most importantly events. Events play a
central role in narrative development. Since event centrality
is already a main aspect of DIVE+, we will focus on also
integrating crowd-driven keyframes and video fragments
semantics to offer users direct access to relevant information.
DIVE+ users should be able to access smaller video
granularity of interest and their enrichments, as opposed to
watching the entire video and inspecting general video metadata.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>The research for this paper was made possible by the
CLARIAH-CORE (www.clariah.nl) project financed
by NWO and by the Netherlands Institute for Sound and
Vision and NWO under project nr. CI-14-25.
[Highfield and Leaver 2016] Highfield, T., and Leaver, T.
2016. Instagrammatics and digital methods: studying
visual social media, from selfies and gifs to memes and emoji.
Communication Research and Practice 2(1):47–62.
[Kemman et al. 2013] Kemman, M.; Scagliola, S.; de Jong,
F.; and Ordelman, R. 2013. Talking with scholars:
Developing a research environment for oral history collections. In
International Conference on Theory and Practice of Digital
Libraries, 197–201. Springer.
[Maccatrozzo et al. 2013] Maccatrozzo,
V.;</p>
      <p>Van Hage, W. R.; et al. 2013. Crowdsourced evaluation of
semantic patterns for recommendations.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Aroyo, Nixon, and Miller 2011]
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nixon</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Notube: the television experience enhanced by online social and semantic data</article-title>
          .
          <source>In Consumer Electronics-Berlin (ICCE-Berlin)</source>
          , 2011 IEEE International Conference on,
          <fpage>269</fpage>
          -
          <lpage>273</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Bakhshi et al. 2016]
          <string-name>
            <surname>Bakhshi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shamma</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kennedy</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Song</surname>
          </string-name>
          , Y.; de Juan, P.; and
          <string-name>
            <surname>Kaye</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Fast, cheap, and good: Why animated gifs engage us</article-title>
          .
          <source>In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems</source>
          , CHI '
          <volume>16</volume>
          ,
          <fpage>575</fpage>
          -
          <lpage>586</lpage>
          . New York, NY, USA: ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[CISCO 2016] CISCO</source>
          .
          <year>2016</year>
          .
          <article-title>Cisco visual networking index: Forecast and methodology,</article-title>
          <year>2015</year>
          -
          <fpage>2020</fpage>
          . http:// tinyurl.com/hd7gd45.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>[De Boer</surname>
            et al. 2012]
            <given-names>De</given-names>
          </string-name>
          <string-name>
            <surname>Boer</surname>
          </string-name>
          , V.;
          <string-name>
            <surname>Hildebrand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; De Leenheer,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Dijkshoorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Tesfa</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          ; and Schreiber,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <year>2012</year>
          .
          <article-title>Nichesourcing: harnessing the power of crowds of experts</article-title>
          .
          <source>In International Conference on Knowledge Engineering and Knowledge Management</source>
          ,
          <fpage>16</fpage>
          -
          <lpage>20</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>[De Boer</surname>
            et al. 2015]
            <given-names>De</given-names>
          </string-name>
          <string-name>
            <surname>Boer</surname>
          </string-name>
          , V.;
          <string-name>
            <surname>Oomen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Inel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L.; Van</given-names>
          </string-name>
          <string-name>
            <surname>Staveren</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Helmich</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>De Beurs</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Dive into the event-based browsing of linked historical media</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>35</volume>
          :
          <fpage>152</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>[de Boer</surname>
          </string-name>
          et al. 2017
          <string-name>
            <surname>] de Boer</surname>
          </string-name>
          , V.;
          <string-name>
            <surname>Melgar</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Inel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ortiz</surname>
            ,
            <given-names>C. M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Oomen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Enriching media collections for event-based exploration</article-title>
          .
          <source>In Research Conference on Metadata and Semantics Research</source>
          ,
          <fpage>189</fpage>
          -
          <lpage>201</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [De Jong, Ordelman, and Scagliola 2011] De Jong, F.;
          <string-name>
            <surname>Ordelman</surname>
          </string-name>
          , R.; and
          <string-name>
            <surname>Scagliola</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Audio-visual collections and the user needs of scholars in the humanities: a case for co-development.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [de Leeuw 2012] de Leeuw,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <year>2012</year>
          .
          <article-title>European television history online: history and challenges</article-title>
          .
          <source>VIEW Journal of European Television History and Culture</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Gligorov et al. 2010] Gligorov,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Baltussen, L. B.; van Ossenbruggen,
          <string-name>
            <given-names>J.</given-names>
            ;
            <surname>Aroyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ;
            <surname>Brinkerink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Oomen</surname>
          </string-name>
          , J.; and van
          <string-name>
            <surname>Ees</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Towards integration of end-user tags with professional annotations</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Gligorov et al. 2011] Gligorov,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Hildebrand, M.; van Ossenbruggen,
          <string-name>
            <given-names>J.</given-names>
            ; Schreiber, G.; and
            <surname>Aroyo</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>On the role of user-generated metadata in audio visual collections</article-title>
          .
          <source>In Proceedings of the sixth international conference on Knowledge capture</source>
          ,
          <fpage>145</fpage>
          -
          <lpage>152</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Mamber 2012] Mamber,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <year>2012</year>
          .
          <article-title>Narrative mapping</article-title>
          . In Everett, A., and
          <string-name>
            <surname>Caldwell</surname>
          </string-name>
          , J., eds.,
          <source>New Media: Theories and Practices of Intertextuality. Routledge</source>
          .
          <volume>145</volume>
          -
          <fpage>158</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Melgar et al. 2017]
          <string-name>
            <surname>Melgar</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Koolen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Huurdeman</surname>
          </string-name>
          , H.; and
          <string-name>
            <surname>Blom</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>A process model of scholarly media annotation</article-title>
          .
          <source>In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval</source>
          ,
          <volume>305</volume>
          -
          <fpage>308</fpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Panofsky 1962]
          <string-name>
            <surname>Panofsky</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>1962</year>
          .
          <article-title>Studies in Iconology: Humanist Themes in the Art of the Renaissance</article-title>
          .
          <source>Harper &amp; Row.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Shatford 1986] Shatford,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <year>1986</year>
          .
          <article-title>Analyzing the subject of a picture: a theoretical approach</article-title>
          .
          <source>Cataloging &amp; classification quarterly 6</source>
          (
          <issue>3</issue>
          ):
          <fpage>39</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Trant 2009] Trant,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2009</year>
          .
          <article-title>Steve: The art museum social tagging project: A report on the tag contributor experience</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>[Van Den</surname>
            Akker et al. 2011] Van Den Akker, C.; Legeˆne, S.; Van Erp,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Segers</surname>
            ,
            <given-names>R.; van Der</given-names>
          </string-name>
          <string-name>
            <surname>Meij</surname>
            , L.; Van Ossenbruggen,
            <given-names>J.</given-names>
          </string-name>
          ; Schreiber,
          <string-name>
            <given-names>G.</given-names>
            ;
            <surname>Wielinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>Oomen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ; et al.
          <year>2011</year>
          .
          <article-title>Digital hermeneutics: Agora and the online understanding of cultural heritage</article-title>
          .
          <source>In Proceedings of the 3rd International Web Science Conference</source>
          ,
          <volume>10</volume>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>