=Paper= {{Paper |id=Vol-1992/paper_5 |storemode=property |title= City-Stories: A Multimedia Hybrid Content and Entity Retrieval System for Historical Data |pdfUrl=https://ceur-ws.org/Vol-1992/paper_5.pdf |volume=Vol-1992 |authors=Shaban Shabani,Maria Sokhn,Laura Rettig,Philippe Cudré-Mauroux,Lukas Beck,Claudiu Tanase,Heiko Schuldt |dblpUrl=https://dblp.org/rec/conf/histoinfo/ShabaniSRCBTS17 }} == City-Stories: A Multimedia Hybrid Content and Entity Retrieval System for Historical Data== https://ceur-ws.org/Vol-1992/paper_5.pdf
 City-Stories: A Multimedia Hybrid Content and Entity Retrieval
                   System for Historical Data
    Shaban Shabani, Maria Sokhn                                Laura Rettig,                         Lukas Beck, Claudiu Tănase,
                                                         Philippe Cudré-Mauroux                           Heiko Schuldt
      Institute of Information Systems                         eXascale Infolab                     Databases and Information Systems
     HES-SO Valais-Wallis, Switzerland                University of Fribourg, Switzerland            University of Basel, Switzerland
       {firstname.lastname}@hevs.ch                    {firstname.lastname}@unifr.ch                 {firstname.lastname}@unibas.ch


ABSTRACT                                                                    KEYWORDS
Information systems used in tourism rely mostly on up-to-date               Content-based Retrieval, Spatio-temporal Querying, Multimedia
content on attractive places. In addition, these systems increas-           Databases, Multimodal Interaction, Historical Multimedia, Crowd-
ingly make use of archived photographs, documents, films, or even           sourcing, Entity Linking
ancient paintings and other artwork by integrating such curated
                                                                            ACM Reference format:
content from museums and memory institutions, possibly enriched             Shaban Shabani, Maria Sokhn, Laura Rettig, Philippe Cudré-Mauroux,
with user-provided content. Hence the distinction between cul-              and Lukas Beck, Claudiu Tănase, Heiko Schuldt. 2017. City-Stories: A
tural heritage applications and tourism more and more blurs. Users          Multimedia Hybrid Content and Entity Retrieval System for Historical Data.
are not only interested in the current appearance of landscapes,            In Proceedings of the 4t h International Workshop on Computational History,
monuments, or buildings, but also in the evolution of these places          Singapore, November 2017 (HistoInformatics 2017), 8 pages.
over time. This requires large multimedia collections which in-
tegrate content from several cultural heritage institutions. As a
consequence, interactive retrieval systems for historical multimedia
                                                                            1    INTRODUCTION
are needed that support homogeneous content-based and semantic              Multimedia data on places of interest like documents, photos, videos,
querying despite the heterogeneity of these collections. In this            or user ratings are the most important sources used in information
paper we present City-Stories, a multimedia hybrid content and              systems for tourists. While systems have so far focused on up-to-
entity retrieval system. City-Stories is based on a state-of-the-art        date content, historical material taken from archives is increasingly
open source multimedia retrieval system. Multimedia features in             gaining importance in order to give tourists more information on
City-Stories represent multiple semantic levels: low-level (e.g., color,    how particular touristic sites have developed over time, i.e., how
edge, motion), mid-level (e.g., date, location, objects), and high-level    they looked 20, 50, 100, or even more years ago. With the content
features (e.g., semantic entities, scene category). For the latter, City-   from museums and memory institutions, the distinction between
Stories applies entity recognition and entity linking for identifying       cultural heritage applications and information systems for tourism
semantic concepts and linking objects across media types. Conse-            increasingly blurs, despite the fact that content may differ signifi-
quently, City-Stories supports various types of cross-modal queries.        cantly (in terms of media types and formats, age, availability and
Moreover, City-Stories uses a map-based visualization layer that            degree of detail of metadata/annotations, etc.). Moreover, content
facilitates spatial queries and browsing. Finally, City-Stories follows     provided by users out of their private archives is also gaining impor-
a crowdsourcing approach for content annotation and for enriching           tance due to the proliferation of social networks and crowdsourcing
curated content with multimedia objects and documents provided              platforms.
by users. The paper shows how the City-Stories system seamlessly               In order to provide integrated access to such heterogeneous con-
combines content-based search with entity-based navigation and              tent, several important technical challenges need to be addressed:
leverages the wisdom of the crowd for manual annotations.                      Multimedia Retrieval. The integrated content should be acces-
                                                                            sible by a very broad range of different query types, such as key-
CCS
•      CONCEPTS                                                             word queries to search in (manual) textual annotations, query-by-
 Information Systems → Multimedia and Multimodal Re-                        example (multimedia search with sample objects), query-by-sketch
trieval; Social tagging systems; Multimedia Databases; Spatial-tem-         (multimedia similarity search on the basis of hand-drawn sketches),
poral Systems; •Computing Methodologies → Semantic Net-                     semantic queries that exploit semantic concepts and links between
works; •Human-centered Computing → Collaborative and Social                 objects, spatio-temporal queries (i.e., queries on the location and/or
Computing Systems and Tools;                                                time where/when a particular object has been created), and any
                                                                            combination of these modes.
                                                                              Entity Recognition and Linking. Content coming from different
                                                                            sources, in different formats, and possibly also with different meta-
                                                                            data structures has to be integrated to make sure that it can be
                                                                            accessed via a homogeneous interface. This includes standard ap-
                                                                            proaches to schema and data integration, but also more advanced
HistoInformatics 2017, November 2017, Singapore                                                                                 S. Shabani et al.


and innovative challenges like entity recognition and entity linking
to make sure that links between objects (of the same or even of
different media types) can be identified, stored as part of the meta-
data, enhanced with further external sources, and subsequently
exploited for query purposes.

   Crowdsourcing. In addition to cultural heritage content curated
by archives, user-generated content from private collections is
gaining importance in touristic information systems. In order to
attract the attention to potential content providers, the awareness
of such touristic platforms has to be raised, the technical barrier for
contribution has to be lowered, and users have to be encouraged to
actively participate. This is not only true for the provision of new
content but also for annotations to existing content (e.g., ratings or
experience reports). The rapid adoption of smartphones has made it
possible to also exploit mobile crowdsourcing [1] as an efficient and
easy way of reaching and using human intelligence and machine
computation for solving Human Intelligence Tasks (HITs).
   In this paper, we introduce City-Stories, a novel and innovative
system for collecting, managing, and accessing heterogeneous cul-
tural heritage content for touristic applications. The City-Stories                Figure 1: City-Stories conceptual architecture.
browser allows to retrieve spatio-temporal knowledge and supports
real-time interactive search in large databases of historical multi-
media collections. From a systems perspective, City-Stories is based       linking, thereby allowing for cross-media retrieval based on seman-
on vitrivr1 [2] which in turn uses the retrieval engine Cineast [3]        tic concepts. Second, we show how curated collections and auto-
and the distributed database backend ADAMpr o [4]. Information             matically generated metadata can be extended by user-generated
is extracted both from content and metadata and can be simulta-            content and user-provided metadata, which is particularly relevant
neously queried in both modes. City-Stories’ content-based search          in applications for tourists and which complements manually cu-
extends the functionality of the vitrivr engine, which allows interac-     rated cultural heritage collections. Third, we show how all these
tive, efficient multi-feature retrieval in large multimedia collections.   elements can be seamlessly combined in the City-Stories system.
Metadata of the multimedia objects (automatically generated or                The remainder of the paper is structured as follows: Section 2 mo-
manually added) is used to enrich the description of content in            tivates the City-Stories approach with a tourism use case. Section 3
several ways:                                                              discusses the components needed for the integrated multimedia
        • Low-level features like color, edge, motion, mid-level fea-      content and entity retrieval system and Section 4 presents details
          tures like date, location, and high-level features like seman-   of the City-Stories system. Section 5 summarizes related work and
          tic entities or scene categories.                                Section 6 concludes.
        • Spatio-temporal metadata in the collection is directly im-
          ported as vector location and timestamp features; the City-
                                                                           2    MOTIVATION
          Stories frontend allows for spatial, temporal, and spatio-       Consider, as an example for City-Stories, the following use case:
          temporal queries and the results are displayed in a map          Sophia, a tourist from Dublin, is visiting the city center of Berlin.
          and on a timeline, respectively.                                 When standing in front of the Brandenburg Gate, one of Berlin’s
        • Manual annotations and user ratings are provided via crowd-      neoclassical city gates, she has a variety of questions regarding
          sourcing.                                                        the building and its neighborhood, like ‘What building is this, what
        • Textual metadata (usually in the form of title and descrip-      was its purpose, when and by whom has it been built?’ or ‘How did
          tion of an item) is subjected to entity extraction, which        the neighborhood of the gate look like around 1900, in the so-called
          yields Uniform Resource Identifiers (URIs) in a knowledge        ‘golden’ 1920’s, shortly after the end of WW2, in the 1970s, before and
          base. These entity URIs are then used to pre-compute se-         after the fall of the iron curtain in 1989 — or at any other point in
          mantic entity distances between collection items.                time in the past?’.
                                                                              Sophia holds a smartphone on which she accesses the City-Stories
   Hence, City-Stories combines content-based search with entity-          query interface. The City-Stories system integrates several cul-
based navigation and leverages the wisdom of the crowd for manual          tural heritage multimedia collections, e.g., from the German federal
annotations.                                                               archive or from focused museum collections. Moreover, City-Stories
   The contribution of the paper is threefold. First, we show how          encompasses a large number of photos and associated metadata
multi-feature content-based similarity search providing a plethora         provided by local citizens. Using City-Stories, Sophia is able to
of query types can be enriched by entity recognition and entity            directly browse the content and submit queries of different types:
                                                                                  • Simple location queries: Using the GPS coordinates of her
1 https://vitrivr.org                                                               current location, Sophia is able to identify the building
City-Stories                                                                             HistoInformatics 2017, November 2017, Singapore




                               Figure 2: Content-based similarity search in Cultural Heritage content.


        she is currently looking at and get access to basic informa-             her smartphone as query input and adds a superimposed
        tion regarding this building, combined from several data                 sketch (e.g., she draws a typical Berlin double-deck coach
        sources on the web.                                                      in the foreground).
      • Temporal queries: On the basis of information from var-             Most importantly, Sophia does not want to use an earmarked
        ious sources that have been integrated beforehand into           smartphone app provided by a local tourist organization with man-
        the City-Stories system, Sophia is able to query details of      ually curated content specialized for a particular touristic site, as
        the building’s history (photos or historical paintings from      she would have to newly install such an app every time she visits
        different stages of the building). Moreover, she will also get   another place. Rather, Sophia is interested in City-Stories, a generic
        information on the building’s neighborhood at different          approach that can be used to integrate and access content indepen-
        points in time, on historical events that took place there,      dent of a concrete location, so she could use this system for her
        and statistical information (e.g., population of the city at     next trips to Singapore, to visually explore the recent growth of the
        different points in time). The latter is based on linked meta-   Marina area, or to New York, for instance to allow her to visualize
        data, i.e., metadata enriched with links between objects         the development of Lower Manhattan over the past 120 years.
        after entity recognition has been applied to the content.
      • Combined spatial and multimedia similarity queries: These
        queries allow to search for similar buildings (or buildings
                                                                         3     CONCEPTS
        that take a similar role) in the vicinity of Sophia’s current    Figure 1 illustrates the different contributions of the City-Stories
        location.                                                        system and their interplay.
      • Multimedia similarity queries: Sophia takes a photo of the
        Brandenburg Gate with the camera of her smartphone. She          3.1    Spatio-Temporal Retrieval Engine
        will use this photo for a similarity query in order to find      The multimedia retrieval engine of City-Stories, which is based on
        other buildings (in Berlin or in any other European city)        the vitrivr system, offers multiple different query modes: query-
        that look similar. Here, similarity can be defined either        by-example (QbE), query-by-sketch (QbS), temporal, and spatial
        by intrinsic image features, on the base of the object’s         queries. In a given query we can freely combine these different
        metadata or links, or any combination of these.                  modes, e.g., by providing an exemplary image, drawing a sketch
      • Combined sketch-image similarity queries: Sophia provides        on top of an existing or a provided image, and/or specifying a
        one of the photos of the Brandenburg Gate taken with             location. To provide this functionality, we extend the software stack
HistoInformatics 2017, November 2017, Singapore                                                                              S. Shabani et al.




                Figure 3: Screenshot of the spatio-temporal content-based retrieval front-end of City-Stories [5].


of vitrivr including Cineast, the extraction and retrieval engine, and   3.2      Entity Recognition and Linking
ADAMpr o , the database engine.                                          In order to integrate the archive data available to us with other data
   In Cineast, we differentiate between an on-line and an off-line       sources such as knowledge bases, we apply named-entity recogni-
phase. The off-line phase includes the feature extraction and the        tion, candidate selection, and entity linking techniques on available
storage of the resulting metadata. In the on-line phase, where the       text data. Named-entity recognition is the task of identifying men-
actual retrieval happens, the engine executes a given query and          tions of entities, which can take various surface forms, in text. Once
returns a list of documents ordered by similarity.                       an entity mention has been extracted, the corresponding URI in the
   Cineast uses multiple features in combination to describe a doc-      knowledge base has to be found. Selecting the URI corresponding
ument. In total, there are already 40 content descriptors for videos     to a mention of an entity in text is called entity linking.
and images provided by Cineast. These descriptors extract mainly             By linking to a knowledge base, specifically to DBpedia2 [6], we
color, edge, and motion information (where applicable). Building         enhance our data with relevant information on the entity in the
on top of that, we extend Cineast with the following higher level        knowledge base and can link data from further sources to the same
descriptors:                                                             entities for integration.
      • Spatial and temporal similarity features by using geolocali-         In order to identify entities associated with media items, we
        zation and timestamp metadata provided by the content            use textual media metadata and extract entities from associated
        (usually via the capturing devices).                             titles and descriptions. Consequently, when displaying media items,
      • A descriptor using semantic entities that are provided by the    additional information from the knowledge base can be retrieved
        entity recognition described in what follows in Section 3.2.     and displayed. Knowledge bases tend to offer various attributes for
      • Semantic concept features provided by a deep neural net-         entities, many of which will not be relevant to the viewer. Thus,
        work.                                                            2 http://dbpedia.org
City-Stories                                                                                      HistoInformatics 2017, November 2017, Singapore


                                                             TGV candidates

                                                               Martigny
                                                               candidates                                Arrivée du premier TGV {http://fr.dbpedia.org/
        Arrivée du premier TGV des                                                                           resource/TGV} des neiges en gare de
                                                               http://fr.dbpedia.org/
        neiges en gare de Martigny                             resource/Martigny, 0.9                    Martigny {http://fr.dbpedia.org/resource/Martigny}
                                                                                            candidate
                                                               ________________             selection
               entity recognition        surface form -                                                                 entity linking
                                                               ________________
                                          entity pairs



                                            Figure 4: Entity recognition and linking process.


when choosing which information to display, we also rank the                 fraud but may affect the overall system due to misunderstandings
attributes by their importance to the viewer (for example, when              or lack of knowledge and experience with the platform [9, 10]. As
viewing information on a city, the population and the founding               the latter scenario is more likely to happen in our case, we apply two
year will likely be more relevant to a tourist than the ZIP codes in         quality control mechanisms to evaluate the quality of volunteers’
this city).                                                                  work in City-Stories: play cards and a weighted majority voting with
   Furthermore, we are able to leverage the relationships between            reputation system.
media items by extracting the relationships between entities in a
graph-structured knowledge base. This also allows to transitively
                                                                             4     CITY-STORIES SYSTEM
extract related information, e.g., for entities with little available
data. Oftentimes, in such graph structures, we can rely on higher-           In what follows, we describe details on the implementation of the
level categories to provide general information on an entity for             different components of City-Stories and the content that has been
which specific information may not be available.                             integrated in a first prototype.


                                                                             4.1         Spatio-temporal Multimedia Browser
3.3    Crowdsourcing
                                                                             When querying the retrieval engine via the City-Stories UI, for
The crowdsourcing component provides the possibility of enriching            each given feature, Cineast performs an extraction on the given
content with new data provided by different types of users and               query document resulting in a feature vector. Each feature vector is
enhancing the integrated digital collection data.                            passed to the ADAMpr o database backend to perform a k-nearest
   Collected data is not always complete, i.e., it could be noisy            neighbor (k-NN) similarity search to find a list of similar documents.
and/or miss particular information. In a tourist application, for            ADAMpr o has been shown to scale to collection sizes of up to 50
instance, information provided by users could miss the location              million entries and feature vectors with up to 500 dimensions [4].
where a point of cultural interest can be found, to whom it belongs,         After receiving a list of documents ranked by similarity for each
or even the title or a description. Likewise, data might be incorrect        feature, the Cineast retrieval engine merges these results into one
or conflicting as a result of the integration of collected datasets          result list (depicted in Figure 2).
coming from different sources. Hence, these challenges are grouped              In particular, for the spatial and temporal similarity search we
into two categories: conflicting information and missing information,        use a nearest neighbor query on the two-dimensional geolocation
both of which are addressed in City-Stories using crowdsourcing.             data and the one-dimensional timestamp data, respectively (see
   Crowdsourcing is able to build an open, connected, and smart              Figure 3).
cultural heritage with involved consumers and providers [7]. The                To search for similar documents based on semantic properties,
crowdsourcing service built into City-Stories enables volunteers to          we provide two different kinds of features:
engage in order to share new data, as well as complete the existing
data and improve data quality.                                                          • Based on entity recognition and linking, each document
   In order to optimize the assignment of tasks to the appropriate                        is characterized by a list of semantic entities. Using this
crowds, we make use of push crowdsourcing [8]. In contrast to                             list of semantic entities we calculate a pairwise distance to
standard pull crowdsourcing, where workers pull the tasks at ran-                         estimate the similarity between two documents.
dom, push crowdsourcing is oriented towards modeling tasks based                        • The similarity based on semantic concepts utilizes an Alex-
on users’ profiles and pushed to them. At first, users specify their                      Net convolutional network [11]. Using the output of the
interests and topics they have knowledge on. Then, by matching                            last fully connected layer, fc7, we obtain a 4096-dimensional
users’ profiles with the available HITs, City-Stories recommends                          feature vector for a given image that can be used in a k-NN
tasks to the best matched users based on their interests and skills.                      search. We provide two different features by training the
   In paid micro crowdsourcing scenarios, money as incentive is                           network on different datasets. The training data from the
the main motivator for workers to contribute and can also be used                         Places2 dataset [12] equates to a feature focusing on scene
for quality control (e.g., constrain a payment on the quality of the                      and environment similarity. Compared to Places2, the data
work that has been done). However, it is important to distinguish                         from the MS COCO Detection challenge [13] provides a
malicious users from workers that do not have the intention of                            feature focusing on object similarity inside a scene.
HistoInformatics 2017, November 2017, Singapore                                                                                  S. Shabani et al.




                                         Figure 5: Crowdsourcing quality control and task management process.


4.2      Entity Recognition and Linking                                      4.3     Crowdsourcing
The data integration component of the City-Stories system focuses            In parallel to collecting data from institutions such as audio/visual
on leveraging textual data to integrate different sources and schemata.      archives, Mediatheque4 , and DigitalValais, we emphasize the im-
In the implementation, we use metadata provided with the media               portance of data sharing from people that have valuable data and
items in the DigitalValais3 dataset, specifically, title and description     information about cultural heritage in private collections. This
corresponding to each item, to extract and link related entities.            part of the system enables users to participate and contribute to
   The implementation consists of three steps (see Figure 4):                cultural heritage. Once shared, users’ data is integrated to the data
     (1) Named-entity recognition: Using the Stanford Named Entity           repository. In order to maintain a high level of quality of the crowd-
         Recognizer (NER) [14], we extract likely mentions of enti-          sourced data, we apply the following control methods (shown in
         ties from the full text, which may be in different languages        Figure 5):
         (German, French or any other local language).                          Play cards is a test measure used to qualify or disqualify users
     (2) Candidate selection: This step consists in choosing a set of        for solving tasks of a certain category. We use 195 playing cards
         candidate entities that may correspond to the extracted             grouped in 13 different categories, where categories represent sub-
         surface form. We choose a set of candidate entities for each        ject areas of the crowdsourcing tasks. Each card contains both
         extracted mention in the text.                                      a question and its answer (not visible to the user). At first users
     (3) Entity linking: We rank the candidates and choose the best          provide information on their topics of interest which intersect with
         matching entity for each mention, then add the DBpedia              card categories. Then they are forwarded to the test phase. To get
         URI to the media item in the City-Stories database.                 qualified, they have to correctly answer at least 70% of the questions
                                                                             matched to their interests. Upon two consecutive failures, a user is
   Candidates are selected from a database of pairs of surface forms
                                                                             no longer considered for tasks on these specific topics. Providing
and entities (given as DBpedia URIs). This database is created by
                                                                             the option to choose topics they like or have knowledge on avoids
processing all Wikipedia articles and extracting hyperlinks where
                                                                             false positives, i.e., eliminating users due to lack of knowledge on
the hyperlink text corresponds to the observed surface form (i.e., the
                                                                             randomly assigned questions coming from a pool of predefined
link text) and the link location corresponds to the entity this surface
                                                                             questions. Moreover, implementing this measure in a game fashion
form links to (i.e., the linked article). The frequency of observing a
                                                                             increases the interactivity and the interest of the users. Addition-
specific (surface form, entity) pair yields a prior probability that a
                                                                             ally, users can test their knowledge on topics covered by the cards
particular surface form corresponds to the linked entity. We rank
                                                                             and at the same time expand their knowledge with additionally
the candidates using this prior score and select the top candidate
                                                                             provided information by these cards.
to link this mention to an entity in the knowledge base.
                                                                                A weighted majority voting with reputation system combines ma-
   Having done the linking, we are able to create a graph of the
                                                                             jority voting (MV) with users’ reputation scores. MV [15] as a
media items where an edge is present between two items if they
                                                                             quality control mechanism assigns the same task to multiple users
contain the same entity and are thus related. Knowing the DBpedia
                                                                             and aggregates the results to choose the right answer. On the other
URI of an entity found in the metadata of a media item, we extract
                                                                             hand, the reputation strategy allows to track users’ performance
relevant attributes for this entity by ranking the attributes with their
frequency of appearing in close proximity to this type of entity.
3 http://www.valais-wallis-digital.ch/                                       4 http://www.mediatheque.ch/
City-Stories                                                                                 HistoInformatics 2017, November 2017, Singapore




       (a) Data sharing                                          (b) Play cards                                     (c) Annotation task sample

                                 Figure 6: Screenshot of the crowdsourcing front-end of City-Stories


during crowdsourcing tasks and is complementary to majority vot-           annotating collection objects. Their system lacks a trust assessment
ing. The users’ results from the play cards qualification tests are        which denotes a key issue for data quality. SCULPTEUR [22] is a
assigned as initial reputation scores. These scores are later used as      multimedia retrieval system for searching digital collections in mu-
weights during the aggregation of the answers, i.e. an answer from         seums. It features content-based image retrieval as well as semantic
a user with high reputation has higher weight. After each aggrega-         retrieval using metadata and a semantic layer. An extension of this
tion phase, the users’ reputation scores are updated by considering        work [23] provides a hybrid model for cultural heritage collections,
their outcomes on that task.                                               combining the two retrieval methods for image search.
   A screenshot of the crowdsourcing frontend of City-Stories show-            Named-entity recognition is based largely on natural language
ing how data is shared, the play cards for quality control, and sample     processing techniques and is required as a step prior to performing
annotations is depicted in Figure 6.                                       candidate selection and named-entity disambiguation. The Stan-
                                                                           ford NER [14] is trained by combining a constraint model with a
5     RELATED WORK                                                         sequence model for the purpose of extracting information from
In general, existing retrieval systems applied to the cultural her-        text, including named entities. Prokofyev et al. [24] employ n-gram
itage domain are either metadata-based or content-based, and few           based features for NER and demonstrate their accuracy in idiosyn-
of them implement both [16]. Metadata-based systems focus on               cratic domains, which could also be applied to the domain of his-
keyword-based queries and linking of digital objects with external         torical archive data. SANAPHOR [25] introduces the use of a type
data sources, whereas content-based systems focus on query-by-             system on top of recognized entities. The relatedness of types is
example, query-by-specification and browsing.                              then used to identify co-references referring to the same entity,
   EUscreen5 is a project related to multimedia archives that fo-          and to link these identified mentions of an entity to a DBpedia
cuses on the collection, integration, and publication of audio-visual      URI. DBpedia Spotlight [26] provides the entire pipeline from NER
content. Oomen et al. [17, 18] created a European television archive       over candidate selection, i.e., retrieving candidate entities that may
using data from many different TV broadcasters. It provides an             correspond to the extracted surface form, to entity linking. Their
interface with keyword search. Media in Context6 is a platform for         work has focused on the implementation of a usable system for
cross-media extraction (via pipelined extractors), analysis, metadata      multi-lingual entity extraction and linking.
publishing, and querying. Within this project lies Sensefy [19], a             Crowdsourcing has shown to be an effective solution for prob-
multimedia search and information retrieval system, that provides          lems that are difficult to solve for computers and problems that
metadata keyword search and object linking with real world entities        require human intelligence [27]. Its popularity has grown due to
and concepts. Otegi et al. [20] introduced Personalized PageRank,          online platforms such as Amazon Mechanical Turk7 and Crowd-
a tool for generating personalized recommendations in a cultural           Flower8 , which allow crowds to participate in solving paid micro-
heritage collection. They use metadata, session logs from users,           tasks. Concerning quality control, a broadly used quality checker
and Wikipedia as an external sources to elicit recommendations.            is the gold questions technique [28], a test measure to qualify users
INVENiT [21] is a semantic search system used for cultural heritage        for solving tasks. However, this method alone leads to the elimi-
collections, making use of links between objects and terms pro-            nation of honest workers who lack some knowledge. Aggregation
vided by structured vocabularies. Moreover, users can contribute by
5 http://euscreen.eu                                                       7 https://www.mturk.com/mturk/
6 http://mico-project.eu                                                   8 https://www.crowdflower.com/
HistoInformatics 2017, November 2017, Singapore                                                                                                                   S. Shabani et al.


methods also known as voting strategy aim to avoid biased work-                                   Communities and Technologies (C&T’11). Brisbane, Australia: ACM, 2011, pp.
ers. Majority voting [15] is a redundancy mechanism that is widely                                138–149.
                                                                                              [8] D. E. Difallah, G. Demartini, and P. Cudré-Mauroux, “Pick-a-crowd: Tell Me
applied to prevent spammers and lazy workers.                                                     What You Like, and I’ll Tell You What to Do,” in Proc. of the 22nd Int’l World Wide
                                                                                                  Web Conf. (WWW’13). Rio de Janeiro, Brazil: ACM, May 2013, pp. 367–374.
6    CONCLUSION                                                                               [9] H. Li, B. Yu, and D. Zhou, “Error Rate Bounds in Crowdsourcing Models,” ArXiv
                                                                                                  e-prints, Jul. 2013.
In this paper, we have presented City-Stories, a novel system that                           [10] P. G. Ipeirotis, F. Provost, and J. Wang, “Quality Management on Amazon Mechan-
combines content-based similarity search on both historical and                                   ical Turk,” in Proceedings of the ACM SIGKDD Workshop on Human Computation
                                                                                                  (HCOMP ’10). Washington DC, USA: ACM, 2010, pp. 64–67.
contemporary multimedia data with spatio-temporal queries and                                [11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep
that exploits semantic analysis of the content for entity recognition                             Convolutional Neural Networks,” in Advances in Neural Information Processing
                                                                                                  Systems 25, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds. Curran
and linking. We have deployed City-Stories in a tourist application                               Associates, 2012, pp. 1097–1105.
where cultural heritage content from archives and memory insti-                              [12] B. Zhou, A. Khosla, A. Lapedriza, A. Torralba, and A. Oliva, “Places: An image
tutions is complemented with content contributed by users via a                                   database for deep scene understanding,” arXiv preprint arXiv:1610.02055, 2016.
                                                                                             [13] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L.
crowdsourcing approach.                                                                           Zitnick, in Proceedings of the 13t h European Conference on Computer Vision (ECCV
   In our future work, we aim to increase the size of the collections                             2014), ser. Lecture Notes in Computer Science, vol. 8693. Zurich, Switzerland:
available in City-Stories for several selected tourist locations to                               Springer, Sep. 2014, pp. 740–755.
                                                                                             [14] J. R. Finkel, T. Grenager, and C. D. Manning, “Incorporating Non-local Informa-
show the generic applicability of the City-Stories approach. We                                   tion into Information Extraction Systems by Gibbs Sampling,” in Proceedings of
also intend to perform user studies at these locations to assess                                  the 43nd Annual Meeting of the Association for Computational Linguistics (ACL
the usability of the system and the effectiveness of the integrated                               2005), Michigan, USA, 2005, pp. 363–370.
                                                                                             [15] A. Sorokin and D. A. Forsyth, “Utility data annotation with Amazon Mechanical
content and entity retrieval approach. Moreover, we plan to further                               Turk,” in CVPR Workshops, Anchorage, AK, USA, June 2008, pp. 1–8.
exploit the synergies obtained from the combination of all retrieval                         [16] C.-F. Tsai, “A review of image retrieval methods for digital cultural heritage
modes supported in City-Stories, for instance by proactively making                               resources,” Online Information Review, vol. 31, no. 2, pp. 185–198, 2007.
                                                                                             [17] J. Oomen and V. Tzouvaras, “Publishing Europe’s Television Heritage on the
recommendations during the retrieval process and by providing                                     Web: The EUscreen Project,” in Proc. of the 1s t International Workshop on Multi-
additional information from external sources. Finally, we aim to                                  media for Cultural Heritage (MM4CH 2011), ser. Communications in Computer &
extend the user experience by providing within the user interface                                 Information Science, vol. 247. Modena, Italy: Springer, 2012, pp. 136–142.
                                                                                             [18] J. Oomen, E. Verbruggen, V. Tzouvaras, and K. Hyyppä, “Television heritage
an overlay function that allows to superimpose the camera view                                    linked and visualized: The EUscreen virtual exhibitions and the Linked Open
of a smartphone showing the current view of a place of interest                                   Data pilot,” in Proceedings of the 2013 Digital Heritage International Congress
                                                                                                  (DigitalHeritage 2013), vol. 2. Marseille, France: IEEE, 2013.
with historical content in order to better show the development                              [19] C. Perera and D. Jayakody, “Cross media entity and concept driven search,” in
of a particular place. When multiple visual objects of a place are                                Joint Proceedings of the Posters and Demos Track of the 12t h International Confer-
available from different periods of time (taken from the same or                                  ence on Semantic Systems (SEMANTiCS’16) and the 1s t International Workshop
                                                                                                  on Semantic Change & Evolving Semantics (SuCCESS’16), ser. CEUR Workshop
at least a similar perspective), this will lead to a “history browser”                            Proceedings, vol. 1695. Leipzig, Germany: CEUR-WS.org, Sep. 2016.
which can be used to steer the overlay with a slider on the timeline,                        [20] A. Otegi, E. Agirre, and P. Clough, “Personalised PageRank for Making Recom-
to select the object chosen for the overlay, and to gradually visualize                           mendations in Digital Cultural Heritage Collections,” in Proceedings of the 14t h
                                                                                                  ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’14). London, United
the development.                                                                                  Kingdom: IEEE Computer Society, Sep. 2014, pp. 49–52.
                                                                                             [21] C. Dijkshoorn, J. van Ossenbruggen, L. Aroyo, and G. Schreiber, “INVENiT: Ex-
ACKNOWLEDGMENT                                                                                    ploring Cultural Heritage Collections While Adding Annotations,” in Proceedings
                                                                                                  of the 3r d International Conference on Intelligent Exploration of Semantic Data
This work was partly funded by the Hasler Foundation in the context                               (IESD 2014), ser. CEUR Workshop Proceedings, vol. 1279. Riva del Garda, Italy:
of the project City-Stories. We would like to thank the cantonal                                  CEUR-WS.org, Oct. 2014, pp. 95–99.
archives and the “Mediathèque” of the canton of Valais and the                              [22] S. Goodall, P. H. Lewis, K. Martinez, P. A. S. Sinclair, F. Giorgini, M. Addis, M. J.
                                                                                                  Boniface, C. Lahanier, and J. Stevenson, “SCULPTEUR: multimedia retrieval for
team of Digital Valais project for delivering a data testbed.                                     museums,” in Proceedings of the 3r d International Conference on Image and Video
                                                                                                  Retrieval (CIVR 2004), ser. Lecture Notes in Computer Science, vol. 3115. Dublin,
REFERENCES                                                                                        Ireland: Springer, Jul. 2004, pp. 638–646.
                                                                                             [23] S. Vrochidis, C. Doulaverakis, A. Gounaris, E. Nidelkou, L. Makris, and I. Kompat-
 [1] J. Ren, Y. Zhang, K. Zhang, and X. Shen, “Exploiting mobile crowdsourcing                    siaris, “A hybrid ontology and visual-based retrieval model for cultural heritage
     for pervasive cloud services: challenges and solutions,” IEEE Communications                 multimedia collections,” International Journal of Metadata, Semantics and Ontolo-
     Magazine, vol. 53, no. 3, pp. 98–105, March 2015.                                            gies (IJMSO), vol. 3, no. 3, pp. 167–182, Feb. 2008.
 [2] L. Rossetto, I. Giangreco, C. Tănase, and H. Schuldt, “vitrivr: A Flexible Retrieval   [24] R. Prokofyev, G. Demartini, and P. Cudré-Mauroux, “Effective named entity
     Stack Supporting Multiple Query Modes for Searching in Multimedia Collections,”              recognition for idiosyncratic web collections,” in Proceedings of the 23r d Interna-
     in Proceedings of the 2016 ACM Conference on Multimedia Conference (ACM MM                   tional Conference on World Wide Web (WWW ’14). Seoul, Republic of Korea:
     2016), Amsterdam, The Netherlands, Oct. 2016, pp. 1183–1186.                                 ACM, Apr. 2014, pp. 397–408.
 [3] L. Rossetto, I. Giangreco, and H. Schuldt, “Cineast: A Multi-feature Sketch-Based       [25] R. Prokofyev, A. Tonon, M. Luggen, L. Vouilloz, D. E. Difallah, and P. Cudré-
     Video Retrieval Engine,” in Proc. of the International Symposium on Multimedia               Mauroux, “SANAPHOR: ontology-based coreference resolution,” in Proceedings
     (ISM 2014). Taichung, Taiwan: IEEE Computer Society, Dec. 2014, pp. 18–23.
 [4] I. Giangreco and H. Schuldt, “ADAMpr o : Database Support for Big Multimedia                 of the 14t h International Conference on The Semantic Web (ISWC 2015), vol. 9366.
                                                                                                  Bethlehem, PA, USA: Springer, Oct. 2015, pp. 458–473.
     Retrieval,” Datenbank-Spektrum, vol. 16, no. 1, pp. 17–26, 2016.
                                                                                             [26] J. Daiber, M. Jakob, C. Hokamp, and P. N. Mendes, “Improving Efficiency and
 [5] L. Beck and H. Schuldt, “City-Stories: A Spatio-Temporal Mobile Multimedia
     Search System,” in Proc. of the IEEE International Symposium on Multimedia, San              Accuracy in Multilingual Entity Extraction,” in Proceedings of the 9t h International
     Jose, CA, USA, Dec. 2016, pp. 193–196.                                                       Conference on Semantic Systems. Graz, Austria: ACM, Sep. 2013, pp. 121–124.
 [6] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hell-     [27] G. Li, J. Wang, Y. Zheng, and M. J. Franklin, “Crowdsourced Data Management:
     mann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer, “DBpedia – A large-scale,              A Survey,” IEEE TKDE, vol. 28, no. 9, pp. 2296–2319, Sept 2016.
     multilingual knowledge base extracted from Wikipedia,” Semantic Web Journal,            [28] D. Oleson, A. Sorokin, G. P. Laughlin, V. Hester, J. Le, and L. Biewald, “Program-
     vol. 6, no. 2, pp. 167–195, 2015.                                                            matic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing,” in
 [7] J. Oomen and L. Aroyo, “Crowdsourcing in the cultural heritage domain: oppor-                Proceedings of the AAAI Workshop on Human Computation (AAAIWS’11), San
                                                                                                  Francisco, CA, USA, Aug. 2011, pp. 43–48.
     tunities and challenges,” in Proceedings of the 5t h International Conference on