=Paper=
{{Paper
|id=Vol-1992/paper_5
|storemode=property
|title= City-Stories: A Multimedia Hybrid Content and Entity Retrieval System for Historical Data
|pdfUrl=https://ceur-ws.org/Vol-1992/paper_5.pdf
|volume=Vol-1992
|authors=Shaban Shabani,Maria Sokhn,Laura Rettig,Philippe Cudré-Mauroux,Lukas Beck,Claudiu Tanase,Heiko Schuldt
|dblpUrl=https://dblp.org/rec/conf/histoinfo/ShabaniSRCBTS17
}}
== City-Stories: A Multimedia Hybrid Content and Entity Retrieval System for Historical Data==
City-Stories: A Multimedia Hybrid Content and Entity Retrieval
System for Historical Data
Shaban Shabani, Maria Sokhn Laura Rettig, Lukas Beck, Claudiu Tănase,
Philippe Cudré-Mauroux Heiko Schuldt
Institute of Information Systems eXascale Infolab Databases and Information Systems
HES-SO Valais-Wallis, Switzerland University of Fribourg, Switzerland University of Basel, Switzerland
{firstname.lastname}@hevs.ch {firstname.lastname}@unifr.ch {firstname.lastname}@unibas.ch
ABSTRACT KEYWORDS
Information systems used in tourism rely mostly on up-to-date Content-based Retrieval, Spatio-temporal Querying, Multimedia
content on attractive places. In addition, these systems increas- Databases, Multimodal Interaction, Historical Multimedia, Crowd-
ingly make use of archived photographs, documents, films, or even sourcing, Entity Linking
ancient paintings and other artwork by integrating such curated
ACM Reference format:
content from museums and memory institutions, possibly enriched Shaban Shabani, Maria Sokhn, Laura Rettig, Philippe Cudré-Mauroux,
with user-provided content. Hence the distinction between cul- and Lukas Beck, Claudiu Tănase, Heiko Schuldt. 2017. City-Stories: A
tural heritage applications and tourism more and more blurs. Users Multimedia Hybrid Content and Entity Retrieval System for Historical Data.
are not only interested in the current appearance of landscapes, In Proceedings of the 4t h International Workshop on Computational History,
monuments, or buildings, but also in the evolution of these places Singapore, November 2017 (HistoInformatics 2017), 8 pages.
over time. This requires large multimedia collections which in-
tegrate content from several cultural heritage institutions. As a
consequence, interactive retrieval systems for historical multimedia
1 INTRODUCTION
are needed that support homogeneous content-based and semantic Multimedia data on places of interest like documents, photos, videos,
querying despite the heterogeneity of these collections. In this or user ratings are the most important sources used in information
paper we present City-Stories, a multimedia hybrid content and systems for tourists. While systems have so far focused on up-to-
entity retrieval system. City-Stories is based on a state-of-the-art date content, historical material taken from archives is increasingly
open source multimedia retrieval system. Multimedia features in gaining importance in order to give tourists more information on
City-Stories represent multiple semantic levels: low-level (e.g., color, how particular touristic sites have developed over time, i.e., how
edge, motion), mid-level (e.g., date, location, objects), and high-level they looked 20, 50, 100, or even more years ago. With the content
features (e.g., semantic entities, scene category). For the latter, City- from museums and memory institutions, the distinction between
Stories applies entity recognition and entity linking for identifying cultural heritage applications and information systems for tourism
semantic concepts and linking objects across media types. Conse- increasingly blurs, despite the fact that content may differ signifi-
quently, City-Stories supports various types of cross-modal queries. cantly (in terms of media types and formats, age, availability and
Moreover, City-Stories uses a map-based visualization layer that degree of detail of metadata/annotations, etc.). Moreover, content
facilitates spatial queries and browsing. Finally, City-Stories follows provided by users out of their private archives is also gaining impor-
a crowdsourcing approach for content annotation and for enriching tance due to the proliferation of social networks and crowdsourcing
curated content with multimedia objects and documents provided platforms.
by users. The paper shows how the City-Stories system seamlessly In order to provide integrated access to such heterogeneous con-
combines content-based search with entity-based navigation and tent, several important technical challenges need to be addressed:
leverages the wisdom of the crowd for manual annotations. Multimedia Retrieval. The integrated content should be acces-
sible by a very broad range of different query types, such as key-
CCS
• CONCEPTS word queries to search in (manual) textual annotations, query-by-
Information Systems → Multimedia and Multimodal Re- example (multimedia search with sample objects), query-by-sketch
trieval; Social tagging systems; Multimedia Databases; Spatial-tem- (multimedia similarity search on the basis of hand-drawn sketches),
poral Systems; •Computing Methodologies → Semantic Net- semantic queries that exploit semantic concepts and links between
works; •Human-centered Computing → Collaborative and Social objects, spatio-temporal queries (i.e., queries on the location and/or
Computing Systems and Tools; time where/when a particular object has been created), and any
combination of these modes.
Entity Recognition and Linking. Content coming from different
sources, in different formats, and possibly also with different meta-
data structures has to be integrated to make sure that it can be
accessed via a homogeneous interface. This includes standard ap-
proaches to schema and data integration, but also more advanced
HistoInformatics 2017, November 2017, Singapore S. Shabani et al.
and innovative challenges like entity recognition and entity linking
to make sure that links between objects (of the same or even of
different media types) can be identified, stored as part of the meta-
data, enhanced with further external sources, and subsequently
exploited for query purposes.
Crowdsourcing. In addition to cultural heritage content curated
by archives, user-generated content from private collections is
gaining importance in touristic information systems. In order to
attract the attention to potential content providers, the awareness
of such touristic platforms has to be raised, the technical barrier for
contribution has to be lowered, and users have to be encouraged to
actively participate. This is not only true for the provision of new
content but also for annotations to existing content (e.g., ratings or
experience reports). The rapid adoption of smartphones has made it
possible to also exploit mobile crowdsourcing [1] as an efficient and
easy way of reaching and using human intelligence and machine
computation for solving Human Intelligence Tasks (HITs).
In this paper, we introduce City-Stories, a novel and innovative
system for collecting, managing, and accessing heterogeneous cul-
tural heritage content for touristic applications. The City-Stories Figure 1: City-Stories conceptual architecture.
browser allows to retrieve spatio-temporal knowledge and supports
real-time interactive search in large databases of historical multi-
media collections. From a systems perspective, City-Stories is based linking, thereby allowing for cross-media retrieval based on seman-
on vitrivr1 [2] which in turn uses the retrieval engine Cineast [3] tic concepts. Second, we show how curated collections and auto-
and the distributed database backend ADAMpr o [4]. Information matically generated metadata can be extended by user-generated
is extracted both from content and metadata and can be simulta- content and user-provided metadata, which is particularly relevant
neously queried in both modes. City-Stories’ content-based search in applications for tourists and which complements manually cu-
extends the functionality of the vitrivr engine, which allows interac- rated cultural heritage collections. Third, we show how all these
tive, efficient multi-feature retrieval in large multimedia collections. elements can be seamlessly combined in the City-Stories system.
Metadata of the multimedia objects (automatically generated or The remainder of the paper is structured as follows: Section 2 mo-
manually added) is used to enrich the description of content in tivates the City-Stories approach with a tourism use case. Section 3
several ways: discusses the components needed for the integrated multimedia
• Low-level features like color, edge, motion, mid-level fea- content and entity retrieval system and Section 4 presents details
tures like date, location, and high-level features like seman- of the City-Stories system. Section 5 summarizes related work and
tic entities or scene categories. Section 6 concludes.
• Spatio-temporal metadata in the collection is directly im-
ported as vector location and timestamp features; the City-
2 MOTIVATION
Stories frontend allows for spatial, temporal, and spatio- Consider, as an example for City-Stories, the following use case:
temporal queries and the results are displayed in a map Sophia, a tourist from Dublin, is visiting the city center of Berlin.
and on a timeline, respectively. When standing in front of the Brandenburg Gate, one of Berlin’s
• Manual annotations and user ratings are provided via crowd- neoclassical city gates, she has a variety of questions regarding
sourcing. the building and its neighborhood, like ‘What building is this, what
• Textual metadata (usually in the form of title and descrip- was its purpose, when and by whom has it been built?’ or ‘How did
tion of an item) is subjected to entity extraction, which the neighborhood of the gate look like around 1900, in the so-called
yields Uniform Resource Identifiers (URIs) in a knowledge ‘golden’ 1920’s, shortly after the end of WW2, in the 1970s, before and
base. These entity URIs are then used to pre-compute se- after the fall of the iron curtain in 1989 — or at any other point in
mantic entity distances between collection items. time in the past?’.
Sophia holds a smartphone on which she accesses the City-Stories
Hence, City-Stories combines content-based search with entity- query interface. The City-Stories system integrates several cul-
based navigation and leverages the wisdom of the crowd for manual tural heritage multimedia collections, e.g., from the German federal
annotations. archive or from focused museum collections. Moreover, City-Stories
The contribution of the paper is threefold. First, we show how encompasses a large number of photos and associated metadata
multi-feature content-based similarity search providing a plethora provided by local citizens. Using City-Stories, Sophia is able to
of query types can be enriched by entity recognition and entity directly browse the content and submit queries of different types:
• Simple location queries: Using the GPS coordinates of her
1 https://vitrivr.org current location, Sophia is able to identify the building
City-Stories HistoInformatics 2017, November 2017, Singapore
Figure 2: Content-based similarity search in Cultural Heritage content.
she is currently looking at and get access to basic informa- her smartphone as query input and adds a superimposed
tion regarding this building, combined from several data sketch (e.g., she draws a typical Berlin double-deck coach
sources on the web. in the foreground).
• Temporal queries: On the basis of information from var- Most importantly, Sophia does not want to use an earmarked
ious sources that have been integrated beforehand into smartphone app provided by a local tourist organization with man-
the City-Stories system, Sophia is able to query details of ually curated content specialized for a particular touristic site, as
the building’s history (photos or historical paintings from she would have to newly install such an app every time she visits
different stages of the building). Moreover, she will also get another place. Rather, Sophia is interested in City-Stories, a generic
information on the building’s neighborhood at different approach that can be used to integrate and access content indepen-
points in time, on historical events that took place there, dent of a concrete location, so she could use this system for her
and statistical information (e.g., population of the city at next trips to Singapore, to visually explore the recent growth of the
different points in time). The latter is based on linked meta- Marina area, or to New York, for instance to allow her to visualize
data, i.e., metadata enriched with links between objects the development of Lower Manhattan over the past 120 years.
after entity recognition has been applied to the content.
• Combined spatial and multimedia similarity queries: These
queries allow to search for similar buildings (or buildings
3 CONCEPTS
that take a similar role) in the vicinity of Sophia’s current Figure 1 illustrates the different contributions of the City-Stories
location. system and their interplay.
• Multimedia similarity queries: Sophia takes a photo of the
Brandenburg Gate with the camera of her smartphone. She 3.1 Spatio-Temporal Retrieval Engine
will use this photo for a similarity query in order to find The multimedia retrieval engine of City-Stories, which is based on
other buildings (in Berlin or in any other European city) the vitrivr system, offers multiple different query modes: query-
that look similar. Here, similarity can be defined either by-example (QbE), query-by-sketch (QbS), temporal, and spatial
by intrinsic image features, on the base of the object’s queries. In a given query we can freely combine these different
metadata or links, or any combination of these. modes, e.g., by providing an exemplary image, drawing a sketch
• Combined sketch-image similarity queries: Sophia provides on top of an existing or a provided image, and/or specifying a
one of the photos of the Brandenburg Gate taken with location. To provide this functionality, we extend the software stack
HistoInformatics 2017, November 2017, Singapore S. Shabani et al.
Figure 3: Screenshot of the spatio-temporal content-based retrieval front-end of City-Stories [5].
of vitrivr including Cineast, the extraction and retrieval engine, and 3.2 Entity Recognition and Linking
ADAMpr o , the database engine. In order to integrate the archive data available to us with other data
In Cineast, we differentiate between an on-line and an off-line sources such as knowledge bases, we apply named-entity recogni-
phase. The off-line phase includes the feature extraction and the tion, candidate selection, and entity linking techniques on available
storage of the resulting metadata. In the on-line phase, where the text data. Named-entity recognition is the task of identifying men-
actual retrieval happens, the engine executes a given query and tions of entities, which can take various surface forms, in text. Once
returns a list of documents ordered by similarity. an entity mention has been extracted, the corresponding URI in the
Cineast uses multiple features in combination to describe a doc- knowledge base has to be found. Selecting the URI corresponding
ument. In total, there are already 40 content descriptors for videos to a mention of an entity in text is called entity linking.
and images provided by Cineast. These descriptors extract mainly By linking to a knowledge base, specifically to DBpedia2 [6], we
color, edge, and motion information (where applicable). Building enhance our data with relevant information on the entity in the
on top of that, we extend Cineast with the following higher level knowledge base and can link data from further sources to the same
descriptors: entities for integration.
• Spatial and temporal similarity features by using geolocali- In order to identify entities associated with media items, we
zation and timestamp metadata provided by the content use textual media metadata and extract entities from associated
(usually via the capturing devices). titles and descriptions. Consequently, when displaying media items,
• A descriptor using semantic entities that are provided by the additional information from the knowledge base can be retrieved
entity recognition described in what follows in Section 3.2. and displayed. Knowledge bases tend to offer various attributes for
• Semantic concept features provided by a deep neural net- entities, many of which will not be relevant to the viewer. Thus,
work. 2 http://dbpedia.org
City-Stories HistoInformatics 2017, November 2017, Singapore
TGV candidates
Martigny
candidates Arrivée du premier TGV {http://fr.dbpedia.org/
Arrivée du premier TGV des resource/TGV} des neiges en gare de
http://fr.dbpedia.org/
neiges en gare de Martigny resource/Martigny, 0.9 Martigny {http://fr.dbpedia.org/resource/Martigny}
candidate
________________ selection
entity recognition surface form - entity linking
________________
entity pairs
Figure 4: Entity recognition and linking process.
when choosing which information to display, we also rank the fraud but may affect the overall system due to misunderstandings
attributes by their importance to the viewer (for example, when or lack of knowledge and experience with the platform [9, 10]. As
viewing information on a city, the population and the founding the latter scenario is more likely to happen in our case, we apply two
year will likely be more relevant to a tourist than the ZIP codes in quality control mechanisms to evaluate the quality of volunteers’
this city). work in City-Stories: play cards and a weighted majority voting with
Furthermore, we are able to leverage the relationships between reputation system.
media items by extracting the relationships between entities in a
graph-structured knowledge base. This also allows to transitively
4 CITY-STORIES SYSTEM
extract related information, e.g., for entities with little available
data. Oftentimes, in such graph structures, we can rely on higher- In what follows, we describe details on the implementation of the
level categories to provide general information on an entity for different components of City-Stories and the content that has been
which specific information may not be available. integrated in a first prototype.
4.1 Spatio-temporal Multimedia Browser
3.3 Crowdsourcing
When querying the retrieval engine via the City-Stories UI, for
The crowdsourcing component provides the possibility of enriching each given feature, Cineast performs an extraction on the given
content with new data provided by different types of users and query document resulting in a feature vector. Each feature vector is
enhancing the integrated digital collection data. passed to the ADAMpr o database backend to perform a k-nearest
Collected data is not always complete, i.e., it could be noisy neighbor (k-NN) similarity search to find a list of similar documents.
and/or miss particular information. In a tourist application, for ADAMpr o has been shown to scale to collection sizes of up to 50
instance, information provided by users could miss the location million entries and feature vectors with up to 500 dimensions [4].
where a point of cultural interest can be found, to whom it belongs, After receiving a list of documents ranked by similarity for each
or even the title or a description. Likewise, data might be incorrect feature, the Cineast retrieval engine merges these results into one
or conflicting as a result of the integration of collected datasets result list (depicted in Figure 2).
coming from different sources. Hence, these challenges are grouped In particular, for the spatial and temporal similarity search we
into two categories: conflicting information and missing information, use a nearest neighbor query on the two-dimensional geolocation
both of which are addressed in City-Stories using crowdsourcing. data and the one-dimensional timestamp data, respectively (see
Crowdsourcing is able to build an open, connected, and smart Figure 3).
cultural heritage with involved consumers and providers [7]. The To search for similar documents based on semantic properties,
crowdsourcing service built into City-Stories enables volunteers to we provide two different kinds of features:
engage in order to share new data, as well as complete the existing
data and improve data quality. • Based on entity recognition and linking, each document
In order to optimize the assignment of tasks to the appropriate is characterized by a list of semantic entities. Using this
crowds, we make use of push crowdsourcing [8]. In contrast to list of semantic entities we calculate a pairwise distance to
standard pull crowdsourcing, where workers pull the tasks at ran- estimate the similarity between two documents.
dom, push crowdsourcing is oriented towards modeling tasks based • The similarity based on semantic concepts utilizes an Alex-
on users’ profiles and pushed to them. At first, users specify their Net convolutional network [11]. Using the output of the
interests and topics they have knowledge on. Then, by matching last fully connected layer, fc7, we obtain a 4096-dimensional
users’ profiles with the available HITs, City-Stories recommends feature vector for a given image that can be used in a k-NN
tasks to the best matched users based on their interests and skills. search. We provide two different features by training the
In paid micro crowdsourcing scenarios, money as incentive is network on different datasets. The training data from the
the main motivator for workers to contribute and can also be used Places2 dataset [12] equates to a feature focusing on scene
for quality control (e.g., constrain a payment on the quality of the and environment similarity. Compared to Places2, the data
work that has been done). However, it is important to distinguish from the MS COCO Detection challenge [13] provides a
malicious users from workers that do not have the intention of feature focusing on object similarity inside a scene.
HistoInformatics 2017, November 2017, Singapore S. Shabani et al.
Figure 5: Crowdsourcing quality control and task management process.
4.2 Entity Recognition and Linking 4.3 Crowdsourcing
The data integration component of the City-Stories system focuses In parallel to collecting data from institutions such as audio/visual
on leveraging textual data to integrate different sources and schemata. archives, Mediatheque4 , and DigitalValais, we emphasize the im-
In the implementation, we use metadata provided with the media portance of data sharing from people that have valuable data and
items in the DigitalValais3 dataset, specifically, title and description information about cultural heritage in private collections. This
corresponding to each item, to extract and link related entities. part of the system enables users to participate and contribute to
The implementation consists of three steps (see Figure 4): cultural heritage. Once shared, users’ data is integrated to the data
(1) Named-entity recognition: Using the Stanford Named Entity repository. In order to maintain a high level of quality of the crowd-
Recognizer (NER) [14], we extract likely mentions of enti- sourced data, we apply the following control methods (shown in
ties from the full text, which may be in different languages Figure 5):
(German, French or any other local language). Play cards is a test measure used to qualify or disqualify users
(2) Candidate selection: This step consists in choosing a set of for solving tasks of a certain category. We use 195 playing cards
candidate entities that may correspond to the extracted grouped in 13 different categories, where categories represent sub-
surface form. We choose a set of candidate entities for each ject areas of the crowdsourcing tasks. Each card contains both
extracted mention in the text. a question and its answer (not visible to the user). At first users
(3) Entity linking: We rank the candidates and choose the best provide information on their topics of interest which intersect with
matching entity for each mention, then add the DBpedia card categories. Then they are forwarded to the test phase. To get
URI to the media item in the City-Stories database. qualified, they have to correctly answer at least 70% of the questions
matched to their interests. Upon two consecutive failures, a user is
Candidates are selected from a database of pairs of surface forms
no longer considered for tasks on these specific topics. Providing
and entities (given as DBpedia URIs). This database is created by
the option to choose topics they like or have knowledge on avoids
processing all Wikipedia articles and extracting hyperlinks where
false positives, i.e., eliminating users due to lack of knowledge on
the hyperlink text corresponds to the observed surface form (i.e., the
randomly assigned questions coming from a pool of predefined
link text) and the link location corresponds to the entity this surface
questions. Moreover, implementing this measure in a game fashion
form links to (i.e., the linked article). The frequency of observing a
increases the interactivity and the interest of the users. Addition-
specific (surface form, entity) pair yields a prior probability that a
ally, users can test their knowledge on topics covered by the cards
particular surface form corresponds to the linked entity. We rank
and at the same time expand their knowledge with additionally
the candidates using this prior score and select the top candidate
provided information by these cards.
to link this mention to an entity in the knowledge base.
A weighted majority voting with reputation system combines ma-
Having done the linking, we are able to create a graph of the
jority voting (MV) with users’ reputation scores. MV [15] as a
media items where an edge is present between two items if they
quality control mechanism assigns the same task to multiple users
contain the same entity and are thus related. Knowing the DBpedia
and aggregates the results to choose the right answer. On the other
URI of an entity found in the metadata of a media item, we extract
hand, the reputation strategy allows to track users’ performance
relevant attributes for this entity by ranking the attributes with their
frequency of appearing in close proximity to this type of entity.
3 http://www.valais-wallis-digital.ch/ 4 http://www.mediatheque.ch/
City-Stories HistoInformatics 2017, November 2017, Singapore
(a) Data sharing (b) Play cards (c) Annotation task sample
Figure 6: Screenshot of the crowdsourcing front-end of City-Stories
during crowdsourcing tasks and is complementary to majority vot- annotating collection objects. Their system lacks a trust assessment
ing. The users’ results from the play cards qualification tests are which denotes a key issue for data quality. SCULPTEUR [22] is a
assigned as initial reputation scores. These scores are later used as multimedia retrieval system for searching digital collections in mu-
weights during the aggregation of the answers, i.e. an answer from seums. It features content-based image retrieval as well as semantic
a user with high reputation has higher weight. After each aggrega- retrieval using metadata and a semantic layer. An extension of this
tion phase, the users’ reputation scores are updated by considering work [23] provides a hybrid model for cultural heritage collections,
their outcomes on that task. combining the two retrieval methods for image search.
A screenshot of the crowdsourcing frontend of City-Stories show- Named-entity recognition is based largely on natural language
ing how data is shared, the play cards for quality control, and sample processing techniques and is required as a step prior to performing
annotations is depicted in Figure 6. candidate selection and named-entity disambiguation. The Stan-
ford NER [14] is trained by combining a constraint model with a
5 RELATED WORK sequence model for the purpose of extracting information from
In general, existing retrieval systems applied to the cultural her- text, including named entities. Prokofyev et al. [24] employ n-gram
itage domain are either metadata-based or content-based, and few based features for NER and demonstrate their accuracy in idiosyn-
of them implement both [16]. Metadata-based systems focus on cratic domains, which could also be applied to the domain of his-
keyword-based queries and linking of digital objects with external torical archive data. SANAPHOR [25] introduces the use of a type
data sources, whereas content-based systems focus on query-by- system on top of recognized entities. The relatedness of types is
example, query-by-specification and browsing. then used to identify co-references referring to the same entity,
EUscreen5 is a project related to multimedia archives that fo- and to link these identified mentions of an entity to a DBpedia
cuses on the collection, integration, and publication of audio-visual URI. DBpedia Spotlight [26] provides the entire pipeline from NER
content. Oomen et al. [17, 18] created a European television archive over candidate selection, i.e., retrieving candidate entities that may
using data from many different TV broadcasters. It provides an correspond to the extracted surface form, to entity linking. Their
interface with keyword search. Media in Context6 is a platform for work has focused on the implementation of a usable system for
cross-media extraction (via pipelined extractors), analysis, metadata multi-lingual entity extraction and linking.
publishing, and querying. Within this project lies Sensefy [19], a Crowdsourcing has shown to be an effective solution for prob-
multimedia search and information retrieval system, that provides lems that are difficult to solve for computers and problems that
metadata keyword search and object linking with real world entities require human intelligence [27]. Its popularity has grown due to
and concepts. Otegi et al. [20] introduced Personalized PageRank, online platforms such as Amazon Mechanical Turk7 and Crowd-
a tool for generating personalized recommendations in a cultural Flower8 , which allow crowds to participate in solving paid micro-
heritage collection. They use metadata, session logs from users, tasks. Concerning quality control, a broadly used quality checker
and Wikipedia as an external sources to elicit recommendations. is the gold questions technique [28], a test measure to qualify users
INVENiT [21] is a semantic search system used for cultural heritage for solving tasks. However, this method alone leads to the elimi-
collections, making use of links between objects and terms pro- nation of honest workers who lack some knowledge. Aggregation
vided by structured vocabularies. Moreover, users can contribute by
5 http://euscreen.eu 7 https://www.mturk.com/mturk/
6 http://mico-project.eu 8 https://www.crowdflower.com/
HistoInformatics 2017, November 2017, Singapore S. Shabani et al.
methods also known as voting strategy aim to avoid biased work- Communities and Technologies (C&T’11). Brisbane, Australia: ACM, 2011, pp.
ers. Majority voting [15] is a redundancy mechanism that is widely 138–149.
[8] D. E. Difallah, G. Demartini, and P. Cudré-Mauroux, “Pick-a-crowd: Tell Me
applied to prevent spammers and lazy workers. What You Like, and I’ll Tell You What to Do,” in Proc. of the 22nd Int’l World Wide
Web Conf. (WWW’13). Rio de Janeiro, Brazil: ACM, May 2013, pp. 367–374.
6 CONCLUSION [9] H. Li, B. Yu, and D. Zhou, “Error Rate Bounds in Crowdsourcing Models,” ArXiv
e-prints, Jul. 2013.
In this paper, we have presented City-Stories, a novel system that [10] P. G. Ipeirotis, F. Provost, and J. Wang, “Quality Management on Amazon Mechan-
combines content-based similarity search on both historical and ical Turk,” in Proceedings of the ACM SIGKDD Workshop on Human Computation
(HCOMP ’10). Washington DC, USA: ACM, 2010, pp. 64–67.
contemporary multimedia data with spatio-temporal queries and [11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep
that exploits semantic analysis of the content for entity recognition Convolutional Neural Networks,” in Advances in Neural Information Processing
Systems 25, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds. Curran
and linking. We have deployed City-Stories in a tourist application Associates, 2012, pp. 1097–1105.
where cultural heritage content from archives and memory insti- [12] B. Zhou, A. Khosla, A. Lapedriza, A. Torralba, and A. Oliva, “Places: An image
tutions is complemented with content contributed by users via a database for deep scene understanding,” arXiv preprint arXiv:1610.02055, 2016.
[13] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L.
crowdsourcing approach. Zitnick, in Proceedings of the 13t h European Conference on Computer Vision (ECCV
In our future work, we aim to increase the size of the collections 2014), ser. Lecture Notes in Computer Science, vol. 8693. Zurich, Switzerland:
available in City-Stories for several selected tourist locations to Springer, Sep. 2014, pp. 740–755.
[14] J. R. Finkel, T. Grenager, and C. D. Manning, “Incorporating Non-local Informa-
show the generic applicability of the City-Stories approach. We tion into Information Extraction Systems by Gibbs Sampling,” in Proceedings of
also intend to perform user studies at these locations to assess the 43nd Annual Meeting of the Association for Computational Linguistics (ACL
the usability of the system and the effectiveness of the integrated 2005), Michigan, USA, 2005, pp. 363–370.
[15] A. Sorokin and D. A. Forsyth, “Utility data annotation with Amazon Mechanical
content and entity retrieval approach. Moreover, we plan to further Turk,” in CVPR Workshops, Anchorage, AK, USA, June 2008, pp. 1–8.
exploit the synergies obtained from the combination of all retrieval [16] C.-F. Tsai, “A review of image retrieval methods for digital cultural heritage
modes supported in City-Stories, for instance by proactively making resources,” Online Information Review, vol. 31, no. 2, pp. 185–198, 2007.
[17] J. Oomen and V. Tzouvaras, “Publishing Europe’s Television Heritage on the
recommendations during the retrieval process and by providing Web: The EUscreen Project,” in Proc. of the 1s t International Workshop on Multi-
additional information from external sources. Finally, we aim to media for Cultural Heritage (MM4CH 2011), ser. Communications in Computer &
extend the user experience by providing within the user interface Information Science, vol. 247. Modena, Italy: Springer, 2012, pp. 136–142.
[18] J. Oomen, E. Verbruggen, V. Tzouvaras, and K. Hyyppä, “Television heritage
an overlay function that allows to superimpose the camera view linked and visualized: The EUscreen virtual exhibitions and the Linked Open
of a smartphone showing the current view of a place of interest Data pilot,” in Proceedings of the 2013 Digital Heritage International Congress
(DigitalHeritage 2013), vol. 2. Marseille, France: IEEE, 2013.
with historical content in order to better show the development [19] C. Perera and D. Jayakody, “Cross media entity and concept driven search,” in
of a particular place. When multiple visual objects of a place are Joint Proceedings of the Posters and Demos Track of the 12t h International Confer-
available from different periods of time (taken from the same or ence on Semantic Systems (SEMANTiCS’16) and the 1s t International Workshop
on Semantic Change & Evolving Semantics (SuCCESS’16), ser. CEUR Workshop
at least a similar perspective), this will lead to a “history browser” Proceedings, vol. 1695. Leipzig, Germany: CEUR-WS.org, Sep. 2016.
which can be used to steer the overlay with a slider on the timeline, [20] A. Otegi, E. Agirre, and P. Clough, “Personalised PageRank for Making Recom-
to select the object chosen for the overlay, and to gradually visualize mendations in Digital Cultural Heritage Collections,” in Proceedings of the 14t h
ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’14). London, United
the development. Kingdom: IEEE Computer Society, Sep. 2014, pp. 49–52.
[21] C. Dijkshoorn, J. van Ossenbruggen, L. Aroyo, and G. Schreiber, “INVENiT: Ex-
ACKNOWLEDGMENT ploring Cultural Heritage Collections While Adding Annotations,” in Proceedings
of the 3r d International Conference on Intelligent Exploration of Semantic Data
This work was partly funded by the Hasler Foundation in the context (IESD 2014), ser. CEUR Workshop Proceedings, vol. 1279. Riva del Garda, Italy:
of the project City-Stories. We would like to thank the cantonal CEUR-WS.org, Oct. 2014, pp. 95–99.
archives and the “Mediathèque” of the canton of Valais and the [22] S. Goodall, P. H. Lewis, K. Martinez, P. A. S. Sinclair, F. Giorgini, M. Addis, M. J.
Boniface, C. Lahanier, and J. Stevenson, “SCULPTEUR: multimedia retrieval for
team of Digital Valais project for delivering a data testbed. museums,” in Proceedings of the 3r d International Conference on Image and Video
Retrieval (CIVR 2004), ser. Lecture Notes in Computer Science, vol. 3115. Dublin,
REFERENCES Ireland: Springer, Jul. 2004, pp. 638–646.
[23] S. Vrochidis, C. Doulaverakis, A. Gounaris, E. Nidelkou, L. Makris, and I. Kompat-
[1] J. Ren, Y. Zhang, K. Zhang, and X. Shen, “Exploiting mobile crowdsourcing siaris, “A hybrid ontology and visual-based retrieval model for cultural heritage
for pervasive cloud services: challenges and solutions,” IEEE Communications multimedia collections,” International Journal of Metadata, Semantics and Ontolo-
Magazine, vol. 53, no. 3, pp. 98–105, March 2015. gies (IJMSO), vol. 3, no. 3, pp. 167–182, Feb. 2008.
[2] L. Rossetto, I. Giangreco, C. Tănase, and H. Schuldt, “vitrivr: A Flexible Retrieval [24] R. Prokofyev, G. Demartini, and P. Cudré-Mauroux, “Effective named entity
Stack Supporting Multiple Query Modes for Searching in Multimedia Collections,” recognition for idiosyncratic web collections,” in Proceedings of the 23r d Interna-
in Proceedings of the 2016 ACM Conference on Multimedia Conference (ACM MM tional Conference on World Wide Web (WWW ’14). Seoul, Republic of Korea:
2016), Amsterdam, The Netherlands, Oct. 2016, pp. 1183–1186. ACM, Apr. 2014, pp. 397–408.
[3] L. Rossetto, I. Giangreco, and H. Schuldt, “Cineast: A Multi-feature Sketch-Based [25] R. Prokofyev, A. Tonon, M. Luggen, L. Vouilloz, D. E. Difallah, and P. Cudré-
Video Retrieval Engine,” in Proc. of the International Symposium on Multimedia Mauroux, “SANAPHOR: ontology-based coreference resolution,” in Proceedings
(ISM 2014). Taichung, Taiwan: IEEE Computer Society, Dec. 2014, pp. 18–23.
[4] I. Giangreco and H. Schuldt, “ADAMpr o : Database Support for Big Multimedia of the 14t h International Conference on The Semantic Web (ISWC 2015), vol. 9366.
Bethlehem, PA, USA: Springer, Oct. 2015, pp. 458–473.
Retrieval,” Datenbank-Spektrum, vol. 16, no. 1, pp. 17–26, 2016.
[26] J. Daiber, M. Jakob, C. Hokamp, and P. N. Mendes, “Improving Efficiency and
[5] L. Beck and H. Schuldt, “City-Stories: A Spatio-Temporal Mobile Multimedia
Search System,” in Proc. of the IEEE International Symposium on Multimedia, San Accuracy in Multilingual Entity Extraction,” in Proceedings of the 9t h International
Jose, CA, USA, Dec. 2016, pp. 193–196. Conference on Semantic Systems. Graz, Austria: ACM, Sep. 2013, pp. 121–124.
[6] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hell- [27] G. Li, J. Wang, Y. Zheng, and M. J. Franklin, “Crowdsourced Data Management:
mann, M. Morsey, P. van Kleef, S. Auer, and C. Bizer, “DBpedia – A large-scale, A Survey,” IEEE TKDE, vol. 28, no. 9, pp. 2296–2319, Sept 2016.
multilingual knowledge base extracted from Wikipedia,” Semantic Web Journal, [28] D. Oleson, A. Sorokin, G. P. Laughlin, V. Hester, J. Le, and L. Biewald, “Program-
vol. 6, no. 2, pp. 167–195, 2015. matic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing,” in
[7] J. Oomen and L. Aroyo, “Crowdsourcing in the cultural heritage domain: oppor- Proceedings of the AAAI Workshop on Human Computation (AAAIWS’11), San
Francisco, CA, USA, Aug. 2011, pp. 43–48.
tunities and challenges,” in Proceedings of the 5t h International Conference on