The Digital Archiving of Historical Political Cartoons: An Introduction Junte Zhang Kees Ribbens Rob Zeeman Meertens Institute & NIOD Erasmus University Rotterdam Meertens Institute Institute for War, Holocaust & NIOD Institute for War, and Genocide Studies Holocaust and Genocide Studies Royal Netherlands Academy of Arts and Sciences ABSTRACT view and support the study of visual culture using a compu- Political (editorial) cartoons often capture the Zeitgeist of tational approach. This is made possible because newspaper society and convey a message. Increasingly, historians study pages have been digitized as images, which contain cartoons. them to understand commentaries of past events or per- These cartoons are not yet machine-readable, therefore pro- sonalities. Visual culture as an academic subject could be viding intellectual access is the best option. It has been greatly enhanced if this information can be digitally archived. proposed in [5] to detect the text lines in cartoons using We employ crowdsourcing to obtain valuable metadata by OCR, but this is difficult because it involves handwritten guiding volunteers’ feedback using an online survey with 31 texts. In [3] it is pointed out that “more descriptive areas targeted questions. We provide intellectual access to a set of by which images might be accessed are largely neglected,” about 300 cartoons of a single creator spanning over multiple and argued that subject indexing as a field of academic work years in a highly interactive search engine. is aboutness – and VRA Core 4.0 is referred to as a metadata schema to record bibliographic information. Our aim is to transcribe a cartoon, and move beyond stan- Categories and Subject Descriptors dard bibliographic information by comprehensively captur- H.3.3 [Information Search and Retrieval]: Search pro- ing its meaning(s) for historical research by eliciting user cess; H.3.7 [Digital Libraries]: Systems issues, user is- feedback using crowdsourcing. So we address the following sues; H.5.2 [Information interfaces and presentation]: question: How can we provide intellectual access to, and Graphical user interfaces (GUI) allow for, advanced use of these cartoons? 2. CROWDSOURCING OF CARTOONS General Terms The objects of our study are so far 286 cartoons pub- Design, Human Factors lished by Maarten Meuldijk in the weekly Volk en Vaderland (VoVa) of the National Socialist Movement in the Nether- Keywords lands from 1937 to 1942. Pages on which they occur have been digitized by the National Library. To obtain descrip- metadata, crowdsourcing, e-Humanities, cartoons tions about the cartoons, we experiment with crowdsourcing to see whether crowdsourcing is applicable in our context. 1. INTRODUCTION The search tasks that we have in mind are more complex, therefore we created a comprehensive survey that captures Newspapers often have political (editorial) cartoons that the questions historians typically would ask about a cartoon. contain a commentary about events or personalities [4] which This also requires more contextual knowledge. Fig.1(a) shows is being disseminated. For historians, these capture the Zeit- the VoVa Annotation Editor developed in Adobe Flex, where geist of the period of time of their study, and become an in- we guide users through a set of 31 targeted questions in valuable source of information. These print newspapers are 8 stages, and aid them by offering answers of these ques- stored in libraries and get digitally archived – for example tions with pre-defined multiple choice answers in combina- by the National Library of the Netherlands – for long-term tion with open answers. Users can zoom in/out on a cartoon, preservation to continued access. Digital archiving is the but also read contextual information related to the cartoon management of the life cycle of digital assets (records) [2], in the articles on the page – a strategy used by a number of from preservation to continued use. users. There are no time limits and a cartoon is randomly In the Radical Political Representation project, we aim assigned and stays assigned to a user until completion. To to digitally archive historical political cartoons created by a control for the completion of a cartoon description, we vali- single cartoonist and published before and during the Sec- date all questions for at least 1 given answer. ond World War, so we gain insight into different points of We invited interested volunteers online and in printed na- tional media. In total 189 users registered, where eventually DIR 2013, April 26, 2013, Delft, The Netherlands. 83 volunteers participated with at least 1 completed descrip- Copyright remains with the authors and/or original copyright holders. tion of a cartoon and with 5 users completing more than 10. icking model of [1] – queries are not static, but rather evolve, and users “gather information in bits and pieces instead of in one grand best retrieved set.” These steps are stored as part of the search trail, so the overview is kept. The user inter- face of the system is depicted in Fig 1(b). In this example, someone looked for a cartoon about a “Jood” (Jew) used as a main keyword to describe a cartoon, with captions under it, and a “ster” (star) depicted as a symbol. The search en- gine treats the questions asked in the survey as facets, and is therefore a straight-forward question-answering system. Facets that always appear are the date of publication of a cartoon, and the education and knowledge levels of the vol- (a) The VoVa Annotation Editor, where volunteers can pro- unteers who provided the descriptions. We show in [7] that vide valuable metadata about the cartoon, ranging from plain making the credibility of the source transparent gives users descriptions to their opinion of a cartoon. greater confidence in their selection. We think historians will be aided with this part of the search process. There are different search strategies possible. Users can search by full-text or focused (within the answers of ques- tions). The query gets highlighted in context given the full- text and the survey question. A dynamic word cloud widget that supports query expansion is not activated, unless the autocompletion is used. Using the Advanced Search option, users can look up a question and then enter a keyword also with the autocompletion feature. Wildcard (empty) queries can be used to obtain the distribution of words of the an- swers given a question in a word cloud for a quick summary. 4. CONCLUSIONS We have presented – in a compressed version – the mission statement and some results of the Radical Political Repre- sentation project. We completed the first phase of crowd- sourcing, and pending further releases of data by the Na- tional Library, we can further digitally archive the complete series of Meuldijk cartoons. The technical infrastructure to digitally archive political cartoons has been set-up. This means we can expand our scope to other cartoonists in different times – there is no shortage of cartoons. We (b) The VoVA Search Engine, which is used to gain intellec- can refine our survey to allow for more different information tual (advanced) access to the cartoons. needs of historians, or embed our survey as part or exten- sion of a formal metadata schema like VRA Core. We will improve the UI and further implement useful information Figure 1: The digital archiving of cartoons with the visualization of results, and evaluate the search engine. It VoVa Annotation Editor and Search Engine. can be used at www.meertens.knaw.nl/vova/search. 3. SERENDIPITY IN CONTEXT 5. REFERENCES [1] M. J. Bates. The design of browsing and berrypicking Having obtained the metadata, we want to use it. Since techniques for the online search interface. Online Review, the search engine should serve historians, we design it to sup- 13(5):407–424, 1989. port serendipitous search and be highly interactive in order [2] G. M. Hodge. An information life-cycle approach : Best practices for digital archiving. Journal of Electronic to focus on a high recall (rather than precision). The system Publishing, 5(4):1–14, 2000. has been designed to maximize the user’s ability to explore. [3] C. Landbeck. Issues in subject analysis and description of We have proposed search features to support serendipitous political cartoons. Advances in Classification Research and focused access in [6], and these features have been re- Online, 19(1), 2008. implemented here. The search features primarily deal with [4] C. Sterling. Encyclopedia of Journalism. Sage, 2009. query expansion, recommendation, and interactive visual- [5] Y. Wu. Searching digital political cartoons. In Proceedings of izations of aggregated results. The former is based on using the 2010 IEEE International Conference on Granular ternary search trees for spellchecking, returning the top term Computing, GRC ’10, pages 541–545, Washington, DC, USA, 2010. IEEE Computer Society. vectors related to the original query, and returning the top [6] J. Zhang. Supporting serendipitous and focused search. In terms that have the original query as substring. The latter EuroHCIR, volume 909 of CEUR Workshop Proceedings, is based on charts, maps and word clouds. pages 79–82, 2012. A user can improve the searching in a session by effectively [7] J. Zhang, A. Amin, H. S. M. Cramer, V. Evers, and reducing the information space step by step, i.e. incremen- L. Hardman. Improving user confidence in cultural heritage tally combining questions. This confirms with the Berryp- aggregated results. In SIGIR, pages 702–703, 2009.