<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Hacking History: Automatic Historical Event Extraction for Enriching Cultural Heritage Multimedia Collections</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Roxane</forename><surname>Segers</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">VU University Amsterdam</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marieke</forename><surname>Van Erp</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">VU University Amsterdam</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lourens</forename><surname>Van Der Meij</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">VU University Amsterdam</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lora</forename><surname>Aroyo</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">VU University Amsterdam</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Guus</forename><surname>Schreiber</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">VU University Amsterdam</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bob</forename><surname>Wielinga</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">VU University Amsterdam</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jacco</forename><surname>Van Ossenbruggen</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Johan</forename><surname>Oomen</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Netherlands Institute for Sound and Vision</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Geertje</forename><surname>Jacobs</surname></persName>
							<affiliation key="aff3">
								<orgName type="institution">Rijksmuseum Amsterdam</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Centre for Mathematics and Computer Sciences (CWI)</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Hacking History: Automatic Historical Event Extraction for Enriching Cultural Heritage Multimedia Collections</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">C27D3808B42CA49007CD9C2FE08EEE8E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T08:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Within cultural heritage collections, objects are often grounded in a particular historical setting. This setting can currently not be made explicit, as structured descriptions of events are either missing or not marked up explicitly. This poster reports a study on automatic extraction of an historical event thesaurus from unstructured texts. We also present a demo in which relations between events and museum objects are visualised to accommodate event-and object-driven search and browsing of two cultural heritage collections.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Events have recently gained attention in the knowledge representation community as valuable constructs <ref type="bibr">[4,</ref><ref type="bibr" target="#b19">7,</ref><ref type="bibr" target="#b20">8]</ref> that can help tie together relevant but yet unrelated elements of information. In the cultural heritage domain, knowledge about historical events is often concealed in textual descriptions that can only be accessed via keyword search. As such, the available knowledge can not be reused across collections as it is not part of the shared metadata and controlled vocabularies.</p><p>In this study, we investigate how historical events in unstructured text collections can be captured and modeled to create an event thesaurus for enriching metadata in cultural heritage collections. We adopt the SEM event model <ref type="bibr" target="#b20">[8]</ref> to distinguish event types, actors, locations, and dates. We experiment with natural language processing (NLP) techniques to extract event names and their associated actors, dates and locations. Additionally, we show how this resulting preliminary event thesaurus is employed in a new platform for event-and object driven search and browsing of the collections of the Rijksmuseum Amsterdam (RMA) and the Netherlands Institute for Sound and Vision (S&amp;V). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Event Extraction from Text</head><p>As no annotated historical document collections exist in Dutch, our approach is focused on extracting named events with minimal manual effort. For this study we selected 3,724 historical Wikipedia articles as a test set. The event extraction process consists of three steps: in the first step, we recognize actor names and locations using the Stanford Named Entity Recognition system <ref type="bibr">[2]</ref> adapted for Dutch historical texts. Dates were recognized via regular expressions. This step resulted in 18,623 candidates for actors (F-measure of 0.77), 7,023 locations (F-measure of 0.66) and 7,981 dates. In the second step, we use a pattern-based method for recognizing event names such as French Revolution. We harvest patterns from the Web (e.g., destroyed during the, before the) using the Yahoo! search API<ref type="foot" target="#foot_0">5</ref> and a seed set of one hundred historical events. Patterns are ranked by frequency of co-occurrence with two or more seed events <ref type="bibr">[6]</ref>. To retrieve event candidates, we applied the patterns to the Wikipedia corpus. The event candidates are then filtered, based on a threshold on the pattern score, resulting in a set of 2,444 unique events. The precision score of this set is 56.3%.</p><p>In the third step, we associate events with actors, locations and dates. We experiment with both redundancy and co-occurrence of data on the Web, inspired by the work of Geleijnse et al. <ref type="bibr">[3]</ref> and Cilibrasi &amp;Vitanyi <ref type="bibr">[1]</ref>. Each combination of an event name and actor/location/date is sent to Yahoo! and for each pair a score is computed. We discovered 392 event names that were paired with an actor, a location and a date. Through manual evaluation we conclude the following: 71.9% (323) are correct event names, 45.6% (179) are correct actors, 41.1% (161) are correct locations and 51.5% (202) are correct dates.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Enrichment by Events</head><p>The extracted events are linked to the RMA and S&amp;V collections. In total 35 unique events provide direct relations from 435 S&amp;V objects to 675 RMA objects. An additional 34 unique events provide links from 391 S&amp;V objects to 362 RMA objects, but this link exists indirectly through the event instance (e.g., S&amp;V object -Actor -RMA object). We hypothesize that these links are potentially useful for navigating cultural heritage collections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">The Agora demonstrator</head><p>The automatically generated event thesaurus is applied in a new historical event browser called Agora 6 which provides an integrated access route to museum objects and audio-visual material from RMA and S&amp;V respectively. It is a first step towards a platform to investigate the added value of historical events and narratives for the exploration of integrated collections. For each event and object there is an automatically generated page that shows (1) all associated objects, e.g., museum and audio-visual objects; (2) all associated events and the type of their relationship, e.g., previous-in-time event, sub-event; (3a) the event descriptive metadata, e.g., actors, place, period; or (3b) object descriptive metadata organized in three groups, e.g., biographical, material and semiotic dimensions -see figure <ref type="figure" target="#fig_0">1</ref> for a screenshot -and finally (4) the navigation path. The current version of the event thesaurus will be extended further to accommodate searching for relations between events such as temporal inclusion, causality and meronymy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Discussion</head><p>In this paper, we presented a modular pipeline for capturing knowledge about historical events from Dutch texts. Compared with previous approaches (i.e., <ref type="bibr" target="#b17">[5]</ref>), it relies on a minimum of manual annotation and can be repurposed for other languages. To the best of our knowledge, this is the first work to extract events from unstructured Dutch text. Although our results are promising, more sophisticated techniques are necessary to obtain more fine-grained extractions and define measures for the historic relevance of the extracted events. Additionally, we also aim to find and represent relations between events such as causality, meronymy and correlation.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Screenshot of object page in the Agora Event Browsing Demonstrator</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_0">http://developer.yahoo.com/search</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Acknowledgements</head><p>This research was funded by the CAMeRA Institute of the VU University Amsterdam and by the CATCH programme, NWO grant 640.004.801.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">0%6&apos;,7&quot;$</title>
	</analytic>
	<monogr>
		<title level="j">8%9</title>
		<imprint>
			<biblScope unit="issue">42</biblScope>
			<biblScope unit="page">1</biblScope>
		</imprint>
	</monogr>
	<note>0&amp;. 4%2)%:44&quot;*)%&lt;&quot;4%6&apos;,7&quot;$&quot;0&amp;</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">#&amp;:)=%2</title>
	</analytic>
	<monogr>
		<title level="m">%+</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m">0%?@AB8 3CD?@@B DE D?F !&quot;#$%&amp;</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">*,0),1)0)*&quot;#20+)</title>
	</analytic>
	<monogr>
		<title level="m">##/&amp;,0&apos;&apos;</title>
				<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page">2</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<idno>-6#/#*&amp;# !789</idno>
		<title level="m">##</title>
				<imprint>
			<biblScope unit="volume">5</biblScope>
		</imprint>
	</monogr>
	<note>$&amp;</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">#&amp;:)</title>
	</analytic>
	<monogr>
		<title level="m">G5))2)%+</title>
				<imprint/>
	</monogr>
	<note>4)</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">!&quot;#%%.&apos;()0?@ABD?JD?@ !$#%?@A@DFKDLF !$#%JF)%)</title>
	</analytic>
	<monogr>
		<title level="m">%H2:*:2;&apos;;&apos;=%I&apos;(&quot;**)2</title>
				<imprint/>
	</monogr>
	<note>&apos;($20G&apos;</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">7&quot;$&quot;0&amp;&quot; !&quot;# G5))2)%+&apos;.:&amp;:&apos;4).)%&quot;#&amp;:) !&apos;# ?@ABD?JD?@ !&quot;# ?@A@DFKDL? !%#</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">%(#,7&lt;=; O%+0)&lt; % ? J L &gt; P % 4)Q&amp;%R</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m">&amp;:&apos;4%S&quot;&amp;(%T)&amp;</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">S0)1:2)4&amp; %-&apos;)$&quot;04&apos;%,888 H11&apos;#:&quot;&amp;)2 %S0)11 -:4$:4%+&quot;4;&quot;4, %*)&amp; %1888 H4</title>
	</analytic>
	<monogr>
		<title level="m">&apos;47*&apos;M1 U42&apos;4</title>
				<imprint>
			<publisher>W</publisher>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">-#</title>
	</analytic>
	<monogr>
		<title level="m">X&quot;&amp;&amp;&quot;= %I</title>
				<imprint>
			<biblScope unit="page">2</biblScope>
		</imprint>
	</monogr>
	<note>4%). 4 %H&amp;</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">H4&apos;47*&apos;M1 H&quot;4$&apos;*1&amp; %&lt;</title>
	</analytic>
	<monogr>
		<title level="j">4%Y</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page">888</biblScope>
		</imprint>
	</monogr>
	<note>M1</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The google similarity distance</title>
		<author>
			<persName><forename type="first">R</forename><surname>Cilibrasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vitanyi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="370" to="383" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Incorporating non-local information into information extraction systems by gibbs sampling</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Finkel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Grenager</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005)</title>
				<meeting>the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005)</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Instance classification using co-occurrences on the web</title>
		<author>
			<persName><forename type="first">G</forename><surname>Geleijnse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Korst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>De Boer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ISWC 2006 workshop on Web Content Mining (WebConMine)</title>
				<meeting>the ISWC 2006 workshop on Web Content Mining (WebConMine)<address><addrLine>Athens, GA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006-11">November 2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Automatic event-based indexing of multimedia content using a joint content-event model</title>
		<author>
			<persName><forename type="first">N</forename><surname>Gkalelis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Mezaris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kompatsiaris</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM Events in MultiMedia Workshop (EiMM10)</title>
				<imprint>
			<date type="published" when="2010-10">Oct 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Exploiting semantic web technologies for intelligent access to historical documents</title>
		<author>
			<persName><forename type="first">N</forename><surname>Ide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Woolner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourth Language Resources and Evaluation Conference (LREC)</title>
				<meeting>the Fourth Language Resources and Evaluation Conference (LREC)<address><addrLine>Lisbon, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="2177" to="2180" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Learning dictionaries for information extraction by multilevel bootstrapping</title>
		<author>
			<persName><forename type="first">E</forename><surname>Riloff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of AAAI &apos;99</title>
				<meeting>AAAI &apos;99</meeting>
		<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="474" to="479" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Lode: Linking open descriptions of events</title>
		<author>
			<persName><forename type="first">R</forename><surname>Shaw</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hardman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">4th Annual Asian Semantic Web Conference (ASWC&apos;09)</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Abstracting and reasoning over ship trajectories and web data with the Simple Event Model (SEM)</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">R</forename><surname>Van Hage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Malaisé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Vries</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Schreiber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Someren</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
