<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The Secret Life of Wikipedia Tables</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Tobias</forename><surname>Bleifuß</surname></persName>
							<email>tobias.bleifuss@hpi.de</email>
						</author>
						<author>
							<persName><forename type="first">Leon</forename><surname>Bornemann</surname></persName>
							<email>leon.bornemann@hpi.de</email>
						</author>
						<author>
							<persName><forename type="first">Dmitri</forename><forename type="middle">V</forename><surname>Kalashnikov</surname></persName>
							<email>dmitri.vk@acm.org</email>
						</author>
						<author>
							<persName><forename type="first">Felix</forename><surname>Naumann</surname></persName>
							<email>felix.naumann@hpi.de</email>
						</author>
						<author>
							<persName><forename type="first">Divesh</forename><surname>Srivastava</surname></persName>
							<email>divesh@att.com</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Hasso Plattner Institute</orgName>
								<orgName type="institution">University of Potsdam</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Hasso Plattner Institute</orgName>
								<orgName type="institution">University of Potsdam</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Hasso Plattner Institute</orgName>
								<orgName type="institution">University of Potsdam</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="department">AT&amp;T Chief Data Office</orgName>
								<address>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The Secret Life of Wikipedia Tables</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F8458002826AC09F95CA25F951964471</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T08:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Tables on the web, such as those on Wikipedia, are not the static grid of values that they seem to be. Rather, they have a life of their own: they are created under certain circumstances and in certain webpage locations, they change their shape, they move, they grow, they shrink, their data changes, they vanish, and they re-appear. When users look at web tables or when scientists extract data from them, they are most likely not aware that behind each table lies a rich history.</p><p>For this empirical paper, we have extracted, matched and analyzed the entire history of all 3.5 M tables on the English Wikipedia for a total of 53.8 M table versions. Based on this enormous dataset of public table histories, we provide various analysis results, such as statistics about lineage sizes, table positions, volatility, change intervals, schema changes, and their editors. Apart from satisfying curiosity, analyzing and understanding the change-behavior of web tables serves various use cases, such as identifying out-of-date values, recognizing systematic changes across tables, and discovering change dependencies.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">EMPIRICAL RESEARCH ON THE WEB</head><p>Traditionally, empiricism plays a minor role in the theory-and engineering-oriented field of our research community, while it has played a significant role in other disciplines of computer science (e.g., <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>). Hardly ever do we pause to analyze and reflect on the observable, "natural" behavior of data and systems. Among the notable exceptions is the area of data quality research <ref type="bibr" target="#b21">[22]</ref>.</p><p>One example of observable behavior is that of web tables, in particular those that are collaboratively edited. As such, an example of a very heterogeneous and semi-structured data lake is the set of tables on Wikipedia. Such web tables are used for a variety Figure <ref type="figure">1</ref>: An example of an evolving table in Wikipedia (from https://en.wikipedia.org/?diff=prev&amp;oldid=541341520). of purposes, as was recently surveyed in <ref type="bibr" target="#b9">[10]</ref>, including entity extraction and fact generation <ref type="bibr" target="#b15">[16]</ref>, improving web search <ref type="bibr" target="#b20">[21]</ref>, and entity linking <ref type="bibr" target="#b18">[19]</ref>. Other work seeks to enhance web tables themselves, such as generating their title <ref type="bibr" target="#b13">[14]</ref>, generating column headers <ref type="bibr" target="#b23">[24]</ref>, or finding subject columns <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b24">25]</ref>. Again, all of these approaches make use of table content, headers, and surrounding text and data. Providing more such data, and in particular different versions of such data, gives these machine learning approaches a richer input set.</p><p>In the context of our Janus project <ref type="bibr" target="#b2">[3]</ref>, we have been extracting and working with the histories of various structured datasets, including DBLP, IMDB, open government data, and in particular Wikipedia, for which a detailed history of every edit is available. In this empirical paper, we focus on tables as they appear on Wikipedia pages and report on our various observations across their lifetime, including their creation (Section 4), their evolution over time (Section 5), and ultimately their deletion (Section 6). We report on such varied dimensions as table-counts, users, duration, edits, table similarity, table position, and of course time itself, highlighting expected and some surprising behavior. Figure <ref type="figure">1</ref> shows one exemplary evolutionary step (in schema and data) for one of millions of Wikipedia tables.</p><p>Our results can help researchers better understand the volatility of web tables: a given table or corpus snapshot is not a stable basis but rather just that: a snapshot with a history of changes leading up to it and a future with many further changes. In fact, at the time of writing any given Wikipedia table was changed twice in the past year, on average, but with a standard deviation of 9.1, and some tables changed multiple times per day. But not only the content of tables change, also their schema evolves over time. This information about evolving schema can serve, for example, to identify synonymous attributes <ref type="bibr" target="#b22">[23]</ref>.</p><p>In the following, we highlight selected analyses in this paper and outline their possible implications for researchers: Need for timeliness. Figure <ref type="figure" target="#fig_8">8</ref> shows how quickly a snapshot becomes outdated. As a consequence, all models that are trained on static snapshots run the risk of quickly becoming obsolete. Efficient methods for updating these are therefore desirable. Help with maintenance and updating. A large portion of tables is created and maintained by power-users, as can be seen in Figures <ref type="figure" target="#fig_5">5 and 10</ref>. Knowledge about update patterns could be used to notify these editors about (potentially) outdated values and the need for updates. Suggestions for cleaning and improvement. Figure <ref type="figure">16</ref> illustrates an example of the inconsistencies that can emerge in tables.</p><p>Based on the knowledge of how similar tables evolve, one can make concrete suggestions to improve their (in this case) schemata, such as to rename or add certain columns.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">RELATED WORK</head><p>We discuss two types of related work: structured datasets with history and Wikipedia change analysis. There is a lot of research on web tables, so we can only provide a high-level overview in this short paper and refer to surveys and research papers for more details.</p><p>Related corpora. Wikipedia provides access to its entire version history, allowing us to track very fine-grained changes. A variety of datasets that also deal with (semi-) structured content have been extracted from the web and Wikipedia before. Multiple corpora of web tables <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b17">18]</ref> provide extracts of static versions of tables on the web and have since been subject to extensive research <ref type="bibr" target="#b9">[10]</ref>. For example, Lautert et al. establish a taxonomy of web tables and thus give an insight into the general structure of static web tables in <ref type="bibr" target="#b16">[17]</ref>, whereas we focus on the temporal evolution of web tables. The infobox history dataset WHAD <ref type="bibr" target="#b1">[2]</ref> comprises structured information on Wikipedia, namely the changes of infoboxes. This dataset is orthogonal to our dataset, since it does not cover general tables, which are more diverse and also more complex in comparison.</p><p>Analyzing changes in Wikipedia. The content of Wikipedia has been the subject of much research <ref type="bibr" target="#b19">[20]</ref>. While large parts of that research were conducted on static snapshots of Wikipedia, a variety of works analyzes changes on Wikipedia. Both the evolution of content <ref type="bibr" target="#b10">[11]</ref> and the evolution of the page link graph <ref type="bibr" target="#b7">[8]</ref> have been studied. Specifically, the study of content evolution can help detect conflicts <ref type="bibr" target="#b14">[15]</ref> or controversy that may result in edit-wars <ref type="bibr" target="#b8">[9]</ref>. The edit histories serve as input to event-extraction <ref type="bibr" target="#b12">[13]</ref> and are also valuable for trust assignment <ref type="bibr" target="#b0">[1]</ref>. Our approach can provide a better understanding of what has really changed, from which many of these studies should benefit.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">TABLE CORPUS</head><p>To explore tables over time, we need to be able to track tables over time, which is a non-trivial task as tables and their context can change over time. We consider tables as objects with an identity, which in contrast to its concrete shape and content, stays constant over time. A table can have multiple versions, where each version is an edit of the previous version. However, tables on the web usually lack a stable identifier, which is why we have to infer that identifier. We proposed a solution for this identity inference through a table matching procedure. The details of this work are described in <ref type="bibr" target="#b3">[4]</ref>, where we also evaluate the matching and show that it works much better than related work. Our input for the table change extraction process is a dump of web page versions -either snapshots that have been crawled, and specifically for Wikipedia the complete edit history as a set of XML files 1 . These XML files contain the actual version content encoded as Wikitext, a markup language including table markup, as well as additional metadata for each of the revisions, such as its timestamp, author, and comments.</p><p>For every page version, we extract a (possibly empty) list of parsed table nodes. This list of table nodes for every page revision constitutes the input for our table matching. For each page revision, it is necessary to decide whether the tables therein are versions of previously identified tables or entirely new tables. It is not sufficient to consider only tables of the directly preceding version, because tables can be deleted and be restored several revisions (and sometimes several years) later. To determine the quality of our matching, we have manually created a gold standard of table matchings comprising 1,445 tables, with a total of 16,919 distinct table versions selected from 90 pages. We show that the matching works well, matching all versions that belong to a given table correctly for 88.58% of all tables in the gold standard. For a majority of the remaining tables, we misclassify only a small number of versions of individual table histories (1 mistake: 6.09%; 2 mistakes: 2.98%).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Our matching decisions for individual table versions reach &gt;99%</head><p>𝐹 1 -measure.</p><p>We next present various statistics based on the 3,471,609 Wikipedia table objects we have collected that have been linked using our matching process. These statistics are based on the Wikidump of September 1, 2019. Both our gold standard and output dataset are available at our project website www.IANVS.org. The different statistics and findings are grouped by the three phases of a table's existence: creation, evolution and deletion. Already these three phases of existence show that establishing a table identity is essential for the following statistics, because without it, it would not be possible to determine statistics that aggregate on a per-table basis (but only on a per-version basis).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">CREATION</head><p>In this section, we focus on the first insert of every table -its creation -even when during its lifetime a table might be deleted and recreated multiple times.    popular only around 2004 and tables were fully adopted by end of 2006. Since then, every month around 20,000 new tables are created (about one every two minutes). The hypothesis that insertion frequency would decrease once tables are inserted at all relevant locations seems false: While the number of new pages created per month drops since 2007 2 , the insertion-rate of new tables remains constant. This relative increase in tables per page shows that more and more data is stored in a structured fashion, raising the relevance of methods to extract knowledge from said tables. We observed separately that most tables are created at the same time or soon after the page containing them is created. Only for pages that were created at the beginning of Wikipedia, when tables were not so popular, larger gaps between the page creation and the creation of the first table on the page are common.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Where and when are tables created?</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Maximum table count</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Number of pages</head><p>Figure <ref type="figure">3</ref> shows a histogram of the maximum number of tables that ever existed simultaneously on a Wikipedia article. The vast majority of Wikipedia articles contain only a few tables (we omitted the even larger number of pages that do not contain any tables at all). On the other hand, most tables appear on pages together with other tables. Only 19.1% of all tables appear alone on a Wikipedia article. The many tables that exist in the vicinity of each other can be assumed to be related in terms of content.</p><p>On Wikipedia, every article can link to categories, which are used to group related articles to a topic and can themselves be organized in categories. We investigate how the creation dates of tables correlate with any year mentioned in these page categories (such as 2020 for "2020 United States presidential election"), which we assume to be the relevant years for that table. Figure <ref type="figure">4</ref> shows that the extracted years and the creation year match for most tables. For every mention of a year in the page categories, a table is counted in a cell that represents the month of creation (on the x-axis) and the mentioned year (y-axis). If those two dimensions would perfectly align, we would only see marks close to the diagonal of the plot. There is a tendency that tables are rather created in the second half of the mentioned year or in the beginning of the following year, which shows as a small shift to the right in the plot. For those years that are covered by our dataset (2004-2019), in 50.8% of all cases the 2 Source: https://stats.wikimedia.org/#/en.wikipedia.org/contributing/newpages/normal|line|2001-01-01~2020-10-01|page_type~content|monthly   mentioned year and the year of creation align, for 5.6% of the cases the tables are created before the mentioned years and in 43.5% of the cases, the tables are created after the mentioned year.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Who creates tables?</head><p>The distribution of the number of tables that a user creates is shown in Figure <ref type="figure" target="#fig_5">5</ref>. Only 13.1% of the tables are created by non-registered users. The figure also clearly shows that tables are more likely to be created by power-users: More than half of the tables are created by users who each have also created at least 128 other tables. The record for the highest number of tables created by a single user is <ref type="bibr" target="#b19">20,</ref><ref type="bibr">194</ref> (in this case on a variety of sports events). A possible explanation for this behavior could be that the effort and skill it takes to create a new table is too high for many users. On the other hand, there are very dedicated users who must have acquired the necessary skills and possibly tools to create thousands of tables. One insight that we can take from this observation is that any random sample of tables is likely to be influenced by those power users.    </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table age [years] Time since last update [years]</head><p>Figure <ref type="figure">9</ref>: Table freshness over time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">How are tables created?</head><p>Creating tables is a tedious job, especially for inexperienced users, who might not be familiar with the syntax of tables. An obvious hypothesis is therefore that users copy &amp; paste similar tables (created by other users or themselves) and adapt them according to their needs. To investigate this hypothesis, we studied the frequency with which the same content appears in the first version of different tables. For a more accurate picture, we also analyzed how many users chose to use exactly the same table markup code, presumably as table templates.</p><p>The first observation we made is that 3,004,883 of the 3,471,609 tables (86.6%) in our corpus appear to have unique first versions. This does not imply that they are not copied from somewhere else, but were modified prior to the first save of the page. On the other end of the spectrum, there are templates that are used more than 15,000 times to create tables.</p><p>As can be seen in Figure <ref type="figure">6</ref>, the ratio of tables and number of users that use the same template greatly varies. Some templates are used for thousands of tables, but also by thousands of different users (top right in the plot). This is usually the case for example tables that contain only dummy values (an example can be seen in Figure <ref type="figure" target="#fig_7">7a</ref>). However, there are other templates that are also used for thousands of tables, but only by a few dozen users (top left in the plot). These are usually domain-specific templates, such as tables for sports results or statistics (see Figure <ref type="figure" target="#fig_7">7b</ref> for an example).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">EVOLUTION</head><p>The second phase in the lifetime of a table is its evolution. This phase encompasses all changes that happen between the initial creation and the possible final deletion, including changes to data, to schema, and to shape.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">How often are tables changed?</head><p>The average table in our corpus is re-inserted 0.62 times, deleted 0.93 times, and updated 13.89 times. Of the 0.62 re-inserts, 0.10 are fresh table versions, i.e., the table's content is different from any previous version, which means 0.52 of the inserts restore previously existing table versions that were deleted at some prior point in time. For the 13.89 updates, the ratio of fresh and old versions is 11.97 fresh versus 1.92 updates that restore previously existing versions. While these average numbers seem quite low, there is a large skew: there is a table on social networking websites that was updated more than 10,000 times during its lifetime. At least 1,310 tables were each updated more than 1,000 times during their lifetimes.</p><p>Figure <ref type="figure" target="#fig_8">8</ref> shows the number of tables that would have been created/updated/deleted by the date of our analyzed snapshot (September 1, 2019), if the snapshot were taken at a previous point in time (shown on the x-axis). In a one-month-old snapshot, already 4.4% of tables are outdated. If the snapshot were taken a year earlier, 26.6% of the tables would no longer represent the current state. In a 5 years time range, this number rises to 60.6%.</p><p>The violin plot in Figure <ref type="figure">9</ref> shows how the change frequency behaves with increasing table age. The shape of a violin plot follows the distribution of the values: the wider the line, the more probable the value. Within the violin plot, there are marks at the 0.25/0.5/0.75 quantiles. In particular, it shows how the time since the last update is distributed for tables at different ages. The median rises until a certain point, after which it stays constant or slightly decreases again. However, the distribution is skewed towards the two ends of the spectrum: tables either are very frequently updated or are hardly ever changed. For example, considering the quantiles for the 5-year-old tables, more than 25% of these tables were updated in    Figure <ref type="figure" target="#fig_1">12</ref>: A density plot of table position differences between two consecutive table versions.</p><p>the last year and for another 25% the last update was almost more than four years ago.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Who changes tables?</head><p>Figure <ref type="figure">10</ref> shows how long the original creator is active as an updater of a table. We distinguish between registered and unregistered creators, because for unregistered creators we have only the IP address as an identifier, which might change from time to time. Therefore, it is not too surprising that the share of edits that is done by an unregistered creator quickly drops and, hence, we exclude tables created by unregistered users from this plot. On the other hand, for registered users, there are tables that are still updated by the original creators years after they have been created. In reality, the influence of the original author on a table could be even higher than what this plot suggests: the number of edits for a table decreases over time, so the first buckets contain more edits.</p><p>When we look at the number of editors that change individual tables in Figure <ref type="figure">11</ref>, we see that a large number of tables (35%) are updated by the creator of the table only. In this analysis, we only consider users as editors of a table who create a new version (a simple revert to a previous version is not counted). While most of the tables are updated by only a few users, there are some exceptions where thousands of users contribute to the table. Again, the previously mentioned table on social networking websites holds the record with contributions by 4235 distinct users.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Are tables moved?</head><p>Figure <ref type="figure" target="#fig_1">12</ref> shows how much tables move in relation to other tables on their page. While for most page revisions, the tables do not move or move only slightly, there are page revisions for which tables move by up to 1,574 positions for a single page (we removed this one extreme case as an outlier). We observe that if tables move, this is often due to the insertion or deletion of tables and that tables rather move down on the page (64.09%) than up (35.91%). One obvious reason for this imbalance is the fact that a table inserted in the middle of the page causes other tables to move down, and insertions are more common than deletions. On average, a table's position changes 1.66 times during its lifespan.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">How much do tables change?</head><p>Figure <ref type="figure">13</ref> shows how the content of tables develops over time. More precisely, it shows a similarity score of each table compared to the first version of that table (calculated on a random subset of 1,000 tables). We use a similarity metric that is based on a word vector representation of both table versions: sim(ì 𝑣, ì 𝑤) = 𝑖 min(𝑣 𝑖 ,𝑤 𝑖 ) 𝑖 max(𝑣 𝑖 ,𝑤 𝑖 ) , where ì 𝑣 and ì 𝑤 are word vectors of the two table versions that should be compared. In general, the similarity is expected to decrease over time, but it can also rise if the table content becomes more similar to its original version. While there are some tables that stay almost unchanged throughout their lifetime, there are other tables that rapidly change within the first few days of their existence. One reason for this could be that people copy &amp; paste other tables as templates and then adjust the content, as explained in Section 4.3.</p><p>During their lifetime, 23.6% of all tables either grow or shrink in the number of columns, 37.0% grow or shrink in the number of rows. However, 57.3% of all tables retain their original size throughout their lifetime.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.5">How much do schemata change?</head><p>About half of all tables never change their schema, as can be seen in Figure <ref type="figure" target="#fig_12">14</ref> (note that this is a log-log plot). On the other side, there are tables that change their schema hundreds of times, up to 443 changes. On average, each table has 1.86 schema versions. The types of schema change can be manifold. For example, columns are renamed, columns are added or removed.</p><p>Figure <ref type="figure">16</ref> shows a vivid example of how schemata of web tables evolve over time. To create this plot, we created a clustering of schemata based on tables that evolve from one schema to another. This particular plot shows a cluster of schemata that all contain information about league results of football teams. There are almost 500 tables for which at least one of the snapshots had one of the Schemata 2-7. More than half of those tables followed Schema 6 at the beginning of 2011, while the other half mostly did not yet exist (Group 1). The splines show how this schema evolved to many different specializations until 2018. While in some cases these specializations make sense (such as a clarification about the league system), in other cases they are due to inconsistent changes (such as   the header "Year", which after manual inspection should actually be "Season", a range spanning two consecutive years, in most cases). As these tables are webtables, the header can also be formatted differently and we can see that for most tables of Schema 7, the header was changed to bold-type (Schema 6) in between 2009 and 2010. Still, there is a small number of tables that even after almost a decade still did not make this transition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">DELETION</head><p>Figure <ref type="figure" target="#fig_5">15</ref> shows how long tables survive, counting the days from their creation. The blue part shows the percentage of tables that reached the respective age without being deleted. The green part represents those tables that have been created long enough ago such that they could have reached the respective age, but were deleted before reaching that age. 69.5% of all tables ever created have survived until the end-date of our dataset. If a table is deleted, then this usually happens at the beginning of its lifetime. The longer a table exists, the less likely it becomes that it will be deleted.</p><p>From the time a table is created until it is deleted (or until the end-date of the dataset), the average in our table corpus is 4.93 years. In 97.7% of that time, the table is truly part of the page, while in the remaining 40.50 days the table is (temporarily) deleted.</p><p>While the vast majority of tables is never deleted (57.2%) or deleted only once (29.9%), there is a larger skew in the distribution of deletes. One table that explains the Wiki syntax was deleted 620 times during its lifetime, mostly from vandalism.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">CONCLUSIONS</head><p>In summary, we have seen how fast tables on Wikipedia change and how fast they come and go. When working with this corpus, it is important to keep this additional temporal dimension in mind and leverage it when possible. The history also makes other dimensions of the corpus accessible, such as the creators, editors, or templates, which together provide a perspective on the tables that is more holistic than single snapshots of individual tables or a table corpus.</p><p>As future work, we plan to explore whether other structured corpora, such as Wikipedia infoboxes or lists, for which we also provide histories, behave similarly in terms of their dynamics. Furthermore, we want to use the gained insights to assign trust to values and improve data quality. We encourage researchers to explore their datasets in a similar manner to uncover hidden information in a dataset's history.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2</head><label>2</label><figDesc>Figure 2 shows that Wikipedia pages in their initial years (2001-2003) had almost no tables. Using tables in Wikipedia became more 1 https://dumps.wikimedia.org/</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Number of tables and pages created per month.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :Figure 4 :</head><label>34</label><figDesc>Figure 3: Histogram of the maximum table count per page.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head></head><label></label><figDesc>r e g i s t e r e d Number of tables created by user Number of tables</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Histogram of tables bucketed by the total number of tables an author created.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Two concrete examples of table templates.</figDesc><graphic coords="4,236.79,83.68,136.19,67.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Missed updates in relation to snapshot date.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Figure 10 :Figure 11 :</head><label>1011</label><figDesc>Figure 10: Creator update activity for tables created by registered users.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_12"><head>Figure 14 :</head><label>14</label><figDesc>Figure 14: Number of schema versions per table.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_13"><head>Figure 15 :5Figure 16 :</head><label>1516</label><figDesc>Figure 15: Time until deletion.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table age</head><label>age</label><figDesc></figDesc><table><row><cell>Share of edits</cell><cell></cell></row><row><cell cols="2">[years]</cell></row><row><cell>By non-creator</cell><cell>By creator</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Assigning trust to Wikipedia content</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">Thomas</forename><surname>Adler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Krishnendu</forename><surname>Chatterjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luca</forename><forename type="middle">De</forename><surname>Alfaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marco</forename><surname>Faella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ian</forename><surname>Pye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vishwanath</forename><surname>Raman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Symposium on Wikis (WikiSym)</title>
				<meeting>the International Symposium on Wikis (WikiSym)</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">WHAD: Wikipedia historical attributes data -Historical structured data extraction and vandalism detection from the Wikipedia edit history</title>
		<author>
			<persName><forename type="first">Enrique</forename><surname>Alfonseca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guillermo</forename><surname>Garrido</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jean-Yves</forename><surname>Delort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anselmo</forename><surname>Peñas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Language Resources and Evaluation</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1163" to="1190" />
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Exploring Change -A New Dimension of Data Analytics</title>
		<author>
			<persName><forename type="first">Tobias</forename><surname>Bleifuß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leon</forename><surname>Bornemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Theodore</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dmitri</forename><forename type="middle">V</forename><surname>Kalashnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felix</forename><surname>Naumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Divesh</forename><surname>Srivastava</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="85" to="98" />
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Structured Object Matching across Web Page Revisions</title>
		<author>
			<persName><forename type="first">Tobias</forename><surname>Bleifuß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leon</forename><surname>Bornemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dmitri</forename><forename type="middle">V</forename><surname>Kalashnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felix</forename><surname>Naumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Divesh</forename><surname>Srivastava</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Data Engineering (ICDE)</title>
				<meeting>the International Conference on Data Engineering (ICDE)</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Natural Key Discovery in Wikipedia Tables</title>
		<author>
			<persName><forename type="first">Leon</forename><surname>Bornemann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tobias</forename><surname>Bleifuß</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dmitri</forename><forename type="middle">V</forename><surname>Kalashnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felix</forename><surname>Naumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Divesh</forename><surname>Srivastava</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The Web Conference</title>
				<meeting>The Web Conference</meeting>
		<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="2789" to="2795" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Graph structure in the Web</title>
		<author>
			<persName><forename type="first">Andrei</forename><forename type="middle">Z</forename><surname>Broder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ravi</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Farzin</forename><surname>Maghoul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Prabhakar</forename><surname>Raghavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Raymie</forename><surname>Sridhar Rajagopalan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><surname>Stata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Janet</forename><forename type="middle">L</forename><surname>Tomkins</surname></persName>
		</author>
		<author>
			<persName><surname>Wiener</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Comput. Networks</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="309" to="320" />
			<date type="published" when="2000">2000. 2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Scale-free networks are rare</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Broido</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Clauset</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature Communications</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">1017</biblScope>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Temporal analysis of the wikigraph</title>
		<author>
			<persName><forename type="first">Luciana</forename><forename type="middle">S</forename><surname>Buriol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carlos</forename><surname>Castillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Debora</forename><surname>Donato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefano</forename><surname>Leonardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefano</forename><surname>Millozzi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Web Intelligence (WI)</title>
				<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="45" to="51" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Finegrained controversy detection in Wikipedia</title>
		<author>
			<persName><forename type="first">Siarhei</forename><surname>Bykau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Flip</forename><surname>Korn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Divesh</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yannis</forename><surname>Velegrakis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Data Engineering (ICDE)</title>
				<meeting>the International Conference on Data Engineering (ICDE)</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1573" to="1584" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Ten years of webtables</title>
		<author>
			<persName><forename type="first">Michael</forename><surname>Cafarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alon</forename><surname>Halevy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hongrae</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jayant</forename><surname>Madhavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cong</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Daisy</forename><forename type="middle">Zhe</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eugene</forename><surname>Wu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="2140" to="2149" />
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Information evolution in Wikipedia</title>
		<author>
			<persName><forename type="first">Andrea</forename><surname>Ceroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mihai</forename><surname>Georgescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ujwal</forename><surname>Gadiraju</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kaweh</forename><surname>Djafari Naini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marco</forename><surname>Fisichella</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Symposium on Open Collaboration (OpenSym)</title>
				<meeting>the International Symposium on Open Collaboration (OpenSym)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page">10</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Top-k Entity Augmentation Using Consistent Set Covering</title>
		<author>
			<persName><forename type="first">Julian</forename><surname>Eberius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maik</forename><surname>Thiele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Katrin</forename><surname>Braunschweig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wolfgang</forename><surname>Lehner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Scientific and Statistical Database Management (SSDBM)</title>
				<meeting>the International Conference on Scientific and Statistical Database Management (SSDBM)</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Extracting event-related information from article updates in Wikipedia</title>
		<author>
			<persName><forename type="first">Mihai</forename><surname>Georgescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nattiya</forename><surname>Kanhabua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Daniel</forename><surname>Krause</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wolfgang</forename><surname>Nejdl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefan</forename><surname>Siersdorfer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval (ECIR)</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="254" to="266" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Generating Titles for Web Tables</title>
		<author>
			<persName><forename type="first">Braden</forename><surname>Hancock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hongrae</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cong</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International World Wide Web Conference (WWW). ACM</title>
				<meeting>the International World Wide Web Conference (WWW). ACM</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="638" to="647" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">He says, she says: conflict and coordination in Wikipedia</title>
		<author>
			<persName><forename type="first">Aniket</forename><surname>Kittur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bongwon</forename><surname>Suh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bryan</forename><forename type="middle">A</forename><surname>Pendleton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ed</forename><forename type="middle">H</forename><surname>Chi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Human Factors in Computing Systems (SIGCHI)</title>
				<meeting>the International Conference on Human Factors in Computing Systems (SIGCHI)</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="453" to="462" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Automatically Generating Interesting Facts from Wikipedia Tables</title>
		<author>
			<persName><forename type="first">Flip</forename><surname>Korn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xuezhi</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">You</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cong</forename><surname>Yu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Management of Data (SIGMOD)</title>
				<meeting>the International Conference on Management of Data (SIGMOD)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="349" to="361" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Web Table Taxonomy and Formalization</title>
		<author>
			<persName><forename type="first">Larissa</forename><forename type="middle">R</forename><surname>Lautert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marcelo</forename><forename type="middle">M</forename><surname>Scheidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carina</forename><forename type="middle">F</forename><surname>Dorneles</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGMOD Record</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="28" to="33" />
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A large public corpus of web tables containing time and context metadata</title>
		<author>
			<persName><forename type="first">Oliver</forename><surname>Lehmberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dominique</forename><surname>Ritze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Robert</forename><surname>Meusel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Bizer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference Companion on World Wide Web</title>
				<meeting>the International Conference Companion on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="75" to="76" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Linking temporal records</title>
		<author>
			<persName><forename type="first">Pei</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xin</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Luna</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrea</forename><surname>Maurino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Divesh</forename><surname>Srivastava</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="956" to="967" />
			<date type="published" when="2011">2011. 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">The sum of all human knowledge: A systematic review of scholarly research on the content of Wikipedia</title>
		<author>
			<persName><forename type="first">Mostafa</forename><surname>Mesgari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chitu</forename><surname>Okoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohamad</forename><surname>Mehdi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Finn</forename><surname>Årup Nielsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arto</forename><surname>Lanamäki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the Association for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">66</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="219" to="245" />
			<date type="published" when="2015">2015. 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Answering Table Queries on the Web Using Column Keywords</title>
		<author>
			<persName><forename type="first">Rakesh</forename><surname>Pimplikar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sunita</forename><surname>Sarawagi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PVLDB</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="908" to="919" />
			<date type="published" when="2012">2012. 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Data Quality: The Role of Empiricism</title>
		<author>
			<persName><forename type="first">Wasim</forename><surname>Shazia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tamraparni</forename><surname>Sadiq</surname></persName>
		</author>
		<author>
			<persName><surname>Dasu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luna</forename><surname>Xin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Juliana</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ihab</forename><forename type="middle">F</forename><surname>Freire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Ilyas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Renée</forename><forename type="middle">J</forename><surname>Link</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Felix</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiaofang</forename><surname>Naumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Divesh</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><surname>Srivastava</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGMOD Record</title>
		<imprint>
			<biblScope unit="volume">46</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="35" to="43" />
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Finding Synonymous Attributes in Evolving Wikipedia Infoboxes</title>
		<author>
			<persName><forename type="first">Paolo</forename><surname>Sottovia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matteo</forename><surname>Paganelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Guerra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yannis</forename><surname>Velegrakis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Databases and Information Systems (ADBIS)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="169" to="185" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Understanding tables on the web</title>
		<author>
			<persName><forename type="first">Jingjing</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Haixun</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhongyuan</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kenny</forename><forename type="middle">Q</forename><surname>Zhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Conceptual Modeling (ER)</title>
				<meeting>the International Conference on Conceptual Modeling (ER)</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="141" to="155" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Effective and Efficient Semantic Table Interpretation using TableMiner+</title>
		<author>
			<persName><forename type="first">Ziqi</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="921" to="957" />
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
