<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ASIA: a Tool for Assisted Semantic Interpretation and Annotation of Tabular Data</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Vincenzo</forename><surname>Cutrona</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Milano -Bicocca</orgName>
								<address>
									<settlement>Milan</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Michele</forename><surname>Ciavotta</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Milano -Bicocca</orgName>
								<address>
									<settlement>Milan</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Flavio</forename><surname>De Paoli</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Milano -Bicocca</orgName>
								<address>
									<settlement>Milan</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Matteo</forename><surname>Palmonari</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Milano -Bicocca</orgName>
								<address>
									<settlement>Milan</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ASIA: a Tool for Assisted Semantic Interpretation and Annotation of Tabular Data</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">D3CFC283C8F34D5158F9A27794C9C819</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T14:07+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Semantic annotation</term>
					<term>Data enrichment</term>
					<term>Linked Data</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Enriching datasets with additional information to build robust models is an essential task in many data science applications. Also, the huge availability of Linked Data encourages to reuse and integrate such high-quality information. The ASIA tool assists users in annotating tabular data both at schema-and instance-level, in such a way to enable data extension. This demo paper presents its core capabilities.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Table interpretation and annotation is the process where a table, e.g., a CSV file or an HTML &lt;table&gt;, is annotated with semantic pieces of information such as types (ontology classes or data types), properties and resource identifiers. Consider for instance the case where the columns of the table are annotated with types specifying the class of entities or literal values contained in the column. Columns can also be associated with ontology properties, which specify a relation that is implicitly represented in the column; in this case, the column can be interpreted as a source of RDF triples &lt;subject, predicate, object&gt;, one per row, such that the values of the annotated column, i.e., the target column, are interpreted as objects of the triple, values contained in a different column, specified as the source column (of the relation), are interpreted as subjects, and the property specified in the annotation defines the predicate of the triples. In addition to these schema-level annotations, instance-level annotations match values in the columns (interpreted as mentions to entities) to identifiers in a Knowledge Base (KB), e.g., identifiers of DBpedia resources. Several approaches have been proposed to automate this interpretation and annotation process; we suggest two recent papers for a review of techniques proposed in these approaches <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b3">4]</ref>. Among these approaches, we also mention semantic labeling approaches, where the distinction as mentioned above between class-based and property-based annotations of columns is less strict than in our definition <ref type="bibr" target="#b2">[3]</ref>. The automatic table interpretation and annotation approaches discussed above target two main kinds of applications: mapping tables to known vocabularies and instances so as to generate RDF data from the table; execute structured queries on a large amount of data available in web tables.</p><p>In this paper, we showcase ASIA (Assisted Semantic Interpretation and Annotation Tool)<ref type="foot" target="#foot_0">1</ref> , a tool designed to support users in annotating data by providing assistance with three main tasks: i) schema-level annotation, to map tabular data to existing vocabularies and generate RDF data; ii) instance-level annotation, to perform data linking while generating the new data; iii) data extension, to use the links established with instance-level annotations to fetch additional data from third-party sources (e.g., after linking a column to DBpedia cities, additional data about these cities can be fetched from DBpedia). Thanks to the combination of instance-level annotations and data extension features, both implemented to work with third-party reconciliation and extension services<ref type="foot" target="#foot_1">2</ref> , ASIA targets a new type of application that is crucial to support analytics workflows at scale: semantic enrichment of tabular data to help users analyzing their proprietary data once they are enriched with third-party data sources. Applications of this semantic enrichment task can be found in real-world data analytic projects in domains such as Digital Marketing<ref type="foot" target="#foot_2">3</ref> , and eCommerce.</p><p>ASIA is built on top of the D-a-a-S application DataGraft and its data manipulation tool Grafterizer <ref type="bibr" target="#b5">[6]</ref>. From the latter ASIA borrows the capability to transform the annotations into full-fledged data transformation scripts, which can be applied in batch mode to transform data into RDF or enrich large volumes of data. Moreover, ASIA provides features to streamline the annotation task by supplying cross-lingual vocabulary suggestions services based on data profiling systems, which provide information about the usage of vocabularies in existing data (currently, ABSTAT <ref type="bibr" target="#b4">[5]</ref> and Linked Open Vocabularies (LOV) are supported). As a result, ASIA table interpretation and annotation is offered as part of an end-to-end solution for semantic data preparation.</p><p>We can summarize the novelties of ASIA by comparing it with other table annotation tools (the comparison with table interpretation tools or techniques that do not offer a UI is out of the scope of this paper). <ref type="foot" target="#foot_3">4</ref> Compared to Karma, ASIA provides also reconciliation of column values as well as data extension as features; in addition, (cross-lingual ) schema-level annotation is implemented as a service and is currently performed using vocabulary usage statistics, rather than one full-fledged ontology (otherwise, Karma uses more sophisticated schema-level annotation techniques). Compared to OpenRefine, ASIA supports more sophisticated schema-level annotations and RDF data generation; it also supports, natively, batch execution of data transformations. Odalic and MantisTable support schema-level annotation, but -to the best of our understanding -does not support data enrichment. RMLEditor supports the editing of rules to generate RDF data, but does not perform table annotation and data enrichment. ASIA's prime objective is to support users in annotating semantically and extending datasets in a tabular format. In the following, we consider a scenario where a user is interested in running analyses requiring information about cities and their regions (such as population and coordinates), and weather forecasts about those regions. The dataset used for this demo has been provided by the JOT Internet Media company <ref type="foot" target="#foot_4">5</ref> and contains data about digital marketing campaigns performance. Particularly, it comes with a column "CityStr", featuring city toponyms. We demonstrate how ASIA can help the user in extending the working dataset. First, the user relies on ASIA's matching functionalities to disambiguate the toponyms with non-ambiguous identifiers (URIs) from a reference KB, e.g., GeoNames in the example. These identifiers are then used to query the reference KB to retrieve additional information.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> depicts the whole enrichment pipeline. The blocks refer to: a reconciliation step (blue), an annotation step (orange), a transformation step (purple), and extensions steps, namely, KB-based extensions (green), and weather extensions (yellow). The first reconciliation step includes an important user validation step mediated by an interface (wrong reconciliations lead to wrong extensions). Statistics help the user understand the quality of the results returned by automatic reconciliation; the user can modify the results by i) choosing an alternative URI, or ii) manually inserting the URI himself. Consequently, a new reconciled column (named "GN city") is appended in the working dataset and is automatically annotated with the type of the entities listed therein. <ref type="foot" target="#foot_5">6</ref> In step 2, with the support of the schema-level annotation form, the user specifies that toponyms are associated as labels with the GeoNames entities. Subsequently, the user exploits some KB-based extensions to extend the working dataset with information from GeoNames: the extension form allows to select as many properties as the user needs, and then retrieves all the properties' objects from the KB. In the pipeline depicted in Figure <ref type="figure" target="#fig_0">1</ref>, the user applies four consecutive KB-based extension steps starting from the "GN city" column (steps 3 to 6): all these steps can be accomplished at once by selecting four properties in the extension form. The sixth step, "GN city → GN region", adds a new reconciled column to the dataset, which contains the region entity wherein the city entity is located. At this point, the user may want to slightly modify the extension results, for ex-ample by merging the latitude and longitude columns into a new "coordinates" column (step 7). Starting from the "GN region" column, the user applies new KB-based extensions and appends the population of each region to the dataset (step 8). Lastly, the user retrieves information about weather (temperature and wind) at region level.</p><p>Weather extensions become available in ASIA when i) the dataset contains one column annotated as xsd:date, or ii) the dataset contains a column reconciled to GeoNames. Thus, the user obtains weather data by extending the "GN region" column. In the Weather extension form, the user selects the observation dates (that can be kept from another column -ASIA can recognize the most common date formats) and the day offset, i.e., the weather forecast for the next x days using the observation date as base. The user has also to select which aggregation function to apply to the daily weather observations (avg, min, max, cumulative). In the example pipeline, the user chooses to add information about temperature and wind (steps 9 and 10); as a result, the Weather extension appends n × m × p new columns, where n is the number of selected parameters, m is the number of selected offsets, and p the number of selected aggregation functions. Finally, the user downloads the enriched dataset in CSV format. Alternatively, she can generate a KB in RDF, or download the whole pipeline as an executable JAR to perform the same manipulations locally on larger volumes of data compared to those that can be managed from the UI.</p><p>A video demonstration of ASIA for building an enrichment pipeline that extends the one described above can be found at https://youtu.be/Z7M2_ SjN2xo<ref type="foot" target="#foot_6">7</ref> . The demonstration can be replicated using the online version of Datagraft at https://datagraft.io/.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. The enrichment pipeline</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://inside.disco.unimib.it/index.php/asia/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">The latest release of ASIA includes several reconciliation services: GeoNames, Google GeoTargets, Wikifier, and Google ProductsServices Categories.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Examples of ASIA-supported enrichment pipelines in this domain can be found in<ref type="bibr" target="#b1">[2]</ref>.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">A more complete comparison can be found as a resource at https://ew-shopp. github.io/eswc2019-tutorial/, the tutorial's page where ASIA has been presented and compared with several other tools. This is the first work that illustrates the tool.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://www.jot-im.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">Since GeoNames uses the type gn:Feature for all its instances, we adopted the gn:featureCode property as type, which is more significant.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">Other videos are available at https://www.youtube.com/playlist?list= PLy7SznldqqmezwdL4QcxQYy2Fz1HV0wMS.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgment. This research has been partly supported by EU H2020 projects EW-Shopp -Grant n. 732590, and EuBusinessGraph -Grant n. 732003.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Colnet: Embedding the semantics of web tables for column type prediction</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Jimenez-Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sutton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Cutrona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>De Paoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Košmerlj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Palmonari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Perales</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roman</surname></persName>
		</author>
		<title level="m">Semantically-enabled optimization of digital marketing campaigns</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>Accepted for ISWC2019 In-Use track</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Semantic labeling: A domainindependent approach</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Alse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Knoblock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Szekely</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISWC</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="446" to="462" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Matching web tables to dbpedia -A feature utility study</title>
		<author>
			<persName><forename type="first">D</forename><surname>Ritze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EDBT</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="210" to="221" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Abstat: Ontologydriven linked data summaries with pattern minimalization</title>
		<author>
			<persName><forename type="first">B</forename><surname>Spahiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Porrini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Palmonari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Maurino</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Semantic Web</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="381" to="395" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Tabular data cleaning and linked data generation with grafterizer</title>
		<author>
			<persName><forename type="first">D</forename><surname>Sukhobok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pultier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Berre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Moynihan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Elvesaeter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mahasivam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ESWC (Posters &amp; Demos)</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="134" to="139" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
