Results of SemTab 2021 ⋆

Results of SemTab 2021 ⋆ VincenzoCutrona vincenzo.cutrona@supsi.ch SUPSI

Switzerland

JiaoyanChen jiaoyan.chen@cs.ox.ac.uk University of Oxford

VasilisEfthymiou FORTH-ICS

Greece

OktieHassanzadeh hassanzadeh@us.ibm.com IBM Research

USA

ErnestoJiménez-Ruiz ernesto.jimenez-ruiz@city.ac.uk University of London

City UK

SIRIUS University of Oslo

Norway

JuanSequeda KavithaSrinivas kavitha.srinivas@ibm.com IBM Research

USA

NoraAbdelmageed nora.abdelmageed@uni-jena.de University of Jena

Germany

MadelonHulsebos m.hulsebos@uva.nl University of Amsterdam

The Netherlands

DanielaOliveira dpoliveira@fc.ul.pt Faculdade de Ciências LASIGE Universidade de Lisboa

Portugal

CatiaPesquita clpesquita@fc.ul.pt Faculdade de Ciências LASIGE Universidade de Lisboa

Portugal

Results of SemTab 2021 ⋆ 11D59919A6F09F3C2E6F38AD94EACECC GROBID - A machine learning software for extracting information from scholarly documents Tabular data Knowledge Graphs Matching SemTab Semantic Web Challenge Semantic Table Interpretation

SemTab 2021 was the third edition of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, successfully collocated with the 20th International Semantic Web Conference (ISWC) and the 16th Ontology Matching (OM) Workshop. SemTab provides a common framework to conduct a systematic evaluation of state-of-the-art systems.

Motivation

Data in tabular format are the most frequent input to data analytics pipeline, thanks to their high storage and processing efficiency. Also, the tabular format allows users to represent the information in a compacted way, by exploiting the clear data structure defined by rows and columns. However, such clear structure does not imply a clear understanding of the semantic structure (e.g., relationships between columns), as well as the meaning of the content (e.g., if data are about a specific topic). The lack of understanding hinders data analytics processes, requiring additional effort to properly understand the data first. Gaining the semantic understanding is valuable for many applications, including data cleaning, data mining, data integration, data analysis and machine learning, and knowledge discovery. For example, the semantic understanding can help in assessing what kind of transformations are more appropriate for a dataset, or which datasets can be integrated to enable new analytics (e.g., marketing analysis) [10].

In addition to their efficiency, the huge availability of tabular data on the Web makes Web tables a valuable source to consider for data miners (e.g., open data CSV files). Adding semantic information to Web tables is useful for a wide range of applications, including web search, question answering, and knowledge base construction.

Tabular data to Knowledge Graph (KG) matching is the process of clarifying the semantic meaning of a table by mapping its elements (i.e., cells, columns, rows) to semantic tags (i.e., entities, classes, properties) from KGs (e.g., Wikidata, DBpedia). The task difficulty increases when table metadata (e.g., table captions, table description, or column names) being missing, incomplete or ambiguous.

The tabular data to KG matching process is typically broken down into the following tasks: (i) cell to KG entity matching (CEA task), (ii) column to KG class matching (CTA task), and (iii) column pair to KG property matching (CPA task).

Over the last decade several approaches made advances in addressing one or several of above tasks, also constructing benchmark datasets ( [18,22,17,11]). The creation of SemTab1 [15,16] aimed at putting this significant amount of work into a common framework, enabling the systematic evaluation of state-of-the-art systems. The ambition is to make SemTab becoming the reference challenge in the Semantic Web community, in the same way the OAEI2 is for the Ontology Matching community. 3

The Challenge

The SemTab 2021 challenge has been organised into 3 different tracks: the Accuracy Track, which is the standard track proposed in previous editions; the Usability Track, a new track addressing the lack of publicly available, easy-to-use and generic solutions; and the Applications Track, which focuses on applications in real-world settings where the output of matching systems can contribute. The application track was also open to the submission of novel benchmark datasets.

Accuracy Track

The Accuracy Track included 3 rounds, running from June 30 to October 15. Different target KGs were used across rounds (see Table 1):

-DBpedia [3]: http://downloads.dbpedia.org/wiki-archive/

(version 2016-10) -Wikidata [24]: https://zenodo.org/record/6153449 -Schema.org [12]: https://gittables.github.io/downloads/sche ma 20210528.pkl

The different rounds of SemTab 2021 have been organised to evaluate participating systems on different datasets with variable difficulty. All the rounds were run with the support of AIcrowd;4 SemTab 2021 also used the STILTool system [8,5] for getting additional insights about the submitted solutions. Table 3 shows the participation per round. Compared with previous editions, we had 11 participants (vs 28 in 2020) submitting to at least one round. 6 We identified 6 core participants (vs 8 in 2020), which completed ∼14 tasks on average (out of 17 tasks). Seven participants submitted a system paper to the challenge: [26], Kepler-aSI [6], and DAGOBAH [14].

MTab [19], MAGIC [23], MantisTable V [4], JenTab [1], GBMTab

Evaluation measures As per the previous editions, systems have been evaluated on a single annotation for each provided target, for all the tasks; i.e., in CEA, target cells are to be annotated with a single entity from the target KG; in CTA, target columns are to be annotated with a single type from the target KG (as fine-grained as possible). The evaluation measures for CEA, CPA and CTA (DBpedia and Schema.org) are the standard Precision, Recall and F1-score, as defined in Equation 1:

where target annotations refer to the target cells for CEA, the target columns for CTA, and the target column pairs for CPA. We consider an annotation as correct when it is included within the ground truth set (a target cell usually has multiple annotations in the ground truth, because of redirect and same-as links in KGs).

Given the fine-grained type hierarchy in Wikidata, we adopted approximations of Precision and Recall in the CTA evaluation. Approximations adapt their numerators to consider partially correct annotations, i.e., annotations that are ancestors or descendants of the ground truth (GT) classes. The correctness score cscore of a CTA annotation α considers the distance between the annotation and the GT classes in the type hierarchy, and it is defined as

cscore(α) =      0.8 d(α) , if α is in GT, or an ancestor of the GT, with d(α) ≤ 5 0.7 d(α) , if α is a descendant of the GT, with d(α) ≤ 3 0, otherwise;(2)AP = cscore(α) |System Annotations| , AR = cscore(α) |Target Annotations| , AF 1 = 2 × AP × AR AP + AR(3)

Results Table 4 contains the average F1-score achieved by the 11 participating systems. The Tough Tables dataset still represent a challenge for almost all the systems, specially considering the fact that the the dataset is the same as in SemTab 2020. The BiodivTab and GitTables datasets brought additional complexity in Round 3, highlighting that realworld tables are challenging.

CEA task. Results for the CEA task are reported in Figure 1 for all the datasets. The Round 1 used the same 2T tables from last year edition, 7 raising the difficulty bar at the very beginning. Most of the systems faced important challenges when dealing with 2T tables, with only 2 systems managing to achieve an F1-score over 0.8 and several of them participating in only one of the tasks. It is worth noting the work of the DAGOBAH team, which improved their system over the last year, being able to achieve higher scores on 2T this year. Starting from Round 2, systems have been evaluated on datasets never seen before. The AG datasets aimed at bringing new challenges in each round, and we can observe than only the best systems managed to maintain almost the same score on the two different versions of this dataset. Concerning bio-related datasets, performance in Round 2 were positive (slightly below 0.9 on average), confirming that tables with many rows (∼2,500 on average) do not represent a problem for most of all the systems. Instead, the complexity brought by the (relatively small) tables in the BiodivTab dataset represented a new problem to solve, showing significantly reduced performance (none of the systems scored over 0.6). The JenTab system ranked 1 st over a very difficult dataset. It is worth noting, however, that members of the JenTab team are also the providers of the BiodivTab dataset.

CTA task. As shown in Figure 2, the results in the CTA tasks resemble the trend already seen from the CEA results. This is an indicator that most of the systems solve the CTA tasks based on annotations found in the CEA. Additional challenges have been included in Round 3 with the GitTables dataset, where we can see a critical performance drop for all the involved systems. It is worth emphasising that, given the general picture provided by the results in CTA, more research is needed to make existing systems able to deal with real-world tables, where the cells may be missing a correspondence to the target KG.

CPA task. Results for the CPA tasks are plotted in Figure 3. Currently, only BioTables and the AG datasets provide a GT for CPA. Results are overall positive for all the tasks, with a general improvement from Round 2 to Round 3 for all the involved systems, except for MAGIC, whose performance dropped a bit during the last round. Fig. 3: Results in the CPA task for the core participants.

Usability Track

Starting from SemTab 2021, the organisation committee agreed to include a new track focusing on system usability. The main goal of this track is to mitigate a pain point in the community: the lack of publicly available, easy-to-use, and generic solution that will address the needs of a variety of applications and settings.

Evaluation measures Deeply evaluating the usability of a system requires user studies to monitor different parameters [21]. Within the SemTab scope, we decided to simply verify the overall usability of tools as judged by a review panel. Participants' solutions were examined for the following criteria:

-Open source: open-source solutions make a great contribution to the community, especially when released with a permissive license. Publicly available resources can be used as a starting point for new tools or research investigations, and make experiments easily reproducible. -System dependencies: some tools may require specific platforms to be executed on premises, or have a huge resource consumption that may affect the use in common settings. For example, requiring many indexes/databases may prevent the usage of a tool by users with limited access to hardware. -Model generality: a tool may be considered general when it applies to different (and new) applications/domains, requiring near-zero adaptations; for example, tools employing machine learning techniques should not require extensive training and tuning to be adapted to different contexts. -Availability: tools may not be released as open source, but offered as a publicly available services. In this case, a tool served as a public service supports further research activities, and represent a big contribution to the community. -User experience: the purpose of a tool is to help people in solving a task; for this reason, semantic table to graph matching tools should come with a well-designed user interface that makes the tool usable also by practitioner with a limited experience in semantic matching. That is, the tool should not require an extensive training to be mastered.

✓ ✓ MAGIC ✓ ✓ DAGOBAH ✓ MantisTable V ✓ ✓ JenTab ✓ Kepler-aSI

Results Almost all the core participants obtained good results in this track, by performing well on one or more of the above evaluation criteria. Evaluation details are reported in Table 5. We exclude system dependencies and model generality because of the insufficient available evidence, which resulted in these two criteria not impacting the overall assessment strongly. Indeed, available data about system performance (i.e., accuracy) with reference to the different datasets and target KGs used in SemTab rounds do not allow us to draw any consistent conclusions. For example, it is not clear if tools were customized or tweaked (e.g., changing the lookup function for noisy data) to increase their accuracy in different rounds; we are not able to assess how hard a system adapts to a different context (e.g., changing the target KG).

The evaluation panel concluded that most of the tools are pre-configured and can potentially be used out of the box: for example, JenTab has been packaged in Docker containers to ease the deployment and execution of the tool on local premises. In general, tools requirements vary in complexity, but they are reasonable overall (e.g., preprocessing required, like creating new indexes or embeddings).

Considering the other criteria, JenTab is the only system released as open source under a permissive license (Apache 2.0). The MTab tool has been made publicly available as a Web service, free to use (MIT license); but the back-end application has not been disclosed. However, having a public API enables MTab serving third-party application (with no rate limit), and this was a key point in declaring MTab the most usable tool. Systems like DAGOBAH and MantisTable delivered a framework with impressive GUIs, while others (e.g., MAGIC) opted for a lightweight application.

Applications Track

This new track aims at addressing applications in real-world settings that take advantage of the output of the matching systems. Challenging dataset proposals have also been accepted and included within the SemTab 2021 rounds.

Results A specific application has been identified within the biological domain, where new data are constantly produced thanks to the advances in the field. The domain is particularly challenging from the semantics standpoint because of the the complexity of the biological relations between entities. Within SemTab, the data representation significantly impact the systems performance since entities are usually represented by codes (e.g., chemical formulas or gene names). Two different datasets have been submitted related to the biological domain; the first one, BioTables, is a dataset focused on molecular biology data; the second, BiodivTab, is a dataset focused on biodiversity research data and data augmentation.

Along side the above domain, a different dataset has been submitted to this track and also included in Round 3, GitTables. This dataset includes relational tables extracted from CSV files hosted at GitHub, and it comes with a peculiarity: the GT for CTA uses a mixture of classes and properties to annotate columns (both for the DBpedia and Schema.org versions).

The three datasets brought new complexity and contributed to increment the data diversity among the SemTab benchmark datasets.

Prizes

As in previous editions, IBM Research8 sponsored SemTab 2021 and awarded the best systems in each track with the following prizes:

-Accuracy Track: DAGOBAH (1 st prize) was the top system in most of the tasks, showing appreciable improvements over the last years. Honorary mention to MTab -Usability Track: MTab team (1 st prize), for providing the easy-to-use MTab tool 9along with Web services to lookup entities and annotate tables; JenTab (2 st prize), for being the only open-source system with a permissive license. Honorary mentions to DAGOBAH, MAGIC and MantisTable. -Applications Track: BiodivTab dataset (1 st prize), for having brought new challenges in CEA and CTA tasks. Honorary mention to GitTables.

Lessons Learned and Future Work

Avoiding over-fitting to AG. We have been using the same automated dataset generation process, with some variations that make it more challenging, since the first SemTab challenge. This may be resulting to participating systems that explicitly target datasets with characteristics similar to those of the AG datasets. This becomes evident from the almost perfect results shown in Table 4. For that reason, this year we have introduced several new datasets, while we are also planning to use as much as possible real data, rather than synthetic, in the future versions of the challenge.

System generalizability beyond KGs. Many systems currently rely on matching table values to entities in KGs. In this version of SemTab, we challenged the participating systems on their ability to detect the semantic types of table columns even when their values are not linkable to KG entities. We conclude that most systems do not generalize well in this scenario as indicated by the performance drop on the CTA task for GitTables (see Section 2.1). Improving systems to this end would make them useful for expanding KG coverage by matching tables from novel data sources to KGs in order to populate the "unknown unknowns" [25]. This generalizability would also benefit the applicability of the systems in offline databases. We plan to encourage and evaluate systems on their generalizability towards novel data sources in future versions of SemTab.

CTA vs CPA: the case of GitTables. Since the first edition of SemTab, we are used to consider CTA and CPA as two separated tasks, the first focuses on ontology classes, and the latter is dedicated to properties. However, GitTables annotations for CTA includes also properties from DBpedia and Schema.org. The rationale behind this choice stands in the relational nature of the considered tables: columns typically correspond to the attributes of an entity, which are reflected by properties in DBpedia and Schema.org, for example. Also, this choice is very useful when annotating literal columns (i.e., columns not containing mentions of entities), avoiding annotations based on datatypes (e.g., xsd:string). Therefore, GitTables introduced a new technical challenge, which potentially contributed to the complexity observed from the results in Figure 2. The case of GitTables may result in a new task to accomplish in the future, given that it enables table-to-KG matching with tables from alternative data sources and contexts (e.g., database dumps from industry).

Usability track. We believe that the introduction of the usability track has contributed to making participating systems publicly accessible. Our goal was exactly to encourage this, despite the competitive nature that a challenge may have. Thus, we consider this new track to be a very important one and we are planning to keep it in the next challenges. Next SemTab editions may consider to improve the evaluation of this track, for example by adopting the System Usability Scale (SUS) [7] to score the overall user experience. In particular, developing a systematic way to evaluate systems' generality and dependencies would definitely improve the evaluation of this track.

Applications track. We believe that the call of the application track has grasped more attention from the community by introducing their own datasets. Contributions from the community like BiodivTab, BioTable and GitTables help in extending the SemTab benchmark with new real-world challenges that are hard to reproduce in synthetic datasets as AG. Thus, this new track has been an important addition to SemTab.

Fig. 1 :Fig. 2 :12Fig. 1: Results in the CEA task for the core participants. MTab results on 2T are from 2020.

Table 1 :1Datasets used across SemTab 2021 rounds.RoundsTasksTarget KGsR1 R2 R3 CTA CPA CEA DBpedia Wikidata Schema.org2T [9]✓✓✓✓✓BioTable [20]✓✓✓✓✓AG [15]✓✓✓✓✓✓BiodivTab [2]✓✓✓✓GitTables [13]✓✓✓✓Datasets The different datasets used to run SemTab 2021 rounds are reported in Ta-ble 1, with some statistics available in Table 2. All the datasets are available in Zenodo:-Tough Tables (2T): a dataset featuring high-quality manually-curated tables withnon-obviously linkable cells, i.e., where values are ambiguous names, typos, andmisspelled entity names. These challenges are particularly relevant for the annota-tion of structured legacy sources to existing KGs.Link: https://doi.org/10.5281/zenodo.6211551-BioTable: a dataset focused on molecular biology data covering different entities.It has the larges number of rows per table in the challenge.Link: https://doi.org/10.5281/zenodo.5606585-Automatically Generated (AG): 5 a synthetic dataset with tables generated automat-ically by means of SPARQL queries. AG is the largest dataset used in SemTab.Link: https://zenodo.org/record/6154708-BiodivTab: a dataset with tables from real-world biodiversity research datasets.Original tables have been adapted for the SemTab challenge.Link: https://doi.org/10.5281/zenodo.5584180-GitTables: a large-scale corpus of relational tables extracted from CSV files inGitHub. The main purpose of this dataset is to facilitate learning table represen-tation models and applications in e.g., data management. A subset of tables hasbeen curated for benchmarking column type detection methods in SemTab.Link: https://doi.org/10.5281/zenodo.5706316

Table 2 :2Statistics of the datasets in each SemTab 2021 round. For target values: W=Wikidata; D=DBpedia; S=Schema.org.AG2TBioTablesBiodivTab GitTablesRound 2 Round 3Round 1Round 2Round 3Round 3Tables #1,750.00 7,207.00180.00110.0050.001,101.00Avg. Rows # (total)16.738.181,080.212,449.08259.0658.20Avg. Cols # (total)3.192.484.465.9723.9615.87Avg. Rows # (target CEA)16.73W8.18W1, 080.19D 1, 080.21W2, 449.08W258.28WAvg. Cols # (target CEA)1.65W1.00W3.00D 3.00W5.97W13.60WAvg. Cols # (target CTA)1.25W1.00W3.00D 3.00W5.97W12.28W3.08D 2.62SAvg. Cols # (target CPA)3.19W2.48W5.97W

Table 3 :3Participation in the SemTab 2021 challenge.Round 1Round 2Round 32TBioTable AG AG BiodivTab GitTablesCEA5D 7W6655-CTA3D 7W76664D 2SCPA-665--Total1176664

Table 4 :4Average F1-score consider the 11 participating systems. We included MTab results for 2T from SemTab 2020.Round 1Round 2Round 32TBioTable AGAG BiodivTab GitTablesCEA0.51D 0.52W0.820.91 0.900.41-CTA0.35D 0.53W0.780.91 0.800.230.04D 0.19SCPA-0.880.96 0.95--

where d(α) is the shortest distance to one of the GT classes (as for CEA, also CTA GT columns may have multiple classes). For example, d(α) = 0 if α is a class in the ground truth (cscore(α) = 1), and d(α) = 2 if α is a grandchild of a class in the ground truth (cscore(α) = 0.49). Types in the higher level(s) of the KG type hierarchy are not considered in the GT (e.g., Q35120 [entity] in Wikidata). Given the correctness score cscore, approximated Precision (AP), Recall (AR), and F1-score (AF1) for the CTA evaluation are as follows:

Table 5 :5Usability evaluation details.Open sourceAvailability as a ServiceUser Experience (GUI)MTab

http://www.cs.ox.ac.uk/isg/challenges/sem-tab/ http://oaei.ontologymatching.org/ http://ontologymatching.org/ https://www.aicrowd.com/ In SemTab 2021, also referred to as Hard Tables. AIcrowd leaderboard scores 23 participants because of test submissions. https://www.research.ibm.com/ https://github.com/phucty/mtab tool

Acknowledgements

We would like to thank the challenge participants, the ISWC & OM organisers, the AIcrowd team, and our sponsor IBM Research that played a key role in the success of SemTab. We also thank Paul Groth and C ¸agatay Demiralp for their contributions to GitTables. Moreover, we would like to thank Sirko Schindler and Birgitta König-Ries for their contribution to BiodivTab. This work was also supported by the SIRIUS Centre for Scalable Data Access (Research Council of Norway), Samsung Research UK, the EPSRC projects UK FIRES and ConCur, and the HFRI project ResponsibleER (No 969). DO and CP were supported by FCT through LASIGE (UIDB/00408/2020 and UIDP/00408/2020). We would also like to acknowledge that the work of the challenge organisers was greatly simplified by using the EasyChair conference management system and the CEUR-WS.org open-access publication service.

JenTab Meets SemTab 2021's New Challenges NAbdelmageed SSchindler Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) CEUR-WS 2021 BiodivTab: A Tabular Benchmark based on Biodiversity Research Data NAbdelmageed SSchindler BKönig-Ries Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) CEUR-WS 2021 DBpedia: A Nucleus for a Web of Open Data SAuer CBizer GKobilarov JLehmann RCyganiak ZIves The Semantic Web

Berlin Heidelberg

Springer 2007 MantisTable V: A novel and efficient approach to Semantic Table Interpretation RAvogadro MCremaschi Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) CEUR-WS 2021 A Framework for Quality Assessment of Semantic Annotations of Tabular Data RAvogadro MCremaschi EJiménez-Ruiz ARula 20th International Semantic Web Conference (ISWC) 2021 Kepler-aSI at SemTab WBaazouzi MKachroudi SFaiz Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) CEUR-WS 2021. 2021 SUS: a 'quick and dirty' usability scale JBrooke Usability evaluation in industry 189 3 1996 STILTool: A Semantic Table Interpretation evaLuation Tool MCremaschi ASiano RAvogadro EJiménez-Ruiz AMaurino ESWC 2020 Satellite Events 2020 Tough Tables: Carefully Evaluating Entity Linking for Tabular Data VCutrona FBianchi EJiménez-Ruiz MPalmonari 19th International Semantic Web Conference (ISWC) 2020 Semantically-Enabled Optimization of Digital Marketing Campaigns VCutrona FDPaoli AKošmerlj NNikolov MPalmonari FPerales DRoman International Semantic Web Conference (ISWC) Springer 2019 Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings VEfthymiou OHassanzadeh MRodriguez-Muro VChristophides ISWC Springer 2017 10587 Schema.Org: Evolution of Structured Data on the Web RVGuha DBrickley SMacbeth Commun. ACM 59 2 jan 2016 GitTables: A Large-Scale Corpus of Relational Tables MHulsebos C¸Demiralp PGroth CoRR, abs/2106.07258 2021 DAGOBAH: Table and Graph Contexts For Efficient Semantic Annotation Of Tabular Data V.-PHuynh JLiu YChabot FDeuzé TLabbé PMonnin RTroncy Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) CEUR-WS 2021 Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems EJimenez-Ruiz OHassanzadeh VEfthymiou JChen KSrinivas The Semantic Web: ESWC Springer International Publishing 2019. 2020 SemTab Results of SemTab EJiménez-Ruiz OHassanzadeh VEfthymiou JChen KSrinivas VCutrona Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 19th International Semantic Web Conference (ISWC 2020) the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 19th International Semantic Web Conference (ISWC 2020) 2020. 2020 A large public corpus of web tables containing time and context metadata OLehmberg DRitze RMeusel CBizer WWW 2016 Annotating and searching web tables using entities, types and relationships GLimaye SSarawagi SChakrabarti VLDB Endowment 3 1-2 2010 Tabular Data Annotation with MTab Tool PNguyen IYamada NKertkeidkachorn RIchise HTakeda Semtab Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) CEUR-WS 2021. 2021 SemTab DOliveira CPesquita 10.5281/zenodo.5606585 BioTable Dataset 2021. Oct. 2021 A framework to conduct and report on empirical user studies in semantic web contexts CPesquita VIvanova SLohmann PLambrix European Knowledge Acquisition Workshop Springer 2018 Matching HTML Tables to DBpedia DRitze OLehmberg CBizer Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, WIMS the 5th International Conference on Web Intelligence, Mining and Semantics, WIMS ACM 2015 10 6 MAGIC: Mining an Augmented Graph using INK, starting from a CSV BSteenwinckel FDTurck FOngenae Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) CEUR-WS 2021 Wikidata: a free collaborative knowledge base DVrandecic MKrötzsch Commun. ACM 57 10 2014 Knowledge Graphs 2021: A Data Odyssey GWeikum Proc. VLDB Endow VLDB Endow 2021 14 GBMTab: A Graph-Based Method for Interpreting Semantic Table to Knowledge Graph LYang SShen JDing JJin Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab) CEUR-WS 2021