1. Introduction

Kavitha Srinivas

0 1 2 5 6

IBM Research

hassanzadeh@us.ibm.com 0 1 2 5 6 0 , Ernesto Jiménez-Ruiz 1 , Sainyam Galhotra 2 Bases (VLDBW'23) - TaDA'23: Tabular Data Analysis Workshop 3 City, University of London , United Kingdom 4 Cornell University , USA 5 Halevy from Meta AI 6 Renée Miller from Northeastern University and Alon

With the advent of data lakes and open data repositories containing heterogeneous collections of structured datasets, there is an increasing need for automated methods to analyze tabular data collections for a wide range of applications in data management, data science, and decision support. Our goal in this workshop was to bring together researchers and practitioners working on building such tabular data analysis solutions. TaDa workshop aimed to provide a venue for the growing number of researchers in data management, AI, and Semantic Web communities working on a wide range of problems relevant to tabular data analysis. The first edition of the workshop included two keynote talks, a research track comprising presentations and posters, and invited posters and virtual talks of the work done in these communities.

1. Introduction

Data Analysis, as a crucial process in various domains, involves examining, cleaning, transforming, and modeling data to extract valuable insights, make informed conclusions, and facilitate decision-making [1]. However, performing such data analysis tasks becomes exceedingly complex when dealing with vast and diverse collections of tabular data, commonly found in enterprise data lakes and on the Web. Consequently, this challenge has piqued the interest of researchers and practitioners in data management, AI, and related communities [2, 3, 4, 5, 6].

To address the fundamental research challenges posed by tabular data analysis and foster the development of automated solutions, Tabular Data Analysis (TaDA 2023) workshop (https://tabular-data-analysis.github.io/ tada2023/) was organized with the primary goal of bringing together experts from diverse communities. This workshop aimed to create a collaborative environment for researchers and practitioners in data management and AI fields, enabling them to share insights, methodologies, and advancements in tackling the complexities of analyzing large and heterogeneous collections of tabular data. The workshop provided a forum for: • Exchange of ideas between two communities: 1) an active community of data management researchers working on data integration, schema ALITE [11], a method for integrating tables using full disjunction, and DIALITE [12], an open discovery system for analyzing tables, sharing new benchmarks for evalu

Acknowledgements

We would like to thank the steering committee, the program committee, the keynote speakers, and the authors for their contributions. Finally, we thank the workshop attendees for making TaDA a great venue to discuss the works in the area of tabular data analysis. in developing and evaluating scalable table search and integration methods on real data.

Alon’s keynote emphasized the significance of understanding how individuals can leverage their generated data to enhance their health, vitality, productivity, and overall well-being. He motivated the research on fusing personal digital data, discussed potential pitfalls, and explored multiple approaches to querying timelines. This application area necessitated careful consideration of language models to efectively query partially structured and unstructured data. ternational Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, ACM, 2020, pp.

1939–1950. doi:10.1145/3318464.3380605. [7] E. Jiménez-Ruiz, O. Hassanzadeh, K. Srinivas,

V. Efthymiou, J. Chen (Eds.), Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 18th International Semantic Web Conference, SemTab@ISWC 2019, Auckland, New Zealand, October 30, 2019, volume 2553 of CEUR Workshop Proceedings, CEUR-WS.org, 2020. [8] E. Jiménez-Ruiz, O. Hassanzadeh, V. Efthymiou,

J. Chen, K. Srinivas, V. Cutrona (Eds.), Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2020) co-located with the 19th International Semantic Web Conference (ISWC 2020), Virtual conference, November 5, 2020, volume 2775 of CEUR Workshop

Proceedings, CEUR-WS.org, 2020. [9] E. Jiménez-Ruiz, V. Efthymiou, J. Chen, V. Cutrona,

O. Hassanzadeh, J. Sequeda, K. Srinivas, N. Abdelmageed, M. Hulsebos, D. Oliveira, C. Pesquita (Eds.), [1] M. Brown, Transforming Unstructured Data into Proceedings of the Semantic Web Challenge on TabUseful Information, 2014, pp. 211–230. doi:10. ular Data to Knowledge Graph Matching co-located 1201/b16666-11. with the 20th International Semantic Web Confer[2] O. Hassanzadeh, A. Kementsietsidis, B. Kimelfeld, ence (ISWC 2021), Virtual conference, October 27, R. Krishnamurthy, F. Ozcan, I. Pandis, Next genera- 2021, volume 3103 of CEUR Workshop Proceedings, tion data analytics at IBM research, Proc. VLDB En- CEUR-WS.org, 2022. dow. 6 (2013) 1174–1175. doi:10.14778/2536222. [10] V. Efthymiou, E. Jiménez-Ruiz, J. Chen, V. Cutrona, 2536246. O. Hassanzadeh, J. Sequeda, K. Srinivas, N. Abdel[3] S. Galhotra, A. Fariha, R. Lourenço, J. Freire, A. Me- mageed, M. Hulsebos (Eds.), Proceedings of the Seliou, D. Srivastava, Dataprism: Exposing disconnect mantic Web Challenge on Tabular Data to Knowlbetween data and systems, in: Z. Ives, A. Bonifati, edge Graph Matching, SemTab 2022, co-located A. E. Abbadi (Eds.), SIGMOD ’22: International Con- with the 21st International Semantic Web Conferference on Management of Data, Philadelphia, PA, ence, ISWC 2022, Virtual conference, October 23-27, USA, June 12 - 17, 2022, ACM, 2022, pp. 217–231. 2022, volume 3320 of CEUR Workshop Proceedings, doi:10.1145/3514221.3517864. CEUR-WS.org, 2023. [4] M. Helali, E. Mansour, I. Abdelaziz, J. Dolby, K. Srini- [11] A. Khatiwada, R. Shraga, W. Gatterbauer, R. J. Miller, vas, A scalable automl approach based on graph Integrating data lake tables, Proc. VLDB Endow. 16 neural networks, Proc. VLDB Endow. 15 (2022) (2022) 932–945.

2428–2436. doi:10.14778/3551793.3551804. [12] A. Khatiwada, R. Shraga, R. J. Miller, DIALITE: [5] F. Özcan, C. Lei, A. Quamar, V. Efthymiou, Se- discover, align and integrate open data tables, in: mantic enrichment of data for AI applications, in: S. Das, I. Pandis, K. S. Candan, S. Amer-Yahia (Eds.), M. Boehm, J. Stoyanovich, S. Whang (Eds.), Proceed- Companion of the 2023 International Conference on ings of the Fifth Workshop on Data Management Management of Data, SIGMOD/PODS 2023, Seattle, for End-To-End Machine Learning, In conjunction WA, USA, June 18-23, 2023, ACM, 2023, pp. 187–190. with the 2021 ACM SIGMOD/PODS Conference, doi:10.1145/3555041.3589732. DEEM@SIGMOD 2021, Virtual Event, China, 20 June, 2021, ACM, 2021, pp. 4:1–4:7. doi:10.1145/ 3462462.3468881. [6] F. Nargesian, K. Q. Pu, E. Zhu, B. G. Bashardoost,

R. J. Miller, Organizing data lakes for navigation, in: D. Maier, R. Pottinger, A. Doan, W. Tan, A. Alawini, H. Q. Ngo (Eds.), Proceedings of the 2020 In