The Hetor project: a joint effort to co-create Cultural Heritage Open Data in the Campania Region Maria Anna Ambrosino1 , Vanja Annunziata1 , Maria Angela Pellegrino1,* and Vittorio Scarano1 1 Università degli Studi di Salerno, via Giovanni Paolo II, 132 84084 Fisciano (SA), Italy Abstract Open Data are published to encourage their exploitation, but limited technical skills are a crucial barrier. Initiatives to let learners in particular and users in general exploit Open Data are rare in literature, and they mainly focus on the exploitation phase rather than the authoring one. To increase Open Data awareness and move users in the position of open data curators, the HETOR project regularly organise workshops to let participants create, publish, and exploit Open Data. This project started in 2016 and resulted in the co-creation of dozens of high-quality open datasets, publicly available on CKAN, involving hundreds of learners, public administration delegates, and volunteers in associations. This article describes the involved communities within the HETOR project and quantitatively and qualitatively details authored datasets covering any aspect of Cultural Heritage in the Campania Region. Keywords Open Data, Authoring, Local Communities, Repository, Cultural Heritage 1. Introduction “Open Data (OD) [...] can be freely used, shared and built-on by anyone, anywhere, for any purpose” [1]. OD is a promising tool to raise curiosity about data sources, data availability, and the techniques underlying data access, extraction, and analysis [2], develop data literacy [3], enhance digital skills [4, 5], stimulate critical thinking, collect relevant information and produce reliable conclusions [6]. OD are published to let interested stakeholders exploit data and create value, but limited technical skills are a crucial barrier [7]. Initiatives to let learners and interested users exploit OD are rare in literature. The situation is even worse if we look for opportunities to move them into the position of OD publishers. To advance the dialogue around methods to increase OD awareness and improve users’ skills to familiarise themselves with OD, the HETOR project regularly organizes workshops with different communities to let them create, publish, and exploit OD. This article reports the effort invested by HETOR in co-authoring OD with learners, associations, and Public Administrations (PAs). Education can take place in a heterogeneous setting, traditionally classified as formal, informal, and non-formal learning [8]. Formal learning corresponds to an intentional and systematic 19th IRCDL (The Conference on Information and Research science Connecting to Digital and Library science), February 23–24, 2023, Bari, Italy * Corresponding author. $ mariaanna.ambrosino@gmail.com (M. A. Ambrosino); vanja.annunziata@gmail.com (V. Annunziata); mapellegrino@unisa.it (M. A. Pellegrino); vitsca@unisa.it (V. Scarano) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) education model, and it typically takes place at school. Non-formal learning is still intentional but takes place outside formal learning environments, typically occurring in community settings, such as associations or clubs. While the HETOR activities with learners take place as formal learning, the ones with PAs and associations are classified as non-formal learning. The contribution of this manuscript is twofold: i) it reports the effort of the HETOR activities in preserving and digitizing Cultural Heritage (CH) of the Campania Region by co-creating OD involving communities of associations, PAs, and learners; ii) it details the HETOR datasets publicly available as CSV files on CKAN with an open license to let researchers, data lovers, or any interested user exploit available data to disseminate data, improve data quality by machine-learning based approach, or model tabular datasets via Semantic Web technologies. The article is structured as follows. Section 2 overviews related work; Section 3 reports on the HETOR project, overviews the involved communities, and quantitatively and qualitatively details the authored open datasets; Section 4 discusses potentialities interpreted as success stories and limitations; then, the article concludes with final remarks and future directions. 2. Related work More and more researchers and educators recognise the potentialities in using OD as an educational resource [9] targeting heterogeneous goals, such as focusing on deeper learners’ skills in environmental education [10, 11] or improve data visualization and data literacy skills [12, 13, 14]. Learners usually experience OD in a formal setting, as including skills in educational curricula democratises the learning process [15]. However, reaching new audiences is an important benefit of OD [16, 17, 18, 19]. Gasco et al. [20] describe and compare interventions to increase awareness of OD, enhance users’ skills and engage them in the use of OD by involving learners, PAs, non-governmental organizations, and citizens. Similarly, the HETOR project targets heterogeneous communities, i.e., schools, PAs, and associations. Interventions to improve users’ skills and knowledge proposed in the literature mainly focus on OD exploitation, to engage learners while letting them learn [21, 22, 23], improve their awareness of the environment and smart city development [24], master OD visualization [25, 26]. OD initiatives rarely move learners to the position of OD producers. Consequently, learners only sometimes experience OD production challenges, such as defining data schema, collecting information, dealing with licenses, and mastering OD authoring tools. Chen et al. [21] employ an instructional pervasive gaming model to deeper participants’ CH knowledge. They exploit an OD Kit form that is used as the interface for implicitly gathering information from the mobile device. Similarly, HETOR’s workshops move secondary school learners to the position of OD publishers, letting them experience the challenges inherent in the role of data curator. A key difference with related work is that learners explicitly author OD. 3. HETOR activities to co-create Open Data The HETOR project1 aims to collect and make available both the “Open Heritage” provided by the National Institutions, such as ISTAT, MIBACT, MIUR, and Campania Region, and the one 1 The HETOR project: http://www.hetor.it created by interested citizens concerning their local CH, improving the quality and quantity of OD at the local and national level. This article focuses on OD concerning the Campania Region. To reach these goals, the HETOR project co-creates OD in the tabular format working with schools, associations, and local PAs via a Social Platform for Open Data (SPOD)2 , reuses and exploits data via data visualizations, and disseminates data stories via social networks, such as Facebook, Instagram, and Telegram, and the Hetor website. Communities. 3 communities actively contribute to the HETOR project, associations, schools, and PAs. By detailing agencies and number of users, HETOR collaborated with 39 users belonging to 14 associations, 67 users belonging to 3 PAs, and 596 learners belonging to 9 schools. All the associations, but one, are in small municipalities, all belonging to the province of Salerno. The effort from Nocera Inferiore is remarkable, with the participation of 11 associations joining the HETOR project. The school community is the largest in terms of involved users, with Avellino holding a record of 215 users. School agencies cover all the provinces of the Campania Region but Benevento, mainly collaborating with municipalities. Moreover, schools are heterogeneous in terms of involved school type, involving both High Schools and technical institutes. The PA community is the smallest group, represented by mayors, cultural advisors, school professors, and politicians. They cover all the Campania region provinces. While some municipalities join two communities, such as Montoro and Avellino, it is remarkable the participation of Nocera Inferiore in all the communities. While activities with the schools take place as formal learning, collaborations with associations and PAs represent non-formal learning. While PAs and associations freely join the HETOR project to digitise, document, and preserve local CH, schools join it to let learners develop data literacy skills. The HETOR datasets. This section overviews datasets authored within the HETOR project by learners, local PAs, and associations, quantifies the effort invested in preserving and digitizing CH of the Campania Region, and reports the quality of the authored datasets. All the datasets3 are publicly available on CKAN with the Creative Commons License, in the CSV format, and in the Italian language. Datasets are manually authored and refined via SPOD. Table 1 reports the English dataset name, the community that authored the dataset, quantitative details in terms of the number of rows, columns, and cells, and qualitative details in terms of completeness and accuracy. When we report that a dataset is authored by a given community, such as the school, we mean that learners created the dataset supervised by the HETOR group. Datasets are classified according to the CH definition in Tangible CH, further split into movable and immovable, Intangible CH, Natural CH, Food & Wine, and other that includes geographical information and details about companies and associations. The completeness metric reports the percentage of non-empty values. The accuracy metric is computed by verifying how many textual geographical fields (such as municipalities) are correctly reconciled with Wikidata towns or municipalities. The accuracy metric also considers how many ZIP codes (if any) in the datasets match the ones retrieved by Wikidata. The qualitative information is computed by Open Refine, exploiting the facet and the reconciliation mechanisms. 2 SPOD: http://spod.databenc.it 3 Hetor datasets: http://www.hetor.it/dataset Table 1: Overview of Open datasets co-created within the HETOR project. Dataset details Quantitative info Qualitative info Name Author Rows Cols Cells CMP. ACC. Tangible Cultural Heritage - Immovable Cultural Heritage Castels and coast towers Hetor 523 31 16213 45% 96% Rock cults Hetor 88 25 2200 50% 82% Theatres and odeons Hetor 32 27 864 77% 87% Noble palaces in Fisciano Assoc. 22 16 352 89% 100% Churches and art in Calitri School 64 19 1216 86% 97% Cilento resources Hetor 145 5 725 98% 87% Abandoned factories School 69 23 1587 72% 99% CH of San Nicola la Strada School 22 13 286 83% 100% Calitri buldings School 27 18 486 75% 100% Novera Inferiore Itineraries School 49 14 686 98% 98% Agrometeorological network School 33 12 396 92% 70% Collina del Parco risk map Assoc. 8 13 104 70% - Caserta contemporary itineraries School 31 16 496 98% 94% Caserta modern itineraries School 31 14 434 100% 100% Caserta medieval itineraries School 42 16 672 99% 93% Capua & Aversa Churches School 133 20 2660 57% 94% Agriculture assistance centres School 161 10 1610 99% 89% Clinical records of Psychiatric Hos- Assoc. 200 10 2000 90% 86% pital in Nocera Inferiore Bio companies School 203 16 3248 100 % 89% Solidarity Purchasing Groups School 29 12 377 99% 85% Didactic farms School 267 14 3738 99% - Gate crests of Nocera Inferiore Assoc. 9 7 63 70% - Nocera Inferiore votive shrines Assoc. 49 12 588 74% - Photografic Safari @Paestum School & 115 9 1035 98% 100% Assoc. Touring club Assoc. 29 19 551 72% 97% Avellino POI School 1439 13 18707 83% 99% Caserta POI School 1314 13 17082 81% 100% Nocerino - Sarnese POI School 285 13 3705 82% 99% Artistic High Schools Assoc. 36 37 1332 92% 94% Museums of Cilento and the Gulf School 69 17 1173 68% 93% of Policastro Hidden treasures Assoc. 83 50 4150 77% 96% Tangible Cultural Heritage - Movable Cultural Heritage Trademarks School 32 43 1376 98% 58% Peasant civilization School 196 14 2744 87% - Open Museum School 213 18 3834 90% - Irpino Museum: Epigraphs School 11 53 583 93% 91% Irpino Museum School 21 30 630 100% 100% Ancient arts and jobs School 95 13 1235 80% - San Nicola La Strada churches School 35 12 420 97% - decor elements Forino company trademarks Assoc. 109 23 2507 100% - Chronicle of Nuceria Alfaterna and Assoc. 327 27 8829 69% 95% its territory: the Agro Nocerino mu- seum Handicrafts Hetor 26 12 312 100% 54% Monumental complex of the former School 95 18 1710 98% 85% Bourbon prison of Avellino Art at UNISA School 7 9 63 94% - Mathematics Museum School 53 18 954 96% - Intangible Cultural Heritage Central Political Records Office School 509 22 11198 88% 81% Provincial political records of School 464 23 10672 88% 79% Caserta during the Kingdom of Italy “La torre” press School 1287 10 12870 100% - Uses and customs of Upper Irpinia School 65 11 715 79% 89% Ancient arts and crafts of the Beni- School 130 13 1690 80% - amino Tartaglia Museum of Aquilo- nia - Crafts Section Traditional games Assoc. & 29 15 435 60% - PA The Nocerina industry from the Assoc. 385 12 4620 30% - unification of Italy to the economic miracle Proverbs and ancient words Assoc. 83 10 830 100% - The local press since the Italian uni- Assoc. 33 17 561 45% - fication History of the Carnival and of the School 35 10 350 90% 100% Carts of Marcianise Natural Heritage Natural areas Hetor 42 18 756 94% - 2018 blue flag beaches Hetor 54 10 540 86% 100% Regional forests School 10 19 190 98% 60% Seed woods School 17 19 323 100% 88% 2020 blue flag beaches Hetor 60 10 600 86% 100% 2021 blue flag beaches Hetor 61 10 610 85% 100% 2022 blue flag beaches Hetor 62 11 682 83% 100% Food and Wine Typical products Hetor 607 15 9105 84% - Wines Hetor 1858 15 27870 91% 95% Dairies authorized for the produc- School 91 20 1820 100% 96% tion of buffalo mozzarella D.O.P. Producers at Km 0 School 35 24 840 100% 88% Farms authorized to produce D.O.P. School 122 12 1464 100% 92% buffalo mozzarella Pizzerias in Naples and Caserta School 49 19 931 100% 98% D.O.C.G., D.O.C., I.G.P. wines School 79 16 1264 83% - Craft breweries School 88 27 2376 81% 91% Coffee roasters Hetor 107 19 2033 99% 97% Salerno farmhouses School 207 15 3105 98% 93% Slow Food Presidia School 89 15 1335 100% - Social farms School 19 22 418 98% 95% Nocera: social farms Assoc 16 5 80 100% - Other (Companies and Geographical Information Nocera Inferiore streets School 245 16 3920 100% - Pro Loco Hetor 580 13 7540 79% 95% Autonomous Care, Stay and Hetor 15 11 165 92% 64% Tourism companies Tourist Boards Hetor 5 10 50 100% - San Nicola La Strada streets School 163 12 1956 100% - ANICAV Companies School 32 12 384 100% 84% Companies in Upper Irpinia School 407 17 6919 78% 100% Battipaglia & Eboli Companies School 169 22 3718 99% 100% Salerno Start up and SMEs School 153 21 3213 99% 77% Montoro’s fractions Assoc. 79 11 869 91% 30% Avellino municipalities School 118 24 2832 94% 99% Salerno municipalities School 158 24 3792 96% 97% 4. Discussion: Potentialities 𝑃𝑥 and Limitations 𝐿𝑦 𝑃1 - Joint effort. Since 2016, HETOR has collaborated with three communities, associations, schools, and local PAs, with 27 agencies and 702 users. It demonstrates that the HETOR project is a joint effort of data lovers, experts in the field, citizens, and learners in co-creating content as OD. The biggest community in terms of agencies is the association one, with 14 joining agencies. It involves volunteers, data experts, and data lovers. 𝑃2 - Consistent OD co-creation effort. The HETOR project co-authored 87 datasets concerning CH in the Campania Region since 2016. It is worth noting that the dataset collection presented in this article is a subset of the published datasets as we focus only on local CH in our Region. Looking at Table 1, it is evident that datasets differ in size and topics, covering all the aspects of CH, i.e., tangible and intangible heritage, natural heritage, and food and wine. They also cover other topics relevant for citizens, such as companies, associations, and geographical information in the Campania Region. The same topic is modeled in different areas of the Campania Region, such as itineraries, and points of interest (POI), to guarantee a wider geographical coverage. 𝑃3 - High-quality OD. As made evident by the CMP. column of Table 1, the completeness percentage of the HETOR datasets is overall very high. Only in 10 out 87 cases, the percentage is lower than 75% of the dataset. It is worth clarifying that the reported percentage count non-empty cells. In some datasets, authors explicitly report missing information that does not affect the reported value. Moreover, according to the ACC. column of Table 1, the accuracy score of the geographical information is very high. It is always less than 70% in only 4 out of 87 datasets. It means that published datasets can be considered high-quality data. 𝐿1 - Tabular OD. All the authored datasets are published as CSV. They are the best way to pub- lish independent datasets, not yet interlinked. Modeling data as tables forces the data publisher to represent all the entries with the same structure, causing empty values for not applicable columns or the use of lists in a single cell. By exploiting the Semantic Web technologies, any entry can be modeled with an arbitrary number of relations. 𝐿2 - No uniform schema. The datasets differ for schema, in terms of the amount and the type of modelled columns, and lack a uniform terminology in the column headers. Before modeling a unified schema, it is suggested to carefully check the datasets’ content to avoid modeling columns that are declared as headers, but contain no data. 𝐿3 - Inaccurate values due to manual input. The datasets are manually curated. Hence, typos, improper use of apostrophes as accents, and misspelled words are common errors. It causes the deficiency observed in the datasets accuracy. Moreover, string facets in Open Refine detected non-uniform use of lower and upper-case, switched letters, wide use of acronyms, and improper usage of apostrophes and accents. 5. Conclusions and Future directions Since 2016, the HETOR project co-create OD with different communities (𝑃1 ) to digitize CH in the Campania Region. This effort resulted in 87 high-quality Open Datasets freely available on CKAN (𝑃2 , 𝑃3 ). Topics span from tangible and intangible CH, natural heritage, gastronomic curiosities, and information of public interest. This remarkable result is attributable to the effort of the HETOR project to propose structured activities built around the collaborative platform SPOD and a meticulous search for the data to be modeled to digitize CH of the Campania region. All the datasets are published as CSV attached to the Creative Commons License. Since different communities author them over time, they have no uniform schema (𝐿1 , 𝐿2 ). Published datasets might take advantage by proposing a uniform schema, such as an ontology, for each dataset group. Moreover, datasets are manually curated (𝐿3 ). Hence, they contain inaccurate values that can be easily corrected by automatic data quality approaches, such as clustering approaches to detect and correct typos, or by reconciling values with the ones published in well-known Knowledge Graphs, such as Wikidata. Further effort should be invested in quantifying the coherence and the coverage with respect to the covered topics. References [1] Open Knowledge Foundation, Defining open data, 2013. https://blog.okfn.org/2013/10/03/ defining-open-data, [Online, Last access November 2022]. [2] A. Trentini, S. Scaravati, Raising curiosity about open data via the ‘physiradio’ musicaliza- tion iot device, Data Science Journal 19 (2020) 39. doi:10.5334/dsj-2020-039. [3] L. Van Audenhove, W. Van den Broeck, I. Mariën, Data literacy and education: Introduction and the challenges for our field, Journal of Media Literacy Education 3 (2020) 1–5. doi:10. 23860/JMLE-2020-12-3-1. [4] T. Coughlan, The use of open data as a material for learning, Educational Technology Research and Development 68 (2020) 383–411. doi:10.1007/s11423-019-09706-y. [5] K. Shamash, J. P. Alperin, A. Bordini, Teaching data analysis in the social sciences: A case study with article level metrics, Open Data as Open Educational Resources (2015) 49. [6] E. Tovar, N. Piedra, Guest editorial: open educational resources in engineering education: various perspectives opening the education of engineers, IEEE Transactions on Education 57 (2014) 213–219. doi:10.1109/TE.2014.2359257. [7] M. Janssen, Y. Charalabidis, A. Zuiderwijk, Benefits, adoption barriers and myths of open data and open government, Information systems management 29 (2012) 258–268. [8] C. Z. Dib, Formal, non-formal and informal education: concepts/applicability, in: AIP conference proceedings, volume 173, American Institute of Physics, 1988, pp. 300–315. doi:10.1063/1.37526. [9] N. Piedra, J. Chicaiza, J. López, E. T. Caro, A rating system that open-data repositories must satisfy to be considered OER: Reusing open data resources in teaching, in: Global Engineering Education Conference, 2017, pp. 1768–1777. doi:10.1109/EDUCON.2017. 7943089. [10] J. Álvarez Otero, M. Lázaro, M. JesusG, A cloud-based GiScience learning approach to spanish national parks, European Journal of Geography 9 (2018) 6–20. URL: http: //hdl.handle.net/10612/10756. [11] K. Charvat, O. Cerba, D. Kozuch, M. Splichal, Geospatial data based environment in INSPIRE4Youth, Procedia Computer Science 104 (2017) 183–189. doi:10.1016/j.procs. 2017.01.101. [12] R. R. Kurada, Y. Ramu, S. Pattem, Lessoning geospatial visualizations on real-time data, in: 2021 IEEE International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), 2021, pp. 1–6. doi:10.1109/CSITSS54238.2021. 9683776. [13] F. Windhager, E. Mayr, G. Schreder, M. Smuc, Linked information visualization for linked open government data. a visual synthetics approach to governmental data and knowledge collections, JeDEM-eJournal of eDemocracy and Open Government 8 (2016) 87–116. doi:10.29379/jedem.v8i2.436. [14] R. De Donato, M. Garofalo, D. Malandrino, M. A. Pellegrino, A. Petta, Education meets knowledge graphs for the knowledge management, in: Methodologies and In- telligent Systems for Technology Enhanced Learning, 10th International Conference. Workshops, Springer International Publishing, Cham, 2021, pp. 272–280. doi:10.1007/ 978-3-030-52287-2_28. [15] J. E. Weishart, Democratizing education rights, William & Mary Bill of Rights Journal 29 (2020) 1. [16] I. Susha, Å. Grönlund, M. Janssen, Driving factors of service innovation using open government data: An exploratory study of entrepreneurs in two countries, Information polity 20 (2015) 19–34. doi:10.3233/IP-150353. [17] I. Safarov, A. Meijer, S. Grimmelikhuijsen, Utilization of open government data: A system- atic literature review of types, conditions, effects and users, Information Polity 22 (2017) 1–24. doi:10.3233/IP-160012. [18] E. G. Martin, G. M. Begany, Opening government health data to the public: benefits, challenges, and lessons learned from early innovators, Journal of the American Medical Informatics Association 24 (2017) 345–351. doi:10.1093/jamia/ocw076. [19] C. Baldwin, Using public sector open data to benefit local communities, Computer Weekly (2014) 17–20. [20] M. Gascó-Hernández, E. G. Martin, L. Reggi, S. Pyo, L. F. Luna-Reyes, Promoting the use of open government data: Cases of training and engagement, Government Information Quarterly 35 (2018) 233–242. doi:10.1016/j.giq.2018.01.003. [21] C.-P. Chen, J.-L. Shih, Y.-C. Ma, Using instructional pervasive game for school children’s cultural learning, Journal of Educational Technology & Society 17 (2014) 169–182. URL: https://www.jstor.org/stable/jeductechsoci.17.2.169. [22] A. Dickinson, M. Lochrie, P. Egglestone, Datapet: Designing a participatory sensing data game for children, in: Proceedings of the British Human-Computer Interaction Conference, 2015, p. 263–264. doi:10.1145/2783446.2783602. [23] I. Vargianniti, K. Karpouzis, Using big and open data to generate content for an educational game to increase student performance and interest, Big Data and Cognitive Computing 4 (2020). doi:10.3390/bdcc4040030. [24] M. Saddiqa, L. Rasmussen, R. Magnussen, B. Larsen, J. M. Pedersen, Bringing open data into danish schools and its potential impact on school pupils, in: Proceedings of the 15th International Symposium on Open Collaboration, 2019, pp. 1–10. doi:10.1145/3306446. 3340821. [25] M. Saddiqa, B. Larsen, R. Magnussen, L. L. Rasmussen, J. M. Pedersen, Open data visual- ization in danish schools: A case study, in: Proceedings of International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), 2019. URL: http://hdl.handle.net/11025/35629. [26] A. Antelmi, M. A. Pellegrino, Open data literacy by remote: Hiccups and lessons, in: Proceedings of the Symposium on Open Data and Knowledge for a Post-Pandemic Era (ODAK), BCS Learning & Development, 2022, pp. 1–5. doi:10.14236/ewic/ODAK22.7.