=Paper= {{Paper |id=Vol-3741/tta02 |storemode=property |title=Knowledge Discovery su Schemi per l’Integrazione di Basi di Dati |pdfUrl=https://ceur-ws.org/Vol-3741/tta02.pdf |volume=Vol-3741 |authors=Massimo La Camera,Luigi Palopoli,Domenico Saccà,Domenico Ursino |dblpUrl=https://dblp.org/rec/conf/sebd/Camera0SU24 }} ==Knowledge Discovery su Schemi per l’Integrazione di Basi di Dati== https://ceur-ws.org/Vol-3741/tta02.pdf
                                Knowledge Discovery su Schemi per l’Integrazione di
                                Basi di Dati
                                Massimo La Camera1,† , Luigi Palopoli2,† , Domenico Saccà2 and Domenico Ursino3,*,†
                                1
                                  Tecnoter s.r.l
                                2
                                  DIMES, Università della Calabria
                                2
                                  DII,Università Politecnica delle Marche


                                            Abstract
                                            In the early 1990s, with the advent of computer networks, the need to integrate heterogeneous databases
                                            became increasingly important. This activity was complex but at the same time challenging, since the
                                            heterogeneities to be managed were varied and concerned both the extensional component and, perhaps
                                            most importantly, the intensional one. Several research groups in Italy and all over the world started to
                                            propose solutions to this problem.




                                1. Introduction: DIKE and XIKE
                                In the early 1990s, the increasing spread of computer networks opened up new horizons and,
                                at the same time new challenges, in all areas of computer science. The world of databases
                                was no exception, and from the very beginning it was clear that the possibility of integrating
                                heterogeneous databases and making them work together would have enormous benefits, but
                                also pose challenging issues. In those years, the “reigning” logic model was the relational
                                one. Therefore, the heterogeneity was not so much in the data representation model as in the
                                data itself and the conventions used for it (for instance, the string “BL” could represent the
                                color blue in one database and the color black in another). In addition to the heterogenity of
                                extensional component, there was the heterogeneity of intensional one (i.e., regarding schemas
                                and semantics), which was undoubtedly the most difficult to manage. In fact, when integrating
                                different database schemas, it was necessary to detect and handle synonymies (i.e., the same
                                concept represented by different names in different databases), homonymies (i.e., different
                                concepts represented by the same name in different databases), hyponymies/hyperonimies (i.e.,
                                the presence in a schema of a concept that is a specialization of another concept from another
                                schema), and so on.
                                   Several research groups in Italy and around the world have taken up this challenge and started
                                to propose solutions. One of them was the group from University of Calabria, which focused


                                SEBD 2024: The 32nd Italian Symposium on Advanced Database Systems, June 23-26, 2024 - Villasimius, Sardinia, Italy
                                *
                                  Corresponding author.
                                †
                                  These authors contributed equally.
                                $ massimo.lacamera@tecnoter.net (M. La Camera); palopoli@dimes.unical.it (L. Palopoli); sacca@dimes.unical.it
                                (D. Saccà); d.ursino@univpm.it (D. Ursino)
                                 0000-0003-4915-5137 (L. Palopoli); 0000-0003-3584-5372 (D. Saccà); 0000-0003-1360-8499 (D. Ursino)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
mainly on extracting interschema properties (i.e., synonymies, homonymies, hyponymies/hy-
peronimies, subschema similarities) and using them for database integration. The first solution
proposed by this group was presented at SEBD in 1997 [1]. This solution, after several studies
and refinements described in papers published at SEBD [2, 3, 4], as well as in conference pro-
ceedings [5, 6, 7, 8, 9] and in prestigious international journals [10, 11, 12, 13, 14], gave rise to
the DIKE (Database Intensional Knowledge Extractor) system [15, 16].
   Underlying the DIKE approach was a seemingly very simple, yet powerful concept that is
present in various forms in many areas of computer science and research in general. In fact, DIKE
assumed that, given two concepts from different databases, if their “neighborhing” concepts in
the databases they belong to were similar, then they were probably similar; conversely, if their
“neighborhing” concepts were different, then they were probably different.
   We feel it necessary to point out that in Italy, in the same years, the research group of the
University of Brescia and the University of Milan, which proposed ARTEMIS [17, 18, 19], and
the research group of the University of Modena and Reggio Emilia, which proposed MOMIS
[20, 21], were working on the same issues. Abroad, several research groups strove to address
the same challenge. Among them, one of the most renowned was undoubtedly the group of
Prof. Philip Bernstein at University of Seattle and Microsoft Research, which proposed Cupid
[22].
   An important acknowledgement to the database integration Italian school came from Prof.
Bernstein himself, who in a famous paper [22] proposed a detailed comparison between Cupid,
DIKE and ARTEMIS/MOMIS, recognizing how the latter two systems were able to “compete on
a par” with Cupid. Again DIKE and ARTEMIS/MOMIS were considered, along with Cupid, in
another important paper [23], which proposed an account of schema matching research ten
years after the publication of [22].
   Over the years, the relational model, while still extremely important in the database world,
showed all of its limitations when dealing with semi-structured and unstructured data. To
handle semi-structured data, the Object Exchange Model (OEM) and later the Extensible Markup
Language (XML) and JavaScript Object Notation (JSON) were proposed over the years. The
authors of DIKE continued their research in this area and presented approaches capable of
extracting and handling interschema properties from semi-structured data [24, 25, 26, 27]. This
research effort led over the years to the XIKE (XML source Intensional Knowledge Extractor)
system [28].
   The SEBD Community has followed this stream of innovation, first with the “progenitors”
of DIKE [1], and later with DIKE [2, 3, 4] and XIKE [25, 27]. Research on DIKE and XIKE has
also received some awards. These include the Best Student Paper Award at the International
Symposium on Advances in Databases and Information Systems (ADBIS’99) [29] and the
publication of Domenico Ursino’s PhD thesis in Springer’s Lecture Notes in Computer Science
series [30]. Both DIKE and XIKE have been acquired by companies for use as cores within
frameworks aimed at managing Cooperative Information Systems.
   DIKE and XIKE represented the apex of the studies of the database integration group of the
University of Calabria. By the middle of the first decade of the new century, the members of
the group were interested in new topics such as intelligent agents, recommender systems, data
mining, social network analysis, and deep learning. However, these members were able to
observe how some of the ideas that were the basis of DIKE/XIKE were adopted, perhaps under
different forms and names, in research in the areas they were directly concerned with and in
others. To take just one example, the idea behind DIKE/XIKE that the semantics of a concept
depends on its neighbors is used in collaborative filtering recommender systems, when we say
that the interests of a person are influenced by the ones of people closest to her/him, and is the
basis of the concept of homophily [31] in social network analysis. A researcher should not be
surprised that there are some ideas/principles so powerful and general that they can be used
successfully in a variety of fields. However, discovering this through direct personal experience
is always astonishing, despite the many years of research she/he may have behind her/him.


References
 [1] M. L. Camera, L. Palopoli, D. Saccà, D. Ursino, Knowledge discovery su schemi per
     l’integrazione di sistemi di basi di dati, in: Atti del Congresso sui Sistemi Evoluti per Basi
     di Dati (SEBD’97), Verona, Italy, 1997, pp. 166–190. In Italian.
 [2] A. Bonifati, L. Palopoli, D. Saccà, D. Ursino, Utilizzo della logica descrittiva per l’estrazione
     di proprietà terminologiche e strutturali complesse, in: Atti del Congresso su Sistemi
     Evoluti per Basi di Dati (SEBD’98), Ancona, Italy, 1998, pp. 71–86. In Italian.
 [3] L. Palopoli, L. Pontieri, D. Ursino, Progettazione semi-automatica di data warehouse di
     grandi dimensioni, in: Atti del Congresso su Sistemi Evoluti per Basi di Dati (SEBD’99),
     Como, Italy, 1999, pp. 3–17. In Italian.
 [4] L. Palopoli, G. Terracina, D. Ursino, Derivazione di iponimie/iperonimie tra entità apparte-
     nenti a basi di dati eterogenee, in: Atti del Congresso su Sistemi Evoluti per Basi di Dati
     (SEBD 2000), L’Aquila, Italy, 2000, pp. 357–370. In Italian.
 [5] L. Palopoli, D. Saccà, D. Ursino, Semi-automatic, semantic discovery of properties from
     database schemes, in: Proc. of the International Database Engineering and Applications
     Symposium (IDEAS ’98), Cardiff (Wales), UK, 1998, pp. 244–253. IEEE Computer Society.
 [6] L. Palopoli, D. Saccà, D. Ursino, An automatic technique for detecting type conflicts in
     database schemes, in: Proc. of the ACM International Conference on Information and
     Knowledge Management (CIKM’98), Bethesda, Maryland, USA, 1998, pp. 306–313. ACM
     Press.
 [7] D. Ursino, Deriving type conflicts and object cluster similarities in database schemes by an
     automatic and semantic approach, in: Proc. of the International Symposium on Advances
     in Databases and Information Systems (ADBIS’99), Maribor, Slovenia, 1999, pp. 46–60.
     Lecture Notes in Computer Science, Springer-Verlag.
 [8] L. Palopoli, D. Saccà, G. Terracina, D. Ursino, A unified graph-based framework for deriving
     nominal interscheme properties, type conflicts and object cluster similarities, in: Proc. of
     the International Conference on Cooperative Information Systems (CoopIS’99), Edinburgh,
     Scotland, United Kingdom, 1999, pp. 34–45. IEEE Computer Society.
 [9] G. Terracina, D. Ursino, A study on the interaction between interscheme property extrac-
     tion and type conflict resolution, in: Proc. of the International Database Engineering and
     Applications Symposium (IDEAS ’00), Yokohama, Japan, 2000, pp. 25–33. IEEE Computer
     Society.
[10] L. Palopoli, D. Saccà, D. Ursino, Semi-automatic techniques for deriving interscheme
     properties from database schemes, Data & Knowledge Engineering 30(4) (1999) 239–273.
[11] L. Palopoli, D. Saccà, D. Ursino, DL𝑃 : a description logic for extracting and managing
     complex terminological and structural properties from database schemes, Information
     Systems 24(5) (1999) 410–424.
[12] L. Palopoli, L. Pontieri, G. Terracina, D. Ursino, Intensional and extensional integration
     and abstraction of heterogeneous databases., Data & Knowledge Engineering 35(3) (2000)
     201–237.
[13] G. Terracina, D. Ursino, A uniform methodology for extracting type conflicts and sub-
     scheme similarities from heterogeneous databases, Information Systems 25(8) (2000)
     527–552.
[14] L. Palopoli, D. Saccà, G. Terracina, D. Ursino, Uniform techniques for deriving similarities
     of objects and subschemes in heterogeneous databases, IEEE Transactions on Knowledge
     and Data Engineering 15(2) (2003) 271–294.
[15] L. Palopoli, G. Terracina, D. Ursino, Experiences using DIKE, a system for supporting
     cooperative information system and data warehouse design, Information Systems 28(7)
     (2003) 835–865.
[16] L. Palopoli, G. Terracina, D. Ursino, DIKE: a system supporting the semi-automatic
     construction of Cooperative Information Systems from heterogeneous databases, Software
     Practice & Experience 33(9) (2003) 847–884.
[17] S. Castano, V. D. Antonellis, Reference conceptual architecture for re-engineering infor-
     mation systems, International Journal of Cooperative Information Systems 4(2) (1995)
     213–235.
[18] S. Castano, V. D. Antonellis, M. Fugini, B. Pernici, Conceptual schema analysis: Techniques
     and applications, ACM Transactions on Database Systems (TODS) 23 (1998) 286–332.
[19] S. Castano, V. D. Antonellis, Building views over semistructured data sources, in: Proc.
     of the International Conference on Conceptual Modeling (ER’99), Paris, France, 1999, pp.
     146–160. Springer.
[20] S. Bergamaschi, S. Castano, M. Vincini, Semantic integration of semistructured and
     structured data sources, SIGMOD Record 28(1) (1999) 54–59.
[21] S. Bergamaschi, S. Castano, M. Vincini, D. Beneventano, Semantic integration and query of
     heterogeneous information sources, Data & Knowledge Engineering 36(3) (2001) 215–249.
[22] J. Madhavan, P. Bernstein, E. Rahm, Generic schema matching with Cupid, in: Proc. of
     the International Conference on Very Large Data Bases (VLDB 2001), Rome, Italy, 2001, pp.
     49–58. Morgan Kaufmann.
[23] P. Bernstein, J. Madhavan, E. Rahm, Generic Schema Matching, Ten Years Later, Proceed-
     ings of the VLDB Endowment 4 (2011) 695–701.
[24] G. Terracina, D. Ursino, Deriving synonymies and homonymies of object classes in semi-
     structured information sources, in: Proc. of the International Conference on Management
     of Data (COMAD 2000), Pune, India, 2000, pp. 21–32. McGraw Hill.
[25] P. De Meo, G. Quattrone, G. Terracina, D. Ursino, Estrazione, a vari livelli di “severità”, di
     proprietà interschema da Schemi XML, in: Atti del Congresso sui Sistemi Evoluti per Basi
     di Dati (SEBD 2004), S. Margherita di Pula (Cagliari), Italy, 2004, pp. 290–301.
[26] P. De Meo, G. Quattrone, G. Terracina, D. Ursino, Integration of XML Schemas at various
     “severity” levels, Information Systems 31(6) (2006) 397–434.
[27] P. De Meo, G. Quattrone, G. Terracina, D. Ursino, Utilizzo delle proprietà interschema per
     il clustering di schemi XML semanticamente eterogenei, in: Atti del Congresso sui Sistemi
     Evoluti per Basi di Dati (SEBD 2005), Brixen-Bressanone, Italy, 2005, pp. 336–347. Aracne.
[28] P. De Meo, G. Quattrone, G. Terracina, D. Ursino, Experiences with the system XIKE (XML
     source Intensional Knowledge Extractor), Soft Computing: New Research (2009) 333–386.
     Nova Science notes.
[29] L. Palopoli, G. Terracina, D. Ursino, The system dike: Towards the semi-automatic synthesis
     of cooperative information systems and data warehouses, in: Proc. of the Challenges of
     Symposium on Advances in Databases and Information Systems (ADBIS-DASFAA 2000),
     Prague, Czech Republic, 2000, pp. 108–117. Matfyzpress.
[30] D. Ursino, Extraction and Exploitation of Intensional Knowledge from Heterogeneous
     Information Sources, Heidelberg, Germany, 2002. PhD Thesis, Lecture Notes in Computer
     Science 2282, Springer Verlag.
[31] M. McPherson, L. Smith-Lovin, J. Cook, Birds of a feather: Homophily in social networks,
     Annual Review of Sociology 27 (2001) 415–444. JSTOR.