Hybrid Technologies for Databases Anatoly V. Anisimov, Igor O. Zavadskiy, Petro P. Kuliabko Taras Shevchenko National University of Kyiv, Volodymyrs’ka, 64/13, Kyiv, 01601, Ukraine Abstract Relational approach DBTG CODASYL proposal allow us to use better features of each component. We propose some supplementary operations over the data sets – the main construction of DBTG CODASYL proposal, which can be more effective in some cases, at least due to the fact that relational databases use mostly symbolic addressing, and databases with complex structures - mostly relative. As a result, user may choose among different types of data connections concerning the speed of their processing: slow but flexible, or fast but complex. Keywords Hybrid technology, relational approach, relational algebra, DBTG CODASYL proposal, data set, Dribas algebra 1. Introduction As well-known, the relational approach [1,2,3] provides the maximum simplification of the data structure to a flat file and significant increasing of the data manipulation language level, which makes it possible to create a clear and strong mathematical background for such languages. However, the transition from more complex structures to relationships means (at least for simple queries) an impairment in processing efficiency, at least due to that relational databases use mostly symbolic addressing, and databases with complex structures - mostly relative. Nowadays many applications need to use BIG Data [7,8,9] and Cloud computing [10,11] technologies and receive the problem of data processing inefficiency. At least in part, the solution is in using different types of connections among data according to the speed of their processing: from slow but flexible to fast but complex. So, the main goal of this article is to construct an algebra (based on the relational selective Dribas algebra [2]) with the basic set that isnm the set of data sets as a supplement to the traditional algebra over relationships. 2. Selective Dribas algebra The selective Dribas algebra was proposed a few years after the appearance of Codd's relational algebra. Relational Codd algebra is more focused on theoretical aspects, and Dribas algebra is more convenient in sense of implementation and practical use. It borrowed set-theoretic operations of Codd relational algebra (union, intersection, and difference), as well as the Cartesian operation and project. In addition, 4 filtering operations were added. The operation of β-filtering Fβ (R, condition) is very similar to the Codd operation of -restriction, but in Dribas algebra the condition is an arbitrary formula without quantifiers with the attributes names from the relationship R and constants, which in practice is much more convenient than in the Codd case (comparing values of different attributes, but with the same sign and are combined only by conjunction). II International Scientific Symposium «Intelligent Solutions» IntSol-2021, September 28–30, 2021, Kyiv-Uzhhorod, Ukraine EMAIL: anatoly.v.anisimov@gmail.com (A.1); ihorza@gmail.com (A.2); kpp1@ukr.net (A.3) ORCID:0000-0002-1467-2006 (A.1);0000-0002-4826-5265 (A.2); 0000-0001-5411-6592 (A.3) ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 186 The next filtering operation F (R1, R2, condition) is also similar to the Codd operation of -join. The difference is that the result of this operation is a relationship, the scheme of which is a copy of the relationship R1, and not the Cartesian product of two relationships, as in the classical case. In fact, the result is only those tuples from R1, for which there is at least one tuple in R2 that satisfies the condition. Again, we get a practical gain compared to the classic case. The operation F(R1, R2, condition) has no direct equivalent in classical algebra, but can be expressed through other operations and was added for practical reasons. As in the previous case, the result of the scheme is the copy from R1, and is filled with those tuples from R1, for which the condition is equal TRUE for all tuples from R2. For some queries, this operation is very convenient, and its implementation largely repeats the implementation of the previous operation. The most interesting and complex in terms of execution operation FS(R1,R2,A, BaC) is also a filtering operation, the result of which is some subset of tuples from the relation R1; A, Ba - lists of attributes from R1, and C - list of attributes from R2;  is a binary predicate of set comparison: , , , , =, . First, R1 is grouped by the same values of the sub-tuples from list A, then R2 is projected by the list of attributes C; then each group of tuples, more precisely sub-tuples according to the list Ba, is compared (as a sets) with R2 [C]; if the condition is TRUE, then the corresponding group of tuples from R1 passes into the result. 3. DBTG CODASYL proposal The data set (DS) (after DBTG CODASYL [4,5]) - (as an instance) is a two-level tree with one record-owner (Owner) and several records-members (Member) of the same type. By record we mean a set of pairs {attribute name: value}, and we interpret uniformity in a relational sense: two records are considered to be of the same type if there is a bijection between their attributes that matches both the attribute names and the types of corresponding values. Thus, all records are structurally similar to tuples of relationships, which really is a simplification in comparison with the DBTG CODASYL proposal. Two data sets are of the same type if their owner records and all member records from both data sets are of the same type. The set of the same type of data sets forms the type of data set. Some datasets have only one Owner and an empty set of Member records. There can also be records outside of any data set, then we assume that such records are a Members of some DS, the Owner of which is a special SYSTEM record. Classical relationships can be formed from the same type of records with the SYSTEM owner, over which all operations of both Codd relational algebra and Dribas relational algebra are defined. Therefore, our proposed data model can be considered as an extension of the relational model, and the data selection language - as an extension of the relational algebras of Codd and Dribas. Records may or may not have a field similar to the primary key. Database key (DBK) for written records always exists, but it can be formed on the basis of the several fields, or calculated. Although firstly it seems that the DS has less expressive power than the hierarchical or network structure, but in fact each record can be part of several different DS both as owner and as a member, forming a data structure such as "network". 4. Example in terms of ER-model Consider an example of a subject area. First, we give its text description and ER-model (Fig. 1) [6]. There are hospitals (Hospital), there are departments (Department) in hospitals, there are chambers (Chamber) in departments, patients (Patient) reside in chambers; in addition, physicians (Physician) work in the departments; each physician is responsible for several chambers (perhaps some physician is not responsible for any chamber); physicians consult patients, one physician can consult several patients and one patient can be consulted by several physicians. Although all connections, as seen in the ER-model, have a one-to-many type, the Consulting entity actually implements a many-to-many connection between the Physician and Patient entities. We need to make one more important remark. Hospital has its number (h#), which uniquely specifies the object instance. Department also has its number (d#), but this number is unique only in one hospital (by the 187 way such situation is quite usual for subject area with hierarchical structure), consequently in relational model we need to use for the relationship Department the pair attributes (d#, h#) as a primary key. The same situation we have for the relationship Chamber. Relationships Patient and Physician have one attribute primary key. Now we give a description of the data domain according to the relational approach, i.e. each entity and relationship (type “many-to-many”) are represented by tables with the corresponding attributes : Hospital(h#, Hname, Hadress, rank); Department(d#, h#, Dname, Dmng); Chamber(c#, d#, h#, Cname, qnt); Physician(p#, d#, h#, Pname, spec, age); Patient(t#, c#, d#, h#, Tname, diagn, temp); Consulting(r#, p#, t#, data); Table 1 Record Description Record Name is Hosp { Record Name is Dep { H# CHAR(5), D# CHAR(5), HNAME CHAR(25), DNAME CHAR(25), Hadress CHAR(30), Dmng CHAR(30) rank CHAR(7) }; }; Record Name is Ch{ Record Name is Ph { C# CHAR(5), P# CHAR(5), CNAME CHAR(25), PNAME CHAR(25), Qnt integer }; spec CHAR(20), age integer }; Record Name is Pat{ Record Name is Con { T# CHAR(5), R# CHAR(5), TNAME CHAR(25), Data DATE}; diag CHAR(40), temp float }; Table 2 Set Clauses Set HD Owner is Hosp Member Set DC Owner is Dep Set CP Owner is Ch is Dep; Member is Ch; Member is Pat; Set DP Owner is Dep Set PhCon Owner is Ph Set PC Owner is Ph Member is Ph; Member is Con; Member is Ch; Set PCon Owner is Pat Member is Con; Unlike DBTG CODASYL in our case the data set as Member can have other data set that is convenient for hierarchical constructions. For example, to the already described DS we will add some more (Table 3). 5. The operations of the supplement algebra We construct a supplement algebra based on DS, i.e. the arguments of the operations of this algebra will be DS, and the result will also be DS. Note that DS can be thought of as a function that matches each instance of Member (uniquely, but not mutually uniquely) to the instance of OWNER to which it is subordinate. Operations are focused on actions within the database, aspects of the interface will be considered separately. Data sets DS1 and DS2 must be the same type, the resulting DS will be of the same type. DS3. OWNER = DS1. OWNER U DS2. OWNER (without duplicates), Member 188 sets are also combined without duplicates, but with subordination, i.e. if there are levels (matching DBK) records in DS1. OWNER and in DS2. OWNER, then in DS3. OWNER includes only one record, and many of their subordinate Member records are combined (without duplicates) and are subordinate to a common OWNER. This operation can cause problems: suppose that DS1 has an instance (O1, M1) and DS2 has the instance (O2, M1) then after reunion in DS3 we get the situation that M1 has two owners, which is forbidden, because it is not tree-structure; so in such situations UNION operation will leave only instance (O1, M1) in DS3. INTERSECT(DS1, DS2) -> DS3; Table 3 Data set as Member Set HDC Owner is Hosp Set HDP Owner is Hosp Set CPCon Owner is Ch Member is DC; Member is DP; Member is PCon; Set HCP Owner is Hosp Set CPCon Owner is Ch Set PhCP Owner is Ph Member is CP through DC; Member is PCon; Member is CP; Figure 1: ER-model of the “Hospital” data domain Data sets DS1 and DS2 must be the same type, the resulting DS will be of the same type. DS3. OWNER is an intersection DS1.OWNER ∩ DS2.OWNER, in particular it may be empty, then the entire resulting DS will be empty; Member sets also intersect, but only those for which there are instances of OWNER; instances of DS with empty sets Member can be formed. 189 DIFFERENCE(DS1, DS2) -> DS3; Data sets DS1 and DS2 must be the same type, the resulting DS will be of the same type. DS3. OWNER = DS1. OWNER \ DS2. OWNER, in particular it may be empty, then the entire resulting DS will be empty; for Member subtraction is performed on subsets subordinate to one OWNER. TIMES(DS1, DS2) -> DS3; This operation is defined everywhere and there is no requirement on types of data sets. DS3. OWNER = DS1. OWNER  DS2. OWNER, instances of Member also multiply among themselves, but only within the limits of subordination to their OWNER. PROJECT (DS1, [Olist], [Mlist]); Olist is a list of fields for OWNER, and Mlist is a list of fields for Member. Obvious limitation: Olist must be a sublist of fields DS1.OWNER, and Mlist - a sublist of fields DS1.Member. The expression [ ] means an empty list of fields, and the expression [*] means that all OWNER or Member fields are in the list, depending on whether the expression is in the 2nd or 3rd place. In the future there will be deviations from the classical algebraic style, but this is only for more convenient use; this technique was used by E. Codd when he introduced relational algebra operations. Besides the overall case of the PROJECT operation we’ll also use some special cases: PROJECT-OWNER(DS, [Olist]) - > R; and PROJECT_Member(DS, [Мlist]) -> R; For both operations their result is not DS, but relationship; if a member of a data set is a DS, the result of such operation will be a relationship (according to the list Mlist), which is a Member of the last DS. 5.1. Operations of filtration β-filtration ВFilter(DS1, condition) -> DS2; DS2 has the same type as DS1 and is populated with data from DS1 that satisfies the condition. The condition is a formula with fields from both OWNER and Member, constants, and some unary predicates, for example: EmptyMember (NotEmptyMember) returns TRUE (FALSE) if there is at least one instance of Member for the current OWNER instance. The condition verification is valid within one instance of the data set. ExistsFilter(DS1,DS2,condition)-> DS3; DS3 has the same type as DS1 and is populated with data from DS1 that satisfies the condition if there is at least one such instance in DS2. The condition is defined in the same way as in the previous operation, but fields with DS2 may also be present in the formula. Condition verification is valid within one instance of DS1. AllFilter(DS1,DS2, condition)-> DS3; DS3 has the same type as DS1 and is populated with data from DS1 that satisfies the condition if it is executed for all instances of DS2. The condition is determined in the same way as in the previous operation. Condition verification is valid within one instance of DS1. SetFilter (DS1, DS2, AOlist, BOlist  COlist) -> DS3; - it is the operation for set comparison. DS1, DS2 and DS3 are data sets, and the resulting data set DS3 has the same type as DS1; AOlist and BOlist are attribute lists from DS1, and AOlist is attribute list from DS1.OWNER and BOlist is attribute list from DS1.Member; COlist - list of attributes from DS2. Member;  is a binary predicate of set comparison: , , , , =, . The order of actions is such: a) a set of instances of DS1.OWNER is grouped by AOlist; b) DS2.Member is projected by C; 190 c) from the groups {R1AOlist} are selected those instances for which the condition BOlist COlist is fulfilled; d) into the resulting DS3 passes only those instances of DS1 that meet the specified condition. 5.2. Some other operations JOIN(DS1, DS2, condition) -> DS3; This operation joins two data sets DS1, DS2, which are neighbors on the hierarchical path. D (DS1. OWNER) = D (DS3. OWNER) – this means that their domains are equal; D (DS2.Member) = D (DS3. Member) and D (DS1.Member) = D (DS2.OWNER). Namely, from the 2 two-level trees is constructing three-level tree with subordination saving, then the intermediate level is removed and again we get a 2-level tree; by means of a condition the necessary copies are selected; the condition may include constants and fields of all records from DS1 and DS2. JOIN*(DS1, DS2, condition) -> DS3 ; Such type of operation will be appropriate when there is another DS on the hierarchical path between DS1 and DS2; its OWNER will be the same type as DS1.Member, and its Member will be the same type as DS2.OWNER, then we will get DS3, such as D(DS1.OWNER) = D(DS3.OWNER), and D(DS2.Member) = D(DS3.Member). JOINMember (DS1, DS2, condition) -> R; Another one specific type of connection, oriented on the situation when DS1 and DS2 have a common Member, which implements a connection between two objects of the type "many-to-many" (ER-model). The result will be a relationship that is DS1.OWNER and satisfies the condition. COUNTMember(DS1) -> DS2; For this operation DS2.OWNER = DS1.OWNER, but instead of the DS2. Member for each OWNER the corresponding quantity of elements is substituted. COMPOSE(REC1, REC2, condition) -> DS1; REC1 and REC2 are the records (or more exactly relationships) from which DS1 is formed, REC1 is forming the OWNER of the data set and REC2 is forming its Member. REC2 must have an attribute (or group of attributes) that is a foreign key, and for REC1 it is a primary key. Symbolic addressing between such relations will be transformed to more efficient relative. The condition is required to specify such attributes. ADDMember(DS, REC, condition) adds a REC record to the DS as a Member; condition is used for specification as in previous operation. This operation is convenient when we need to add some record to the data set and connect it with some particular owner; such connection is specified by condition. 6. Query examples Let us consider a few examples of queries in the subject area, which was described earlier (Table 4 – Table 10). Each query will be presented in natural English, in terms of the original Dribas algebra (left column) and by means of proposed supplement algebra (right column). 7. Conclusion The proposed extensions to classical relational algebra allow us, on the one hand, to increase the efficiency of query execution, and on the other hand, to reduce the gap between DML and SQL. From the given examples it is seen that the notation of some queries in terms of the proposed algebra looks more compact. 191 Table 4 Query Example 1. Find hospital names and address for the hospitals with rank = “xxx”. Fβ(Hosp, Rank = “xxx”) -> R1; BFilter(HD, Rank = “xxx”) -> DS1; R1[Hname, Hadress] -> res; PROJECT-OWNER(DS1,[Hname, Hadress]) -> res; For such simple query proposed algebra has no benefits in comparison with traditional relation algebra. Table 5 Query Example 2. Find department names and hospital number in the hospitals with rank = “xxx”. Fβ(Hosp, Rank = “xxx”) -> R1; BFilter(HD, Rank = “xxx”& NotEmptyMember) -> F( Dep, R1,Dep.h# = R1.h#) -> R2; DS1; R2[Dep.h#, Dname] ->res; PROJECT (DS1, [h#], [Dname]) ->res Supplementary algebra uses one operation less than relational algebra. Table 6 Query Example 3. Find the patient names with a temperature greater than 37 from hospital № 1, department №3, chamber №6. Fβ(Dep, h# = 1) -> R1; JOIN (HD, DC, HD.H# = 1 & C# = 6) -> DS1; F( Ch, R1, Ch.c# = 6 & Ch.d# = R1.d#) ->R2; JOIN (DS1, CP, temp >37) ->DS2; F(Pat, R2, Pat.temp >37 & Pat.c# = R2.c#) -> PROJECT (DS2, [], [Tname]) -> res; R3; /* or the same with DS HCP */ R3[Tname] -> res; BFilter(HCP, temp>37 & h#=1 & d#=3 & c#=6) ->DS2; Relational algebra uses 4 operations, while proposed algebra uses 3 operations, even 2 if it will work through data set HCP. Table 7 Query Example 4. Find the chamber names from hospital №1, where there are no patients with a temperature > 37 (Fβ(Pat, temp >37)) -> R1; BFilter(CP, temp >37) -> DS1; (Pat \ R1)[c#] -> R2; F(Ch, R2, Ch.c#=R2.c#) -> CP \ DS1 -> DS2; R3; R3[Cname] -> res; PROJECT(DS2, [Cname], []) -> res; Relation algebra has 4 operations while proposed algebra needs 3. Table 8 Query Example 5. Find the physician names who consult at least all patients in chamber №6 of hospital №1 and department “XXX”. (Fβ(Dep, h# =1 & Dname = «ХХХ») ВFilter(HD, h# =1 & Dname = ХХХ») -> DS1; -> R1; JOIN(DS1, DC, c# = 6) -> DS2; F (Ch, R1, c# =6 & Ch.d# = R1.d#) -> R2; JOIN(DS2, CP, TRUE) -> DS3; F (Pat, R2, Pat.c#=R2.c#)-> R3; JOIN(DS3, PCon, TRUE) -> DS4; FS(Consulting, R3, p#, Consulting.t#  R3.t#) -> /* JOIN*(DS2,PCon,TRUE) -> DS4; R4; Join over the hierarchical path */ F (Ph, R4, Ph.p# = R4.p#)-> R5; SetFilter(PhCon, DS4, Ph.p#, PhCon.Con.r# R5[Pname] -> res; DS4.Con.t#) ->DS5; PROJECT-OWNER(DS5, [PNAME]) ->res; This query is classified as complex because it needs to perform the set comparison. 192 Table 9 Query Example 6. Find the patient names consulted by Dr. Sidorchuk. Fβ(Ph, Pname = “ Sidorchuk ”) -> R1; JOINMember(PCon,PhCon, PCon.Con.r# = F(Con,R1,Con.p#=R1.p#)->R2; PhCon.Con.r# & PhCon.Ph.Pname = “ Sidorchuk F(Pat,R2, Pat.t#=R2.t#) ->R3; ”) -> R1; R3[Tname] -> res; R1[Tname] -> res; This query is performed on the base of connection of “many-to-many” type. Table 10 Query Example 7. Find the patient names who are consulted only by those physicians who consulted the patient “xxx” by name. (Fβ(Pat, Tname = “xxx”))[t#] -> R1; Join*(PatPh, PhPat) -> DS1; F(Con,R1,Con.t#=R1.t#)->R2; OnlyFilter(DS1,Member.Tname=”xxx”) R2[p#] -> R3; Con[p#,t#] -> R4; Fs (R4, R3, t#, R4.p# R2.p#) -> R5; F(Pat,R5, Pat.t#=R5.t#) ->R3; R3[Tname] -> res; This query is performed on the base of connection of “many-to-many” type. OnlyFilter operation is the special case of the SetFilter operation; it is equivalent to SetFilter (DS1, DS1.Owner, DS1.Member, , condition). Its running algorithm may be the follows: firstly we mark such member- records for which the condition is FALSE and then for each owner-record its subordinate records are verified while marked one is appeared. If it will be no marked records among the subordinate records then such owner-record should be added to the result. 8. References [1] Codd, E.F. A Relational Model of Data for Large Shared Data Banks // Communications of the ACM: journal, 1970, Vol. 13, no. 6, P. 377—387, doi:10.1145/362384.362685 [2] Dribas V.P. Relational models of databases.// Publ. BGU. Minsk. 1982, 192 р. [3] Date C.J. An Introduction to Database Systems 6th edn. Reading, MA: Adison-Wesley. 1995, 1328 p. [4] Olle T.W., The CODASYL approach to data base management, A.Wiley-Interscience Publication, John Wiley & Sons, 1978. [5] CODASYL Database Task Group Report. (1971). ACM, New York, April 1971, 487 p. [6] Chen P.P. The Entity-Relationship model – Toward a unified view of data. ACM Trans. Database Systems, 1(1), 1976, 9-36. [7] Building a robust, governed data lake for AI, IBM Hybrid Data Management, https://www.ibm.com/downloads/cas/RMAMZNRY (Nov. 2021) [8] Botelho B., Bigelow S.J., Big Data, (Nov. 2021), https://searchdatamanagement.techtarget.com/definition/big-data [9] Tlamelo Emmanuel, Thabiso Maupong, Dimane Mpoeleng, Thabo Semong, Banyatsang Mphago and Oteng Tabona, A survey on missing data in machine learning, Journal of Big Data 8, N140 (2021), https://journalofbigdata.springeropen.com/articles/10.1186/s40537-021-00516-9 [10] Sai Vennam, Cloud Computing, (18 August 2020), https://www.ibm.com/cloud/learn/cloud- computing [11] Steve Ranger, An introduction to cloud computing right from the basics up to IaaS and PaaS, hybrid, public, and private cloud, AWS and Azure. https://www.zdnet.com/article/what-is-cloud- computing-everything-you-need-to-know-about-the-cloud/ (Nov.2021) 193