A New Implementation for Maximal Itemsets Miner using
Oracle PL/SQL
Hussein K. Khafajia, Mais A. Al-Sharqib, Oleksiy Nedashkivskiyc, and Pawel Falatd
a
  Al-Rafidain University College, Baghdad, Iraq
b
  University of Information Technology and Communications Baghdad, Iraq
c
  National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute,” Kyiv, Ukraine
d
  University of Bielsko-Biala, 2 Willowa str., 43-309, Bielsko-Biala, Poland

                Abstract
                The problem of determining how to implement a data mining system as looslly-coupled or
                tightly-coupled remains a major challenge as it affects the performance of the system and the
                consumption of memory and computation resources. The aim of the research is to propose a
                new approach to data mining design based on aggregating a data mining system with data
                intended for mining under the umbrella of database management systems. The authors
                produced a new algorithm to mine maximal itemsets depending on Bees' algorithm named
                Maximal Itemsets Mining Algorithm Based on Bees’ Algorithm, MMIBA. MMIBA was
                implemented as loosely-coupled miner. This research presents a new implementation for
                MMIBA using Oracle PL\SQL. The aim of this implementation is to combine the mining
                system with the data to be mined. This approach excludes many drawbacks associated with
                other approaches such as the conflict of data environment and mining system environment,
                data transfer between these different environments, and data format conversion due to the
                mismatch of the formats that are supported by various environments. This approach dominants
                the loosely-coupled implementation in considerable amount of execution time and memory
                consumption. The proposed system was tested using many real and synthetic databases with
                wide range of properties including size, number of items, database sparseness or its density.
                Many values for minimum support and conflict were used in these tests to prove the robustness
                of the designed system. Experiments showed that the techniques of implementing mining
                algorithms affected their efficiency, and this was demonstrated by increasing the efficiency of
                MIMBA when it was implemented by collecting data with the miner that uses it in a single
                software environment.

                Keywords 1
                Maximal itemset, embedded miner, PL\SQL, MMIBA.

1. Introduction
    The discovery of frequent itemsets (FI) is one of the very important subjects in data mining [1]. The FI concept
was firstly produced by Agrawal et al. [2]. A frequent itemset is a group of items coming frequently together in
set of transactions satisfying minimum support (min_sup) [3–5]. The big challenge lies in the huge number of FIs
that are generated from mining process especially when min_sup is set to small value [2, 6].
    A FI is maximal if it is the longest itemset that satisfies min_sup and all its superset is infrequent [7]. Maximal
frequent itemset (MFI) considered as dense representation of FIs because of their relatively low number [6]. For
this reason, MFIs has rapidly been a paramount task in data mining field [3]. The number of MFIs is so small in
comparative with the number of FIs. All the FIs can be generated from the set of MFIs. Due to two reasons many
attempts were done to mine maximal itemsets instead of the mining of FIs to keep time and efforts complexities
[6, 7].


Emerging Technology Trends on the Smart Industry and the Internet of Things, January 19, 2022, Kyiv, Ukraine
EMAIL: dr.hkml811@yahoo.com (H. K. Khafaji); newmisk@gmail.com (M. A. Al-Sharqi); al_1@ua.fm (O. Nedashkivskiy);
pfalat@ad.ath.bielsko.pl (P. Falat)
ORCID: 0000-0002-1830-0568 (H. K. Khafaji); 0000-0002-9059-9031 (M. A. Al-Sharqi); 0000-0002-1788-4434 (O. Nedashkivskiy); 0000-
0002-3593-9750 (P. Falat)
             ©️ 2022 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                   38
   The object of study is the process of mining MFIs by utilizing the capabilities of Oracle DBMS
which provides PL/SQL as programming language to implement the mining system in addition to its
characteristic as repository to hold the data. This utilization is accomplished by selecting a suitable
mining algorithm that is implementable by PL/SQL.
   The subject of study is the use of the PL/SQL or any programming tool provided by a DBMS to
implement the mining system which depends on a PL/SQL implementable algorithm.
   The purpose of the work is to increase a data miner’s throughput with minimum allocated computer
system resources by transferring many of the miner duties from the miner space to the BDMS space.

2. Problem Statement
   Mining of the frequent itemsets from large transaction databases is one of the inherently difficult
problems that classified as NP-complete problem. In addition to this difficulty, miners are implemented
using loosely or tightly-coupled techniques, which leads to storing data in flat files that are difficult to
manage, or that the data remotely stored in DBMS tables separated from the programming language
environment, leading to a lot of I/O operations and repeated data transformation and transferring. Both
techniques reduce the efficiency of algorithms and consume memory, computation resource, and I/O
devices.

3. Review of the Literature
   Durand et al [2] has introduced the FIBAD approach to compute nearly borders of frequent itemsets
by utilizing dualization and computation of approximate minimal transversals of hypergraph. They used
maximal frequent and infrequent itemsets to represent positive and negative borders respectively. They
found that their method outperformed other methods in producing borders having the highest quality.
   Ganesh et al [4] has mined maximal patterns from the large database used for knowledge discovery.
The contribution of their paper includes (i)Transposed Representation of Database ii) reduction of
Database scanning (iii) Pruning the candidate itemsets in each step. Vertical Frequent Mining is deemed
simple to use and effectively implemented.
   Kocak et al [7] are utilizing maximal frequent itemsets to propose a novel feature elicitation method
based on data mining techniques for understanding significant patterns in octamer’s cleavability of
acquired immune deficiency syndrome disease, which is caused by human immunodeficiency virus.
The extracted features are used in the classification process implies significant results which may be
used when developing a new medicine.
   Mais et al [6] has introduced a novel maximal itemsets algorithm by pairing MFI and Bees’
algorithm (BA). They exhibited a MFI-oriented BA for the purpose of mining MFIs from transactional
database (TDB). Mining Maximal Itemsets Bees’ Algorithm (MMIBA) is population based stochastic
algorithm. This paper depends on MMIBA therefore it will be explained in details. Elucidation of
MMIBA algorithm is presented in Figure 1. Recall the parameters of MMIBA to be set such as N, M,
E, nep, nsp, and ngh.
   The MMIBA algorithm is population based, so it firstly prepares its population of FIs that satisfy
min_sup by generating them randomly. It mines MFIs from TDB by selecting higher FIs and select the
highest FIs out of the higher ones, making union between them in order to generate new generation.
The union is done between the highest FIs and half number of randomly selected FI population. Where
the remaining FIs after selection the highest (i.e. higher-highest) are united with quarter number of
randomly selected FI population. Each of the remaining higher-FIs population unites with one randomly
selected FIs.


                                                      39
 1.     Initialize population with random generated itemsets.
      1.1 Select frequent items from transactional DB (TDB) for new itemset generation.
      1.2 Determine the size of population represented by N.
      1.3 Generate unique N numbers represents N k-itemsets.
      1.4 Select k-items for each k-itemset from frequent items.

 2. Evaluate support for each itemset.
 3. While (stopping criterion not met) //generating new itemsets.
 4. Select M longer frequent itemsets out of N and E longest frequent itemsets out of M.
 5. Perform union operation (more for best itemsets)
      5.1 Perform union operation between each of E itemsets and half no. of frequent itemsets to generate new
           itemsets.
       5.2 Perform union operation between each of M–E itemsets and quarter no. of frequent itemsets to
            generate new itemsets.
       5.3 Calculate the support of the new generated itemsets.

 6. Select the highest frequent itemset as candidate maximal frequent itemset.
 7. Perform union operation for remaining itemsets randomly.
     7.1 Perform union operation between each of itemsets-M and one of randomly selected frequent
        itemsets to generate new itemsets.
     7.2 Calculate the support of the new generated itemsets.
8. End While.
 Figure 1: MMIBA Algorithm

          The MMIBA algorithm is population based, so it firstly prepares its population of FIs that
 satisfy min_sup by generating them randomly. It mines MFIs from TDB by selecting higher FIs and
 select the highest FIs out of the higher ones, making union between them in order to generate new
 generation. The union is done between the highest FIs and half number of randomly selected FI
 population. Where the remaining FIs after selection the highest (i.e. higher-highest) are united with
 quarter number of randomly selected FI population. Each of the remaining higher-FIs population unites
 with one randomly selected FIs.

 4. Materials and Methods

     Data mining algorithms usually deals with a huge data. Because these algorithms encounter issue of
 scalability with respect to vast data, traditional methods address this problem by selecting subset of data
 to be mined which is in-memory data [8, 9]. Another issue is that these algorithms depended on main-
 memory data structure, limited amount of data that can be handled [9]. In addition, most data mining
 algorithms can only be loosely coupled with data infrastructures in organization and are hard to pour
 into existing mission-critical application [10]. In loosely coupled connection, data is transferred either
 between database and main memory or dynamically in client/server architecture generating network
 traffic [11]. Most data mining algorithms have been used flat files as data source of information that
 means handling specific file-format. Data integrity is another issue that is needed to be solved via
 consolidation data mining methods with DBMSs [11-13]
     DBMS technology presents many characteristics that make it valuable when performing data mining
 applications. Data sets, those are extremely larger than main memory, can be dealt with because the
 database itself is responsible for processing information, paging and swapping when needed. In
 addition, DBMS provides a simplified data management and closer integration to other systems [11].
     Information used during mining processing is often clandestine. Consequently, DBMSs can be used
 as a way of supplying data security, which is the demand of many commercial databases. In this way,
 we avoid needing encryption algorithms to process information. [11]
     Most versions of Oracle DBMS support tightly coupled integration within database through
 executing user-defined computation database thus averting unnecessary network traffic. One sort of this


                                                         40
implementation is the stored procedure, which is provided in Oracle PL/SQL within data dictionary.
The aim of this paper is to introduce a new implementation of MMIBA using Oracle 11g PL\SQL to be
supported with all the previously mentioned DBMS properties.
Some Oracle 11g Database Features
   ORACLE DBMS is preferred to implement the maximal itemsets embedded miner because it has
the following privileged features:
1. ORACLE supports external stored procedures and PL/SQL, the complete, block-structured
   programming language for data manipulation (DML), which excludes the need for host language.
2. ORACLE supports stored procedure within database that combines the DB with the user-developed
   programs to manipulate the data.
3. ORACLE supports Open Database Connectivity (ODBC) mechanism that is used to access the
   various DBs of DBMSs.
4. ORACLE supports Large Object Block (LOB), Character LOB CLOB, and Multi CLOB (MCLOB),
   data type, which allows us to store a very large number of transactions for one item or very long
   frequent patterns itemsets.
5. ORACLE supports compound object and nested tables.
6. ORACLE supports LONG data type, which holds up to 2 GB of data as one field.
    The last three features of ORACLE are very important to represent dense databases prepared for
   mining.
7. Subprogram inlining in 11g, ORACLE has presented the PL/SQL Function Result Cache, which
   eliminates the overhead of function calls by re-organizing the source code during compilation.
8. PL/SQL enhancements in ORACLE 11g of the following features: Easy and simplified PL/SQL
   native compilation, Improved PL/SQL stored procedure invalidation mechanism, Stored Procedure
   Named Notation, Virtual columns in 11g, Query result cache in ORACLE 11g, PL/SQL function
   result cache, and Regular expression enhancements.
Proposed Miners’ Architecture
    The architecture of the proposed miner is composed of many modules each of which is responsible
for a specific task such that they collectively perform the task of finding the MFIs. These modules are
initialization module, support counter module, frequent item selector module, m and e selector module,
itemset-m selector module, new itemset generator module, maximal itemset selector module, and
reporter module. (See Figure 2).
Proposed Miners’ Modules
   Frequent item selector module (level 1) In order to discover MFIs, frequent items, 1-itemsets, must
be found firstly. In this module, only frequent items will be selected from TDB and stored in a table
which this module will construct it. The table is named "frequent items DB". It consists of three
columns: freq_item, sup, and item_len. The fields' data types used for all columns of modules tables
are CLOB and Number as follows:

   Freq_item CLOB --- stores frequent itemsets, 1-itemsets, 2-itemset, etc.
   Sup       number --- stores itemsets' supports.
   Item_len number --- stores the length of itemset.
   Itemsets may have huge size, therefore the most suitable data type to hold an itemset is CLOB data
type. These FIs are encoded numerically such that each FI has given a unique code (integer number)
which represents it during mining process. These unique sequential codes are stored in table called
"coding" table. Coding table has the following column: item, code . Coding step has two advantages:
1) It simplifies the completely mining process, because it is easier to deal with item, feature, or property
   in DB as numbers than as names (sequence of characters) in terms of storing and retrieving FIs and
   performing union operations.
2) Because MMIBA relies on randomization to explore search space (itemsets space), it uses
   randomization function to generate random numbers ranging between min and max values to form
   an itemset. These min and max values are min and max code of that FI. To decide whether the item


                                                      41
   is frequent or not, the support of the item must be calculated and compared with the input value
   defined by the user which represents the threshold between frequent and infrequent items, i.e.
   min_sup.

                                                                                                                        Execution Flow
                                                                                                                        Data Flow

                           TDB                                Min_sup


                                                   Frequent item selector
                        Frequent item DB                                                         Initializer
                                                                                                                           Random itemset DB

      Level 1


                                                           Support counter

     Level 2

                                                                             M and E selector                  itemset-M selector


                                                                                                ME DB                       Itemset-M DB


        Level 3

                           New itemset from E DB

                                                                      New itemset
                                                                     generator
                                                                                                                          New itemset from
                               New itemset                                                                              itemset-M DB
                            from M-E DB
         Level 4


                                                                Maximal itemset selector
                                                                                                                   Maximal itemset DB

       Level 5

                                                                      Reporter
       Level 6


                      Figure 2: Presents the architecture of the proposed miner

   Initializer module (level 1) The task of this module is generating the initial population. Each
individual of the population is an itemset of length k, i.e. k-itemset. Initially, the module constructs a
table named "random itemsets DB". The table consists of the following columns: N (population size),
K (the length of itemset), Itemset (holds the itemset itself), and Sup (represents the support of the
itemset). Initializer module generates N numbers randomly, each of which represents the length of
itemset. Moreover, for each K, the initializer will generate a random number K-times, which represents
the itemset. After that, it will invoke support counter module in order to count itemset support (see
figures 3 and 4).


  BEGIN
    loop
     rnum:=round (dbms_random.VALUE(min_value,max_value));
     SELECT count(*) INTO c FROM temp_itemset WHERE itemset1=rnum AND item=len_item;
     exit WHEN c=0;
    END loop;
  END;

Figure 3 Abstracted PL/SQL code to generate K.


                                                                  42
   The generated itemset is unordered and in form of one length itemset. To order it and making it of
K-length itemset; a view is created and the order by Clause is used. The items of the generated itemset
are ascendingly ordered and connected by “;” symbol. After all itemsets have been generated, support
counter module is invoked in order to calculate the support of each itemset and store the result in
"random itemset DB" table.
 BEGIN
  loop
   rnum:=ROUND (DBMS_RANDOM.VALUE(min_value,max_value));
   SELECT count(*) INTO c FROM initpop WHERE k=rnum;
   -- check the uniqueness of K
   exit WHEN c=0; -- if not found insert it in the table
  END loop; -- if rnum previously found in the table
 END;

Figure 4 Abstracted PL/SQL code to generate the itemset

Support counter module (level 2). The task of this module is calculating the support of the itemset that
is generated by initializer module. The support of 1-length item is gotten from frequent table and there
is no need to scan the database to ch reduces time cost. If itemset's length is k-itemset, then its support
will be calculated through count of tidset resulted from intersection among sets of tidset of each item
that forming the itemset in TDB. After counting the tidsets resulted from intersection operation, random
itemset DB will be updated and the resulted number will be inserted in sup column (consider figure 5).

The longer (M) and longest (E) Selector Module (level 3) The job of this module is selecting M and E
random numbers from N population size. They represent itemsets that are chosen according to fitness
criterion such that M < N and E < M. M represents the higher fitness itemset where E is the highest
fitness itemset. Initially, M and E selector module constructs a table named "ME DB". The table consists
of M_selected and Elite attributes. Then the module selects E and (M-E) and stores them in "ME DB"
table (see figure 6).
   IF item_record.item=1 THEN --if item length=1
        V_ITEM:=1;
      SELECT sub into v_sup FROM frequent WHERE
       freq_item=(SELECT itemset1 FROM     temp_itemset
                   WHERE item=1);
   ELSE    ------ if item length is more than 1
     SELECT COUNT(tidset),item_record.item
         INTO   v_sup,v_item FROM
         (SELECT tidset FROM trans WHERE item IN
            (SELECT itemset1 FROM temp_itemset    WHERE
              item=item_record.item)GROUP BY tidset
          HAVING COUNT (DISTINCT item)= (SELECT
             COUNT(*) FROM
               temp_itemset WHERE
               item=item_record.item));
   END IF;


Figure 5: Abstracted PL/SQL code of support generation


                                                     43
   Mnum:= ROUND (DBMS_RANDOM.VALUE(2,lcount));
      insert into t(m_selected) SELECT * FROM
         (select itemset from initpop
          where sup is not null group by itemset
          order by max(length(itemset)) desc);
      Enum:=ROUND (DBMS_RANDOM.VALUE(1,Mnum-1));
      insert into me(m_selected,elite)
       (SELECT LEAD (m_selected, Enum) OVER (ORDER BY LPAD
       (m_selected,100) desc)     m_selected,
         CASE
         WHEN m_selected< NTH_VALUE (MAX (m_selected),Enum)
         OVER (ORDER BY LPAD (m_selected, 100) desc) THEN NULL
          ELSE m_selected
          END
                Elite FROM t GROUP BY m_selected);


Figure 6. Abstracted PL/SQL code for selecting M and E
    Itemset−M Selector Module (level 3). This module is in charge of selecting the remaining itemsets
after selecting M better itemsets and Elite itemsets from random itemset DB. These itemsets are referred
to as“itemset-M” itemsets. Initially, Itemset-M selector module constructs a table named “itemset_M” to carry
the selected itemsets. The table consists of two columns, which are ID1 and Itemset_M. Then it selects the
remaining itemsets from random itemset DB that are frequent and neither Elite nor M_selected (not
within the itemsets in ME table) (see figure7).
      insert into itemset_m
       select rownum id1, itemset
       from (select * from initpop
             where sup>=min_sup
             AND ITEMSET NOT IN (
      SELECT M_SELECTED FROM ME WHERE m_selected IS NOT NULL ) AND
      ITEMSET NOT IN (
      SELECT ELITE FROM ME WHERE elite IS NOT NULL)

  Figure 7. Abstracted PL/SQL code for selecting itemset-M.

   New Itemset Generator (level 4). this module is responsible for generating new itemset. Initially,
the module constructs number of table to generate new itemsets resulted from union operation of Elite,
M_selected, and itemset-M with frequent items in frequent item DB.The first table is “me_result” table,
which consists of the following columns: ID1, ELITE, FREQ_ITEM, COMBINED_STR, and SUP.
The generated itemsets are saved in new_gen_item table, which represents the population of new
generation (see figure 8).


                                                      44
 insert into new_gen_item
   select COMBINED_STR,SUP,REGEXP_COUNT(combined_str,';')
   from (
   SELECT id1,COMBINED_STR,SUP,REGEXP_COUNT(combined_str,';')
        from (
                SELECT id1,ELITE,FREQ_ITEM,COMBINED_STR,SUP,
                ROW_NUMBER() over (PARTITION BY ELITE order by id1) RN FROM ME_RESULT
                WHERE ( EITE,REGEXP_COUNT(combined_str,';')
                      ) IN
                           (
          SELECT ELITE,MAX(REGEXP_COUNT(combined_str,';')
                           )
                             FROM me_result where sup>=min_sup GROUP BY ELITE
              ) and sup>=min_sup
     ) where rn=1
 order by id1


Figure 8. Abstracted PL/SQL code for generating new itemset (elite union freq_item)

Maximal Itemset Selector (level 5). This module is responsible for selecting the best new generated
itemsets from new_gen_item, new_gen_item2, and new_gen_item3 tables as candidate maximal
itemset.

 Loop
    ,item_counts as
                 (
                        SELECT DISTINCT a.item
                       ,COUNT(DISTINCT a.tidset) over() total_tidset
                       ,COUNT(a.item) over(PARTITION BY a.item)item_occurrence_count
                       FROM (
                              SELECT DISTINCT item ,tidset
                              FROM all_items
                            ) a
                 )
   SELECT listagg(item, ';') within GROUP(ORDER BY to_number(item))common_items
   into cc FROM item_counts
   WHERE total_tidset = item_occurrence_count;
 End loop;


Figure 9: Saving Elite maximal candidates in maximal table

    The module constructs a table named “tidtid” table to hold tidsets of each item that form an itemset.
The table has the following column: ID1, TIDSET, and GROUP_ALL. Figure.9 depicts the skeleton of
this module through very abstracted PL/SQL code.
    The Reporter Module (level 6). The reporter module shows Maximal itemsets (MI)s of each
generation. Each generation displays, the contents of TDB, random itemset DB, frequent item DB, ME
DB, itemset-M DB, new generated itemset from Elite U half no. of freq_item DB, new generated itemset
from M-E U quarter no. of freq_item DB, new generated itemset from itemset-M U freq_item DB,
maximal itemset DB. The reporter module exhibits for each item the set of tidsets that support this item


                                                    45
in TDB. For random itemset DB, it exhibits population size N, the length of each randomly generated
itemset K, k-itemset itemsets, and the support of each k-itemset sup. For frequent item DB, it exhibits
each itemset item with its support sup and length Len. For ME DB, it displays number of best M selected
itemsets m_selected and number of best E selected itemsets Elite. For new generated itemset from Elite
U half no. of freq_item DB, it displays Elite, half no. of freq_item,combined_str which represents new
generated itemset, and support sup. For new generated itemset from itemset-M U freq_item DB, it
displays itemset-M, freq_item,combined_str which represents new generated itemset, and support sup.
For maximal DB, it exhibits new generated itemsets with their tidsets that support them.

5. Experiments
   Very low levels of Min-Sup values were adopted to test the speed and scalability of the proposed
implementation, PL/SQL-MMIBA, and prove its ability to execute in spite of these values of Min-Sup.
The experiments indirectly pointed to the memory utilization according to the used database and Min-
Sup values. It is well known that the lower value of Min-Sup leads to largest number of FIs and MIs in
addition to the increment of the execution time and memory consumption.
   There are many real databases used for the experiments such as (Chess, Mushroom, Cancer Cells,
Census Data, and Dense Census) in addition to synthetic databases. The following tables present a
comparison among PL/SQ-MMIBA, Loosely -coupled implementation of MMIBA (LC-MMIBA), and
MAFIA. As it is explained previously, MMIBA starts the execution with random values for its
population, therefore its execution time is affected by the closeness or farness of these values from
actual results to be mined, i.e., MIs. Therefore, to get fair comparisons, LC-MMIBA and PL/SQL-
MMIBA were executed 10 times for each Min-Sup value for a specific database and the average of
these executions has been calculated and presented in the tables of result section.

6. Results and Discussion
   Table (1) presents the execution times of the MAFIA, LC-MMIBA and PL/SQL-MMIBA using
Chess DB and different values of Min-Sup ranged from 1% to 10% of the database size. It is obvious
that MAFIA outperforms LC-MMIBA when the values of Min-Sup ranged from 1% to 7.5%. In
addition, there is unexpected execution time for LC-MMIBA when the Min-Sup value equals 6%.
Indeed, one of the ten executions of LC-MMIBA for Min-Sup=6% was considerably higher than the
rest nine executions which affects the value of the presented average, i.e., 50:20. PL/SQL-MMIBA
outperforms MAFIA and LC-MMIBA in all aspects of the Min-Sup and its execution time has regular
decrement according to the increased value of Min-Sup.

   Table 1
   Execution time of MAFIA, Loosely-Coupled-MMIBA, and PL/SQL-MMIBA using Chess DB with
various MIN-Sup values.
                                                    MIN-SUP VALUES
Alg.      1%        2%      2.5%    3%      4%       5%      6%      7%      7.5%    8%      9%      10%

MAFIA     78:02     60:55   50:00   40:57   40:21    40:00   40:00   40:00   40:00   30:45   30:42   30:42
LC-       100:01    90:03   90:00   90:00   60:23    50:15   50:20   40:27   40:02   30:34   30:02   20:55
MMIBA
PL/SQL-   64:33     50:55   47:58   40:30   40:07    40:00   30:25   20:18   20:12   20:06   20:00   10:50
MMIBA

   Table (2) presents the execution times of the MAFIA, LC-MMIBA and PL/SQL-MMIBA using
Mushroom DB and different values of Min-Sup ranged from 1% to 10% of the database size. MAFIA
does better than LC-MMIBA when the values of Min-Sup ranged from 1% to 4%. The execution time
of MAFIA for many values of Min-Sup was not changed, for example consider the execution time when
Min-Sup values are 7.5%, 8%, and 9%. This case can be observed for LC-MMIBA for some values of
Min-Sup. PL/SQL-MMIBA outperforms MAFIA and LC-MMIBA for all values of Min-Sup. As a


                                                     46
general observation, the nature of the database has imposed a slight and smooth decline for the
execution time values when the values of Min-Sup increase, this observation can be considered for the
three algorithms.

  Table 2
  Execution time of MAFIA, Loosely-Coupled-MMIBA, and PL/SQL-MMIBA using Mushroom
DB with various MIN-Sup values.
                                                       MIN-SUP VALUES

 Alg.      1%       2%       2.5%      3%       4%        5%      6%      7%       7.5%     8%       9%      10%

 MAFIA     100:02   100:05   90:00     60:57    60:05     50:13   50:11   50:11    50:00    50:00    50:00   40:42

 LC-       110:15   110:11   110:00    90:00    70:01     40:43   40:30   40:27    30:46    30:37    30:30   30:30
 MMIBA
 PL/SQL-   80:03    80:22    50:47     40:30    40:22     40:00   30:25   20:18    20:12    20:06    20:01   10:50
 MMIBA

    Table (3) presents the execution times of the MAFIA, LC-MMIBA and PL/SQL-MMIBA using
Cancer DB and different values of Min-Sup ranged from 1% to 10% of the database size. Cancer DB
is dense database, which may justify the outperformance of LC-MMIBA and PL/SQL-MMIBA over
the MAFIA algorithm. One can see that a considerable difference in the amount execution time between
MMIBA in its two implementations from one side and MAFIA from another side.

Table 3
   Execution time of MAFIA, Loosely-Coupled-MMIBA, PL/SQL-MMIBA using Cancer Cells DB
with various MIN-Sup values.
                                                    MIN-SUP VALUES
 Alg.      1%       2%       2.5%     3%       4%      5%     6%           7%        7.5%     8%      9%      10%
 MAFIA     170:1    160:0    150:0    130:4    130:1   120:1  110:2        100:0     90:2     90:1    90:1    90:1
           3        9        0        4        3       1      1            3         3        5       0       0
 LC-       170:0    150:1    140:1    130:0    120:0   110:0  100:0        80:01     80:0     70:2    60:5    60:0
 MMIBA     1        3        0        9        0       0      3                      1        5       0       0
 PL/SQL    150:3    150:0    130:2    110:0    110:0   90:17  70:47        70:00     70:0     60:2    50:0    40:1
 -         8        2        1        0        0                                     0        5       5       3
 MMIBA


   Tables (4) and (5) show the results using Census and Dense Census DB, they are dense database,
but the latter is very dense in comparison with the former one. Some fields of the tables contain '*'
which indicates an "insufficient memory space" error. These cases appeared in MAFIA and LC-
MMIBA implementation. This problem did not emerge with PL/SQL-MMIBA depending on Census
and Dense Census DBs.

Table 4
   Execution time of MAFIA, Loosely-Coupled-MMIBA, and PL/SQL-MMIBA using Census DB
with various MIN-Sup values.
                                                       MIN-SUP VALUES
  Alg.      1%       2%      2.5%      3%       4%        5%     6%    7%          7.5%      8%       9%      10%
 MAFIA       *      220:2    200:0    190:3    190:1     190:0  160:1 140:0        110:1    110:0    110:0    100:3
                      4        0        7        3         0      2     0            5        0        0        9
 LC-       260:0    210:5    180:0    180:0    170:0     160:4  120:4 100:3        90:09    90:35    80:03    70:30
 MMIBA       3        0        3        2        5         5      5     2
 PL/SQL    270:1    190:5    170:0    180:0    170:2     150:0  100:2 100:0        80:03    70:13    70:00    60:14
 -           7        5        0        2        0         4      0     0
 MMIBA


Table 5


                                                          47
  Memory utilization of MAFIA, Loosely-Coupled-MMIBA, and PL/SQL-MMIBA using Dense
Census DB with various MIN-Sup values.
                                                   MIN-SUP VALUES
   Alg.     1%      2%     2.5%     3%      4%       5%      6%      7%     7.5%     8%      9%     10%

 MAFIA       *     310:0   290:0   270:5   270:4    270:2   250:0   220:2   200:1   190:1   180:0   160:1
                     9       0       6       3        9       2       1       2       0       0       9
 LC-         *     320:0   250:3   260:0   220:2    200:1   190:0   180:0   170:0   160:0   160:0   140:4
 MMIBA               2       9       0       0        6       7       0       8       0       0       5
 PL/SQL    340:0   310:1   270:0   250:1   200:2    200:3   180:5   170:3   160:0   140:0   130:0   110:3
 -           5       7       3       0       1        7       5       2       2       9       0       7
 MMIBA


7. Conclusions
    This research presents a new implementation of MMIBA using Oracle 11g PL\SQL. The aim of this
implementation is to combine the mining system with the data to be mined. This approach maintains
scalability to large DBs and data integrity with DBMS. In contrast with traditional mining approaches
which use flat files, depend on limited facilities of programming languages, make extensive use of main
memory and I/O traffics, embedded miner in DBMS can treat dataset larger than main memory. In
addition, it avoids I/O traffics because all DBS, tables, programming language and other structures are
in one environment (Oracle DBMS). This approach overcomes the loosely-coupled implementation in
considerable amount of execution time and memory consumption. The proposed system was tested
using many real and synthetic databases with wide range of properties including size, number of items,
database sparseness or its density. Many values for minimum support and conflict were used in these
tests to prove the robustness of the designed system.
    ORACLE DBMS is preferred to implement the maximal itemsets embedded miner since it has the
many privileged features listed previously. According to the implementation of MMIBA algorithm,
several new features have made a difference in implementation effort and time. Such features provided
in Oracle 11g PL/SQL don’t exist in other programming languages. For example ORACLE's regular
expression support comprised five functions (REGEXP_LIKE, REGEXP_SUBSTR,
REGEXP_INSTR, REGEXP_REPLACE and REGEXP_COUNT). In addition, the availability of
suitable data type for data mining such as CLOB to hold k-itemsets, plays important roles in supporting
the efficiency of the proposed miner. We believe that such facilities in DBMSs make big difference in
term of efficiency scalability, data integrity, execution time, memory consuming and security.
    The practical significance of obtained results is that the implementation of the miners as stored
procedures with the data to be mined produced significant results such as the scalability, robust memory
management, and efficient utilization of computation resources which leads to increasing the
throughputs. Such implementation is associated with correct selection of an algorithm and DBMS.
Prospects for further research are to study the ability of implementing more algorithms in different data
mining tasks.

8. Acknowledgements

   The work is supported by Al-Rafidain University College.

9. References
[1] Mohamed A. Gawwad, Mona F. Ahmed et al (2017): Frequent Itemset Mining for Big Data Using
    Greatest Common Divisor Technique. Data Science Journal 16(25), PP. 1-10.
[2] Nicolas D., Mohamed Q. (2016), Frequent Itemset Border Approximation by Dualization. In: A.
    Hameurlain et al. (Eds.) TLDKS XXVI, LNCS, 9670, pp. 32-60. Springer, Verlag Berlin
    Heidelberg.


                                                      48
[3] Haifeng L., Yuejin Z. et al (2016), A Heuristic Rule Based Approximate Frequent Itemset Mining
     Algorithm. Procedia Computer Science 91, 324-333, (2016).
[4] C. Ganesh, B. Sathiyabhama et al (2016), Fast Frequent Pattern Mining Using Vertical Data
     Format for Knowledge Discovery. International journal of emerging research in management &
     Technology 5(5), pp. 141-148.
[5] Philippe F., Jerry C. et al (2017), Survey of Itemset Mining.                            Wiley,
     https://onlinelibrary.wiley.com/doi/abs/10.1002/widm.1207.
[6] Mais A. Mohammed, Hussein Al-Khafaji (2017): Maximal Itemsets Mining Algorithm Based On
     Bees' Algorithm. IEEEXPLORE, https://ieeexplore.ieee.org/document/7976130.
[7] Yunuscan K., Tansel O., et al, (2016), Utilizing Maximal Frequent Itemsets and Social Network
     Analysis for HIV Data Analysis. Journal of Cheminformatics 8(71), pp. 1-15.
[8] Zhengxin C. 2015, Towards Integrated Study of Data Management and Data Mining. (ITQM 2015)
     Procedia Computer Science 55 (2015) 1331 – 1339.
[9] Beibei Z., Xuesong M 2006, Data mining using Relational Database Management Systems. W.K.
     Ng, M. Kitsuregawa, and J. Li (Eds.): PAKDD 2006, LNAI 3918, pp. 657–667, Springer-Verlag
     Berlin Heidelberg.
[10] Lu H. (2001) Seamless Integration of Data Mining with DBMS and Applications. In: Cheung D.,
     Williams G.J., Li Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001.
     Lecture Notes in Computer Science, vol. 2035. Springer, Berlin, Heidelberg.
[11] M. S. Sousa, M. L. Q. Mattoso et al: Data mining: a database perspective.
     http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.8188&rep=rep1&type=pdf.
[12] Amir N., Surajit C. (2000), Integration of Data Mining and Relational Databases. Proceedings of
     the 26th International Conference on Very Large Databases, pp. 719-722 Cairo, Egypt.
[13] Tadeusz M., Marek W. et al (2000): Data Mining Support in Database Management Systems. In :
     2nd International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2000)
     Proceedings, Pages 382-392, Springer-Verlag London, UK.


                                                  49