<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MDORG: Annotation Assisted Rule Agents for Metadata Files</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hiba Khalid</string-name>
          <email>Hiba.Khalid@ulb.ac.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Esteban Zimányi</string-name>
          <email>Esteban.Zimanyi@ulb.ac.be</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Université Libre De Bruxelles</institution>
          ,
          <addr-line>Brussels</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université Libre De Bruxelles</institution>
          ,
          <addr-line>Brussels</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Metadata files are often incomplete and of inadequate quality with underlying issues such as low maintenance, lack of provenance, inaccurate tagging, limited or no annotation and, diferent metadata standards. Such metadata is not generally viable for analysis, large scale integration, data management, or quality metadata maintenance. These types of metadata present challenges in data analysis and production tasks. The expense associated with metadata discovery, metadata annotation, and management amounts to the time and efort of resources. To overcome the aforementioned issues, leverage can be borrowed from intelligent systems that can improve quality and reduce the manual efort associated with metadata discovery, annotation, and management. Intelligent agents can aid in identifying information if they are provided relevant labels or agents are allowed to learn over-time. Thus, the annotation for intelligent agents can improve tasks such as automated summaries, metadata management, cataloging and, feedback management. We experiment and propose the utility of annotation for rule agents to facilitate the analysis and organization of metadata files.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        In simplistic words, metadata [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] defines data, data collection
and data recording process, and details about what the data
entails. The importance of metadata lies in its inherent quality of
maintaining the information about data. Generally, there are
three possibilities when dealing with metadata (i) Case-A: when
there is no metadata available, and data profiling [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] is required
to infer metadata. (ii) Case-B: when there is little to no
metadata available, in such cases, mostly some sort of metadata such
as creation date and publisher titles are available. (iii) Case-C:
when the metadata is available, however, of inadequate
quality, i.e., the metadata content is either incorrect or not recorded
properly. In this case, the challenge is to extract meaning from
the available metadata. There is no speculation or doubt when
it comes to metadata usability [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The process, however, is
non-deterministic and often very resource and time expensive.
A beneficial understanding amongst the scientific community
around the topic of publicly available metadata can yield several
advantages. Provided so, the more the available metadata, the
easier it would be to retrieve datasets, suggest and recommend
datasets, and pre-analyze various data sources. A similar
advantage is evident for data lakes [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], and data warehousing [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
where data privacy can be a prominent concern, and metadata
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Metadata Types</title>
      <p>
        Metadata types include: (1) Descriptive Metadata: It [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] describes
a dataset or resource for discovery, profiling and identification.
For example, title, authors, keywords, abstract, and comments.
(2) Structural Metadata: It [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] defines the type, structure, and
data relationships. This type of metadata highlights the form and
organization of data under consideration. It can include the order
in which information is available as well as; its original recording
format. The most popular examples include information schema
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and definition schema. (3) Administrative Metadata: As the
name suggests, this type of metadata contains information that
can help manage and update a source. It typically includes
information such as date of creation, published date, update dates,
ifle type, logs, file definition, access management, and access
rights. Most commonly used subsets of administrative metadata
include (i) Rights management metadata: this sub-category deals
with data rights, intellectual property, and sharing policies. (ii)
Preservation metadata [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]: includes the information used to
archive resources, update and quality management, and resource
preservation.
      </p>
      <p>In the context of this research paper, we will be discussing
the (i) usability of rule agents in understanding metadata files,
(ii) define annotations on metadata files (for example, metadata
ifle headers, metadata group headers), (iii) to observe rule agents
using annotated MD files that could facilitate metadata file
organization that includes identifying metadata types, content,
dispersed and misplaced metadata. The paper is organized as follows:
section 2 discuss the use of rule agents for diferent applications
and use cases, section 3 discusses the working and design of
intelligent agents, section 3.1 defines the need for rule agents and
how they can be incorporated to support and possibly facilitate
in metadata organization and categorization. Finally, we discuss
the importance of using annotated files for rule agents in section
4. Section 5 provides insights on some of the challenges we
encountered, and section 6 provides an overview and conclusion of
our research paper. Our design and experiment include metadata
ifles extracted from the Kaggle repository and Data.gov metadata
ifles. The examples described in this paper are derived from the
UK Road Safety dataset on Kaggle1.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>This section reviews the most resonating relevant work in the
domain of understanding metadata and the role of AI agents.</p>
      <sec id="sec-3-1">
        <title>Intelligent Agents and Metadata: AIMMX[30] provides a</title>
        <p>
          library-based metadata model extractor from software
repositories, it performs important tasks such as identifying associate
resources, model extraction, and model names. In particular,
metadata supportive systems are advancing and taking support from
machine learning and AI to build better systems such as IBM
Redbooks[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] that among other functions uses metadata from
imagery for specifications. Thereby the literature gathers
inspiration from rule and knowledge systems to address concerns
of semi-automatic categorization and organization of metadata.
Rules can have a unique impact on the type of system and
available architectures, they not only afect individual agent
architectures but multi-agent systems also [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The applicability of
rule agents extends to semantic web technologies and
architectures [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. A general architecture for rule agents is devised and
materialized using semantic web languages. Rule-based systems
are diversely applicable, it tends to facilitate systems in taking
crucial decisions such as intrusion detection [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Rules for
intrusion detection like other rule systems have to be pre-defined
to identify and diferentiate between incoming packets on a
network. Testing these systems also require a set of performance
and evaluation criterion’s that have to be predefined to mark an
agents or systems performance. Another important use case for
rule-based agents is Grid applications [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Rule-based agents
perform adequately for high scale and data-intensive solutions.
Contrary to belief, these applications not only perform well but
are highly advisable for autonomic applications; that allows rule
agents to operate on the set of the rule designed for autonomous
systems [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The application and use of rule agents for diferent
application systems are apparent and understandable. Based on
evidence and performance of rule agents, they are deployed in
various applications and systems.
        </p>
        <p>
          Metadata Management and Profiling: Metadata
management has taken many forms with time, it is important to
understand how metadata has evolved over the years in terms of
applications and tools. In general metadata [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] and metadata
management [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] are important components of data
warehousing and large scale data integration systems. Amongst many
challenges in the field of metadata, standards [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], data
profiling [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], and format normalization is one of the most
prominent concerns. Moreover, the need to maintain and provide
reproducible metadata in terms of lineage and provenance is one
of the most promising aspects of metadata in semantic web [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>INTELLIGENT AGENTS</title>
      <p>
        Intelligent agents are capable of performing tasks that are
instructed to them. The intuition behind introducing intelligent
agents for metadata file understanding and categorization stems
from the idea of eliminating or reducing the manual efort
required in cleaning, processing, and organizing metadata files. We
make use of rule agents for assisting in the process of metadata
tagging [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], metadata annotations, and metadata file
organization. Rule agents, or more commonly known as policy agents,
1https://www.kaggle.com/tsiaras/uk-road-safety-accidents-and-vehicles
function on a set of pre-defined policies to perform a task.
However, this is not simplistic metadata management, and
understanding for intelligent agents poses many challenges such as (1)
lack of metadata availability, (2) poorly orchestrated metadata,
(3) lack of governance [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], (4) bad quality metadata and, (5) lack
of metadata standards [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. The rule agent’s design aims to
semiautomatically analyze available metadata in files and perform
actions such as re-arranging, assigning useful tags, designating
metadata types, and metadata organization for profiles. The rule
sets for diferent groups of agents were distinct in nature to
observe how comprehensive it was for agents to deal with diferent
types of metadata and information available in the metadata files.
3.1
      </p>
    </sec>
    <sec id="sec-5">
      <title>Rule Agents</title>
      <p>
        Rule agents [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] are simple condition action agents that
operate on a provided environment by using the available rules and
completing the assigned task. Each rule agent accesses the rules
from the rule library and uses its allowed set of actions to analyze
and categorize the files. The rule library comprises rule sets that
contain multiple rules for a particular category. The rule library
contains many rules that correspond to diferent uses case in
metadata file categorization. Each rule set can contain two or
more than two policies. Rule agents use the available rules to
organize and categorize information inside a metadata file. The rule
library contains all rules that are accessible by the rule agent [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
see table 1. Each rule set in table 1 comprises subset rules that
satisfy the main aspect of that ruleset. For example, ruleset R_1002
i.e, “count columns", contains a total of 4 rules that (1)identifies
columns, (2) column names, (3) keyword matching, and (4) counts
the total column names that appear in a metadata file. Similarly,
other rulesets contain appropriate rules to identify and target the
main purpose of the ruleset. Some rule sets don’t contain a lot of
rules and are designed more simplistically. For example, the rule
set R_1018 contains only one rule i.e. to search for empty line
annotation. In our problem design, we work with three libraries,
the rule library (contains rule sets), the file library(contains
metadata collection), and the agent library(contains rule agents). The
metadata file organization and content identification are obtained
as a result of policy application using rule agents on collected
metadata files.
      </p>
      <p>
        Rule-based systems are designed for knowledge storage,
knowledge manipulation, and informed feedback from the system. Rules
are designed as a part of the rule-based system. A rule-based
system first gathers the knowledge and then manipulates this
knowledge to derive useful information, conclusions, or set of
actions. Rule-based systems are called expert systems; that can
understand, store, and derive inference from available
information. In the context of our research goals, we intend to identify
diferent types of the available metadata, comprehensively
deduce metadata information, and categorize metadata content
properly. Thus, we work with the most authentic and traditional
type of rule-based system called Deduction expert systems (DES)
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The DES works with a domain-specific knowledge base and
rules to produce (1) deductions and (2) actions based on choices.
      </p>
      <p>The choices can be curated or manually programmed into the
expert system. In our case, we deal with both types of choices that
can render multiple actions. We group types of agent actions as
‘Action Set’ to centralize the diferent types of agent actions a rule
agent can perform or attain. For an expert system to function end
to end, it has to be intuitively designed according to the domain
problem. However, the basic component of all expert systems
comprises the following: (i)Knowledge Acquisition: The process
of collecting knowledge on a domain, defining methods,
boundaries, outliers, and special cases, etc. (ii)Rule Base or Knowledge
Base: this comprises a collection, list or set of indicative rules to
render action and choices., (iii)Inference Engine: this engine is
responsible for deducing or understanding the rules listed.</p>
      <p>We designed a multi-agent system for metadata categorization,
ifle understanding, and organization. However, in the context and
scope of this research paper, we will be discussing the working
and performance of 5 rule agents and omit the rest of the agents
due to space limitations. The agent continues to apply rules until
the metadata file is completely analyzed see figure 1. Each agent
is randomly assigned a metadata file from the mixed collection.</p>
      <p>Once each agent has its file, a number of actions are performed
by each agent to understand the file and its content. For example,
in figure 2 we illustrate how agent RA_101 first identifies the
prominent file features that are expected to be in the file such as
headers, group headers, empty lines, and metadata content. After
this step, each agent takes on the metadata contents identified
and starts applying rules by accessing the rulesets from the rule
library. Each rule application involves the use of allowed agent
actions such as read, write, match, drop, delete, and annotation
request. An agent request for annotation if there is no associated
rule pre-defined in the rule set, or it requests for annotation if
the item cannot be categorized, i.e. when an agent is indecisive
about an annotation and its associated action. In figure 4, there
are five types of information for a rule agent to identify in this
use case. Selective information is listed below:</p>
      <p>3.1.1 Metadata Group Headers: the annotated file in figure
4 is assigned to a policy agent. The policy or rule-based agent
utilizes the annotation to identify the group header and looks
for data inside each group header. The group headers in figure 4
are ‘Usage Information’,‘Maintainers’ and, ‘Updates’. Policies are
designed to equip agents in providing accurate results.</p>
      <p>3.1.2 Metadata Content: the second type of information a
policy agent encounters is ‘metadata’. This is regarded as data in
our problem or more precisely as value, and this is not annotated.</p>
      <p>Thus, the policy agent identifies a group header, locates data, and
tags it accordingly, for example, ‘data: Group Header_Name’. For
example, the first bulk of data would be stored as ‘data[License:
Database: Open Database, Contents: Database Contents,
Visibility: Public]: Usage Information’. Now, this information is
available for all policies to explore and exploit. The data inside group
headers will be utilized to categorize the type of metadata or
information available in the next phases. Most of the annotation
in metadata files falls under the metadata content category. As,
most of the information besides headers, group headers, and
empty lines is metadata content.</p>
      <p>3.1.3 Empty Lines: the third type of information a policy
agent encounters for this metadata file shown in figure 4 is empty
lines. The empty lines are annotated as ‘empty lines’ for policy
agents to quickly diferentiate between data, empty lines, and
empty values.</p>
      <p>
        Rule-based agents typically function on a set of pre-programmed
policies or a set of rules [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. These are basic instruction set for
an agent to use and perform actions. In the case of metadata
categorization, organization, and understanding, rule based agents
are promising. However, autonomy and self-reliance of decisions
are dificult to achieve even for intelligent agents. Annotation
is a technique that allows the rule agents to (i) make sense of
the information (ii) facilitate future autonomous actions. In our
research, we use manual annotation and tagging to observe how
intelligent agents perform and categorize metadata information
provided to them. To facilitate and provide an understanding of
the problem to rule agents, we established annotation criteria for
rule agents, figure 4 demonstrates the application of established
annotations from table 2 highlights the annotations provided to
rule agents for diferent types of metadata files.
      </p>
      <p>For manual annotation, we group the basic contents of a
metadata file that an intelligent agent can come across for a certain file
type(e.g., text, CSV, TSV, JSON). After pre-processing of
diferent file types into spreadsheets the annotation process is further</p>
      <p>Annotation Description
A metadata file header can include file title, the data set title, and other headings or sections
in a metadata file such as Keywords etc.</p>
      <p>A group header is a header that can represent one or more headers, a collection of textual
description can contain tables and column information.</p>
      <p>This is actual value or content against metadata headings, the content inside metadata headers
and group headers, values against metadata types such as created by ,
This identifies diferent types of metadata such as descriptive, administrative, technical etc.
An empty line, group of empty lines or empty space that contains no actual metadata.</p>
      <p>Table 2: Sample Policy description for Rule based agents
carried on. After grouping, we add an annotation that rule-based
agents can use to categorize, organize, and categorize content in
metadata files. We refer to this phase as Anatomy Tagging i.e.,
in this step, we annotate sections that represent the anatomy of
a metadata file. For example, table 2 illustrates the anatomical
annotation contents. Headers and group headers are annotated to
support deterministic and scalable search through a metadata file.</p>
      <p>Empty lines, empty paragraphs, or empty chunks can indicate the
end of files, termination triggers, etc. To deal with this, we
annotated empty lines in a metadata file and set actions for rule agents
to delete and discard provided it does not change information at
hand. For instance, textual metadata files contain empty spaces,
headers such as a Description that includes a bunch of text about
the dataset. The agents need context and understanding of what
classifies as a header and what is the actual content or metadata
value.</p>
      <p>For this purpose, we pre-process the metadata files and assign
meaningful annotations so the rule agents can process the files,
extract, categorize, and organize information more accurately.</p>
      <p>Thus, incorporating the kind of metadata available in a file for
the rule-based agents is critically important. We dealt with this
challenge by annotating limited information in three broad
metadata categories and their subcategories(see section 1.1). Mostly
metadata files are messy and unprepared; the objective is to
utilize rule agents to organize and make sense of the messy files and
generate profiles. To achieve this, the agents must understand the
kind of information they are dealing with. For this, we annotate
examples from diferent types of metadata so rule agents can
categorize and organize metadata files. Figure 4 indicates the
annotation strategy applied for a small file.
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Annotation Example</title>
      <p>
        Let us look at a simple use case from collected and annotated
metadata file repository, the metadata file acquired from UK GOV
Road Safety Data (2005-2015) from Kaggle 2. Figure 4 depicts a
simple metadata file extracted from Kaggle web page. It does
not contain a metadata file; as a response, the agent adds a title
inferred from other metadata files for the UK GOV Road Safety
dataset. It is important to note that there is plenty of more
metadata available on the Kaggle web-link, but it is unorganized, and
it is not marked or labeled as metadata. There are typically three
use cases that we address namely:, (i) Case-I: when there is no
metadata available, in this case, we use techniques such as
exploratory data analysis (EDA) [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] and data profiling [
        <xref ref-type="bibr" rid="ref1 ref24">1, 24</xref>
        ] to
derive meaning from the data itself. (ii) Case-II: when metadata
is available in high quality. Most of the time, metadata is either
unavailable or in poor quality (iii) Case-III: is when metadata is
2https://www.kaggle.com/tsiaras/uk-road-safety-accidents-andvehicles/metadata
available but it is either misplaced or of bad quality. This case
is where most of this research paper focuses. We have worked
with a dataset that does provide textual metadata, descriptive
metadata, metadata tags, headers, context. But, it is misplaced
or, more precisely, unorganized. Our goal was to identify this
metadata using rule agents. We have designed an instruction
set for metadata files to facilitate our rule agents in detecting,
identifying, and categorizing metadata file contents. We annotate
these instruction set categories and supply them to rule-based
agents (RBA). Table 2 indicates the annotations we perform the
ifrst time files are accessed. These files are annotated and are then
further processed by rule agents. The table 2 includes the main
headers that might appear in a metadata file, empty lines are
identified to avoid repetitive cycles for AI agents. It is important
to note that if there are sub-headings or, ‘headers’. We annotate it
as a ‘Metadata group header’. It helps the agent in diferentiating
between file headers and group headers that can hold a chunk of
metadata or information. The chunks of metadata are treated as
data and are further analyzed to identify the types of metadata it
contains. This is also an annotation category identified as
‘metadata types’ in table 2. Once the annotation is complete, now we
have to design rule sets that can be utilized by policy agents to
render a decision. The collection of files, annotation, metadata
cleaning &amp; organization, and preparation comprise the
knowledge acquisition phase. Figure 3 illustrates four main stages for a
policy-based or rule-based agent. The second step involves Rule
Base, a rule base is a collection of rule sets or policy sets. These
policy sets are designed to support agents in performing diferent
actions. The rule sets are analyzed by an inference engine that
makes sense of how each rule matters and applies to the case.
Finally, an action center is a set of actions an agent is permitted
or allowed. Thus, a policy agent can perform a certain limited
set of actions based on an inference derived by rule application
on metadata files.
4
      </p>
    </sec>
    <sec id="sec-7">
      <title>EVALUATION</title>
      <p>
        The role of annotated metadata files for rule agents was observed
by conducting experiments that evaluated and compared the rule
agent’s ability to categorize and organize metadata files with and
without annotations. Another objective of the experiments was
to observe how the rule agents performed if rules were altered or
manipulated. The experiments were conducted on a spreadsheet
dataset that was gathered from Kaggle metadata files, Kaggle
resource descriptions, and Data.gov metadata files from various
categories. A total of 1123 metadata files were retrieved from
both repositories and added to a “mixed collection” to the
working repository for rule agents. Throughout the research paper,
we will refer to the mixed collection as a collection of metadata
ifles from both Kaggle and Data.gov. As far as our research
contribution is concerned, to the best of our knowledge, there are
no substantial contributions that directly address and aim at the
exact definition of the problem. However, there are related tools
such as Metanome [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] that can be tested with the technique
we have developed to assess how their performance improves
or deteriorates. With regard to the scope of this research paper,
we would like to emphasize that the experiments carried out are
based on the usability of the technique and how meaningful
observations have been made with regard to metadata management
using rule agents. It would go beyond the scope of this research
paper to compare and examine the improvement of existing tools
by the technique we have developed. Nevertheless, our future
work will be concentrated on observing performance change and
the applicability of our technique to other tools.
4.1
      </p>
    </sec>
    <sec id="sec-8">
      <title>Evaluating Rule Agent Performance</title>
      <p>To understand the role of annotated metadata files for metadata
categorization and file organization. It is crucial to observe how
agents behave with and without annotations. This experiment
was performed with a set of five rule agents. In the first phase,
these five rule agents were provided metadata files that were not
annotated, or labeled. The same set of agents were analyzed and
observed in a setting with annotated metadata files. Each agent
was randomly assigned metadata files from the entire mixed
collection metadata files retrieved from Kaggle and Data.gov. Figure
5 illustrates the performance of each agent with and without
annotations. For instance, the AR1 represents rule agent_001 with
annotated metadata files and, UR1 represents the performance of
the same agent with unannotated metadata files. Each agent in
both categories (annotated, unannotated files) was evaluated on
four factors: (1) accurate identification of metadata headers, (2)
correct identification of metadata group headers, (3) MD content
or values against metadata types, and (4) accurate categorization
of available metadata into metadata types. File annotation
request was also examined for all agents, to understand how many
times an expert user was prompted for a manual annotation
request. This happened when the rule agent could not decide on
the categorization of metadata value at hand or the occurrence
of empty blocks, empty lines, etc (see figure 6). The experiments
in figure 6 highlight the frequency of manual annotation request
of 5 rule agents(case-I agents-with annotated files, and case-II
5 rule agents-unannotated metadata files). Each time an agent
cannot categorize content i.e. data object, header, content, text
block, etc., it sends a request for manual annotation to the user.
This represents indecisiveness as these agents are simple policy
agents and are not reinforced, agents. From this experiment, we
expected to learn about the usability of adding annotations for
metadata headers, group headers, content, and types. As a result,
we observed that unannotated files required significantly more
annotation requests as compared to files that were pre-annotated.
4.2</p>
    </sec>
    <sec id="sec-9">
      <title>Detecting Metadata Types and Analyzing</title>
    </sec>
    <sec id="sec-10">
      <title>Policy Frequency</title>
      <p>The second experiment was to observe if rule agents can
independently detect metadata types with and without annotations.
The objective was to alter policies for rule agents and observe
if it would afect the overall change in metadata type detection.
Firstly, rule agents without annotations were 80% of the times
requesting for a manual annotation if a metadata type had to
be detected. Figure 7 indicates detectable metadata types
(Descriptive (DMD), Administrative (AMD), Structural (SMD), Rights
(RMD), Preservation (PMD)) in a number of files(recorded in
percentages). We observed that most of the metadata files did not
contain the exact information such as resource link to the original
data repository, update policies, copyrights, etc. In most cases,
this metadata could however be retrieved, we deal with this and
regard it as “misplaced metadata”. Thus, rule agents with
annotated files were able to detect more metadata types and were
intelligent enough to identify discrepancies or missing
information from files. In regards to “misplaced metadata”, we made a
few observations on collected files. Some of the most prominent
observations were as follows: Lack of metadata support and
updates: most websites, resources, and portals do not maintain a
complete metadata update cycle. The information on the dataset
description page and the actual metadata file is observed to be
incoherent. Lack of metadata download support: repositories
and resources describe a resource on the main page or portal but
disregard the facility to download this valuable metadata. For
example, the repository pages at Kaggle and Data.gov both
contain essential descriptive and rights metadata. This information
such as resource description, keywords is not readily available
for downloads. It is also not properly tagged and is not
automatically embedded or added to downloadable metadata files. In most
cases, it has to be manually downloaded or added to the metadata
collection.</p>
      <p>Another important experiment was to observe how many
times a particular rule was accessed and applied by the rule
agents. This was observed by maintaining a Rule log in each
cycle for all rule agents. Figure 8 depicts the frequency (recorded
in percentages) of rules fired by each rule agent. The most useful
aspect of this observation experiment was to understand the type
and frequency of metadata content prevalent in collected files.
For example, R_1006 firing frequency is quite high indicating that
the rule agents encountered null and empty values in the majority
of metadata files. Similarly, R_1022 firing frequency was quite
low in comparison to other rules indicating that the majority
of the metadata files did not contain information relevant to
data archives, metadata archives, or legacy support files and
hyperlinks. Another aspect that can be observed from figure
8 is that not all metadata files contain relevant hyperlinks. As
indicated R_1024 was accessed by agents a couple of times.</p>
      <p>The table 1 includes a sample of 24 rule sets that are measured
in figure 8 that demonstrates the rule firing frequency by rule
agents on the mixed collection. It visualized how each rule is
ifred and how many times it is utilized on average by each agent.
The table 1 provides insight into policies available to all rule
agents for both annotated resources and unannotated resources.
Each policy defined in table 1 is made available to rule agents
based on their tuning. We refer to tuning, i.e.“a set of policies”
as profile setting available to a particular agent. In our research,
we experiment with agents in diferent settings by increasing or
decreasing the number of policies accessible by a particular agent.
For example, R_1010 searches the file for keywords as “publisher”,
synonyms, and approximate matches for this keyword group. The
functionality allows a rule agent to scan headers, group headers,
and text blocks to locate a valid publisher and its details in a
metadata file.
The policy agents are assigned a set of actions that can be chosen
to attain the next state and complete the assigned task (based
on available policies provided to each agent). The allowed set of
actions for any rule agent in this experimental setup include the
following: Read:when the agent scans and reads the file ingested
into the system. Search:a search action that allows the agent to
scan the file, gather insight and look for common words,
indicators and headers. Write: allows the agent to write into a new
metadata file. Match: is an action where the agent matches its
knowledge of annotation with available tags in the metadata file
and cases where there is a need to search for synonyms, etc. Drop:
is the case where a content is dropped or considered invaluable
by the agent and added in the metadata file as ‘supplementary
information’. Delete: is when an agent decides to delete a piece of
content such as consecutive delimiters, misspelled words.
Compare: when two or more headers, sub-headers, or content pieces
are compared for duplicate data identification. Request
Annotation: when an agent is indecisive on a set of sentences such
as comments (if not annotated), the agent requests for a manual
annotation from the user(this action is typically requested after
the agent completes the read cycle and before the write
operation). In table 5, five rule agents are observed for action behaviors
on the same metadata file. The experiment observes the role of
policies and how it changes the action initiation sequence for
each agent. For example, RA_101 does not exclude the dropped
and deleted items. Thus, the read items are the same as written
items in new metadata files but with a diferent context. On the
other hand, agents like RA_102, RA_103, etc., do not add these
items into the total write count for newly generated metadata
ifles. The diference in operations is based on the accessibility of
policies and action preferences assigned to each agent. All five
agents were observed for diferent settings to understand how
actions and operations difer based on agent action allowances.
Thus, by changing policies we were able to observe significant
changes in the performance of agents for the same metadata files.
To obtain a better picture and understanding of agent
performance we designed three major categories of experimental
observation. Firstly, we tested our technique and rule agents on mixed
collection i.e., a collection of metadata files obtained from Kaggle
and Data.gov. Figure 9 illustrates the performance of each rule
agent on the mixed collection. The second experimental setting
was to observe how agents performed with diferent levels of
annotated datasets. Table 3 provides an analysis of each agent’s
performance in accordance with the annotation percentage. We
defined three annotation levels starting from 25%, 50%, and
concluding with 75% annotated metadata files. By observing how
agents performed we were able to understand that annotated
metadata files do significantly improve the agent performance
even in the most dificult cases such as metadata type
identification. Most importantly, from this experiment, we observed how
each agent was afected based on its primary policy selection (i.e.,
setting for each agent in terms of allowed or accessible policies).
For example, in table 3.
It is very critical to understand how the technique responds to
diferent types of datasets. In our case, the aim was to first collect
metadata files that can be analyzed by rule agents. In the context
of our research problem, we did not limit the collection of
metadata files to a certain domain. Thus, the collection comprised
metadata files from various domains such as accidents,
Geospatial metadata files, pharmacy, music, movies to name a few. The
challenge was, however, to understand and find relevance (if it
persists) in these metadata files and secondly to identify diferent
constituents of metadata. Based on our collection, we worked
with a total of 1123 metadata files. A total of 563 files were
obtained from the Kaggle repository (on numerous domains and
topics), and a total of 560 metadata files were retrieved from the
Data.gov repository (on numerous domains and topics). Table 4
depicts our overall observations and understanding of metadata
collection. We have separately analyzed both collections on the
following criteria:</p>
      <p>4.5.1 Pre-Processing. : this metric identifies the amount of
pre-processing required on files before it can be fed to the rule
agents for further categorization and analysis. Pre-processing is
an important aspect as most metadata files are raw and do not
come processed with preambles, labels, and annotations.
Moreover, the file structures and standards vary for the majority of
the cases. We observed that metadata files from Kaggle required
more pre-processing in terms of missing values, incorrect values,
and file standardization. Also, metadata from Kaggle had many
cases of “misplaced metadata” making pre-processing a necessary
step in the process. On the other hand, metadata retrieved from
Data.gov required basic cleaning and file preparation and was
already in structured formats.</p>
      <p>4.5.2 Rule Applicability Rate. : we also observed how many
rules were applicable on a pre-processed metadata file from both
collections. Table 4 identifies a relative diference between rule
applicability on both data collections. However, due to more
header and tag oriented file structure, the metadata files from
Data.gov were perceived to be more rule applicable in comparison
to Kaggle collection.</p>
      <p>4.5.3 Rule Failure Rate. : the rule failure rate is described as
a failed attempt of rule application by a designated rule agent.
If an agent executes a rule and there is no profitable outcome
such as no conclusive agent action is performed and no task is
completed it is recorded as a failure rate. The files from Kaggle
had an 11%failure rate and metadata collection from Data.gov
had an 8% failure rate as depicted in table 4.</p>
      <p>4.5.4 Rule In-applicability Rate. : this is defined in terms of
lack of metadata elements or lack of metadata content inside
metadata files. The unavailability of information in metadata files
causes a rule in-applicability. It means that certain rules were
never applied by agents as those elements, keywords, headers,
or information content was not available. From a collection of
agent polices, a total of 16% rules were inapplicable in diferent
phases on files retrieved from Kaggle, and a total of 5% rules were
inapplicable on metadata gathered from Data.gov (see table 4).
5</p>
    </sec>
    <sec id="sec-11">
      <title>CHALLENGES</title>
      <p>Some of the most eminent challenges we faced were: Metadata
collection: Metadata files were available for some datasets.
However, most of the metadata was misplaced or can be gathered from
other resources. Thus, one file with all metadata was a rarity. In
this section, we discuss some of the most dificult challenges we
encountered. for some of these challenges, we aim to provide
partial solutions and extend them to our future work.</p>
      <p>Metadata Pre-processing In almost all of the files collected
from the Kaggle repository and Data.gov, the metadata had to be
cleaned and prepared before it can be utilized by our system. The
prepossessing is a particular challenge as all files are in diferent
formats, the information does not strictly adhere to standards,
and the files are populated diferently for each dataset. Thus,
metadata file pre-processing is one of the challenges that we
currently deal with and address. Most of the metadata files were
raw, and metadata was dispersed and misplaced a pre-processing
step had to be established.</p>
      <p>Conflicting Labels and Label Cleaning We address this
by designing labels to avoid conflicts in the ground truth. The
strategy we defined includes a growth pattern that allows for
labels to evolve and change over time if needed. We set this up by
providing functions that allow the user to manipulate non-critical
labels (the implementation details of this feature are beyond the
scope of this research paper). For the context of the current
research paper, the labels added are marked for each file, and a
non-conflicting hierarchy of labels is executed to avoid confusion
for the rule agents. For example, if a file contains more than one
main header, it should explicitly contain separate information
and should be a diferent word. One file cannot have two main
headers title says “Descriptive Metadata” or “Publisher”. Such
hierarchical control allows for information categorization and
avoids conflicting labels. We have extended our work on dealing
with conflicting labels and intend to improve as we move along.</p>
      <p>Nested Files and Data In the context of our research we only
deal with non-nested semi-structured files.</p>
      <p>Failure Despite Annotations we prepared ground truth to
deal with the most frequent and available information in
metadata files by providing annotations. However, even with
annotation practices, the rule agents were unable to provide a decisive
result on cases such as identification of the most recent
metadata file version. This was a failed use case for rule agents. The
annotation and content information was insuficient to make a
necessary conclusion.</p>
      <p>Unreadable Text The collection contained metadata files that
had discrepancies in terms of incorrect (text in English with
incorrect spellings etc.)and indiscernible text (text in another
language). In both cases, the rule agents were unable to classify
the text in any of the categories. Since we do not provide
multilingual support and many metadata files contain information
from diferent languages such as French, Chinese, Italian, etc.
This remains a challenge for our current research.</p>
      <sec id="sec-11-1">
        <title>Incomprehensible Metadata Files The collection comprised</title>
        <p>of files that contained metadata information such as schema and
legacy links contained inside an image. This was a challenge and
a limitation of our system since we do not process image files.
Thus, we had to exclude such files from our collection.</p>
        <p>Inconclusive Column Names This is a semantic challenge
that is not a part of our problem description. As we deal with
metadata files, and categorize the information inside them, clean
it for better reusability. Nonetheless, we identified in about 25% of
metadata files in the extended collection (before scrutiny, the final
collection contained 1123 files only) contained column names
that did not represent the information accurately. Also, most
of the time column names and column values were conflicting.
If the column name was, for example, “Publisher” it would be
expected to contain the dataset publisher name, date, or resource
link. But it contained a phone number or city name.</p>
        <p>Empty Files We address this issue partially in our current
work by identifying blocks of empty data and empty files. This is
another challenge that is more oriented towards understanding
ifle structures and identifying diferent tables or contents inside
a file. It is beyond the scope of our research, we only identify
empty metadata files from our extended collection and disregard
them from the main collection i.e., 1123 files.</p>
        <p>Inconclusive Headers This is a challenge that we address in
our current research and we are extending on this challenge in our
future work as well. The problem inhibits naming conventions
and identifying whether a title or text is to be considered a header
or sub-header. Primarily we deal with this by intuitive annotation
techniques in our ground truth. We intend to extend this approach
by developing a tree-based structure for the title, header, and
subheader. The tree would be designed to identify diferent levels of
headers and text to remove the ambiguity of inconclusive headers
and their hierarchy in metadata files.</p>
        <p>Other Challenges Another case was misplaced commas and
delimiters, rule agents failed because preparation was manually
done and no rules directly addressed punctuation, delimiters,
misplaced commas, etc. Another example of rule agent failure
includes the case of broken text paragraphs and empty columns.
Most of the time the rule agents were able to identify an empty
row or column if it was labeled, on the other hand, we did observe
rule agents in a cycle of interpretation of empty rows in sheets.
Finally, missing values were also an issue even with annotations,
as rule agents work by example meaning, the rule agents failed in
identifying if the content or “metadata value” was placed against
the correct metadata category.
6</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>CONCLUSION</title>
      <p>In this paper, we have discussed the prospect of utilizing rule
agents that could facilitate the metadata categorization based on
two types of metadata files i.e., annotated MD files and
unannotated metadata files. To support and enhance the use of metadata
in data management and business information systems it is
crucial to minimize the amount of time spent on pre-processing
and categorization. We provide a policy-based system attributed
by policy agents to identify, observe, and categorize available
metadata files for reusability and easy understanding. To obtain
this, we explored the possibilities of changing policies and
observing the behavior of agents. By our observation, change in
available and accessible policies afected the agent’s ability to
correctly categorize a piece of metadata in a file. We also observed
that diferent files contain information that might be completely
irrelevant based on the policies defined in our research. This
method of experimentation helped us in understanding the core
components of metadata files and how policy agents can best
ift a metadata file categorization use-case. The second
experiment we conducted was to utilize manually annotated files for
rule agents to observe improvement and ease in metadata file
categorization and processing. As a result, we observed that
annotated files are helpful to policy agents as the search space is
more reasonable, and the policies can easily identify the type of
item in the file. We intend to take a few steps to improve the
readability and categorization of metadata files using rule agents.
In our upcoming experiments, we intend to develop self-learning
and service-based AI agents that can learn from their success
and failure rate as well as user feedback. We will be focusing
on allowing agents to learn from a feedback network and how
it afects the user decision process. A feedback system that can
identify how each policy contributed to metadata categorization.</p>
    </sec>
    <sec id="sec-13">
      <title>ACKNOWLEDGMENT</title>
      <p>The work of Hiba Khalid is supported by the European
Commission through the Erasmus Mundus Joint Doctorate project
Information Technologies for Business Intelligence-Doctoral College
(IT4BI-DC).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Ziawasch</given-names>
            <surname>Abedjan</surname>
          </string-name>
          , Lukasz Golab, and
          <string-name>
            <given-names>Felix</given-names>
            <surname>Naumann</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Data profiling: A tutorial</article-title>
          .
          <source>In Proceedings of the 2017 ACM International Conference on Management of Data</source>
          .
          <volume>1747</volume>
          -
          <fpage>1751</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Carlo</given-names>
            <surname>Batini</surname>
          </string-name>
          , Maurizio Lenzerini, and
          <string-name>
            <surname>Shamkant</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Navathe</surname>
          </string-name>
          .
          <year>1986</year>
          .
          <article-title>A comparative analysis of methodologies for database schema integration</article-title>
          .
          <source>ACM computing surveys (CSUR) 18</source>
          ,
          <issue>4</issue>
          (
          <year>1986</year>
          ),
          <fpage>323</fpage>
          -
          <lpage>364</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Rafael</given-names>
            <surname>Berlanga</surname>
          </string-name>
          , Oscar Romero, Alkis Simitsis, Victoria Nebot, Torben Bach Pedersen, Alberto Abelló, and María José Aramburu.
          <year>2012</year>
          .
          <article-title>Semantic web technologies for business intelligence</article-title>
          .
          <source>In Business Intelligence Applications and the Web: Models, Systems and Technologies. IGI global</source>
          ,
          <volume>310</volume>
          -
          <fpage>339</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Sanjay</surname>
            <given-names>P</given-names>
          </string-name>
          <string-name>
            <surname>Bhat and Dennis S Bernstein</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Finite-time stability of continuous autonomous systems</article-title>
          .
          <source>SIAM Journal on Control and Optimization</source>
          <volume>38</volume>
          ,
          <issue>3</issue>
          (
          <year>2000</year>
          ),
          <fpage>751</fpage>
          -
          <lpage>766</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Costin</surname>
            <given-names>Baˇdicaˇ</given-names>
          </string-name>
          , Lars Braubach, and
          <string-name>
            <given-names>Adrian</given-names>
            <surname>Paschke</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Rule-based distributed and agent systems</article-title>
          .
          <source>In International Workshop on Rules and Rule Markup Languages for the Semantic Web</source>
          . Springer,
          <fpage>3</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Serena</surname>
            <given-names>H Chen</given-names>
          </string-name>
          ,
          <article-title>Anthony J Jakeman,</article-title>
          and John P Norton.
          <year>2008</year>
          .
          <article-title>Artificial intelligence techniques: an introduction to their use for modelling environmental systems</article-title>
          .
          <source>Mathematics and computers in simulation 78</source>
          ,
          <fpage>2</fpage>
          -
          <lpage>3</lpage>
          (
          <year>2008</year>
          ),
          <fpage>379</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jens</given-names>
            <surname>Dietrich</surname>
          </string-name>
          , Alexander Kozlenkov,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Schroeder</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Gerd</given-names>
            <surname>Wagner</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Rule-based agents for the semantic web</article-title>
          .
          <source>Electronic Commerce Research and Applications 2</source>
          ,
          <issue>4</issue>
          (
          <year>2003</year>
          ),
          <fpage>323</fpage>
          -
          <lpage>338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Erik</given-names>
            <surname>Duval</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Metadata standards: What, who &amp; why</article-title>
          .
          <source>Journal of universal computer science 7</source>
          ,
          <issue>7</issue>
          (
          <year>2001</year>
          ),
          <fpage>591</fpage>
          -
          <lpage>601</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Neil</given-names>
            <surname>Foshay</surname>
          </string-name>
          , Avinandan Mukherjee, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Taylor</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Does data warehouse end-user metadata add value? Commun</article-title>
          . ACM
          <volume>50</volume>
          ,
          <issue>11</issue>
          (
          <year>2007</year>
          ),
          <fpage>70</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Dov</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Gabbay</surname>
          </string-name>
          .
          <year>1985</year>
          .
          <article-title>Theoretical foundations for non-monotonic reasoning in expert systems</article-title>
          .
          <source>In Logics and models of concurrent systems</source>
          . Springer,
          <fpage>439</fpage>
          -
          <lpage>457</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S</given-names>
            <surname>Christopher</surname>
          </string-name>
          <string-name>
            <surname>Gladwin</surname>
          </string-name>
          , Matthew M England,
          <string-name>
            <surname>Dustin M Hendrickson</surname>
          </string-name>
          , Zachary J Mark, Vance T Thornton,
          <article-title>Jason K Resch,</article-title>
          and
          <string-name>
            <surname>Dhanvi Gopala Krishna Kapila Lakshmana Harsha</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Metadata management system for an information dispersed storage system</article-title>
          .
          <source>US Patent 7</source>
          ,
          <issue>574</issue>
          ,
          <fpage>579</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Jane</given-names>
            <surname>Greenberg</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Metadata and digital information</article-title>
          .
          <source>In Encyclopedia of library and information sciences. CRC Press</source>
          ,
          <fpage>3610</fpage>
          -
          <lpage>3623</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13] IEEE. [n.d.]. IEEE LOM:
          <article-title>IEEE Standard for Learning Object Metadata</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>William</surname>
            <given-names>H Inmon</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnie O'Neil</surname>
            ,
            <given-names>and Lowell</given-names>
          </string-name>
          <string-name>
            <surname>Fryman</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Business metadata: Capturing enterprise knowledge</article-title>
          . Morgan Kaufmann.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S</given-names>
            <surname>Jha</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mahbub</given-names>
            <surname>Hassan</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Building agents for rule-based intrusion detection system</article-title>
          .
          <source>Computer Communications</source>
          <volume>25</volume>
          ,
          <issue>15</issue>
          (
          <year>2002</year>
          ),
          <fpage>1366</fpage>
          -
          <lpage>1373</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Phokion</surname>
            <given-names>G</given-names>
          </string-name>
          <string-name>
            <surname>Kolaitis</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Schema mappings, data exchange, and metadata management</article-title>
          .
          <source>In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems</source>
          .
          <volume>61</volume>
          -
          <fpage>75</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Brian F Lavoie and Richard Gartner</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Preservation metadata</article-title>
          . OCLC.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Zhen</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Manish</given-names>
            <surname>Parashar</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Rudder: A rule-based multi-agent infrastructure for supporting autonomic grid applications</article-title>
          .
          <source>In International Conference on Autonomic Computing</source>
          ,
          <year>2004</year>
          . Proceedings. IEEE,
          <fpage>278</fpage>
          -
          <lpage>279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Marilyn</given-names>
            <surname>McClelland</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Metadata standards for educational resources</article-title>
          .
          <source>Computer</source>
          <volume>36</volume>
          ,
          <issue>11</issue>
          (
          <year>2003</year>
          ),
          <fpage>107</fpage>
          -
          <lpage>109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Renée</surname>
            <given-names>J</given-names>
          </string-name>
          <string-name>
            <surname>Miller</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Open data integration</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          <volume>11</volume>
          ,
          <issue>12</issue>
          (
          <year>2018</year>
          ),
          <fpage>2130</fpage>
          -
          <lpage>2139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Felix</given-names>
            <surname>Naumann</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Data profiling revisited</article-title>
          .
          <source>ACM SIGMOD Record 42</source>
          ,
          <issue>4</issue>
          (
          <year>2014</year>
          ),
          <fpage>40</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Nils J Nilsson and Nils Johan Nilsson</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Artificial intelligence: a new synthesis</article-title>
          . Morgan Kaufmann.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23] NISO. [n.d.].
          <source>Understanding metadata. National Information Standards Organization</source>
          . NISO Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Thorsten</surname>
            <given-names>Papenbrock</given-names>
          </string-name>
          , Tanja Bergmann, Moritz Finke, Jakob Zwiener, and
          <string-name>
            <given-names>Felix</given-names>
            <surname>Naumann</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Data profiling with metanome</article-title>
          .
          <source>Proceedings of the VLDB Endowment 8</source>
          ,
          <issue>12</issue>
          ,
          <fpage>1860</fpage>
          -
          <lpage>1863</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>David</given-names>
            <surname>Peto</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stef</given-names>
            <surname>Lewandowski</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Metadata tagging of moving and still image content</article-title>
          .
          <source>US Patent 8</source>
          ,
          <issue>935</issue>
          ,
          <fpage>204</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>IBM</given-names>
            <surname>Redbooks</surname>
          </string-name>
          . [n.d.]. http://www.redbooks.ibm.com/.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Arun</given-names>
            <surname>Sen</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Metadata management: past, present and future</article-title>
          .
          <source>Decision Support Systems</source>
          <volume>37</volume>
          ,
          <issue>1</issue>
          (
          <year>2004</year>
          ),
          <fpage>151</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>John R Smith and Peter Schirling</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Metadata standards roundup</article-title>
          .
          <source>IEEE MultiMedia 13</source>
          ,
          <issue>2</issue>
          (
          <year>2006</year>
          ),
          <fpage>84</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Alessandra</surname>
            <given-names>Toninelli</given-names>
          </string-name>
          , Jefrey Bradshaw, Lalana Kagal,
          <string-name>
            <given-names>Rebecca</given-names>
            <surname>Montanari</surname>
          </string-name>
          , et al.
          <year>2005</year>
          .
          <article-title>Rule-based and ontology-based policies: Toward a hybrid approach to control agents in pervasive environments</article-title>
          .
          <source>In proceedings of the Semantic Web and Policy Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Jason</surname>
            <given-names>Tsay</given-names>
          </string-name>
          , Alan Braz, Martin Hirzel, Avraham Shinnar, and
          <string-name>
            <given-names>Todd</given-names>
            <surname>Mummert</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>AIMMX: Artificial Intelligence Model Metadata Extractor</article-title>
          .
          <source>In Proceedings of the 17th International Conference on Mining Software Repositories</source>
          .
          <fpage>81</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>John</surname>
            <given-names>W Tukey</given-names>
          </string-name>
          et al.
          <year>1977</year>
          .
          <article-title>Exploratory data analysis</article-title>
          .
          <source>Vol. 2</source>
          . Reading, Mass.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Jovan</surname>
            <given-names>Varga</given-names>
          </string-name>
          , Oscar Romero, Torben Bach Pedersen, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Thomsen</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Towards next generation BI systems: the analytical metadata challenge</article-title>
          .
          <source>In International conference on data warehousing and knowledge discovery</source>
          . Springer,
          <fpage>89</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Marcia</surname>
            <given-names>Lei</given-names>
          </string-name>
          <string-name>
            <surname>Zeng</surname>
          </string-name>
          .
          <year>2008</year>
          . Metadata.
          <string-name>
            <surname>Neal-Schuman</surname>
            <given-names>Publishers</given-names>
          </string-name>
          , Inc.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>