<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Concept Analysis-Based Association Mining From Linked Data: A Case In Industrial Decision Making</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mickael WAJNBERG</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petko VALTCHEV</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario LEZOCHE</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandre BLONDIN MASSE´</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Herve´ PANETTO</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universite ́ de Lorraine</institution>
          ,
          <addr-line>CNRS, CRAN, Nancy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universite ́ du Que ́bec A</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Linked data (LD) is a rich format increasingly exploited in knowledge discovery from data (KDD). To that end, LD is typically structured as graph, but can also fit the multi-relational data mining (MRDM) paradigm, e.g. as multiple types and object properties may be used in the dataset. Formal concept analysis (FCA) has been successfully used as theoretical framework for KDD in a variety of applications, primely in clustering and association rule mining (ARM) tasks. As FCA applicability to LD is limited by its single data table input format, relational concept analysis (RCA) was introduced as a MRDM extension that successfully deals with links in the data, including cyclic ones. While RCA has been mainly adapted for conceptual clustering in the past, we present here an RCA-based ARM method. It exploits the iterative nature of pattern generation to cut cyclic references with a minimal loss of information. The utility of the rules discovered by our method has been validated by an application as a decision support in the aluminum die casting industry.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Linked Data</kwd>
        <kwd>Association Rule Mining</kwd>
        <kwd>Relational Concept Analysis</kwd>
        <kwd>Knowledge Management</kwd>
        <kwd>Industrial Process</kwd>
        <kwd>Decision Making</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Relational data, i.e. data that comprise both proper features of the represented objects
and links between them, is ever more widely used in knowledge discovery from data
(KDD). As a special case, linked data (LD) offers arguably the richest semantics, with
the linked open data (LOD) cloud and other knowledge graphs among the largest such
datasets. As with other massive datasets, decision makers and experts often turn to these
data in a search for insights about the underlying phenomena, processes or concepts of
the corresponding domain. By covering multiple entity types, LD allows the automated
discovery of complex patterns and trends across related types and their links making
them particularly valuable, as challenging. While such data naturally form graphs and
thus can be targeted with graph mining methods [
        <xref ref-type="bibr" rid="ref14 ref24">24,14</xref>
        ], a more structured view thereof
exists, i.e. as a set of inter-related data tables. The need to properly analyze such data
motivated the emergence of multi-relational data mining (MRDM) [
        <xref ref-type="bibr" rid="ref12 ref18">18,12</xref>
        ].
      </p>
      <p>
        We are interested in contexts where MRDM supports experts in gaining better
understanding of the domain behind the data, e.g. in the design of high-level domain
models such as ontologies or causal models, and study a case pertaining to industrial
process optimization. A key requirement is intelligibility of mining output to human experts.
Moreover, our target data are typically unlabeled, thus they warrant descriptive mining
mode [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] as in association discovery or clustering.
      </p>
      <p>
        Formal concept analysis (FCA) [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] is a mathematically-founded analysis approach
that has been successfully applied as a framework for KDD [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. It reveals the
hidden conceptual structure behind an (object x attribute) dataset in the form of a lattice
of conceptual abstractions (a conceptual clustering). Alternatively, FCA extracts
various families of non redundant association rules [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], based on notions of closure and
generator [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. This makes FCA a particularly versatile KDD framework. Another
domain where FCA has been proved to be useful is fault-tolerance [
        <xref ref-type="bibr" rid="ref15 ref2">2,15</xref>
        ]. Recent work in
fault localization crosschecks traces of correct and failing execution traces, it implicitly
searches for association rules which indicate that most probably exists a cause for the
whole execution to fail [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Relational concept analysis (RCA) is a recent extension of FCA that fits the MRDM
paradigm, in general, and Linked data (LD), in particular [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. It organizes multi-type
datasets into a family of inter-related lattices, one per object type. To that end, object
links are factored into concept definitions in the form of reference-like attributes,
dynamically created by propositionalization. So far, RCA is missing effective tools for
association discovery. Indeed, while in FCA associations are composed from parts of concept
intents, relational intents might hide reference cycles, thus making the relationship
between the premise and the conclusion part of a rule hard to interpret. However, if
properly formalized, such rules could help uncover non trivial regularities spanning across
multiple object types.
      </p>
      <p>
        As a possible remedy, we designed a mechanism to untangle references in RCA,
even in circular concept intents. It resolves concept references by a substitution with a
appropriately truncated version of the underlying intent. The result enables the
straightforward definition of an association as well as more advanced closure and generator-based
ones [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The latter offer a valuable trade-off of generalization capacity vs result size
that is beyond the reach of existing LD-compatible mining methods such as pure graph
pattern miners [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] or association miners for LD relying on logical pattern languages [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
or on flattening techniques [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. We applied our method to industrial production data, i.e.
a dataset describing handles and frames for doors and windows and possible issues with
their usage. The overall goal of our study is to assist the expert design of a high-level
casual model to support decisions about optimizing the production process. At a first
step, our methods was to help the search for causality links between production factors
and product anomalies. Thus, it was set up to abstract associations between discovered
machined part concepts, on one hand, and observed problem ones, on the other hand,
which were then to be submitted to the expert. First experimental results show that our
method succeeds in detecting non trivial patterns previously unknown to the expert.
      </p>
      <p>The remainder of the paper is as follows: Section 2 motivates our study while
section 3 provides background on FCA, RCA and association rules. Sections 4 and 5
describe our association rule approach and the experiment with production data,
respectively. Section 6 summarizes then related work and section 7 concludes.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Motivation</title>
      <p>A new industrial paradigm is emerging, the first of which is the gradual disappearance
of boundaries between industry (in the traditional sense) and services, which are brought
to cohabit. The data becomes raw materials of strategic importance. The industry of the
future places great importance on non-price competitiveness as a success factor for
companies on the markets, which requires a change in the offer, its design. The new industrial
world is made up of intertwining knowledge, skills and competences. Today, thanks to
the progress made in the areas of digitalization, information and communication
technologies (ICT), the industrial sector is developing quickly towards ”smarter
manufacturing” and ”connected factories”. This fourth industrial revolution also ensures
complementarity and collaboration between several scientific disciplines. They need decision
support for maximum reactivity. The great challenge is to have good quality products
while minimizing working time and reducing manufacturing costs. To meet these
constraints, it is important to detect risk phenomena through continuous monitoring of the
machining process. This requires an instrumentation of the machines to ensure the
acquisition and analysis of the large amount of heterogeneous data and knowledge,
representative of these processes. To ensure its management, it is essential to structure the data
collected through models to classify the different elements of a machining process, coupled
with the global context of production. These models must also include a classification of
major failure factors and their causality links to process elements.</p>
      <p>We study here a case of metal processing: handles and frames for doors and
windows are manufactured by aluminum die casting. A four steps process transforms the
aluminum, placed in the machine starting compartment, into machined parts: The
machine heats to melt aluminum (1), then the liquid aluminum is sent to a pump (2), where
a piston injects the aluminum into a mould (3) and finally the piston compresses the
aluminum until solidification (4). Right now, given its size, the manufacturer considers the
production machine control a too tedious and costly task to be done directly and
preemptively. Instead, the control is achieved monitoring the products themselves : occasionally,
non compliant parts will provoke failures, halting the production process to check and fix
the machine. When such a problem occurs, the machined part that triggered the stop is
highly likely to be recast (melted again and completely reprocessed). When the products
are discarded for not compliance to the quality standards, the operators fixes the problem
and restarts the production.</p>
      <p>Understanding causality between machined parts variations and observed anomaly
features can help anticipate potential failures, hence schedule a targeted machine control
and avoid recasting a piece, which in the overall, reduce the costs. Therefore, applying
forensic analysis to the collected data to discover associations between those is a
particularly promising approach. Since available data is unlabeled and comprise many-to-many
relations between two types of objects, a MRDM approach for association mining seems
a natural choice.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Background on FCA and RCA</title>
      <p>
        We recall results from FCA [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and association rules [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and then present RCA [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>In FCA, data is introduced as a (formal) context, a triple K = (O; A; I), where O is
a set of objects, A a set of attributes and I O A is a binary relation ((o; a) 2 I means
O0
O1
O2</p>
      <p>O3
object o has attribute a). A context is drawn as (object x attribute) data table as in Fig. 1.
Two derivation operators, both denoted 0, connect O and A both ways: O0 A is the set
of all attributes shared by every o 2 O (ex : fO1; O2g0 = fA0; A1g) and A0 O works
dually (e.g. fA2; A3g0 = fO0g). A (formal) concept is a pair (O; A) O A s.t. O = A0
and A = O0. Thus, a concept is a maximal pair as both its components are maximally
extended w.r.t. each-other. In a concept (O; A), O is called its extent and A its intent.
Moreover, both compound derivation operators 0 0, denoted 00, are closure operators on
Ã(O) and Ã(A), respectively. For instance, fA0; A2g00 = fA0; A1; A2g.</p>
      <p>
        FCA extracts all the concepts of a context and organizes them into a lattice w.r.t.
extent inclusion (see Fig. 2). The lattice can be used to generate the association rules of
the context. These have the form M ! N where M; N A. Typical quality measures for
association rule include support and confidence, i.e. the percentage of objects incident
to attributes in M [ N in O, and the proportion of the previous objects in the larger set
of objects incident to attributes in M, respectively [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For instance, we have from figure
1 the following association rule A1; A2 ! A0. The support of the rule is 25%: only one
object out of the set of all objects in the base (four) hold these three attributes; while
the confidence is 50%: the number of objects holding the three attributes (O2) out of the
numbers of objects holding the A1; A2 of the rule (O2; O3).
      </p>
      <p>We are interested in exact associations, a.k.a. implication rules, which have
confidence of 100%. These rules admit no exceptions as opposed to general associations
whose confidence may be lower. While exact rules are only a small part of all possible
rules (of certain support and confidence levels), their number is still prohibitive. A much
smaller implication subfamily is defined using the notion of equivalence class on Ã(A)
and generators, which still withholds the information encoded in the entire family.</p>
      <p>
        Equivalence classes are induced on both powersets of a context by 00: two sets A and
B are equivalent if they share the same closure, i.e. A00 = B00 . Each equivalence classes on
has a unique maximum, e.g. in Ã(A), it is a concept intent. Conversely, there are one or
more minimal elements, called generators. Now, the most informative implication rules
have the form G ! M n G where M is an intent and G one of its generators [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. FCA
methods exist [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] allowing these rules to be efficiently generated.
      </p>
      <p>
        FCA reveals conceptual abstractions on objects by factoring out shared attributes.
RCA [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] extends it by enabling the factoring in of relational information. Its input
format is thus a relational context family which is made of a set of contexts Ki = (Oi; Ai; Ii)
and a set of binary relations ri; j Oi O j linking a pair of context Ki (domain) and K j
(range).
      </p>
      <p>K1 sko cst smL tcL
g P1 P2 P3
st p qlt mld cost
12
13
14
15
12
13
14
15</p>
      <p>P1
P2</p>
      <p>P3</p>
      <p>Consider the context family in Fig. 3: Context K1, on the left-hand side, represents
data on machined parts, i.e. objects 12, 13, 14 and 15. The respective attribute set
comprises inadequate thickness (sko), part to be melted again and recast (cst), thickness
under lower threshold (smL) and pressure under the lower threshold (tcL). K2, on the
righthand side, gathers data on problems (objects P1, P2, and P3). Its attributes areas follows:
time to solve less than 5 min (t5), machine stopped (st p), quality problem (qlt), mould
defect (mld), and medium financial impact (cost).</p>
      <p>A relation g, for ’generates’, links machined part to generated problems (given in
the center of Fig. 3). While a relation is much alike a formal context, there is a key
difference: While in a context, rows and columns correspond to objects and attributes,
respectively, in the cross-table of a relation they both represent objects, the ones from the
domain context (rows) and from the range context (columns).</p>
      <p>
        To become shareable, inter-object links are transformed into new relational
attributes of the contexts in the family (actually of extended version thereof). The
corresponding processing, called relational scaling, uses abstractions from the range context
K j as targets for relational attributes in Ki (one per abstraction). Absent a prior
hierarchy of such abstractions on Ki, the RCA method builds and then exploits the concept
lattice of that context. A typical attribute over a relation r is structured much alike a role
restriction in description logics (DL), i.e. as qr : c where q is a quantifier (f9; 8; 89; : : :g)
and c is a concept over K j. The way Ii is extended for each new qr : c depends on q and
c (see [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] for details).
      </p>
      <p>
        The overall RCA method is iterative: Each iteration alternates lattice construction
with scaling steps. After each construction, the versions of the contexts are enhanced
with additional attributes created by scaling over the differential set of concepts w.r.t. to
the previous iteration. The global set of concepts is monotonously non decreasing since
adding new attributes to a context preserves all its concepts (as far as their extents are
concerned) while possibly creating new ones. Thus, the global iterative process
necessarily ends at a fix-point [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] which corresponds to a family of relational lattices with a
number of inter-concept links via relational attributes. Eventually, these links appear in
the Hasse diagrams of the fix-point lattices, as seen in Fig. 4 and Fig. 5.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. RCA-Based Knowledge Discovery</title>
      <p>
        RCA discovers conceptual abstractions inherent to the data not originally present in the
data schema[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Moreover, its iterative computing method allows further abstractions
to be discovered on top of previously generated ones. It is also worth mentioning that
RCA can produce rich relations thanks to the diverse quantifiers that can be used and
combined altogether in a given association rule. The down-side of such an expressive
format is some RCA intents, and thus the corresponding rules, might present
terminological cycles, which makes them hard to interpret. For instance, in Fig. 4 and Fig. 5 the
concept ob j 2 will yield the rule rcast ! 9g : pb(2) which could be interpreted by “if a
part presents the attribute rcast, then there exists a related problem described by concept
pb 2”. If the description of pb 2 is inserted to resolve the reference, that would yield ”if
a part present attribute rcast then there exists a co-occuring problem that has mstp mld
and 8g 1 : ob j(0)”. However, 8g 1 : ob j(0) refers to a concept whose attribute refers
back to pb2. Therefore the interpretation would be entrapped in a cycle.
      </p>
      <p>We designed a technique to disentangle the description of a relational concept intent
(subject of a separate publication). It exploits the iterative nature of the generations
process which is inherently cycle-free : at the creation of a node, this one can only have two
types of attributes : non relational ones and relational that refer to a node that was created
in a previous iteration. Hence, in a concept any relational attribute r r : C targeting a
concept C by the relation r and the scaling operator r can be interpreted in r r : (A1; :::; An)
where (A1; :::; An) is the intent of C at its creation. Recursively, if any attribute of Ai is
relational it is replaced by its interpretation.</p>
      <p>Using this principle, we can generate lattices without references as in figure 6 from
which we can extract immediately interpretable rules such as :</p>
      <p>8g 1ob j : (sko; cst; smL) ! 8g 1ob j : (9g pb : (t5; cost); 9g pb : (st p; mld))
This states that any problem generated by all machine parts that are sko, cst and smL are
also generated by machine parts that generates a t5 and cost problem or an stp and mld
problem.</p>
      <p>In the next section, we discuss the results obtained by applying RCA on an industrial
relational dataset linking produces machined parts to generated problems.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Study</title>
      <p>We present below our study, dataset and experimental settings, and discuss its outcome.</p>
      <sec id="sec-5-1">
        <title>5.1. Dataset</title>
        <p>We started our experimental study with a dataset that covers a single month of
production. It comprises three sorts of triples: Some triples represent machined parts and their
properties, while others describe observed problems. The remaining triples in the dataset
describe the relationships between parts and the generated problems, if any.</p>
        <p>Overall, the dataset covers 58000 parts described by 25 features. These include date
and shift work of the manufacturing, machined part type, its thickness, volume and
conformity, also position, speed and pressure of the piston in the pump during the three
phases of the process (respectively when aluminum is not in the mould, when it is
entering the mould and when it is compressed after filling the mould). Other features are test
results on the part w.r.t low and high thresholds on the mentioned features.</p>
        <p>Next, 16 problems are tracked. They are described by 23 features divided into three
categories : financial losses incurred, duration until resolution, and problem’s nature
(quality of the piece, mechanical wear, machine calibration...).</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Experimental Settings</title>
        <p>In order to find correlations between machined parts characteristics, i.e. categories of
faulty parts, generic types of problems (rather than individual problems), analyzing the
co-occurrences of those in the dataset seems a natural starting point. Thus, RCA
association rules constitute a reasonable first target.</p>
        <p>We set up a relational context family made of two contexts, one for machined parts
and one for problems, and a single ’generates’ relation (with its reverse relation being ’is
generated by’) that links both contexts. A preprocessing step encodes input triples, as
described above, into cross tables: It splits each categorical feature over multiple attributes,
one per possible mode, while numerical ones are split over a set of intervals. In our case,
the machined parts context contains 160 attributes and the problem one 28.</p>
        <p>In the first stage of our study, industrial partners were looking for insights about how
product thickness relates to quality problems, as well as the link between these problems
and piston course. Thus, we focused on rules associating machined part features to the
thickness or, alternatively, piston course to generic problem categories.</p>
        <p>Based on the need to extract the occurrence of any problem for a specific machined
parts characteristic combination, the existential scaling operator was chosen. Also, we
only kept a one-way interpretation of the above relation, namely generates since the need
was to link part characteristics to problem, and not to link part characteristics to others
parts that would have generated an equivalent problem.</p>
        <p>Finally, in a post-processing step, we selected the implication rules which contained
only thickness or piston characteristics for machined parts.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Experimental Outcome</title>
        <p>Below, we illustrate the result extracted by RCA: We provide a small number of rules
and explain their significance.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.3.1. Invalid Thickness</title>
        <p>Among the output rules we found the following set related to a single concept :
8(1) min-sm ( &lt; LimLow) ) sm-1-ko
&gt;
&gt;
&gt;&lt;(2) sm-1-ko ) min-sm ( &lt; LimLow)
&gt;(3) sm-1-ko ) 9generates problems : (micro-stop, sm-quality, time( &lt; 5min), cost-medium)
&gt;
&gt;:(4) sm-1-ko ) recasting</p>
        <p>The above set can be interpreted as follows: The definition for a product to have
defective thickness (sm-1-ko) is that the thickness of the product is below the lower
threshold limit (1 and 2) and so, will be (4) recast (melt again and reprocessed completely).
When that happens, (3) a micro-stop problem about thickness quality (sm-quality), that
took less than &lt; 5 min to solve also occurs. This problem created a medium financial
impact (cost-medium).</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.3.2. Invalid Piston Course</title>
        <p>Piston is monitored through three variables: C1, its position when aluminum gets from
the injector to the mould entrance, C2, its position when aluminum just filled the mould
and CC, its position after compression. Among rules in the RCA output, we detected
several independent sets of rules following the template below:
8C1=X, CC=Y ) C2=Z
&gt;
&gt;
&gt;&lt;C2=Z, CC=Y ) C1=X
&gt;C1=X, C2=Z ) CC=Y
&gt;
&gt;:C1=X, C2=Z ) 9 generates problems : (sm=14, predictive 1, quality)
Each occurrence of the above template carries its own combination of X , Y and Z
values. However, experts noticed that a linear combination of the values is quasi-invariant.
5.3.3. Impact
The domain expert concluded that the above set of rules on thickness reflected the
presence of hardened aluminum residue in the machine. Such residue grows with time while
reducing the space available for the part in production and altering the piston course. It
typically leads to costly machine stoppages as the produced parts end up violating the
admissible variations and have to be recast. As a result of this observation, a reduced
cost and less time consuming solution could be found : calibrating sensors to respond
automatically to the piston course and send alerts to operators to clean the machine in
time before producing a faulty part.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Related Work</title>
      <p>
        Traditional data mining methods have been designed to work on a single table of a
database. All of them can be applied to a LD dataset provided a prior flattening, or feature
extraction, is performed to bring it down to that format [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        Logic-based LD miners such as DL-Learner [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] avoid flattening. DL-Learner is a
supervised concept learner for DL classes, hence it only discovers plausible descriptions
for existing abstractions already in the schema. RCA also borrows attribute constructors
from DL, yet, unlike DL-Learner, its discovery process is extension-bound: It stops when
no new extent are discovered (at a fix-point). DL-based association miners for LD such
as [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] are also intentionally guided. Moreover, in the wider field of MRDM [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a
logicbased association rule miners is described in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. An ad-hoc approach, called contextual
association rule mining, is proposed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] that amounts to dealing with multi-relational
datasets by focusing exclusively on pairs of entity types and a single relation type. As
the notion of closure and generator are missed in all these approaches, their output is
inherently redundant. Pure pattern miners in the field are presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (as valid/relevant
queries in DATALOG) and in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        Alternatively, associations can be mined from LD seen as graphs using classical
graph pattern mining algorithms. Among those admitting labels on both vertices and
edges, a necessary condition to fit the LD format, arguably the most widely used are
gSpan [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and Gaston [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Yet graph pattern offer a single view of the shared
substructure between data objects: They lack the expressive power of logic-based
associations, e.g. miss the equivalent of a universal quantifier (as seen in Fig. 6).
      </p>
      <p>
        Beside RCA, alternative attempts at extending FCA to graph data are proposed in [
        <xref ref-type="bibr" rid="ref23 ref8">8,
23</xref>
        ]. However, they do not seem to easily adapt to association rule mining.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>RCA combines the mathematical rigor of FCA with expressive power of linked data to
support KDD in MRDM mode. Scaling creates relational attributes in RCA that abstract
from inter-object links and then factors in those attributes into concept intents. In this
paper, we show how associations can be defined in RCA. To bring those to a
easilyreadable –and explainable– form, we resolve concept references in attributes using intent
substitution. Moreover, to avoid the pit of circular references, only minimal generating
parts of the intents are used in substitution.</p>
      <p>To demonstrate the utility of the discovered relational associations, we applied our
method to a dataset representing machined parts together with observed problems.
Preliminary results are encouraging: Some rules revealing non trivial relationships between
respective features of parts and problems were rated as highly valuable by the expert. A
larger evaluation effort involving more experts and a variety of datasets is ongoing.</p>
      <p>At a next stage, we plan to test various scenarios of encoding relations in RCA:
bidirectional parts-to-problems, mixing several quantifiers, etc. In the longer run, we plan
to compare RCA association rules to those discovered by competing LD miners.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          et al.
          <article-title>Fast discovery of association rules</article-title>
          .
          <source>Advances in knowledge discovery and data mining</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ):
          <fpage>307</fpage>
          -
          <lpage>328</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Besson</surname>
          </string-name>
          et al.
          <article-title>Mining a new fault-tolerant pattern type as an alternative to formal concept discovery</article-title>
          .
          <source>In International Conference on Conceptual Structures</source>
          , pages
          <fpage>144</fpage>
          -
          <lpage>157</lpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cellier</surname>
          </string-name>
          et al.
          <article-title>Formal concept analysis enhances fault localization in software</article-title>
          .
          <source>In International Conference on Formal Concept Analysis</source>
          , pages
          <fpage>273</fpage>
          -
          <lpage>288</lpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Dehaspe</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Toivonen</surname>
          </string-name>
          .
          <article-title>Discovery of frequent datalog patterns</article-title>
          .
          <source>Data Mining and knowledge discovery</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <fpage>7</fpage>
          -
          <lpage>36</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Dehaspe</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Toivonen</surname>
          </string-name>
          .
          <article-title>Discovery of relational association rules</article-title>
          .
          <source>In Relational data mining</source>
          , pages
          <fpage>189</fpage>
          -
          <lpage>212</lpage>
          . Springer,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          et al.
          <article-title>A general framework to encode heterogeneous information sources for contextual pattern mining</article-title>
          .
          <source>In 21st ACM Intl. Conf. on Information and Knowledge Management</source>
          , pages
          <fpage>65</fpage>
          -
          <lpage>74</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dzˇeroski</surname>
          </string-name>
          <article-title>. Multi-relational data mining: an introduction</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferre</surname>
          </string-name>
          <article-title>´. A proposal for extending formal concept analysis to knowledge graphs</article-title>
          .
          <source>In International Conference on Formal Concept Analysis</source>
          , pages
          <fpage>271</fpage>
          -
          <lpage>286</lpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jo</surname>
          </string-name>
          <article-title>´zefowska et al</article-title>
          .
          <article-title>On reducing redundancy in mining relational association rules from the semantic web</article-title>
          .
          <source>In International Conference on Web Reasoning and Rule Systems</source>
          , pages
          <fpage>205</fpage>
          -
          <lpage>213</lpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kryszkiewicz</surname>
          </string-name>
          .
          <article-title>Concise Representations of Association Rules</article-title>
          .
          <source>In Pattern Detection and Discovery</source>
          , volume
          <volume>2447</volume>
          , pages
          <fpage>92</fpage>
          -
          <lpage>109</lpage>
          . Springer Berlin Heidelberg,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          .
          <article-title>Dl-learner: learning concepts in description logics</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>10</volume>
          (Nov):
          <fpage>2639</fpage>
          -
          <lpage>2642</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lijffijt</surname>
          </string-name>
          et al.
          <article-title>Pn-rminer: A generic framework for mining interesting structured relational patterns</article-title>
          .
          <source>International Journal of Data Science and Analytics</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>61</fpage>
          -
          <lpage>76</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Narasimha</surname>
          </string-name>
          et al.
          <article-title>Liddm: A data mining system for linked data</article-title>
          .
          <source>In Workshop on Linked Data on the Web. CEUR Workshop Proceedings</source>
          , volume
          <volume>813</volume>
          , page 108,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nijssen</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Kok</surname>
          </string-name>
          .
          <article-title>Frequent graph mining and its application to molecular databases</article-title>
          .
          <source>In 2004 IEEE International Conference on Systems, Man and Cybernetics</source>
          , volume
          <volume>5</volume>
          , pages
          <fpage>4571</fpage>
          -
          <lpage>4577</lpage>
          . IEEE,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pensa</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Boulicaut</surname>
          </string-name>
          .
          <article-title>Towards fault-tolerant formal concept analysis</article-title>
          .
          <source>In Congress of the Italian Association for Artificial Intelligence</source>
          , pages
          <fpage>212</fpage>
          -
          <lpage>223</lpage>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          et al.
          <article-title>Mining the Web of Linked Data with RapidMiner</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>35</volume>
          :
          <fpage>142</fpage>
          -
          <lpage>151</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rouane-Hacene</surname>
          </string-name>
          et al.
          <article-title>Relational concept analysis: mining concept lattices from multi-relational data</article-title>
          .
          <source>Annals of Mathematics and Artificial Intelligence</source>
          ,
          <volume>67</volume>
          (
          <issue>1</issue>
          ):
          <fpage>81</fpage>
          -
          <lpage>108</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>E.</given-names>
            <surname>Spyropoulou</surname>
          </string-name>
          and T. De Bie.
          <article-title>Interesting multi-relational patterns</article-title>
          .
          <source>In Data Mining (ICDM)</source>
          ,
          <year>2011</year>
          IEEE 11th International Conference on, pages
          <fpage>675</fpage>
          -
          <lpage>684</lpage>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Szathmary</surname>
          </string-name>
          et al.
          <article-title>A fast compound algorithm for mining generators, closed itemsets, and computing links between equivalence classes</article-title>
          .
          <source>Annals of Maths and Artificial Intelligence</source>
          ,
          <volume>70</volume>
          (
          <issue>1</issue>
          ):
          <fpage>81</fpage>
          -
          <lpage>105</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.-N.</given-names>
            <surname>Tan</surname>
          </string-name>
          et al.
          <article-title>Introduction to Data Mining</article-title>
          .
          <source>Pearson Education Limited</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>P.</given-names>
            <surname>Valtchev</surname>
          </string-name>
          et al.
          <article-title>Formal Concept Analysis for Knowledge Discovery and Data Mining: The New Challenges</article-title>
          .
          <source>In Proc. of ICFCA</source>
          <year>2004</year>
          , volume
          <volume>2961</volume>
          <source>of LNCS</source>
          , pages
          <fpage>352</fpage>
          -
          <lpage>371</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wille</surname>
          </string-name>
          .
          <source>Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts</source>
          , pages
          <fpage>445</fpage>
          -
          <lpage>470</lpage>
          . Springer Netherlands, Dordrecht,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wille</surname>
          </string-name>
          .
          <article-title>Existential concept graphs of power context families</article-title>
          .
          <source>In International Conference on Conceptual Structures</source>
          , pages
          <fpage>382</fpage>
          -
          <lpage>395</lpage>
          . Springer,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          and J. Han. gSpan:
          <article-title>Graph-based substructure pattern mining</article-title>
          .
          <source>In 2002 IEEE International Conference on Data Mining</source>
          ,
          <year>2002</year>
          . Proceedings., pages
          <fpage>721</fpage>
          -
          <lpage>724</lpage>
          . IEEE,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>