<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Interactive Query Refinement using Formal Concept Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>tim.pattison@dst.defence.gov.au Defence Science</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Technology Group West Ave</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edinburgh South Australia</string-name>
        </contrib>
      </contrib-group>
      <fpage>207</fpage>
      <lpage>218</lpage>
      <abstract>
        <p>Formal Concept Analysis (FCA) of a document corpus yields a concept lattice uniting the powersets of corpus terms and documents. Structural navigation of the directed graph connecting neighbours in this lattice affords interactive query refinement. The starting point for navigation is typically the closure of the conjunctive Boolean query with respect to the corpus. This paper describes the special treatment of “closure” terms - those in this closure but which are not specified by the user - and its implications for implicit structural navigation of the digraph through query editing. This approach, in which the user's choice of each additional query term is constrained to avoid the null result set, is contrasted with explicit structural navigation. If a term is added which does not co-occur with the terms already specified, the user must either remove the new term or choose which other term(s) to remove. A novel technique is used to present the user with options for those other terms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <sec id="sec-2-1">
        <title>Formal Concept Analysis for Interactive Query Refinement</title>
        <p>
          Formal Concept Analysis (FCA) has been used in corpus-based information
retrieval for the construction and interactive refinement of conjunctive Boolean
queries (see e.g. [
          <xref ref-type="bibr" rid="ref14 ref15 ref16 ref6">15,14,16,6</xref>
          ]). Here, a query is a set of terms, all of which must
be present in each document retrieved in response to the query. The derivation
operator, which maps the query onto the set of matching documents, induces a
Galois connection between the set of potential queries – the powerset of terms
appearing in the corpus – and the set of potential results – the powerset of
documents in the corpus. This choice elegantly combines the query and result
spaces into a unified structure – the concept lattice – which can be navigated
either explicitly or implicitly by the user for interactive refinement of the query.
        </p>
        <p>The set of documents returned by a query, and the set of query terms,
augmented by any other terms – known as closure terms – which are common to
all documents in the result set, constitute a formal concept. The association
between terms present in the corpus and the documents in which they occur is
known as a formal context, and is often specified in the form of a binary matrix.</p>
        <p>A simple example formal context is shown in Figure 1a.
c paper author(s), 2018. Proceedings volume published and copyrighted by its editors.</p>
        <p>Paper published in Dmitry I. Ignatov, Lhouari Nourine (Eds.): CLA 2018, pp.
207{218, Department of Computer Science, Palacky University Olomouc, 2018.
Copying permitted only for private and academic purposes.</p>
        <p>Given a formal context, FCA enumerates the set of formal concepts, partially
orders them by document set subsumption, and computes the neighbour relation
as the transitive reduction of this ordering relation. The result is a single-source,
single sink, directed acyclic graph (DAG), whose vertices represent formal
concepts and whose arcs connect neighbours.</p>
        <p>
          A lower [upper] neighbour of a concept represents a minimal conjunctive
expansion [contraction] of the corresponding set of query terms with respect
to the corpus, and the consequently contracted [expanded] result set1. These
neighbours constitute the available options for incremental modification of the
current query. The dominant paradigm for FCA-based query refinement involves
explicit navigation via (undirected) edges in this graph – a process which we refer
to as structural navigation. Navigation is typically constrained to neighbours
of the query concept, although concepts beyond neighbours are sometimes also
offered as candidates [
          <xref ref-type="bibr" rid="ref12 ref5">12,5</xref>
          ]. The graph can be presented to the user in its entirety
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], or, more usually, as a keyhole view of the neighbourhood of the current
concept [
          <xref ref-type="bibr" rid="ref2 ref3 ref4 ref7">2,4,3,7</xref>
          ].
1.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Query Editing with Ranked Refinement Options</title>
        <p>
          Lindig [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] instead presented users with a list of terms in the current query, any
one of which could be selected for removal, and a second list of terms which,
when selected and thereby added to the query, would return a non-empty result
set. The user is also shown the result set as a list of identifiers, and this list is
updated with each term selection. This approach supports implicit navigation
of the concept lattice to super- and sub-concepts – not necessarily neighbours –
without explicitly exposing the user to the concept lattice or requiring them to
comprehend it. To refine their query, the user is thereby focused on editing the
query itself rather than navigating the concept lattice.
        </p>
        <p>
          In contrast with structural navigation, the use of term lists as the primary
interface for interactive query refinement naturally supports any scheme which
ranks terms to assist the user’s choice. Using Latent Semantic Analysis, the
SORTeD prototype [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] ranks the lists of query and unused terms according
to their “semantic” relevance to the result set of a conjunctive Boolean query.
With the exception of term substitution, as described in Section 4, SORTeD
implements the query editing techniques described in this paper.
1.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Contribution</title>
        <p>This paper describes a method by which the user explicitly edits a conjunctive
Boolean query, subject to the constraint that the edited query must have a
non-empty result set. To respect this constraint, the addition of a disjunctive
term entails the removal of one or more terms previously entered by the user.
1 Square brackets are used throughout this paper to indicate that a sentence is true
both when read without the bracketed terms and when read with each bracketed
term substituted for the term which precedes it.
Here we describe the term most recently entered by the user as disjunctive if its
addition to the existing conjunctive Boolean query would return an empty result
set and conjunctive otherwise. The user is alerted upon entry of a disjunctive
term, and prompted to choose, from amongst automatically generated options,
which term(s) should be removed. Removal of the disjunctive term is one of those
options. An FCA-based technique is described for identifying those options which
minimise the number of user-specified terms removed.</p>
        <p>This approach to refining conjunctive Boolean queries is compared and
contrasted with the explicit structural navigation of the concept lattice digraph. In
particular, it demonstrates that: special treatment of closure terms is
appropriate; not all neighbours of the query concept are reachable through the addition or
removal of a single user-specified query term; and navigation is not constrained
to neighbours.
1.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Organisation</title>
        <p>This paper is organised as follows. Section 2 commences with a brief introduction
to the application of Formal Concept Analysis for interactive query refinement.
It describes a query editing approach to interactive query refinement, comparing
and contrasting it with structural navigation of the lattice digraph. Section 3
canvasses the computational implications of the query editing approach. A novel
scheme is then described in Section 4 for computing term substitution options
upon user entry of a disjunctive query term. Following a discussion in Section 5
of related work, the contributions of this paper are summarised in Section 6.
2
2.1</p>
      </sec>
      <sec id="sec-2-5">
        <title>Introduction</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>FCA for Interactive Query Refinement</title>
      <p>
        Formal Concept Analysis (FCA) derives multiple-inheritance class hierarchies
from empirical data, and is commonly applied to information retrieval (see e.g.
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for a recent survey). For classical information retrieval, each class or concept
is a set of documents and a set of terms such that each document contains each
of the terms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A concept is maximal in the sense that there are no other
documents containing all of the specified terms, and no other terms which are
common to all of those documents. A concept is a sub-concept [super-concept ]
of another if it corresponds to a proper subset [superset] of its documents –
or equivalently to a proper superset [subset] of its terms. The term
“multipleinheritance” refers to the fact that a concept can be a sub-concept of two or
more others, between which a sub-concept relationship does not exist.
      </p>
      <p>
        For a given document corpus, the corresponding set of concepts, partially
ordered by the sub-concept relation between them, forms a complete lattice. This
lattice can be represented as a directed graph – or digraph – whose vertices are
the concepts and whose directed arcs are from concepts to their upper
neighbours. A super-concept of a given concept is an upper neighbour if none of its
sub-concepts are super-concepts of the given concept. Layered drawings of this
digraph exist in which all arcs of the digraph are upwards, allowing arrows to be
omitted from the arcs. The resultant graph drawing is known as a Hasse diagram
(see e.g. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]). Figure 1b shows the Hasse diagram corresponding to the formal
context in Figure 1a.
      </p>
      <p>At the top of the lattice digraph is the (sole) sink vertex, representing the
supremum concept, which corresponds to the entire corpus and any terms
common to all documents; at the bottom of the lattice digraph is the (sole) source
vertex, representing the infimum concept, which corresponds to the set of all
terms in the corpus and any documents which contain all of them. Each graph
vertex is labelled with any term(s) for which the extent of the corresponding
concept is exactly the set of documents which contain it, and with any document
identifier(s) for which the intent is exactly the set of terms in that document.
A concept is an attribute [object] concept for each attribute [object] with which
the corresponding vertex is labelled.
2.2</p>
      <sec id="sec-3-1">
        <title>Closure Terms</title>
        <p>
          In corpus-based information retrieval applications of FCA, the user starts by
specifying a set of terms which is to be used as a conjunctive query. The result
set – the set of documents containing all of those terms – is easily determined,
along with any additional terms which those documents have in common. To
distinguish these additional terms from the query terms, we refer to them as
closure terms. Closure terms, if any, are identified automatically and provide
additional information about the result set – and indeed the corpus – which the
user acquires without reading the constituent documents. Codocedo and Napoli
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] refer to the addition of closure terms as query “extension”, and note the
“exhaustive” exploitation of closure terms to provide feedback to the user on
the corpus-specific context of their query.
        </p>
        <p>While closure terms are exposed to the user in SORTeD, however, they are
quarantined from user interaction during subsequent query editing, since the
result set is unaffected by either: explicitly extending the user-specified query
with closure terms; or removing closure terms from a conjunctive Boolean query
which has been extended in this manner. The user must understand the definition
of closure terms to properly interpret not only the information they impart about
the corpus and the result set, but also their different treatment and behaviour
in the user interface.
2.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Query Refinement</title>
        <p>
          Having specified a query which has a non-empty result set, the user can
specialise or generalise the query as required. Query generalisation [specialisation]
is typically implemented as the explicit selection of a super- [sub-] concept in
the concept lattice, potentially via upward [downward] structural navigation.
Some authors [
          <xref ref-type="bibr" rid="ref14 ref3">3,14</xref>
          ] refer to navigation to upper and lower neighbours as query
“enlargement” and “refinement” respectively.
1 X X X
2 X
3 X
(a) Context
        </p>
        <p>A
2</p>
        <p>C
1</p>
        <p>B
3
+A</p>
        <p>+B
A
2
+C
C
1</p>
        <p>B
3
(b) Hasse diagram
(c) Specialisation options</p>
        <p>
          Alternatively, individual terms can be manually added to, or removed from,
the query [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Query generalisation is achieved by removing one or more of the
specified query terms. Query specialisation, on the other hand, involves adding
conjunctive terms – those which occur in proper2, non-empty subsets of the
current result set of documents. In addition to requiring the user to select only
from query terms in the case of generalisation, and only from conjunctive terms
in the case of specialisation, the SORTeD user interface provides advanced
knowledge of the effect each choice would have on the size of the result set.
Constraining the choice of additional terms to conjunctive terms during query
refinement also provides the user with explicit information about the terms and
term combinations occurring in the corpus, which they would otherwise need to
infer from failed queries.
2.4
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Contrast with Structural Navigation</title>
        <p>Section 2.3 described the interactive refinement of a conjunctive Boolean query
through explicit editing of the query. Like explicit structural navigation of the
concept lattice digraph, this approach to query refinement constrains the user’s
choice to the set of super- and sub- concepts of the current concept. Nevertheless,
there are significant differences between the behaviour of these two approaches,
and these are explained in this section.</p>
        <p>The manual addition [removal] of a single term implicitly selects a
sub[super-] concept – but not necessarily a neighbour – of the current concept.
The following example illustrates this point. Consider a corpus consisting of
three documents {1, 2, 3} containing combinations of three terms {A, B, C}. Its
formal context is shown in Figure 1a and the corresponding Hasse diagram in
Figure 1b. Starting at the supremum, as shown in Figure 1c, the user specifies
2 this excludes both current query and closure terms, which occur in an improper
subset.
A
2
−C</p>
        <p>C
q
1
(a) C</p>
        <p>B
3</p>
        <p>A
2
−B</p>
        <p>C
q
1
(b) AB</p>
        <p>B
3
−A</p>
        <p>A
2
−C
−A</p>
        <p>C
q
1
(c) AC</p>
        <p>B
3
term A, B or C as their first search term. Term A selects the left lower
neighbour of the supremum, since there is a document in the corpus containing only
that term. Similarly, term B selects the right lower neighbour. However, if the
user enters C as the first search term, the infimum is selected, since the only
document in the corpus which contains term C also contains the (closure) terms
A and B. Here the addition of a search term to the (empty) query implicitly
navigates to a sub-concept of the current concept which is not a lower neighbour.
As illustrated using the directed edge labelled “-C” in Figure 2a, its subsequent
removal returns directly to the supremum, which is a super-concept but not
upper neighbour of the infimum.</p>
        <p>A given concept in the lattice can be reached using different sequences of
query terms. For the example context of Figure 1, the infimum can be reached
not only with the query C, but also with the queries AB, AC, BA and BC.
Query sequences such as CA and ABC are not possible, since the last or last
few terms are rendered closure terms – and therefore unavailable for subsequent
user entry – by the entry of earlier terms in the sequence. In the case of CA, for
example, the query C selects the infimum, at which point A becomes a closure
term whose entry by the user would be nugatory.</p>
        <p>Since only the query (vice closure) terms are available for removal, the query
sequence used clearly affects the subsequent options for generalisation. Figure 2
illustrates the options for query generalisation for three of these queries.
Figure 2b shows that for the query AB, either of the upper neighbours of the
infimum is reachable by removing one of the search terms. In contrast, Figure 2c
shows that the right-most concept cannot be reached by removing either term
from the query AC. Whereas removing C from the search AC selects the
leftmost concept, removing A instead results in no change in the selected concept.
In this case, A simply changes category from a user-specified query term to a
closure term. Such nugatory interaction could be prevented by removing A from
the set of terms which can be removed until such time as C has been removed.</p>
        <p>A query editing approach to query refinement was described in Section 2.3. In
this subsection its behaviour has been compared and contrasted with structural
navigation of the concept lattice digraph. In particular, while all lower neighbours
are reachable in the next move, the query used to reach the current concept may
render some upper neighbours unreachable in the user’s next move. Furthermore
the concepts reachable in the next move may include super- [sub-] concepts which
are not upper [lower] neighbours.
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Computation for Query Editing</title>
      <p>The query editing scheme described in Section 2.3 involves the addition or
removal of a single user-specified query term. In order to inform the user of the
anticipated effect on the size of the result set, and in the case of query
specialisation, to constrain the choice of additional terms to those which produce a
non-empty result set, it is necessary to calculate the set of concepts which can
be reached in the user’s next move.</p>
      <p>
        A range of algorithms exist for the enumeration of concepts in a formal
context (see e.g. [
        <xref ref-type="bibr" rid="ref1 ref11">11,1</xref>
        ]), and these can be readily modified to enumerate only those
concepts which can be reached from the current concept by adding or
removing a single query term. Reachable sub-concepts can be calculated by adding
to the intent of the current concept each unused term in turn and calculating
the result set. Here, an “unused” term is one which is not already in the intent
of the current concept. Terms giving rise to an empty result set are candidates
for term substitution, a technique for which is described in Section 4. Similarly,
the reachable super-concepts can be calculated by removing each term in turn
from the set of user-specified terms, calculating the result set, and optionally
disallowing the removal of any terms for which the result set is not a proper
superset of the extent of the current concept.
      </p>
      <p>The on-demand computation scheme described in this section has the benefit
that only those concepts required to inform the user’s next move are computed
at each step. The required set of concepts must be computed in interactive
timescales, so that the user is not required to wait before contemplating their
next move. Given that the computation is triggered by the user’s move, however,
and the user must assess the consequences of that move – viz. the intent and
extent of the new query concept – before contemplating the next, this requirement
is not onerous. The computational complexity of enumerating the reachable
subconcepts is O(t2d) Boolean AND operations, where t is the number of unique
terms in the formal context and d is the number of documents. Whilst this has
the potential to become prohibitive for large corpora, the computation is highly
parallelisable.</p>
      <p>The set of concepts reachable in the next move could be stored in case
subsequent changes to the query return to the same concept. However, since the set
of reachable super-concepts is dependent not only on the current concept, but
also on the query used to reach it, more than one set of reachable super-concepts
may need to be stored for a given concept.</p>
    </sec>
    <sec id="sec-5">
      <title>Substituting Disjunctive Terms</title>
      <p>
        Let the partially ordered set hB, &lt;i be the concept lattice corresponding to a
formal context constructed from the terms and documents of a nominated
corpus. The elements of B are the formal concepts and the relation &lt; between
concepts corresponds to set inclusion between concept extents. hB, &lt;i is a
complete lattice, for which the greatest lower and least upper bounds, inf S and
sup S, respectively, on any S ⊆ B exist and are unique [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Definition 1 Let X(S) , {x ∈ B|x ≤ z, ∀z ∈ S}. Then inf S ∈ X(S) and
inf S ≥ x, ∀x ∈ X(S).</p>
      <p>Definition 2 Let Y(S) , {y ∈ B|y ≥ z, ∀z ∈ S}. Then sup S ∈ Y(S) and
sup S ≤ y, ∀y ∈ Y(S).</p>
      <p>Let q ∈ B denote the current query concept, α ∈ B an ancestor of q &lt; α,
and t ∈ B the attribute concept of a nominated disjunctive term. By definition,
a disjunctive term is one which is not in the intent of q or any of its descendants,
so that</p>
      <p>Assume the existence of a procedure which identifies all distinct pairs (α0, ω)
such that</p>
      <p>q 6&lt; t
inf {q, t} = inf B
α &gt; q
ω , inf{α, t} ∈ B
α0 , sup{ω, q} ∈ B</p>
      <p>Figure 3a shows the Hasse diagram for the poset h{q, α, ω, t} , &lt;i. The edges
are shown dashed to indicate that they need not correspond to neighbour
relations in the Hasse diagram for hB, &lt;i. From Equation 2b and Definition 1,
ω ∈ X({α, t}), and for any x ∈ X({α, t}), x ≤ ω. The intent of any x &lt; ω is a
superset of that of ω, and hence differs from those of α and t by more terms. In
particular, Equation 2b ensures that the new query, ω, differs from α &gt; q by the
smallest number of (added) terms consistent with the constraint ω ∈ X({α, t}).
We note in passing that Equation 1a eliminates the possibility that ω = α.</p>
      <p>Figure 3b shows the Hasse diagram for the poset h{q, α, α0, ω, t} , &lt;i, which
differs from that in Figure 3a by the addition of α0. From Equation 2c and
Definition 2, α0 ∈ Y({q, ω}) and for any y ∈ Y({q, ω}), y ≥ α0. From Equations 2a
and 2c and Definition 2, α ∈ Y({q, ω}) and hence
Equation 2c ensures that α0 differs from q by the smallest number of (removed)
terms, and from ω by the smallest number of (added) terms, both consistent
with the constraints α0 ∈ Y({q, ω}).</p>
      <p>A procedure for enumerating all pairs satisfying Equation 2 is as follows. For
each α &gt; q encountered during an upwards, breadth-first traversal of hB, &lt;i
from q, find ω and then α0 using Equations 2b and 2c. If α0 6= α, discard (α0, ω).
The upwards, breadth-first traversal ensures that the algorithm encounters α0 as
a candidate ancestor of q before – or strictly speaking not later than – α ≥ α0.
By an upwards, breadth-first traversal, we mean that all upper neighbours of
a concept are visited before any more distant ancestors, and that a vertex is
not visited until each of its lower neighbours has been. The latter requirement
is necessitated by the possibility that the lattice digraph may contain parallel
directed paths of unequal path length. If ω has been previously encountered, and
hence also α0 from Equation 2c, the pair (α0, ω) has already been generated, and
traversal can progress immediately to the next α &gt; q.</p>
      <sec id="sec-5-1">
        <title>Proposition 1.</title>
        <p>Proof. Define
ω = inf {α0, t}
ω0 , inf {α0, t}
(4)
(5)
Substituting Equation 3 into Definition 1 yields ω0 ≤ t and ω0 ≤ α0 ≤ α, to
which Definition 1 can be reapplied to give ω0 ≤ inf {α, t} = ω. But ω ≤ α0 from
Equation 2c and ω ≤ t from Equation 2b, so ω ≤ inf {α0, t} = ω0. Hence ω0 = ω.</p>
        <p>Proposition 1 shows that evaluating the right-hand side of Equation 5 in the
hope of finding a tighter lower bound ω0 on α0 and t is nugatory. Subject to the
initial choice of α, the pair (α0, ω) minimises the number of terms removed from
and added to the current query q to include the disjunctive term for which t is the
attribute concept. Having enumerated the possible choices for α &gt; q to identify
all unique pairs (α0, ω), the problem reduces to choosing from amongst these the
ω differing from q by the smallest number of terms. The cardinality of the set
1 X X
2 X
3</p>
        <p>X
X X</p>
        <p>X
difference between the intents of q and α0 gives the number of terms removed
from the existing query, while that of the set difference between the intents of ω
and α0 gives the number of terms added to it. The most straightforward approach
is to minimise the sum of these two numbers. If preserving terms of the current
query is a priority, then a weighted sum might be used in which the number of
removed terms is weighted more highly than the number of added terms. The
possible choices for the new query ω could be presented to the user ranked in
order of increasing (weighted) term difference.</p>
        <p>Applying this approach to the example context in Figure 4a with query AB
yields the two jointly-optimal solutions ω1∗ and ω2∗ shown in Figure 4b,
corresponding to ancestors α10 and α20, respectively, of the query concept q (shown
red). Each involves removing one query term and adding the (formerly)
disjunctive term C, for which the attribute concept t is shown green. The user might
be asked to choose between them, possibly aided by additional information such
as extent cardinality, or the semantic relevance of the additional terms.</p>
        <p>In this section, we have so far ignored the distinction between user-specified
and closure terms. This distinction should be taken into account to privelege
options minimising the number of user-specified terms which must be removed
to accommodate the new term. If, for example, the user-specified query against
the formal context of Figure 4a were AD, then the removal of closure term B
from the intent of the query concept should be preferred over that of query
term A. Descendant ω1∗ of α10 is therefore preferred over ω2 of α20, as shown in
Figure 4c, even though both involve the removal of two terms from the query
concept.</p>
        <p>Like conjunctive terms, disjunctive terms should be offered in a
progressivelyconstrained list for addition to the query. Upon user selection of a disjunctive
term, options would be presented showing which query (vice closure) terms must
be removed to accommodate the new term. The terms to be removed might
be shown struck out, and the terms to be added – other than the nominated
disjunctive term – highlighted, but otherwise treated as closure terms.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Discussion and Related Work</title>
      <p>
        If the user is not aided in their initial choice of query terms, it is not possible
to guarantee that each term occurs in the corpus and that at least one
document contains all terms. The query editing scheme described in this paper is
therefore used ab initio, with the initial query being the set of terms associated
with the supremum. In contrast, Carpineto and Romano [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] permitted
unconstrained entry of an intial set of query terms, and described a composite method
whereby the resultant query could be mapped onto the concept lattice. Ab
initio computer-assisted query refinement requires the initial enumeration of all
attribute concepts in the formal context and user interaction with the list of all
terms in the corpus. Manual entry of terms, aided by term completion, partially
alleviates the challenge posed by large corpora of finding terms in a long list.
6
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>This paper has described a method by which the user explicitly edits a
conjunctive Boolean query, subject to the constraint that the edited query has a
non-empty result set. Adoption of this query-editing approach, and in particular
its use of lists to present options for term addition and removal, has allowed
the SORTeD prototype to offer the user ranked options for interactive query
refinement. The addition of a disjunctive term entails the removal of one or
more terms previously entered by the user. The user is alerted upon entry of a
disjunctive term, and prompted to choose, from amongst automatically
generated options, which term(s) should be removed. An FCA-based technique has
been described for automatically identifying options minimising the number of
user-specified terms removed.</p>
      <p>This approach to refining conjunctive Boolean queries has been compared and
contrasted with the explicit structural navigation of the concept lattice digraph.
In particular: special treatment of closure terms is appropriate; not all neighbours
of the query concept are reachable through the addition or removal of a single
user-specified query term; and navigation is not constrained to neighbours.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Andrews</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A 'best-of-breed' approach for designing a fast algorithm for computing fixpoints of Galois connections</article-title>
          .
          <source>Information Sciences</source>
          <volume>295</volume>
          ,
          <fpage>633</fpage>
          -
          <lpage>649</lpage>
          (
          <year>2015</year>
          ). https://doi.org/10.1016/j.ins.
          <year>2014</year>
          .
          <volume>10</volume>
          .011
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Carpineto</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romano</surname>
          </string-name>
          , G.:
          <article-title>ULYSSES: a lattice-based multiple interaction strategy retrieval interface</article-title>
          . In: Blumenthal,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Gornostaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Unger</surname>
          </string-name>
          , C. (eds.)
          <source>HumanComputer Interaction (EWHCI</source>
          <year>1995</year>
          ). pp.
          <fpage>91</fpage>
          -
          <lpage>104</lpage>
          . No. 1015
          <string-name>
            <surname>in</surname>
            <given-names>LNCS</given-names>
          </string-name>
          , Springer, Berlin, Heidelberg (
          <year>1995</year>
          ). https://doi.org/10.1007/3-540
          <source>-60614-9 7</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Carpineto</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romano</surname>
          </string-name>
          , G.:
          <article-title>Exploiting the potential of concept lattices for information retrieval with CREDO</article-title>
          .
          <source>Journal of Universal Computing</source>
          <volume>10</volume>
          (
          <issue>8</issue>
          ),
          <fpage>985</fpage>
          -
          <lpage>1013</lpage>
          (
          <year>2004</year>
          ). https://doi.org/10.3217/jucs-010
          <source>-08-0985</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Carpineto</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romano</surname>
          </string-name>
          , G.:
          <article-title>Effective reformulation of Boolean queries with concept lattices</article-title>
          . In: Andreasen,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Christiansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Larsen</surname>
          </string-name>
          , H. (eds.)
          <article-title>Flexible QueryAnswering Systems (FQAS</article-title>
          <year>1995</year>
          ). LNCS, vol.
          <volume>1495</volume>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>94</lpage>
          . Springer, Berlin, Heidelberg (
          <year>1998</year>
          ). https://doi.org/10.1007/BFb0055993
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Codocedo</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lykourentzou</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A semantic approach to concept lattice-based information retrieval</article-title>
          .
          <source>Annals of Mathematics and Artificial Intelligence</source>
          <volume>72</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>169</fpage>
          -
          <lpage>195</lpage>
          (
          <year>2014</year>
          ). https://doi.org/10.1007/s10472-014-9403-0
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Codocedo</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Formal Concept Analysis and Information Retrieval - a survey</article-title>
          . In: Baixeries,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Sacarea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Ojeda-Aciego</surname>
          </string-name>
          , M. (eds.)
          <source>Formal Concept Analysis, Lecture Notes in Computer Science</source>
          , vol.
          <volume>9113</volume>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>77</lpage>
          . Springer, Cham (
          <year>2015</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -19545-2 4
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Eklund</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wray</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodall</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bunt</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christidis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daniels</surname>
          </string-name>
          , V.,
          <string-name>
            <surname>Van Olffen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Designing the digital ecosystem of the Virtual Museum of the Pacific</article-title>
          .
          <source>In: 3rd IEEE Intl. Conf. Digital Ecosystems and Technologies</source>
          . pp.
          <fpage>377</fpage>
          -
          <lpage>383</lpage>
          . IEEE (
          <year>2009</year>
          ). https://doi.org/10.1109/DEST.
          <year>2009</year>
          .5276744
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Eklund</surname>
            ,
            <given-names>P.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ducrou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brawn</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Concept lattices for information visualization: Can novices read line diagrams</article-title>
          ? In: Eklund,
          <string-name>
            <surname>P</surname>
          </string-name>
          . (ed.)
          <source>Proc. 2nd Intl. Conf. on Formal Concept Analysis (FCA'04)</source>
          . LNCS, vol.
          <volume>2961</volume>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>73</lpage>
          . Springer, Berlin, Heidelberg (
          <year>2004</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -24651-0 7
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          . Springer, Berlin, Heidelberg (
          <year>1999</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -59830-2
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gross</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yellen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang, P. (eds.):
          <source>Handbook of Graph Theory. Chapman</source>
          and Hall/CRC Press, New York, second edn. (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poelmans</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Knowledge representation and processing with Formal Concept Analysis</article-title>
          .
          <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
          <volume>3</volume>
          (
          <issue>3</issue>
          ),
          <fpage>200</fpage>
          -
          <lpage>215</lpage>
          (
          <year>2013</year>
          ). https://doi.org/10.1002/widm.1088
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lindig</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Concept-based component retrieval</article-title>
          .
          <source>In: Working notes of the IJCAI-95 workshop:</source>
          Formal Approaches to the Reuse of Plans, Proofs, and Programs. pp.
          <fpage>21</fpage>
          -
          <lpage>25</lpage>
          . Montreal, Canada (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pattison</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Interactive visualisation of formal concept lattices</article-title>
          . In: Burton,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Stapleton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <surname>K</surname>
          </string-name>
          . (eds.)
          <source>Joint Proc. 4th Intl. Workshop on Euler Diagrams (ED 2014) and the 1st Intl. Workshop on Graph Visualization in Practice (GViP</source>
          <year>2014</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1244</volume>
          , pp.
          <fpage>78</fpage>
          -
          <lpage>79</lpage>
          (Jul 28 - Aug 1
          <year>2014</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1244</volume>
          /GViP-paper6.pdf
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Poelmans</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>D.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viaene</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dedene</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Text mining scientific papers: A survey on FCA-based information retrieval research</article-title>
          . In: Perner,
          <string-name>
            <surname>P</surname>
          </string-name>
          . (ed.)
          <article-title>Advances in Data Mining: Applications and Theoretical Aspects (ICDM</article-title>
          <year>2012</year>
          ). pp.
          <fpage>273</fpage>
          -
          <lpage>287</lpage>
          . Springer, Berlin, Heidelberg (
          <year>2012</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -31488-9 22
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Priss</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <source>Formal Concept Analysis in Information Science. Annual Review of Information Science and Technology</source>
          <volume>40</volume>
          ,
          <fpage>521</fpage>
          -
          <lpage>543</lpage>
          (
          <year>2006</year>
          ). https://doi.org/10.1002/aris.1440400120
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Valverde-Albacete</surname>
            ,
            <given-names>F.J.</given-names>
          </string-name>
          ,
          <article-title>Pela´ez-</article-title>
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Systems vs. methods: an analysis of the affordances of Formal Concept Analysis for information retrieval</article-title>
          . In: Carpineto,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.O.</given-names>
            ,
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>Proc. Workshop Formal Concept Analysis Meets Information Retrieval (FCAIR</source>
          <year>2013</year>
          ). vol.
          <volume>977</volume>
          (
          <year>2013</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>977</volume>
          /paper13.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>