Thoughts on exploiting instability in lattices for
  assessing the discrimination adequacy of a
                   taxonomy

            Antony K Cooper? , Derrick G Kourie, and Serena Coetzee

    Department of Computer Science, University of Pretoria, Pretoria, South Africa
       acooperatcsir.co.za,dkourieatcs.up.ac.za,scoetzeeatcs.up.ac.za
                           http://www.cs.up.ac.za/


        Abstract. Conventionally in formal concept analysis (FCA), concept
        stability is preferred in the lattice, because instability (i.e. low stability)
        represents noise that clouds the analysis of the data.
        High stability means there are many objects with the same intent or
        many attributes with the same extent, which could be interpreted as re-
        dundant or absent objects or attributes. The differences between redun-
        dancy or absence need to be assessed quantitatively, a process that could
        be described as stability exploration. We have used FCA to analyse dif-
        ferent taxonomies for user-generated content. For example, redundancy
        amongst attributes represents taxonomy classes unable to differentiate
        adequately the objects being classified. Absent attributes, redundant ob-
        jects and absent objects can have various implications. Hence, instability
        in a lattice is desirable for some types of analysis.

        Keywords: formal concept analysis, stability, taxonomy


1     Background on user generated content

User-generated content (UGC) in general, and volunteered geographical informa-
tion (VGI) in particular, are becoming more important as sources for official
data bases, such as those used in national spatial data infrastructures (SDIs).
An SDI is an evolving concept about facilitating and coordinating the exchange
and sharing of spatial data and services between various stakeholders [1].
     While traditional sources of official data are well understood, the same does
not apply to UGC. It is interpreted in different ways, and one woman’s UGC
could be another man’s professionally generated content. Several attempts have
been made to understand UGC and the contributors of UGC, by developing
taxonomies for aspects of UGC in general (eg: [2, 3]), or VGI in particular (eg: [4,
5]). VGI examples VGI include OpenStreetMap, a free, editable map of the world
[6]; citizen-science projects such as the Second South African Bird Atlas Project
?
    Corresponding author. Current address: Built Environment Unit, CSIR, PO Box
    395, Pretoria, 0001, South Africa
                                Thoughts on exploiting instability in lattices       339

(SABAP2) [7]; in-car navigation systems allowing users to submit corrections;
and geocoded photographs on virtual globes such as Google Earth [8].
    We conducted an assessment amongst some geographical information profes-
sionals of their perceptions of virtual globes, VGI and SDIs [9], and we are in the
process of developing a taxonomy of VGI, which we are modelling formally. We
are using formal concept analysis (FCA) [10] to assess the characteristics of ex-
isting taxonomies of UGC, such as their discrimination adequacy. The intention
is to improve the understanding of UGC, in respect of, for example, assessing
UGC quality or catering for VGI in a metadata standard.
    FCA uses a lattice of concepts with objects and attributes, and the linkages
between them. We use the standard FCA terminology and notation: a context is
written as: K := (G, M, I), where G is a set of objects and M a set of attributes. I
is the binary relation between the sets of objects and attributes: I ⊆ (G        ×
                                                                                M ).
(A, B) represents a concept whose extent is A ⊆ G and whose intent is B ⊆ M .
For (A, B) to be a formal concept, B must contain all those attributes that the
objects in A have in common, and only those attributes: denoted by A0 = B.
Conversely, A must also contain all those objects that share the attributes B,
and only those objects: denoted by B 0 = A. See [10–15].


2   Stability in a lattice

The intensional stability of a concept indicates how much its intent depends on
individual objects in the extent. It is a measure of the likelihood that remov-
ing a random set of objects from the concept’s extent would change its intent.
Similarly, the extensional stability indicates how much the extent depends on
individual attributes in the intent. Formally, [15] defines the intensional stability
index , σi , and extensional stability index , σe , of concept (A, B):
                                      0
           σi ((A, B)) = | {C ⊆ A2|A|
                                  | C = B} |


                                          0
           σe ((A, B)) = | {D ⊆ B2|B|
                                  | D = A} |


    Each concept (A, B) has |A| objects in its extent. The intentional stability
index is the proportion of the 2|A| subsets of A which have the following property:
the attributes C 0 shared by the objects in any such subset, say C, correspond to
the concept’s intent, that is, C 0 = B. In a lattice built with objects in C instead
of A, there will be a concept (C, B), and in this sense the intent, B, of concept
(A, B) is “stable”. The notion of extensional stability is similar, but with the
roles of extents and intents reversed.
    The more objects (or attributes) covered by a formal concept, the more
likely it will be intensionally (or extensionally) stable, because of the greater
likelihood of “redundant” objects (or attributes). Figure 1 shows a very stable
lattice, because a lattice built from any subset of attributes (objects) would
yield a concept whose extent (intent, respectively) is unchanged from that of
the concept in Figure 1. That is, σi (A, B) ≈ 1 and σe (A, B) ≈ 1. While this
340      Antony K Cooper, Derrick G Kourie, and Serena Coetzee


                            attribute b           attribute e
                         attribute a                 attribute d


                           object Z                 object W
                             object Y             object V


                    Fig. 1. A very stable, but rather boring, lattice.


is an extreme example, it illustrates why stability can mean redundancy and
why it can be considered “boring” in some applications [16], because of the
low information content. We appreciate that for machine learning (e.g. [16]),
concepts with high stability indicate robust input data with little noise.
    We have not used FCA to classify data, but to assess the discrimination
adequacy of taxonomies for UGC [2, 3, 5, 4]. The classes in these taxonomies
are the attributes for FCA, e.g. [3] provides classes for copyright issues: User-
authored content, User-derived content, User-copied content and Peer-to-peer as
UGC . The FCA objects are repositories of UGC, such as in-car navigation or
an open repository [5].
    We have used Concept Explorer (ConExp) [17]1 for FCA, because it is open-
source, robust and used by our colleagues (e.g. [16, 19]), and hence has a pool of
expertise readily available. ConExp provides attribute exploration, an interactive
process to see if each implication (set of “linked” attributes) can also apply to
objects not in the context of the implication. Questions are asked about depen-
dencies between different attributes (i.e. the exploration), and if a dependency
does not hold, the user has to provide a counterexample (effectively, add a new
object) [17]. This can reveal “absent” and “redundant” attributes and objects.

3     Absent and redundant attributes and objects
FCA is applied here to determine the adequacy of taxonomies for discriminating
between repositories containing UGC in general, or VGI in particular. We lack
the space to provide detailed examples here, but will do so at the conference.
Instead, we provide a theoretical example in Figure 2, showing absent and
redundant attributes and objects. If a concept has objects, they are shown below
the node and if a concept has attributes, they are shown above the node. In terms
of reduced labelling, each concept inherits objects from its extent and attributes
from its intent. The following subsections refer to Figure 2.

3.1     High intensional stability
Concept A = ({Obj1, Obj2, Obj3, Obj4, Obj5}, ∅) has high intensional stability,
with σi (A) = 0.84. Many (27) of the 32 subsets of A’s extent yield a concept
1
    Note: ConExp’s author requests that users cite his Russian text, [18].
                               Thoughts on exploiting instability in lattices   341


               Fig. 2. Absent and redundant attributes and objects.


whose intent is also ∅. The object Obj5 is not described or differentiated by
any attributes. For our analysis, this could be addressed by adding classes to
a taxonomy, so that it can differentiate better between the repositories. The
taxonomy in [3] extends the taxonomy in [2] (namely, Distribution platform and
Type), to cater for copyright issues. Without the classes of [3], the taxonomy of
[2] does not really differentiate repositories from one another.
    Concept C = ({Obj3, Obj4}, {Attr4}) also has relatively high intensional
stability, with σi (C) = 0.75. The objects Obj3 and Obj4 are not differentiated
from one another by the attributes. Effectively, Obj3 = Obj4 and one of them is
redundant. In a comprehensive analysis of UGC repositories, one would expect
this, namely repositories that are equivalent and hence direct competitors of one
another. For example, referring to Figure 3, the objects shown are generic and
there could be many repositories that are specific instances of each. Adding these
repositories as objects would create redundancies in the taxonomy.


3.2   High extensional stability

Concept E = (∅, {Attr1, Attr2, Attr3, Attr4, Attr5}) has high extensional stabil-
ity, with σi (E) = 0.84. Again, 27 of the 32 subsets of E’s intent yield a concept
whose extent is also ∅. The attribute Attr5 does not describe or differentiate
any objects. This could be a weakness in the analysis, with an important type
of repository omitted, or it could indicate a type of repository that does not
yet exist and hence a potential “gap” in the market. While experimenting with
FCA and the taxonomy of [4], we realised the value of instability in a lattice.
It highlighted a potential “gap” in the market, namely repositories that do not
cater adequately for privacy: a widespread problem on the Internet.
    Concept D also has relatively high extensional stability, with σe (D) = 0.75.
No objects are differentiated from one another by the attributes Attr2 and Attr3 .
Effectively, Attr2 = Attr3 and one of them is redundant. This could be coin-
cidental, could reflect a set of objects that is too narrow (eg: other types of
repositories should also have been included), or could indicate that some classes
should be removed from the taxonomy because they add no value or even worse,
could cause confusion as users try to differentiate between classes that are, in
342     Antony K Cooper, Derrick G Kourie, and Serena Coetzee


Fig. 3. A subset of the taxonomy of [5], for assessing the nature and motivation of
produsers.


essence, equivalent. We illustrate this in Figure 3 with a subset of the VGI taxon-
omy from [5], for assessing the nature and motivation of produsers (users who are
also producers). For the objects, we use the generic examples of VGI repositories
given by [5]. As can be seen, there are several redundancies in the attributes,
because these classes are inadequately defined, or cannot be differentiated in
practice, or other types of repositories should be included in this analysis.


3.3   Stability exploration

While the approach outlined above differentiates situations of absent attributes /
redundant objects from those of absent objects/redundant attributes, it does not
necessarily do so within these two groupings. This is because qualitative analysis
is probably required to differentiate between absent attributes and redundant
objects, and between absent objects and redundant attributes. We would suggest
that when assessing the discrimination adequacy of a taxonomy, the absence or
redundancy of objects or attributes is undesirable. Such a taxonomy could be
improved by reducing the intensional and/or extensional stability. It appears that
this could be done in a manner similar to attribute exploration [17], starting with
the concept with the highest stability and then moving on to the next highest.
This process could be termed stability exploration.


4     Conclusions

We are using FCA to assess the adequacy of taxonomies in discriminating be-
tween different types of UGC repositories. In contrast to the usual FCA applica-
tions, we have shown that instability can have value for analysis. High intensional
stability reveals missing classes from a taxonomy, or redundancy amongst the
repositories. High extensional stability reveals missing repositories or gaps in
                                 Thoughts on exploiting instability in lattices   343

the market, or taxonomy classes that are redundant. We are investigating how
stability exploration could be implemented to guide a user to reduce stability.
Future work could involve assessing the taxonomies in detail or taxonomies in
other domains, such as bloodstain pattern analysis [20].
    We would like to acknowledge the fruitful discussions we have had with our
colleagues and the very insightful comments of the anonymous referees.

References
1. Hjelmager, J., Moellering, H., Delgado, T., Cooper, A.K., Rajabifard, A., Rapant,
   P., Danko, D., Huet, M., Laurent, D., Aalders, H.J.G.L., Iwaniak, A., Abad, P.,
   Düren, U., Martynenko, A.: An initial Formal Model for Spatial Data Infrastruc-
   tures. Int J Geogr Inf Sci, 22(11), pp 1295–1309 (2008)
2. Wunsch-Vincent, S., Vickery, G.: Participative Web: User-Created Content. OECD
   report DSTI/ICCP/IE(2006)7/FINAL. Working Party on the Information Economy
   of the Committee for Information, Computer and Communications Policy (2007)
3. Gervais, D.: The Tangled Web of UGC: Making Copyright Sense of User-Generated
   Content. Vanderbilt JETL, 11(4), pp 841–870 (2009)
4. Budhathoki, N.R., Nedovic-Budic, Z., Bruce, B.: An interdisciplinary frame for un-
   derstanding volunteered geographic information. Geomatica, 64(1), pp 11–26 (2010)
5. Coleman, D.J., Georgiadou, Y., Labonte, J.: Volunteered Geographic Information:
   The Nature and Motivation of Produsers. Int J of SDI Res, 4 (2009)
6. OpenStreetMap: The Free Wiki World Map. http://www.openstreetmap.org/
7. Southern African Bird Atlas Project 2. http://sabap2.adu.org.za/
8. Google Earth: Explore, Search, and Discover. http://earth.google.com/
9. Cooper, A.K., Coetzee, S., Kourie, D.G.: Perceptions of virtual globes, volunteered
   geographical information and spatial data infrastructures. Geomatica, 64(1), pp
   333–348 (2010)
10. Wille, R.: Restructuring lattice theory: An approach based on hierarchies of con-
   cepts. In: Ordered sets, Rival, I. (ed), pp 445–470, D Reidel Publishing Co (1982)
11. Ganter, B., Wille, R.: Applied Lattice Theory: Formal Concept Analysis. Preprints.
   14pp (1997)
12. Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications.
   John Wiley & Sons, Ltd (2004)
13. Priss, U.: Formal concept analysis in information science. Annu Rev Inform Sci,
   40, pp 521–543 (2006)
14. Kuznetsov, S.O.: On stability of a formal concept. Ann Math Artif Intel, 49(1–4),
   pp 101–115 (2007)
15. Klimushkin, M., Obiedkov, S., Roth, C.: Approaches to the Selection of Relevant
   Concepts in the Case of Noisy Data. In: ICFCA 2010, Agadir, Morocco, pp 255–266,
   Springer (2010)
16. Kourie, D.G., Oosthuizen, G.D.: Lattices in machine learning: Complexity issues.
   Acta Informatica, 35, pp 269–292 (1998)
17. Yevtushenko, S., Kaiser, T., Tane, J.: Concept Explorer The User Guide. (2003)
18. Yevtushenko, S.A.: System of data analysis “Concept Explorer”. In: Proceedings
   of the 7th National Conference on Artificial Intelligence KII-2000, Russia (2000)
19. Chan, K.S.M.: Formal Methods for Web Services: A Taxonomic Approach. In:
   ICSE’10, Cape Town, South Africa, 2, pp 357–360, ACM (2010)
20. Cooper, A.K. Thoughts on categorising bloodstain patterns. Technical Report
   0442-0001-701-A1, CSIR (2003)