RIFs as the Formal Tool
           of Measuring Similarity between Sets

                                  Adam Grabowski

                   Institute of Informatics, University of Bialystok
               Konstantego Ciolkowskiego 1M, 15-245 Bialystok, Poland
                                adam@math.uwb.edu.pl


        Abstract. In my paper, I will present some results towards the formal-
        ization of Rough Inclusion Functions (RIFs) and their complementary
        mappings using the Mizar system. Following the lines established by
        Anna Gomolińska, we reuse the notions which are already present in the
        Mizar Mathematical Library, in the theory of metric spaces, in order to
        establish the connections between the theory of such metrics and the
        theory of rough sets.

        Keywords: rough approximation, formalization of mathematics, Mizar
        Mathematical Library


1     Motivation
Rough sets discovered by Z. Pawlak [15] are a tool for knowledge discovery
and modelling under imperfect information; this is especially the case nowa-
days, where we often face large databases of information gathered from various
sources. Knowledge discovery by means of computerized tools seems to be im-
portant taking into account the amount of available information. Similar tools
can be used in order to check of verify the correctness of mathematical theorems
expressed in artificial (but still close to the the natural) language understandable
for computers.
    Rough Inclusion Functions (RIFs for short) establish the connection between
classical set theory and theory of rough sets. Their quantitative nature shows its
feasibility in large databases; on the other hand such a view is also elegant from
the mathematical point of view for the theory of rough sets, at the same time
opening paths of further research – based on this single predicate one can build
the whole theory, as it was done in the case of Leśniewski mereology.
    As the proof assistant used to formalize the theory we have chosen the Mizar
system, relatively well-known system based on classical logic, together with its
repository of texts formally verified by computer – the Mizar Mathematical
Library (MML). At Concurrency Specification & Programming Workshop 2018
we presented the faithful translation of Gomolińska’s paper [4] on rough inclusion
1
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
functions. This time, extending the view presented in [4] and exploring the ideas
from [3], we focus rather on the reuse of existing fields already present in the
Mizar repository, namely complementary mappings, metric spaces, and similarity
measures between subsets.
    In our formal approach to rough approximations, our choice was to have
indiscernibility relation ρ defined as a binary relation on a non-empty universe
U . Essentially then, this corresponds to an abstract relational structure

                                    R = hU, ρi

with all properties credited to the internal relation of R. At the very beginning,
we do not assume any of standard properties of approximations (or tolerances)
added to the type of ρ, although we introduce two basic Mizar types, called
Approximation_Space and Tolerance_Space, to have them available in the
Mizar Mathematical Library. More thorough discussion on this development is
given in [10].
    Our core testing scenario for the formalization process was that, starting with
the mapping complementary to rough inclusion function κ1 , we can obtain an
instance of the Marczewski-Steinhaus metric (or, more generally, we can auto-
matically discover metric properties of defined operators). Namely, for subsets
X, Y of the non-empty universe U , we can define the operator δ1 as (see [3])

                        δ1 (X, Y ) = κ̄1 (X, Y ) + κ̄1 (Y, X).

    Essentially then, we have to build a path from approximation spaces on U
to the theory of metric (or metrizable) spaces, where we can already find some
well-known distance operators. In order to achieve this goal, a list of certain
formalized notions should be available at hand.


2     Rough Inclusion Functions

In this section, we will briefly sketch the content of the Mizar formalization [11]
collected in the file under MML identifier ROUGHIF11 . For a given universe U ,
rough inclusion functions (RIFs for short) are the mappings κ from ℘U × ℘U
into unit interval which satisfy two properties:

                   rif1 (κ) ⇔ ∀X,Y ⊆U (κ(X, Y ) = 1 ⇔ X ⊆ Y )

               rif2 (κ) ⇔ ∀X,Y,Z⊆U (Y ⊆ Z ⇒ κ(X, Y ) ≤ κ(X, Z))

    These two characteristic properties were introduced in Mizar as attributes
satisfying_RIF1 and satisfying_RIF2. For example, the first property is as
follows:
1
    The development is available at http://mizar.uwb.edu.pl/library/roughif1/
definition let R be non empty RelStr;
           let f be preRIF of R;
  attr f is satisfying_RIF1 means
    for X,Y being Subset of R holds f.(X,Y) = 1 iff X c= Y;
end;

    The classical example of a RIF is the standard RIF, κ, defined formally
in two steps. Firstly, the Mizar functor kappa is defined for two subsets of an
approximation space (frankly, it was defined more generally):

definition let R be finite Approximation_Space;
           let X,Y be Subset of R;
  func kappa (X,Y) -> Element of [.0,1.] equals
     card (X /\ Y) / card X if X <> {}
     otherwise 1;
  correctness;
end;

    Then we can define appropriate mapping pointwise as
definition let R be finite Approximation_Space;
  func kappa R -> Function of
    [:bool the carrier of R, bool the carrier of R:], [.0,1.] means
    for x,y being Subset of R holds it.(x,y) = kappa (x,y);
end;

   To assure the automatic understanding that such defined kappa has the
needed properties of RIF, the registration of a cluster is formulated and proved.

registration let R be finite Approximation_Space;
  cluster kappa R -> satisfying_RIF1 satisfying_RIF2;
  coherence;
end;

   Practically, all the content of a paper [4] devoted to RIFs is formalized al-
ready.


3    Complementary Mappings

Having defined rough inclusion function, we can go on with the modification of
this operator, using methods known from fuzzy set theory (as fuzzy implication
operators or triangular norms or conorms in FUZIMPL1 or FUZNORM1).
    With any mapping f : ℘U × ℘U → [0, 1] we can associate a complementary
mapping f¯ : ℘U × ℘U → [0, 1] as

                            f¯(X, Y ) = 1 − f (X, Y ).

    Mizar version of the above is introduced as
definition let R be finite Approximation_Space;
  let f be preRoughInclusionFunction of R;
  func CMap f -> preRoughInclusionFunction of R means
    for x,y being Subset of R holds
      it.(x,y) = 1 - f.(x,y);
   correctness;
end;

    Using this definition, we can, e.g. the following:

theorem PropEx3k:
  X <> {} implies (CMap kappa R).(X,Y) = card (X \ Y) / card X
  proof
    assume
A1: X <> {};
    X \ Y = X \ (X /\ Y) by XBOOLE_1:47; then
T1: card (X \ Y) = card X - card (X /\ Y) by XBOOLE_1:17,CARD_2:44;
    (CMap kappa R).(X,Y) = 1 - (kappa R).(X,Y) by CDef
      .= 1 - kappa (X,Y) by ROUGHIF1:def 2
      .= 1 - card (X /\ Y) / card X by A1,ROUGHIF1:def 1
      .= (card X / card X) - card (X /\ Y) / card X by A1,XCMPLX_1:60
      .= card (X \ Y) / card X by T1,XCMPLX_1:120;
    hence thesis;
  end;

    In order to shorten the notations, and to enhance the formulation of cor-
responding properties, we introduced also the attribute co-RIF-like, stating
that for every RIF f , the mapping complementary to f is again rough inclu-
sion function. Obviously, every CMap f for arbitrary f , which is a RIF, has this
property.
    Independently, we can prove the following analogon of Proposition 6a):

theorem Prop6a: :: Proposition 6 a)
  for kap being RIF of R holds
    (CMap kap).(X,Y) = 0 iff X c= Y;

    In similar manner, we completed the whole Proposition 6 from [4].


4    Metric Spaces

Even if the repository of Mizar texts strongly depend on Tarski-Grothendieck
set theory, which is roughly equivalent with Zermelo-Fraenkel with the Axiom
of Choice, so-called structures are important element of the language, especially
useful to define abstract algebraic structures.
    Metric spaces are excellent example of such a construction: we start with
arbitrary set U and a real function from the Cartesian square of U .
definition
  struct (1-sorted) MetrStruct
   (# carrier -> set,
     distance -> Function of [:the carrier,the carrier:],REAL #);
end;

    Additional adjectives of the distance function correspond to standard prop-
erties defining distance operators. For example, triangle means the following:

definition
  let A be set;
  let f be PartFunc of [:A,A:], REAL;
  attr f is triangle means
  for a, b, c being Element of A holds f.(a,c) <= f.(a,b) + f.(b,c);
end;

   Following these lines, the theory of metric spaces is build upon the Mizar
type MetrSpace:

definition
  mode MetrSpace is Reflexive discerning symmetric triangle MetrStruct;
end;

   That is, metric spaces are structures MetrStruct under the aassumption that
distance satisfies ordinary properties of the distance function.


5     Similarity Measures between Subsets
Additionally, some preparatory work was started to bridge the gap between
the theory of (abstract) metric spaces, concrete metrics defined to measure the
distance between the subsets. I focus on the Hausdorff distance which was defined
by me in the Mizar article HAUSDORF available in the Mizar Mathematical Library
as the following2 :

definition
  let M be non empty MetrSpace, P, Q be Subset of TopSpaceMetr M;
  func HausDist (P, Q) -> Real equals
:: HAUSDORF:def 1
  max ( max_dist_min (P, Q), max_dist_min (Q, P) );
  commutativity;
end;

    Such quite complicated definition using an auxiliary functor max_dist_min
introduced previously allows me to show the ordinary metric-like properties of
the above, e.g. the triangle inequality (so, it states that HausDist is triangle):
2
    All item from MML can be tracked via its MML identifier at http://mizar.org/
    version/current/html/
theorem :: HAUSDORF:38
  for M being non empty MetrSpace,
       P, Q, R being non empty Subset of TopSpaceMetr M st
    P is compact & Q is compact & R is compact holds
      HausDist (P, R) <= HausDist (P, Q) + HausDist (Q, R);

   In order to show some rough set-specific connections, we defined relatively
well-known Jaccard distance, starting with the number showing the similarity
between subsets of arbitrary finite 1-sorted structure:
definition let R be finite 1-sorted;
           let A, B be Subset of R;
  func JaccardIndex (A,B) -> Element of [.0,1.] equals :JacInd:
    card (A /\ B) / card (A \/ B) if A \/ B <> {}
    otherwise 1;
  correctness;
end;

   Using this functor, and previously formalizing its basic properties, we can
conclude with the definition of Jaccard function:
definition let R be finite 1-sorted;
  func JaccardDistance R -> Function of
    [:bool the carrier of R, bool the carrier of R:], REAL means :JacDef:
  for A, B being Subset of R holds
    it.(A,B) = 1 - JaccardIndex (A,B);
  correctness;
end;

    Some other distance operators are present in the MML, e.g. in the article
METRIC_3. In the case of Jaccard distance, first three adjectives are rather sim-
ple to show (that it is Reflexive, discerning, and symmetric). To prove the
remaining one, first we constructed δ1 from Section 1, in a straightforward way:
definition let R;
  func delta_1 R -> preRIF of R means :Delta1:
  for x,y being Subset of R holds
    it.(x,y) = (CMap kappa_1 R).(x,y) + (CMap kappa_1 R).(y,x);
  correctness;
end;

    Finally, we can show that such defined δ1 and JaccardDistance coincide,
and hence, as the earlier is a distance (and satisfies the triangle inequality), so
is the latter.
theorem
  for A, B being Subset of R holds
    (JaccardDistance R).(A,B) = (delta_1 R).(A,B);

   Although the formal proof in arbitrary setting is quite complex, our version,
using the proven correspondence with rough sets, is immediate.
registration let R be finite 1-sorted;
  cluster JaccardDistance R -> triangle;
  coherence;
end;

   Having all these formal notions at hand, our future work will be to discover
automatically possible connections between these topics, as suggested in [3].
   All formal developments described above are submitted for inclusion into the
Mizar Mathematical Library and are available under the following URL:

                     http://mizar.org/library/roughif2/


References

 1. Bancerek, G., Byliński, Cz., Grabowski, A., Kornilowicz, A., Matuszewski, R.,
    Naumowicz, A., Pa̧k, K., and Urban, J.: Mizar: State-of-the-art and beyond. In
    Kerber, M. Carette, J. et al. (Eds.): Intelligent Computer Mathematics – Inter-
    national Conference, CICM 2015, Washington, DC, USA, July 13–17, 2015, Pro-
    ceedings, Lecture Notes in Computer Science, 9150, Springer, pp. 261–279 (2015).
    doi:10.1007/978-3-319-20615-8 17
 2. Gomolińska, A.: A comparative study of some generalized rough approximations,
    Fundamenta Informaticae, 51, pp. 103–119 (2002)
 3. Gomolińska, A.: On certain rough inclusion functions, Transactions on Rough Sets
    IX, Lecture Notes in Computer Science, 5390, pp. 35–55 (2008). doi:10.1007/978-
    3-540-89876-4 3
 4. Gomolińska, A.: On three closely related rough inclusion functions, Proceedings of
    RSEISP 2007, Lecture Notes in Artificial Intelligence, 4585, pp. 142–151 (2007).
    doi:10.1007/978-3-540-73451-2 16
 5. Grabowski, A.: Mechanizing complemented lattices within Mizar type system,
    Journal of Automated Reasoning, 55(3), pp. 211–221 (2015). doi:10.1007/s10817-
    015-9333-5
 6. Grabowski, A.: Binary relations-based rough sets – an automated approach, For-
    malized Mathematics, 24(2), pp. 143–155 (2016). doi:10.1515/forma-2016-0011
 7. Grabowski, A.: Automated discovery of properties of rough sets, Fundamenta In-
    formaticae, 128(1–2), pp. 65–79 (2013). doi:10.3233/FI-2013-933
 8. Grabowski, A.: Efficient rough set theory merging, Fundamenta Informaticae,
    135(4), pp. 371–385 (2014). doi:10.3233/FI-2014-1129
 9. Grabowski, A.: Computer certification of rough sets based on relations, Proceedings
    of IJCRS 2017, Lecture Notes in Artificial Intelligence, 10313, pp. 83–94 (2017),
    doi:10.1007/978-3-319-60837-2 7
10. Grabowski, A.: Automated comparative study of some generalized rough approxi-
    mations, Proceedings of CS&P 2018, pp. 39–45 (2018)
11. Grabowski, A.: Formal development of rough inclusion functions, Formalized Math-
    ematics, 27(3), pp. 337–343 (2019), doi:10.2478/forma-2019-0027
12. Grabowski, A., Kornilowicz, A., Naumowicz, A.: Four decades of Mizar, Journal of
    Automated Reasoning, 55(3), pp. 191–198 (2015). doi:10.1007/s10817-015-9345-1
13. Grabowski, A.: Lattice theory for rough sets – a case study with Mizar, Fundamenta
    Informaticae, 147(2–3), pp. 223–240 (2016). doi:10.3233/FI-2016-1406
14. Grabowski, A.: On the computer-assisted reasoning about rough sets, in: Dunin-
    Kȩplicz, B., Jankowski, A., Szczuka, M. (Eds.) Monitoring, Security and Rescue
    Techniques in Multiagent Systems, Advances in Soft Computing, 28, pp. 215–226
    (2005). doi:10.1007/3-540-32370-8 15
15. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer,
    Dordrecht (1991). doi:10.1007/978-94-011-3534-4
16. Polkowski, L.: Approximate Reasoning by Parts. An Introduction to Rough Mere-
    ology. Intelligent Systems Reference Library, vol. 20, Springer Verlag, Berlin-
    Heidelberg (2011). doi:10.1007/978-3-642-22279-5
17. Polkowski, L. and Skowron, A.: Rough mereology: A new paradigm for approximate
    reasoning, International Journal of Approximate Reasoning, 15(4), pp. 333–365
    (1996). doi:10.1016/S0888-613X(96)00072-2
18. Skowron, A., Stepaniuk, J.: Tolerance approximation spaces, Fundamenta Infor-
    maticae, 27(2–3), pp. 245–253 (1996). doi:10.3233/FI-1996-272311
19. Yao, Y.Y.: Two views of the rough set theory in finite universes, International
    Journal of Approximate Reasoning, 15(4), pp. 291–317 (1996). doi:10.1016/S0888-
    613X(96)00071-0
20. Zadeh, L.: Fuzzy sets, Information and Control, 8(3), pp. 338-353 (1965).
    doi:10.1016/S0019-9958(65)90241-X
21. Zhu, W.: Generalized rough sets based on relations, Information Sciences, 177(22),
    pp. 4997–5011 (2007). doi:10.1016/j.ins.2007.05.037