Towards a Declarative Approach to Object Comparison Bruno Sofiato?1 , Fábio Levy Siqueira1 , and Ricardo Luis de Azevedo Rocha1 Escola Politécnica, Universidade de São Paulo (USP), São Paulo, Brazil Abstract. Equality and ordering are relatable concepts we use daily. As they depend on the domain at hand, the programmer must eventually code them. Empirical evidence points towards defects on code that mod- els comparison on open source software written in object-oriented (OO) languages. The main culprits are: comparison logic that spans multiple methods; and, shortcomings of the message dispatch scheme found on most OO languages. There are several proposals to mitigate these issues. None, however, deals with the ordering aspect and some forbid objects to be mutable. In this paper, we define a declarative, language-agnostic way of comparing objects that handles equality, ordering, and hashing aspects of comparison. We formalize it using Plotkin’s style operation semantics and provide detailed proof of its soundness. Keywords: Object-oriented programming · Computer languages · Com- parison semantics 1 Introduction Equality is an abstract concept that establishes ”when” two entities are similar to each other. The natural ordering is a complementary concept that determines ”how” they differ. They depend on the domain at hand and constitute the com- parison semantics of an entity. However, writing code that compares objects is complex [7, 22, 23, 26]. Baker [1] argues that such a task should be simpler (recently, C++20 plans to add the spaceship operator with this intent [14]). One of the culprits is that comparison usually spans several methods that must be consistent with each other [3]. The developer must enforce such consis- tency by hand. Automatic checking is not feasible, since it is reducible to the program-equivalence problem (and thus undecidable [25]). Static code analysis is also not an option, since it is notoriously prone to false positives [11]. Inheritance further escalates the issue. Bruce et al. [4] noted that it precludes the mainte- nance of symmetry in languages that employ simple message dispatch (which is the case of most mainstream OO languages). A valid equivalence relation must be symmetric. The goal of this paper is to devise, in the theoretical level, a way of comparing objects where the developers use metadata to encode semantics. Our proposal ? Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 Sofiato et al. handles both aspects of comparison – equality and order. We specify how to use this metadata to compare objects and which information it contains. Both languages and libraries may use our proposal as a foundation on how to compare objects. The benefits of our approach are two-fold: it shields the developer from the complexity of writing code that correctly compares objects in single-dispatch, OO languages; and, it restores orthogonality, as a single element - the metadata - is used instead of several methods. There are proposals [10, 21, 26] in the literature that employ different degrees of declarative programming. None deals with the ordering aspect of equality. Also, some of them [1, 10, 26] push that objects must be immutable, since some collections fail when its items change after inclusion. We feel, however, that such constraints forbid common programming idioms and hinder adoption. For instance, ORMs1 frameworks usually rely on mutability to track changes to entities. We think that it would be better if the collections themselves enforce such constraints. Discussing the intricacies of such an approach is out of the scope of this paper. We employ Plotkin‘s [20] Structural operational semantics (SOS) to formal- ize our proposal. We also use both Featherweight Java (FJ) [12] and Cardelli and Mitchell’s record calculus [5] in the formalization process (Section 7). This approach allows us to define our proposal without tying it to any language and also provides us a framework to prove its soundness. The remainder of this paper is structured as follows. Section 2 lays down the mathematical aspects of comparison. Next, Section 3 outlines the impacts of inheritance on comparison. Section 4 describes how the comparison is coded in two distinct OO languages, while Section 5 introduces the hash concept. Sec- tion 6 describes our proposal informally and lays the foundations for the formal description on Section 7. Section 8 proves the the soundness of our proposal. Finally, in Sections 9 and 10, we review related work and provide some final insights. 2 Equality and Ordering Semantics From a set-theoretic point of view, equality and order can both be expressed as binary relations between elements of a set. Two elements are said to be equal whenever an equivalence relation exists between them. Order is modeled by another kind of binary relation known as strict-total order (Zermelo [27] demon- strates that at least one ordering scheme can be found on every set, albeit some- times without any meaningful semantics). The OO paradigm is built upon the notion of modeling real-world elements onto constructs called objects. We access them via their references2 . Multiple references can point to the same object. In this case, we say they have the same 1 An object-relation mapping (ORM) framework maps objects into tables in a rela- tional database. An example of ORM is the Hibernate framework. 2 Through this paper, we use the terms object and reference interchangeably. Towards a Declarative Approach to Object Comparison 3 identity. The identity of an object can determine if two objects are similar3 . In such a case, only references to the same physical object are equal. Often, only a subset of properties determines if two objects are equal (e.g., in a particular context, persons are equal when their Social security number match, regardless of any other properties). We call this minimum subset of properties the object’s equality state. Regardless of its semantics, equality must observe a well-known set of rules to be valid. Bloch [3] stated that failure to observe them leads to subtle bugs that are hard to pin-point. Definition 1 describes those rules. Definition 1. A well-formed equivalence relation must be: reflexive – ∀x[(x = x)] –, symmetric – ∀x∀y[(x = y) ⇐⇒ (y = x)] – and transitive – ∀x∀y∀z[(x = y) ∧ (y = z) =⇒ (x = z)]. Ordering relations also have well-formedness rules that must hold, regardless of the underlying ordering semantics. Definition 2 describes them. Definition 2. A well-formed order relation must be: asymmetric – ∀x∀y[(x < y) ⇐⇒ (y > x)] –, transitive – ∀x∀y∀z[(x > y) ∧ (y > z) =⇒ (x > z)] – and trichotomous – ∀x∀y[(x 6= y) ⇐⇒ (x > y) ∨ (x < y)]. 3 Symmetry with Single Dispatch and Inheritance Coding comparison is complex. There is, however, a particular scenario presented by Bruce et al. [4] that challenges developers the most. Let us assume two classes: P oint and ColorP oint (Figure 1). P oint has two attributes – x and y – that holds its coordinates within a cartesian plane. It also defines the == method, which checks if both operands’ coordinates match. ColorP oint includes a new attribute – color – that stores the color of a point. It also extends == to check colors (as such, two instances of ColorP oint are only equal if their coordinates and colors match). Point -x : double -y : double ColorPoint -color : Color Fig. 1. Point/ColorPoint hierarchy (Bruce et al. [4]) Given two objects – a : P oint and b : ColorP oint –, that shares the same coordinates. Invoking a == b triggers the P oint0 s implementation of == (yield- ing true, since it only checks the coordinates of each operand). Invoking b == a, 3 Matter of fact, this models the Leibniz’s Identity of Indiscernibles Principle. Powered ByVisual Paradigm Community Edition 4 Sofiato et al. however, triggers the ColorP oint0 s implementation, which requires the coordi- nates and color of both operands to match. Since a does not have a color, b == a yields f alse, which is a violation of symmetry. Therein lies the issue: languages based on single-dispatch are unable to main- tain symmetry when inheritance is present [4]. There are techniques to simulate multiple dispatch on simple-dispatch languages [8, 13]. They, however, have neg- ative impacts4 on both encapsulation and modularity. At first glance, this problem seems trivial: we could assume that equal objects must be instances of the same concrete class: Rupakheti and Hou [22] calls this type-incompatible equality. However, this is not the only kind of equality. Rupakheti and Hou describes another strategy – type-compatible – that does not require the objects to have the same concrete class. To better explain the results of supporting only type-incompatible equality, let us imagine a drawing software that allows the user to draw, among other shapes, square and rectangles. Each of those shapes has its own class. A viable criterion to decide if two shapes are equal is the congruency: equal shapes must have the same form and size. By that rule, a square and a rectangle whose sides are the same are equal. Such modeling is only possible if we use type-comparison equality. 4 Implementing comparison in OO Languages Eventually, the programmer must implement comparison semantics, since it varies according to the domain at hand. How to do it, however, depends on the language we are using, with each one having its quirks. Thus, to build an agnostic solution, we must first find common ground between them. To do that, we analyzed two languages: Java [9] and Ruby [6]. Many popular languages (e.g., Python and Smalltalk) follow the same ap- proach of Java and Ruby. Two notable exceptions are JavaScript and C++. JavaScript does not have a standard way to compare objects. C++ does, but it misses a general-purpose hash function. Java is a statically-typed OO language whose execution model allows it to run on a variety of devices without modifications. On Java, comparison spans three distinct methods: equals, hashCode, and compareTo. The equals method checks if two operands are equal (by default, that is only the case if they share the same identity). The hashCode method is a general-propose hash function that some kinds of collection use internally. According to Bloch [3], an informal contract requires both methods to be consistent with each other (Section 5). In Java, objects are not comparable by default. They must realize the Com- parable interface to be comparable. Comparable declares a single method – com- pareTo – that models the Trichotomy Law. The results of compareTo must be consistent with the results of equals (i.e., if x.equals(y) returns true, then 4 According to Muschevici et al. [17], those impacts occur even in languages that support multiple-dispatch natively. Towards a Declarative Approach to Object Comparison 5 x.compareTo(y) yields zero). As with hashCode, the programmer must enforce this consistency themselves. Ruby is an OO language that is heavily influenced by Smalltalk. Ruby is used as both a scripting language and in web development (alongside with the Rails framework). Like Java, Ruby defines5 both an equality and a hash operation (== and hash). Their semantics are also identity-based by default. Ordering is also optional on Ruby. In such a case, it is required that the object includes the Comparable mixin, which requires the developer to implement the spaceship operator (⇔). The ⇔ operator carries the same semantics as compareTo in Java. Ruby’s approach differs from Java’s in the sense that the Comparable mixin incorporates implementations for every other comparison operation, minus the hash operation. They rely on the ⇔ operation and are guaranteed to be consis- tent with each other. That frees the developer from the burden of ensuring their consistency (sadly, they still have to ensure that hash is consistent). 5 Hashing Despite not being strictly necessary to model comparison, Java defines a general- purpose hash function. An important type of collection – the equality collections [18] – relies heavily on it. These collections usually provide better performance in selected operations, such as element lookup. Hash tables and sets are examples of them. Hash functions are directly linked to equality. According to Bloch [3], a hash function h is correct if ∀x∀y[(x = y) =⇒ (h(x) = h(y))]6 (as a corollary, two different objects may have (or may have not) the same hash). An erroneous hash function may lead to subtle bugs that are hard to pin-point. Tantamount to their correctness is their efficiency. A badly written function severely hinders performance while still correct7 . The efficiency of a hash function is based on how it yields distinct hashes to distinct objects. Achieving an optimal hash function can be a daunting task (and is often not needed). 6 Evaluating comparison-related operations with comparison records We informally describe our solution before dwelling into its formal system. This informal description is useful to understand the formalization’s intricacies. 5 Ruby defines two more operations – === and eql?. The former is used within switch/case statements, whereas the latter is used inside hash tables to compare keys. For the sake of argument, let us consider their semantics the same as ==. 6 At first, it looks like a tautology. Yet, this is true only when following a stricter sense of equality (i.e., identity-based equality), which is not the case. It is trivial to implement Point in such a way that only x is checked by equals while every attribute composes its hash. In that case, a : P oint and b : P oint having the same value of x and different values of y will be considered equal despite having distinct hashes. 7 A constant hash function (e.g., h(x) = 42) is especially bad. It degrades the lookup of elements in a hash table from O(1) to O(n) (akin to a regular, linked list). 6 Sofiato et al. The process of evaluating comparison operation is divided into two phases. In the first phase, a comparison record is built. The comparison record encapsulates the context on which a comparison operation is evaluated. In the second phase, the newly-created comparison record is evaluated. 6.1 Comparison records Comparison records are the backbone of our proposal. They are built upon the Cardelli and Mitchell [5] record calculus and provide context to compute com- parison operations. There is a key difference between them and regular records. On regular records, fields are not traversed in any particular order. Fields on comparison records are ordered based on their order of evaluation (which is domain-specific). Informally, a comparison record is a collection of fields. Each field has two components – x and y – that store the value associated with the first and second operand. Values are extracted via the extraction operation (e.g. r.x extracts the value x from record r ). Extraction operations can be chained. 6.2 Denoting comparison semantics We made a conscientious choice of not committing to any particular form of denoting comparison semantics. Doing so would bias our proposal towards a language and hurt its agnosticism. Different languages have different ways of embedding metadata. Java, for instance, uses annotations (both Rayside et al. [21] and Grech et al. [10]. use them). A Ruby programmer, on the other hand, may employ metaprogramming techniques instead. A new language (such as the Java extension proposed by Vaziri et al. [26]) could use keywords to convey the same information. Regardless of ”how” this information is embedded, we can define ”what” we must include. To be able to denote semantics, we must be able to specify: a) if a field belongs to the equality state of its underlying object; b) what is its evaluation order. 6.3 Record building phase At first glance, it might appear that comparison records do not bring further development to the equality state concept. Their differences, however, become more clear as we describe how they are built. Firstly, we collect the fields that compose the equality state of each operand. Then, we fetch the values of each one of those fields within each operand, and include them in the newly-built comparison record. Some scenarios pose an extra challenge. When there is a sub-type relation between the operands, a particular field may appear in just one of them. For instance, if we compare a : P oint and b : ColorP oint, looking up for field color on a would fail, as P oint does not declare the color field. In that case, we expect Towards a Declarative Approach to Object Comparison 7 the field lookup operation to yield a special element – the unknown element (henceforth denoted by t) – instead. For example, a comparison record built to compare a : P oint and b : P oint has the fields x and y. Yet, a record built to compare b : P oint and c : ColorP oint has three fields – x, y, and color (in that case, the component bound to b within color stores t). Notably, b yields different values, depending on the object against it is compared: a striking difference from the equality state concept. At first glance, t and null seems to share the same semantics. Yet, there is a fundamental difference: null usually represents the absence of value, whereas t denotes that even such an assertion is not feasible at the time (if the field does not exist, we cannot determine whether it holds a value or not). In a sense, t shares the same semantics as unknown on Kleene’s K3S Logic [15] and is different to any other value (including itself). Concerning its order, t should always be considered the lesser of two values. A corner-case worth mentioning deals with how to use comparison records to compute hashes. Hashing is a unary operation, yet, two operands are required to create a comparison record. In that case, we use the hash operation’s single operand as both the first and second operand of the record building procedure. 6.4 Record evaluation phase After we build the comparison record, we can proceed to evaluate it. The eval- uation is carried out by one of three specialized functions. These functions map directly to the three aspects of equality outlined in Section 4 – equality, order, and hashing. Checking equality is straightforward. We iterate through every field within a comparison record, checking if their values are different. In that case, we deem the operands as different and stop the process. If no discrepancies were found after all fields have been checked, then we consider the operands equal. Determining the order between two operands is similar. We iterate through every field, checking if one of the values within the current field is greater than the other. In that case, the operand bound to the greater value is deemed greater and the process stops. If the iteration ends without finding any discrepancies, the operands are equal. The algorithm to compute hashes also iterates over every field of a comparison record. For each field, it calculates the field’s hash and aggregates it on a single value, which will be the hash of the whole comparison record by the end of the process. Unlike the other operations, hashing does not allow short-circuiting. Dwelling into the optimal form to aggregate hashes is out of the scope of this paper. Still, the rewriting rules described in Section 7 (and their proofs in Section 8) depend on the aggregation function ⊕ to follow a single well-formedness rule: (Z, ⊕) must be a semi-group8 . 8 A semi-group is an algebraic structure composed of a set S and an associative binary operation ⊕. 8 Sofiato et al. Table 1. Notation summary Construct Description hi Denotes an empty record r = hx = 0i Creates a record r with field x, whose value is 0 hr|x = 0i Adds field x = 0 to the record r, returning a new record r.x Fetches the value associated to field x on r r\x Removes field x from record r, returning a new record A\B Removes every element within set B from set A (e.g., {1, 2, 3}\{1, 4} = {2, 3}) x Shorthand to sequences such as x0 . . . xn (x may be re- placed by any other letter or symbol) class C C D { C f ; K M } Defines class C extending from class D. C is composed of fields f , the constructor K and methods M . o:C Denotes an object o which is an instance of class C f ields(C) Obtains all fields defined on C and its super-classes es f ields(C) Obtains all fields on C’s (and its super-classes) equality state > Represents true ⊥ Represents false t Represents unknown value (as described in Section 6.3) 7 Formal Semantics The given formalization uses both Cardelli and Mitchell [5] record calculus (Table 1) and Featherweight Java [12] (FJ). One may argue why we used FJ, as it focuses on type safety (which is not a concern of our proposal). Shortly, we use FJ’s definition of classes and its field lookup semantics. These constructs are similar to those found on the majority of OO languages. As such, we feel that they do not make our proposal less agnostic. 7.1 metadata access Neither Cardelli‘s record calculus nor FJ defines the es fields function. Its role is to obtain which fields compose the equality state of an object based on supplied metadata (Section 6.2). To prevent bias, we choose not to commit to any form of metadata. Instead, we treat es fields as a black box. Given a class C, it returns the fields f that compose the equality state of C, ordered by their evaluation order (Definition 3). 7.2 Comparison record Definition 3 formalizes comparison records, as described in Section 6.1. Definition 3 (Comparison record). A comparison record r is a record hf0 = hx = x0 , y = y0 i, f1 = hx = x1 , y = y1 i . . . fn = hx = xn , y = yn ii. For each field fi = hx = xi , y = yi i, x stores the value bound to the first operand while y Towards a Declarative Approach to Object Comparison 9 0 0 0 0 0 0 0 o1 : C class C C D {C f ; K M } o2 : C class C C D {C f ; K M } CR-BUILD 0 build(o1 , o2 ) → cpBuild(o1 , o2 , es f ields(C) ∪ es f ields(C ))) f →∅ CR-EMPTY cpBuild(o1 , o2 , f ) → hi f → {f0 } CR-SINGLE cpBuild(o1 , o2 , f ) → hx = f etch(f0 , o1 ), y = f etch(f0 , o2 )i f → {fo . . . fn } CR-MUTIPLE cpBuild(o1 , o2 , f ) → hcpBuild(o1 , o2 , f \{fn }) | cpBuild(o1 , o2 , {fn })i o:C class C C D {C f ; K M } f ∈f FETCH-EXISTS f etch(f, o) → o.f o:C class C C D {C f ; K M } f ∈ /f FETCH-DOES-NOT-EXIST f etch(f, o) → t Fig. 2. Record building semantics. stores the value bound to the second operand. The iteration of the fields f within r follows the LIFO9 style (e.g. fn , fn−1 . . . f0 ). Fields of comparison records may store the t value. As stated in Section 6.1, any comparison having t as one of its operands states that the operands are different (t being the lesser one). Definition 4 formalizes its semantics. Definition 4 (t comparison semantics). The value t denotes an unknown value and is considered to be different from any other value (including itself ). As such, (∀x)(x 6= t) holds. When deciding the order of two values, t will always be the lesser10 one. Hence, (∀x)(x > t) and (∀x)(t < x) hold. 7.3 The Record Building Semantics The first phase of computing comparison operations is the record building stage. The build function is the entry point to the record build process. It receives two objects as operands and returns a corresponding comparison record. Rule CR-BUILD formalizes the build function. As shown in Figure 2, CR-BUILD uses es f ields to fetch the fields that are part of the equality state of each operand. It calls then cpBuild, which recursively builds comparison records. Rules CR-EMPTY, CR-SINGLE, and CR-MULTIPLE formalize cpBuild. CR-SINGLE calls f etch to obtain the value associated with a given field f in the context of an object o. FETCH-EXISTS and FETCH-DOES-NOT-EXIST formalize f etch. The former fires when the class hierarchy of o defines the field f . In that case, it returns o.f . Otherwise, the latter fires, and it returns t instead. 9 Acronym for last in, first out. LIFO is the form of iteration of stacks. 10 At first, this may seem like an oversight, as it blatantly violates asymmetry. However, having a field f storing t on each component is not possible. It implies that f does not exists on both operands. In that case, f would not be part of the comparison record in the first place. 10 Sofiato et al. r → hi r → hf0 = hx = x0 , y = yo ii EQ-EMPTY EQ-SINGLE cr eq(r) → > cr eq(r) → eq(x0 , y0 ) r → hf0 = hx = x0 , y = y0 i . . . fn = hx = xn , y = yn ii cr eq(r.fn ) → > EQ-MULTIPLE-EQ cr eq(r) → cr eq(r\fn ) r → hf0 = hx = x0 , y = y0 i . . . fn = hx = xn , y = yn ii cr eq(r.fn ) → ⊥ EQ-MULTIPLE-NEQ cr eq(r) → ⊥ x=t∨y =t x 6= t ∧ y 6= t EQ-UNKNOWN EQ-KNOWN eq(x, y) → ⊥ eq(x, y) → x = y r → hf0 = hx = x0 , y = y0 i . . . fn = hx = xn , y = yn ii cr cp(r.fn ) → 0 CP-MULTIPLE-EQ cr cp(r) → cr cp(r\fn ) r → hf0 = hx = x0 , y = y0 i . . . fn = hx = xn , y = yn ii cr cp(r.fn ) 6= 0 CP-MULTIPLE-NEQ cr cp(r) → cp(r.fn ) r → hf0 = hx = x, y = yii r → hf0 = hx = x0 , y = y0 ii CP-SINGLE HC-SINGLE cr cp(r) → cp(x, y) cr hc(r) → hc(x0 ) x=t y=t CP-UNKNOWN-LT CP-UNKNOWN-GT cp(x, y) → −1 cp(x, y) → 1 r → hi x 6= t y 6= t r → hi CP-EMPTY CP-KNOWN HC-EMPTY cp(r) → 0 cp(x, y) → x ⇔ y cr hc(r) → 0 r → hf0 = hx = x0 , y = y0 i . . . fn = hx = xn , y = yn ii HC-MULTIPLE cr hc(r) → cr hc(r\fn ) ⊕ hc(r.fn ) Fig. 3. Record evaluation semantics. 7.4 The Record Evaluation Semantics The evaluation stage is the second phase of computing comparison operations. A distinct set of rules evaluates each comparison operation. Figure 3 describes them. The function cr eq evaluates the equality of the comparison record r, us- ing EQ-MULTIPLE-EQ and EQ-MULTIPLE-NEQ to iterate recursively through the fields of r. Having different values fn .x and fn .y on the last field fn trig- gers EQ-MULTIPLE-NEQ, ending the process and returning ⊥. Otherwise, rule EQ-MULTIPLE-EQ is triggered. It acts as the recursive step of the equality check- ing procedure. Eventually, EQ-SINGLE is triggered when only a single field f0 remains on r. Its result depends on whether f0 .x and f0 .y are equal. The func- tion eq determines if f0 .x and f0 .y are equivalent. EQ-UNKNOWN and EQ-KNOWN model the semantics of eq. If one of the given operands is t, EQ-UNKNOWN is triggered and ⊥ is obtained (Definition 4). Conversely, EQ-KNOWN is triggered if neither is t. If the given operands are primitives11 , their equivalence is checked by their native implementation (otherwise, a comparison record is built from f0 .x and f0 .y and is subsequently evaluated). Regardless of the approach, either > or ⊥ is returned, depending on whether f0 .x and f0 .y are equal. The function cr cp evaluates the order of a comparison record r. Its main rules are CP-MULTIPLE-EQ and CP-MULTIPLE-NEQ, which operate under the same 11 We use the term primitive in a broader sense. Here, any type that provides compar- ison operations is primitive. According to that, Java’s strings are primitive. Towards a Declarative Approach to Object Comparison 11 rationale as EQ-MULTIPLE-EQ and EQ-MULTIPLE-NEQ. CP-EMPTY handle empty comparison records while EQ-SINGLE uses the cp function to determine the order of two operands (in this case, f0 .x and f0 .y). Rules CP-UNKNOWN-LT, CP-UNKNOWN-GT, and CP-KNOWN model the semantics of cp. CP-UNKNOWN-LT and CP-UNKNOWN-GT deal with the scenarios described by Definition 4 and fire, re- spectively, when t is the first or the second operand. In the former case, the first operand is the lesser one whereas, in the latter, it is the greater one. CP-KNOWN fires when neither operand is t. To obtain the order of the given operands, it uses the ⇔ operator. Its implementation can be either native (if the operands are primitive) or based on comparison records. The function cr hc evaluates the hash of a comparison record r. If r has more than one field, HC-MULTIPLE triggers. HC-MULTIPLE divides r into two sub- records – r1 and r2 – containing the last field and the remainder fields. The hash of both sub-records are obtained and then combined by the ⊕ operator (as stated in Section 6.4, (Z, ⊕) must be a semi-group). Eventually, only a single field f0 remains on r. That triggers HC-SINGLE, which entails that r’s hash is the same as of f0 .x. Function hc computes the hash of f0 .x. 8 Soundness of comparison-related properties After we formalize our proposal, we can proceed to prove its correctness based on the rules outlined by Definitions 1 and 2 and the rules defined in Section 5, regarding hash functions. The proofs have two premises: the operands do not change while the operation executes; and, the values on their fields have themselves correct comparison procedures. 0 Lemma 1 (Equality symmetry). Given two objects – x : C and y : C –, then x = y ⇔ y = x. Proof. By applying CR-BUILD, EQ-MULTIPLE-EQ, EQ-SINGLE, and EQ-EMPTY, we infer that (x = y) =⇒ [(∀f ∈ f )(x.f = y.f )], where f = (es f ields(C) ∪ 0 es f ields(C )) . Assuming y 6= x implies that either the set union is not com- mutative or [(∃f ∈ f )((x.f = y.f ) ∧ (y.f 6= x.f )). The latter case implies that the values on each field (primitives or objects) are not symmetric themselves. Corollary 1 (Equality reflexivity). Given one object o : C, then o = o. Proof. CR-BUILD yields r = hf0 = hx = f etch(f0 , o), y = f etch(f0 , 0)i . . . fn = hx = f etch(fn , o), y = f etch(fn , o)ii. Since eq is reflexive and assuming that f etch returns the same result when given the same parameters. We conclude that eq(f etch(f, o), f etch(f, o)) will always return > for every field f . By applying EQ-MULTIPLE-EQ repeatedly, we eventually trigger EQ-SINGLE, returning >. 0 Corollary 2 (Equality transitiveness). Given three objects – x : C, y : C 00 and z : C –, then (x = y ∧ y = z) ⇔ x = z. Proof. Similar to the proof of Lemma 1. 12 Sofiato et al. 0 Lemma 2 (Order transitiveness). Given three objects – x : C, y : C and 00 z : C –, then (x > y ∧ y > z) ⇔ x > z. Proof. Assuming that x > y and y > z and analyzing CP-MULTIPLE-NEQ, we 0 00 infer that two fields – f and f – are determinant to conclude that indeed x > y and y > z. Based on that, we divide this proof into three scenarios. 0 00 When f = f , transitiveness is trivially proven as ⇔ is transitive itself. 0 00 0 0 When f is evaluated before f , we conclude that y.f = z.f (since otherwise 0 0 0 0 f would have greater evaluation order). By replacing y.f on x.f > y.f we have 0 0 x.f > z.f , thus, x > z. 0 00 00 00 Finally, when f is evaluated after f , we conclude that x.f = y.f (other- 0 00 00 00 wise, f would have a lesser evaluation other). Replacing y.f on y.f > z.f 00 00 yields x.f > z.f , thus, x > z. 0 Corollary 3 (Order asymmetry). Given two objects – x : C and y : C –, then (x < y) ⇔ (y > x). Proof. Having x < y implies that a field f where x.f < y.f exists. Having y > x 0 implies either: cp is not asymmetric; the traversal order of f = (es f ields(C) ∪ 0 00 0 es f ields(C )) or f = (es f ields(C ) ∪ es f ields(C)) is not based on the 0 00 evaluation order of their fields; or, f and f do not have the same elements. Any of which is contradictory. 0 Corollary 4 (Order trichotomy). Given two objects – x : C and y : C –, they must be either x = y, x < y or x > y. Proof. By analyzing CP-EMPTY, CP-UNKNOWN-LT, CP-UNKNOWN-GT, and CP-KNOWN, alongside with the premise that ⇔ is trichotomous. We conclude that the com- putation ends and yields either 0, -1, or 1, and it is thus trichotomous. 0 Lemma 3. Given two objects – x : C and y : C –, (x = y) =⇒ hc(x) = hc(y). Proof. By analyzing CR-BUILD, EQ-MULTIPLE-EQ, EQ-SINGLE and EQ-EMPTY, we deduce that (x = y) =⇒ [(∀f ∈ f )(eq(x.f, y.f ))], being f = (es f ields(C) ∪ 0 es f ields(C )). Inspecting HC-MULTIPLE and HC-SINGLE, we conclude that hc(x) 6= hc(y) entails that either (∃f ∈ f )((x.f = y.f ) ∧ (hc(x.f ) 6= hc(y.f )) or the ⊕ is not associative, and thus (Z, ⊕) is not a semi-group, which is a contraction. 9 Related Work Common LISP (CLOS) [13] follows the functional paradigm. However, some of its issues also plague OO languages. Among them is the impact of the lack of orthogonality on the programmers. Baker highlights its importance and proposes the EGAL predicate: a uniform way of comparing values in CLOS. Vaziri et al. [26] extend FJ to include a new construct called Relation Type (RT). An RT may declare a special kind of field – key field – that tags which information forms its equality state. A key field must not change after the ini- tialization of an RT. Two instances of RT with the same equality state always Towards a Declarative Approach to Object Comparison 13 have the same identity. That makes it possible to compare them by their memory addresses. Vaziri et al. argue that RTs could replace about 70% of comparable objects. Contrary to Vaziri et al., our proposal does not require language changes and allows objects to be mutable. Like our proposal, Rayside et al. [21] fully embrace mutability. Unlike ours, it is not fully declarative and requires imperative code when inheritance is present. That makes it prone to the same issues of coding comparison by hand. After testing, correctness and easiness of use improved, whereas performance degraded by about 21%. Grech et al. [10] describe a fully declarative approach to object comparison. It presents a novel approach that sensibly improves performance: statements are reordered based on how well they detect distinct objects. Sadly, we could not employ such an approach, as order evaluation is non-commutative. Grech et al. also requires objects to remain immutable after their initialization. Modern IDEs provide wizards to create the source-code of comparison opera- tions. Project Lombok [24] uses metadata to generate such code in compile-time. Due to their static nature, it is not possible to model type-compatible compar- isons and maintain symmetry at the same time using them. The dynamic nature of our proposal mitigates that issue. It is worth noting that, unlike ours, none of the proposals we analyzed deals with the ordering aspect of comparison. 10 Conclusion Implementing comparison semantics on an OO language is tricky. It usually spans multiple methods that depend on each other. Failure to observe such de- pendencies leads to bugs that are hard to track. Since program equivalence is uncomputable, it is impossible to create a procedure that checks if these depen- dencies hold. Another issue is that most OO languages rely on simple message dispatch, on which symmetry (a mandatory property of equivalence relations) is not attainable. Our proposal restores orthogonality by embedding the com- parison semantics on metadata instead of writing it by hand. We formalized the comparison procedure and demonstrated that it is correct. We showed that it is also possible to compute order using the same metadata. The next step of this research is to build a prototype to check the impacts of our proposal on readily-available software. We plan to use the corpus defined by the DaCapo Benchmark Suite [2] and implement it in Java. Cycle handling is a shortcoming of our proposal. Parnas [19] argues that cycles are a bad practice. Still, they occur on production code [16]. Preliminary analysis shows it is possible to extend our proposal to handle cycles transparently. More work is, however, necessary in its formal aspects. Another research opportunity is to check if a solution that prevents the in- clusion of mutable elements into equality collection at compile-time is viable. At first glance, it seems that an alternative collection framework combined with a type capable of identifying mutable objects, would suffice. Bibliography [1] Baker, H.G.: Equal rights for functional objects or, the more things change, the more they are the same. ACM SIGPLAN OOPS Messenger 4(4), 2–27 (1993) [2] Blackburn, S.M., Garner, R., Hoffman, C., Khan, A.M., McKinley, K.S., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Ste- fanović, D., VanDrunen, T., von Dincklage, D., Wiedermann, B.: The Da- Capo benchmarks: Java benchmarking development and analysis. In: OOP- SLA ’06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications. pp. 169–190. ACM Press, New York, NY, USA (Oct 2006) [3] Bloch, J.: Effective Java. Addison-Wesley, Boston, MA, 3 edn. (2018), https://www.safaribooksonline.com/library/view/effective-java- third/9780134686097/ [4] Bruce, K., Cardelli, L., Castagna, G., Group, H.O., Leavens, G.T., Pierce, B.: On binary methods. Theory and Practice of Object Systems 1(3), 221– 242 (1995) [5] Cardelli, L., Mitchell, J.C.: Operations on records. Mathematical structures in computer science 1(1), 3–48 (1991) [6] Flanagan, D., Matsumoto, Y.: The Ruby Programming Language: Every- thing You Need to Know. ” O’Reilly Media, Inc.” (2008) [7] Fraser, G., Arcuri, A.: 1600 faults in 100 projects: Automatically finding faults while achieving high coverage with evosuite. Empirical Software En- gineering 20(3), 611–639 (2013) [8] Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional (1994) [9] Gosling, J., Joy, B., Steele, G.L., Bracha, G., Buckley, A.: The Java language specification. Pearson Education (2014) [10] Grech, N., Rathke, J., Fischer, B.: Generating correct and efficient equality and hashing methods using jequalitygen (2010) [11] Hovemeyer, D., Pugh, W.: Finding bugs is easy. SIGPLAN Not. 39(12), 92–106 (Dec 2004), https://doi.org/10.1145/1052883.1052895 [12] Igarashi, A., Pierce, B.C., Wadler, P.: Featherweight java: a minimal core calculus for java and gj. ACM Transactions on Programming Languages and Systems (TOPLAS) 23(3), 396–450 (2001) [13] INCITS, A.: Incits 226-1994 (r2004). Information Technology: Program- ming Language: Common Lisp (2004) [14] ISO: ISO/IEC 14882:N4778 Information technology – Programming lan- guages – C++. International Organization for Standardization, Geneva, Switzerland (2018) [15] Kleene, S.C., de Bruijn, N., de Groot, J., Zaanen, A.C.: Introduction to metamathematics, vol. 483. van Nostrand New York (1952) Towards a Declarative Approach to Object Comparison 15 [16] Melton, H., Tempero, E.: An empirical study of cycles among classes in java. Empirical Software Engineering 12(4), 389–415 (2007) [17] Muschevici, R., Potanin, A., Tempero, E., Noble, J.: Multiple dispatch in practice. Acm sigplan notices 43(10), 563–582 (2008) [18] Nelson, S., Pearce, D., Noble, J.: Understanding the impact of collection contracts on design. Objects, Models, Components, Patterns pp. 61–78 (2010) [19] Parnas, D.L.: Designing software for ease of extension and contraction. IEEE transactions on software engineering pp. 128–138 (1979) [20] Plotkin, G.D.: Structural operational semantics. Aarhus University, Den- mark (1981) [21] Rayside, D., Benjamin, Z., Singh, R., Near, J.P., Milicevic, A., Jackson, D.: Equality and hashing for (almost) free: Generating implementations from abstraction functions. In: Proceedings of the 31st International Conference on Software Engineering. pp. 342–352. IEEE Computer Society (2009) [22] Rupakheti, C.R., Hou, D.: An empirical study of the design and implemen- tation of object equality in java. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds. p. 9. ACM (2008) [23] Silva, T.M., Serey, D., Figueiredo, J., Brunet, J.: Automated design tests to check hibernate design recommendations. In: Proceedings of the XXXIII Brazilian Symposium on Software Engineering. pp. 94–103 (2019) [24] Spilker, R., Zwitserloot, R.: Project Lombok. https://projectlombok.org/ (2019), [Online; accessed 24-April-2019] [25] Strichman, O.: Special issue: program equivalence. Formal Methods in System Design 52(3), 227–228 (Mar 2018), https://doi.org/10.1007/s10703- 018-0318-y [26] Vaziri, M., Tip, F., Fink, S., Dolby, J.: Declarative object identity using relation types. In: ECOOP. vol. 7, pp. 54–78. Springer (2007) [27] Zermelo, E.: Beweis, daß jede menge wohlgeordnet werden kann. Mathe- matische Annalen 59(4), 514–516 (1904)