Introduction

Structural type inference in Java-like languages - Extended abstract -

Martin Plümicke

pl@dhbw.de 0 0 Baden-Wuerttemberg Cooperative State University Stuttgart/Horb Florianstraße 15 , D-72160 Horb

2007

5437 4 6

In the past we considered type inference for Java with generics and lambdas. Our type inference algorithm determines nominal types in subjection to a given environment. This is a hard restriction as separate compilation of Java classes without relying on type informations of other classes is impossible. In this paper we present a type inference algorithm for a Java-like language, that infers structural types without a given environment. This allows separate compilation of Java classes without relying on type informations of other classes.

Introduction

Let us consider an example that shows the idea. In [Plü15] for the following program no type can be inferred, as there is no type assumption for elementAt.

class A { m (v) { return v.elementAt(0); } } Only with an import declaration import java.util.Vector; a type can be inferred.

In this paper we give an algorithm that infers for v a structural type , which has a method elementAt. Our algorithm is a generalization of an idea, that is given in [ADDZ05]. In the introducing example from [ADDZ05] the method E m(B x){ return x.f1.f2; } is given. The compilation algorithm generates the polymorphic typed Java expressions E m(B x){ return [[x:B].f1: ].f2: ; }, where and are type variables. In this system m is applicable to instances of the class B with the field f1 with the type , where must have a field f2 with the type and the constraint E. In this approach B and E are still nominal types. We generalize this approach, such that also untyped methods like m(x){ return x.f1.f2; } can be compiled, that means the type of x and the return type are type variables, too.

The idea Input syntax :

L ::= class C extends (CT) ff; Mg M ::= m(x)f return e; g NCT ::= CT j C<TVar = CT> CT ::= C<CT> e ::= x j e:f j e:m(e) j new NCT(e) j (CT)e CONS ::= T extends T

Output syntax :

::= I CLt T Mt et I

::= CT j TVar

MH ::= T m(T x);

CLt ::= class C<TVar>[CONS] extends (CT) fT f; Mtg ::= MH f return et; g ::= x : T j e:f : T j e:m(e) : T j new NCT(e) : CT j (CT)e : CT ::= interface I<TVar>fT f; MHg There are some extensions in comparison to usual Java. The class declarations in the output syntax have the form class C<TVar> [CONS]. TVar are the generics and [CONS] is a set of subtype constraints T extends T0, that must fulfill all instances of the class. In any class there is an implicit constructor with all fields (including them from the superclasses) as arguments. There is no differentiation between extends and implements declarations. Both are declared by extends. Interfaces can have fields. Furthermore, the use of the new-statement is allowed without assigning all generics. This is done by the syntax C<TVar = CT>. The not assigned generics are derived by the type inference algorithm. 3

The algorithm

The algorithm TI consists of three parts. First the function TYPE inserts types (usually type variables) to all sub-terms and collects constraints. Second, the function construct generates the interfaces and completes the constraints. Finally, the function solve unifies the type constraints and applies the unifier to the class. In the following definition we give the different forms of constraints, that are collected in TYPE. This definition is oriented at [ADDZ05]:

Definition 1 (Type constraints)

c l c0 means c is a subtype of c0.

( c; f; c0 ) means c provides a field f with type c0.

( c; m; c; (c0; c0) ) means c provides a method m applicable to arguments of type c, with return type c0 and parameters of type c0.

Note that ( c; m; c; (c0; c0) ) implicitly includes the constraints c c0.

Let < be the extends relation defined by the Java declarations und the corresponding subtyping relation.

The type–inference algorithm

Let TypeAssumptions be a set of assumptions, that can consists of assumptions for fields, methods and whole classes with fields and methods. The functions fields and mtype extracts the typed fields respectively the typed methods from a given class, as in [IPW01].

In the type inference algorithm we use the following name conventions for type variables: fA: Type variable for the field f in the class A. mA;i; Am;i: Type variable for the i-th argument of the method m in the class A.

Am; Am: is an abbreviation for the tuple Am;1; : : : ; Am;n respectively Am;1; : : : ; Am;n. mA: Type variable for the return type of the method m in the class A.

The main function TI

The main function TI calls the three functions TYPE, construct, and solve. The input is a set of type assumptions TypeAssumptions and an untyped Java class L. The result Lt is the typed Java class extended by a set of interfaces.

TI: TypeAssumptions

L ! Lt TI ( Ass; class A extends B f f; M g ) = let (clt; C) = Type( Ass; cl ) (I1 : : : Im clt) = construct( clt; C ) TYPE( Ass; class A extends B f f; M g ) = let fass := f this:f : fA j f 2 f g [ f this:f : T j T f 2 f ields( B ) g mass := f this:m : Am ! Am j m(x)f return e; g 2 M g [ f this:m : aty ! rty j mtype( m; B ) = aty ! rty g AssAll = Ass [ fass [ mass [ f this : A g For m(x)f return e; g 2 mM;jf Ass = AssAll [ f xj : A j x = x1 : : : xni g (et : rty; C0) = TYPEExpr( Ass; e ) C = C [ C0[ mA 7! rty]g

Mt = f rty m( Am x)f return et; g j m(x)f return e; g 2 M g in(class A extends B f A f; Mt g; C) The function TYPEExpr inserts types into the expressions and collects the corresponding constraints. The function TYPEExpr is given for all cases of expressions e. In the following we present TYPEExpr for the both most important cases, the method-call and the new-statement.

TYPEExpr for Method-call: First, the types of the receiver and the arguments are determined, recursively. Then it is differed between methods with and without known receiver types. In the known case a subtype relation is introduced. Otherwise a constraint is generated that demands a corresponding method in the type.

in TYPEExpr( Ass; e0:m(e) ) = let (e0t : ty0; C0) = TYPEExpr( Ass; e0 ) (eit : tyi; Ci) = TYPEExpr( Ass; ei ); 81 6 i 6 n if (ty0 is no type variable) && (ty0 2 Ass) && (mtype( m; ty0 ) = aty ! rty) then

((e0t : ty0):m(e1t : ty1; : : : ; ent : tyn) : rty; (C0 [ Si Ci) [ f ty l aty g) else

((e0t : ty0):m(e1t : ty1; : : : ; ent : tyn) : mty0 ; (C0 [ Si Ci) [ f ( ty0; m; ty; ( mty0 ; tmy0 ) ) g) TYPEExpr for the new-statement: The use of the new-statement is allowed without assigning all generics. This is done by the syntax C<TVar = CT>. First, fresh type variables are introduced in the assumptions of the corresponding class. Then the types of the arguments are determined. Finally, the assigned generics are introduced and the subtype relations between the argument types and the fields of the class and its super classes are added.

TYPEExpr( Ass [ f class A<T>[CA] extends B fTA f; Mtg g;new A<S>( e ) ) = where S = [T (1) = 1; : : : ; T (k) = k] with k n for jTj = n let

fresh type variables, that substitute T in class A S0 = S[T 7! ] in (eit : tyi; Ci) = TYPEExpr( Ass; ei ); 81 6 i 6 m (new A<S0>( e1t : ty1; : : : ; emt : tym ) : A< [ 7! where elds( B ) = TB g j = 2 S0]>; (Si Ci) [ CA[ 7! ] [ f ty l TB TA[ 7! ] g

The function construct

The function construct takes the result from TYPE, a typed class and a set of constraints. It generates for any type ty1, ty2 occuring in constraints ( ty1; f; ) or ( ty2; m; ; ( ; ) ) corresponding interfaces with the demanded fields respectively methods: interface ty1< ..., , ...> {

f; }, interface ty2 <..., ; , ...> {

m( x1); introduces fresh type variables X1, X2 and constraints, that have to implement these interfaces: X1 l ty1<...>; X2 l ty2<...>. Finally, the occuring type variables are introduced as generics of the class.

The function solve

The function solve takes the result of construct and solves the constraints of the class by the type unification algorithm from [Plü09], such that the constraints contains only pairs with at least one type variable. Now we give an example, that shows first a structural typing of a class independent from any environment. Then a concrete implementation of this class is given.

Example 2 In this example we print the input syntax (user written) in black and the output syntax (automatically generated) in gray. Let the following class be given class A {

mt(x, y, z) { return x.sub(y).add(z); } } The result of construct( clt; C ) is (with fresh type variables): The result of TYPE is: class A { adsdumbt;1 mt( Amt;1 x, Amt;2 y,

A et = [[[x : Amt;1]:sub([y : Amt;2]) : sumbt;1 ]:add(z : Amt;3) : add sub ] and

A mAt;1 C = f ( Amt;1; sub; Amt;2; ( sumAbt;1 ; sumAbt;1;1 ) ); ( sumAbt;1 ; add; Amt;3; ( adsdumAbt;1 ; adsdu;mAb1t;1 ) ) g

mt;1 <Gamma_m, Beta_m> { Gamma_m sub(Beta_m x); } interface A interface sumbt;1 <Gamma_n, Beta_n> { Gamma_n add(Beta_n x); }

A 6 mt( 1 x, 3 y, 4 z) { return x.sub(y).add(z); } class A < 1, 2, 3, 4, 5, 6, 7>[ 3 extends 5, 4 extends 7, 1 extends Amt;1< 2, 5>, 2 extends sumbt;1 < 6, 7>]{ A } As the application of solve changes nothing, this is the result of TI’s application.

In the following we show, as an instance of the type inferred class can be used.

For this implementations of mAt;1 and sumbt;1 must be given:

A mt;3 z) { return et; } }, with

A class myInteger extends Amt;1<myInteger, myInteger>, sumbt;1 <myInteger, myInteger> {

A Integer i; myInteger sub(myInteger x) { return new myInteger(i - x.i); } myInteger add(myInteger x) { return new myInteger(i + x.i); } } In the class Main an instance of A is used and the method mt is called. class Main { main() { return new A< 1=myInteger, 2=myInteger>()

.mt(new myInteger(2), new myInteger(1), new myInteger(3)); } } myInteger l 3; myInteger l 4; g The mappings 1=myInteger, 2=myInteger means that 1 and 2 are instantiated and all other parameters of A should be inferred by the type inference algorithm TI. We call TI for Main with the set of assumptions consisting of the class A and the class myInteger.

The constraint set of the result of TYPE is given as

mt;1<myInteger; 5>; myInteger l sumbt;1 < 6; 7>; Cmain = f 3 l 5; 4 l 7; myInteger l A A The functions construct adds no interfaces, as there is no call of abstract fields or methods. In solve Cmain is unified. The result of the unification is:

= f 5 7! myInteger; 6 7! myInteger; 7 7! myInteger; 3 7! myInteger; 4 7! myInteger g The resulting Java class is given as: class Main { myInteger main() { return new A<myInteger, myInteger, myInteger, myInteger, myInteger, myInteger, myInteger>() .mt(new myInteger(2), new myInteger(1), new myInteger(3)); } } 4

Summary References

We have presented a type inference algorithm for a Java-like language. The algorithm allows to declare type-less Java classes independently from any environment. This allows separate compilation of Java classes without relying on type informations of other classes. The algorithm infers structural types, that are given as generated interfaces. The instances have to implement these interfaces. [IPW01] [Plü15]

Atsushi Igarashi, Benjamin C Pierce, and Philip Wadler. Featherweight Java: a minimal core calculus for Java and GJ. ACM Transactions on Programming Languages and Systems (TOPLAS), 23(3):396– 450, 2001.

Martin Plümicke. More Type Inference in Java 8. In Andrei Voronkov and Irina Virbitskaite, editors, Perspectives of System Informatics - 9th International Ershov Informatics Conference, PSI 2014, St. Petersburg, Russia, June 24-27, 2014. Revised Selected Papers, volume 8974 of Lecture Notes in Computer Science, pages 248–256. Springer, 2015.

[ADDZ05]

Davide

Ancona , Ferruccio Damiani, Sophia Drossopoulou, and

Elena

Zucca . Polymorphic Bytecode: Compositional Compilation for Java-like Languages . In Proceedings of the 32nd ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL '05 , pages 26 - 37 , New York, NY, USA, 2005 . ACM.

[Plü09]