Combining Multiple Dimensions of Knowledge in API Migration Thiago Tonelli Bartolomei1 , Mahdi Derakhshanmanesh2 , Andreas Fuhr2 , Peter Koch2 , Mathias Konrath2 , Ralf Lämmel2 , and Heiko Winnebeck2 1 University of Waterloo, Canada 2 University of Koblenz-Landau, Germany Abstract—We combine multiple dimensions of knowledge Acknowledgement We are grateful to Daniel Ratiu for providing about APIs so that we can support API migration by us with data related to the programming ontology of [9], [10]. wrapping or transformation in new ways. That is, we We are also grateful to four anonymous MDSM 2011 reviewers assess wrapper-based API re-implementations and provide for their excellent advice. guidance for migrating API methods. We demonstrate our approach with two major GUI APIs for the Java platform II. T HE INTEGRATED REPOSITORY and two wrapper-based re-implementations for migrating between the GUI APIs. We integrate three data sources with API knowledge into a repository. Let us describe those data sources, the Keywords-Software migration, API migration, API analy- sis, Wrapping, Mining software repositories metamodel of the integrated repository, and the repository technology as such. I. I NTRODUCTION A. Data sources API migration is a kind of software migration; it may • A PI M ODEL (developed by the present authors)—a be necessary to meet requirements for software modern- model of API implementations (including S WING, ization, application integration, and others. API migration SWT, S WING WT, SWTS WING) with an underlying is realized by wrapping or transformation. We refer to [1], metamodel that is a (very) limited Java metamodel [2], [3], [4], [5], [6], [7], [8] for recent work on the subject. for structural properties and calling relationships; For instance, consider the following re-engineering sce- • A PI U SAGE (developed by Lämmel et al. [11])—a nario. Two Java applications need to be integrated, but they fact base (say, database) with usage properties of use different GUI APIs, say S WING and SWT. Based on 1476 open-source Java projects at SourceForge, in the exercised features and possibly other considerations, particular with facts for API method calls within the one of the two APIs is favored for the integrated ap- projects’ code; plication. The disfavored API (the “source API”) can be • A PI L INKS (developed by Ratiu et al. [9], [10])— re-implemented in terms of the favored API (the “target an ontology for programming concepts that were API”) as a wrapper so that the migration requires little, if extracted semi-automatically from APIs in different any, rewriting of the application’s code. Incidentally, there programming domains, complete with trace links are two advanced open-source wrappers that serve both between concepts and the API source-code elements directions of migration: S WING WT1 and SWTS WING2 . from which they were derived. In previous work [6], [8], we substantiated that migra- The A PI M ODEL source contributes basic knowledge tion between independently developed source and target about types and methods of genuine API implementations, APIs may be complex because of significantly different and their coverage by the typically incomplete wrapper- generalization hierarchies, contracts, and protocols. based re-implementations. The A PI U SAGE source helps to Contribution: In the present paper, we describe an assess, for example, the relevance of genuine methods that approach for the combination of multiple dimensions of are not implemented in a wrapper. The A PI L INKS source knowledge about APIs so that API migration can be helps to derive candidate classes and methods that could supported in new ways. That is, we assess wrapper- be used in a wrapper-based API re-implementation. based API re-implementations and provide guidance for migrating API methods. To this end, we leverage a model- B. Metamodel of the repository based approach to the integration of knowledge about APIs Fig. 1 shows the metamodel (a UML class diagram) of into a repository for convenient use in declarative queries. our integrated repository where metaclasses are tagged by Throughout the paper, we use the S WING/SWT APIs and data sources A PI M ODEL, A PI U SAGE, and A PI L INKS. We the above-mentioned wrappers as subjects under study. must note that the metamodel does not cover all elements Road-map: Sec. II describes the integrated reposi- of the sources, but is streamlined to fit our objectives. tory. Sec. III and Sec. IV cover different forms of support- The metaclass NamedElement represents package- ing API migration. Related work is discussed in Sec. V, qualified names of packages, classes, and methods. Be- and the paper is concluded in Sec. VI. The paper and cause of the composition relationships in the metamodel, accompanying material are available online.3 NamedElements are also qualified by the name of an 1 http://swingwt.sourceforge.net/: re-implements S WING in terms of SWT API, in fact, by a particular implementation, which could 2 http://swtswing.sourceforge.net/: re-implements SWT in terms of S WING be a genuine implementation or a wrapper-based re- 3 http://softlang.uni-koblenz.de/apirep/ implementation. using a: from c: V{Class} with c.qualifiedName =˜ a and count(c−−>{CorrespondsTo}) = 0 reportSet c end That is, a is an argument of the query for the name of the API; the query selects (“reports”) all classes c such that the qualified name of c matches with a and there are no outgoing edges of the type CorrespondsTo (see -->{CorrespondsTo}) from c. III. W RAPPER ASSESSMENT Consider again our introductory scenario for API migra- tion. Which wrapper, S WING WT or SWTS WING, should we favor? Such decision making should take into account wrapper qualities, e.g., its completeness or compliance— both relative to the genuine API implementation. In case we want to improve a given wrapper, we should also track progress by simple metrics. Accordingly, we propose some concepts for wrapper assessment. Figure 1. Metamodel of the integrated repository with API knowledge A. Coverage of source API The metaclasses Package, Class, and Method represent We can trivially compare the A PI M ODEL data between the package hierarchy with the Java classes and their genuine API implementation and wrapper to get a basic methods, further with extension relationships between sense of completeness in terms of (the percentage of) classes (see association Extends) and calling relationships genuine packages, classes, and methods that are covered between methods (see association Calls). As a means of (say, re-implemented) by the wrapper. Table I collects such prioritization, we leave out interfaces; they are trivially metrics for the S WING/SWT wrappers. The numbers show copied by wrappers. that the wrappers are highly incomplete. Classes of genuine API implementations are linked with S WING WT SWTS WING the corresponding classes of wrappers (see association Packages 25 (78.12 %) 16 (51.61 %) CorrespondsTo). Here we note that wrappers may use Classes 533 (18.61 %) 372 (56.97 %) different package prefixes. Also, these links improve con- Methods 4533 (26.60 %) 3426 (42.59 %) venience for those queries that need to navigate between Table I the different API implementations. The metaclass Concept C OVERAGE OF SOURCE API models concepts in the sense of A PI L INKS’ ontology. Classes and methods can be linked with concepts; see B. Wrapper compliance issues associations IsClass and IsMethod. Hence, classes and methods of different APIs may be linked transitively. Some forms of non-compliance of a wrapper with the The metaclass MethodUsage represents the usage data genuine API implementation can be determined by simple that was integrated from A PI U SAGE. That is, for each queries on our repository, e.g., differences regarding gen- API method, we maintain the number of calls to the eralization hierarchies or the declaring classes for meth- method (if any) within the SourceForge projects covered ods. Consider the following extension chain for S WING’s by A PI U SAGE [11]. We translated this number also into a AbstractButton: relative measure in the sense of the percentage of the calls java.lang.Object |_ java.awt.Component to the given method relative to the number of all calls to |_ java.awt.Container methods of the API. |_ javax.swing.JComponent |_ javax.swing.AbstractButton C. Repository technology The chain itself is preserved by S WING WT. However, The repository leverages the model-based TGraph ap- S WING declares the method addActionListener on the proach [12]. The metamodel of Fig. 1 is represented as class AbstractButton whereas S WING WT declares the a TGraph schema; converters instantiate the schema from method already on the class Component. the different data sources. All analysis is performed by means of queries on TGraphs using the language GReQL S WING WT SWTS WING • Declarations on supertypes 516 161 (Graph Repository Query Language) [13]. For brevity, • Empty implementations 1006 230 we describe all queries (“measurements”) only informally • Missing methods 12506 4618 ◦ Class missing 9604 3698 in this paper, but here is a simple, illustrative GReQL ◦ Class present 2902 920 example for retrieving all classes c of an API a that are Table II not implemented by a wrapper: W RAPPER COMPLIANCE ISSUES Table II shows numbers for some metrics for (lack of) A. Concept-based method candidates wrapper compliance. In reference to the above example We can use A PI L INKS’ trace links between API meth- of the method addActionListener, we measure the number ods and concepts to propose method candidates. The idea of methods that are declared “earlier” on a supertype in is that if methods of the source and target APIs are the wrapper. Further, we measure methods with empty related to the same concept, then the latter may be useful implementations, i.e., implementations without any out- in re-implementing the former. Further, let us sort all going method calls, while the corresponding genuine im- such candidates by their cumulative usage, say, by their plementations had outgoing method calls. (The substantial relevance as far as A PI U SAGE is concerned. number of empty implementations may be surprising, but these wrappers are nevertheless reportedly useful in Qualified candidate name Cumulative usage (%) practice.) Finally, we also subdivide missing methods into swing.javax.swing.ImageIcon.ImageIcon 0,4816 those that are implied by missing classes vs. those that are swing.java.awt.image.BufferedImage.BufferedImage 0,1063 swing.java.awt.Frame.getIconImage 0,0059 missing from existing classes. swing.java.awt.....MemoryImageSource 0,0046 swing.java.awt.Frame.setIconImage 0,0042 swing.javax.swing.text.html.ImageView.ImageView 0,0005 C. Relevance in terms of usage swing.java.awt.....ImageGraphicAttribute N/A Let us qualify wrapper (in-) completeness with Table IV A PI U SAGE data. If the developers of the wrappers ap- C ANDIDATES FOR RE - IMPLEMENTING SWT’ S Button.setImage plied the right judgement call for leaving out classes and methods, then the missing methods should be less relevant Suppose you need to migrate SWT’s Button.setImage to in practice than the implemented ones. Table III lists usage S WING. Table IV shows the method candidates that were metrics for the S WING/SWT wrappers. automatically determined by a GReQL query. Consider the first line with the constructor of ImageIcon. We show the S WING WT SWTS WING line in bold face to convey the fact that there is an existing Unimplemented methods • Any usage 9,01 % 2,90 % wrapper, SWTS WING, whose method implementation of • Cumulative usage 2,88 % 2,35 % setImage readily involves the constructor of ImageIcon. Empty methods • Any usage 42,53 % 25,71 % Further inspection reveals that S WING’s JButton, which • Cumulative usage 11,41 % 1,49 % is a counterpart to SWT’s Button, does not provide an Im- Non-empty methods • Any usage 48,46 % 71,39 % age property and, hence, we cannot simply migrate SWT’s • Cumulative usage 85,72 % 96,17 % Button.setImage to a corresponding setter of S WING. Extra Table III state and a more complex idiom (indeed involving Image- U SAGE OF API METHODS IN S OURCE F ORGE Icon) is needed. B. Assessment of the ontology In the table, we break down S WING’s and SWT’s methods into categories according to the wrappers as fol- The above example shows that A PI L INKS may suggest lows: unimplemented, empty, and non-empty implemented reasonable candidates—in principle. We would like to methods. For each category, we show the percentage of assess A PI L INKS’s relevance more generally. In particular, methods with “any usage” (say, any calls) in the Source- we could compare A PI L INKS-based links with actual Forge projects in the scope of the A PI U SAGE source. We calling relationships in existing wrapper implementations, also show “cumulative usage” for each category, i.e., the as they are available through A PI M ODEL’s data. Table V contribution of the category to all API method calls. These lists corresponding metrics for the S WING/SWT wrappers. are contrasting numbers which show, for example, that S WING WT SWTS WING the many unimplemented and empty methods (see again Unimplemented methods with links 10.83 % 0.35 % Table II) are exercised much less frequently than the fewer Implemented methods with links 28.06 % 24.98 % non-empty methods. Correct links 42.75 % 37.20 % Table V IV. G UIDANCE FOR MIGRATION API LINKS BETWEEN S WING AND SWT A given wrapper may be effectively incomplete in that The coverage of API parts by A PI L INKS’ trace links a missing method is actually exercised by the application is an artifact of the underlying semi-automatic ontology under API migration. In this case, we seek guidance for extraction approach [9], [10], which involves elements migrating the API method in question. Such guidance is of name matching and thresholds for the inclusion of universally useful for API migration—even when transfor- concepts. We cannot expect to retrieve links for arbitrary mation is used instead of wrapping. A practical approach methods from A PI L INKS. to guidance would need to combine elements of API type In the table, we break down S WING’s and SWT’s matching, IDE support (such as autocompletion and stub methods into the categories of unimplemented and im- generation), and others. We focus here on the aspect of plemented methods according to the wrappers. For both proposing method candidates to be called in methods of categories, we show the percentage of methods that are wrapper-based API re-implementations. linked (transitively) with one or more methods of the corresponding target API. The numbers are such that R EFERENCES implemented methods happen to be much better linked [1] I. Balaban, F. Tip, and R. Fuhrer, “Refactoring support for class than unimplemented ones. library migration,” in Proc. of OOSPLA 2005. ACM, 2005, pp. 265–279. At the bottom of the table, we also list the percentage [2] J. Henkel and A. Diwan, “CatchUp!: capturing and replaying of correct A PI L INKS’ trace links. We say that a link from refactorings to support API evolution,” in Proc. of ICSE 2005. the method m of the source API s to a method m0 of ACM, 2005, pp. 274–283. the target API t is correct, if a given wrapper-based re- [3] J. H. Perkins, “Automatically generating refactorings to support API evolution,” in Proc. of the Workshop on Program Analysis implementation of s in terms of t implements m in a way for Software Tools and Engineering (PASTE). ACM, 2005, pp. that it directly calls m0 . When we specify the percentage, 111–114. we consider as the baseline (100%) only those methods [4] I. Şavga, M. Rudolf, S. Götz, and U. Aßmann, “Practical m that both have associated trace links to t and actually refactoring-based framework upgrade,” in Proc. of the Conference on Generative Programming and Component Engineering (GPCE). call some method of t. It turns out that A PI L INKS predicts ACM, 2008, pp. 171–180. a correct link in more than 1/3 of the cases. We have to [5] D. Dig, S. Negara, V. Mohindra, and R. Johnson, “ReBA: note though that A PI L INKS typically proposes multiple refactoring-aware binary adaptation of evolving libraries,” in Proc. candidates—with a median of 8. of ICSE 2008. ACM, 2008, pp. 441–450. [6] T. T. Bartolomei, K. Czarnecki, R. Lämmel, and T. van der Storm, “Study of an API Migration for Two XML APIs,” in Proc. of V. R ELATED WORK Conference on Software Language Engineering (SLE 2009), ser. LNCS, vol. 5969. Springer, 2010, pp. 42–61. Work on API migration has previously focused on [7] M. Nita and D. Notkin, “Using Twinning to Adapt Programs to transformation and wrapper-generation techniques for API Alternative APIs,” in Proc. of ICSE 2010, 2010. upgrades [2], [3], [4], [5] and, to a lesser extent, on [8] T. T. Bartolomei, K. Czarnecki, and R. Lämmel, “Swing to SWT migration between independently developed APIs [1], [6], and Back: Patterns for API Migration by Wrapping,” in Proc. of ICSM 2010. IEEE, 2010, 10 pages. [7], [8]. The present work is the first to integrate diverse [9] D. Ratiu, M. Feilkas, and J. Jürjens, “Extracting Domain Ontologies data sources to assess wrappers and to guide their devel- from Domain Specific APIs,” in 12th European Conference on Soft- opment. Typically, wrappers are assessed by testing (i.e., ware Maintenance and Reengineering, CSMR 2008, Proceedings. testing whether the application under migration continues IEEE, 2008, pp. 203–212. to function, or recovers from any test failures that had to be [10] D. Ratiu, M. Feilkas, F. Deissenboeck, J. Jürjens, and R. Marinescu, “Towards a Repository of Common Programming Technologies addressed by improving a pre-existing wrapper) [6]. There Knowledge,” in Proc. of the Int. Workshop on Semantic Technolo- is no previous work on guiding API-wrapper development gies in System Maintenance (STSM), 2008. for independently developed APIs. [11] R. Lämmel, E. Pek, and J. Starek, “Large-scale, AST-based API- Most of the techniques that we integrate are inspired by usage analysis of open-source Java projects,” in SAC’11 - ACM 2011 SYMPOSIUM ON APPLIED COMPUTING, Technical Track program comprehension research. For instance, our com- on “Programming Languages”, 2011, to appear. parison of different API implementations is a simple form [12] J. Ebert, V. Riediger, and A. Winter, “Graph Technology in Reverse of object-model matching [14]. Also, our exploitation of Engineering: The TGraph Approach,” in WSR 2008, ser. GI- API-usage data is straightforward, when compared to other EditionProceedings, vol. 126. Gesellschaft für Informatik, 2008, pp. 67–81. scenarios of exploiting such data in the context of API [13] D. Bildhauer and J. Ebert, “Querying Software Abstraction usability [15] and understanding API usage (patterns) [16], Graphs,” in Query Technologies and Applications for Program [17]. Our proposal for guided migration can be viewed Comprehension (QTAPC 2008), Workshop at ICPC 2008, 2008. as one specific approach to advanced (“intelligent”) code [14] Z. Xing and E. Stroulia, “UMLDiff: an algorithm for object- completion systems [18], [19]. oriented design differencing,” in 20th IEEE/ACM International Conference on Automated Software Engineering (ASE 2005), Pro- ceedings. ACM, 2005, pp. 54–65. VI. C ONCLUDING REMARKS [15] J. Stylos, B. A. Myers, and Z. Yang, “Jadeite: improving API documentation using usage information,” in Proc. of the 27th The complexity of API migration requires many skills Intern. Conf. on Human Factors in Computing Systems, CHI 2009. and techniques. Of course, one must understand the API’s ACM, 2009, pp. 4429–4434. domain, and the application under migration. Basic soft- [16] J. Stylos and B. A. Myers, “Mica: A Web-Search Tool for Finding API Components and Examples,” in 2006 IEEE Symposium on ware engineering skills such as testing, design by contract, Visual Languages and Human-Centric Computing (VL/HCC 2006), effective use of documentation are critical as well. Still Proceedings. IEEE, 2006, pp. 195–202. API migrations are largely unstructured today, and they [17] T. Xie and J. Pei, “MAPO: mining API usages from open source come with unpredictable costs. We submit that techniques repositories,” in MSR ’06: Proceedings of the 2006 international workshop on Mining software repositories. ACM, 2006, pp. 54– for assessment and guidance, such as those discussed 57. in this short paper, are needed to tackle non-trivial API [18] D. Mandelin, L. Xu, R. Bodı́k, and D. Kimelman, “Jungloid migrations in the future. mining: helping to navigate the API jungle,” in Proc. of the 2005 Clearly, our work is at an early state, and makes only ACM SIGPLAN conference on Programming language design and implementation (PLDI 2005). ACM, 2005, pp. 48–61. a limited contribution to the larger API migration theme. [19] M. Bruch, M. Monperrus, and M. Mezini, “Learning from ex- There is a need for a comprehensive approach for guided amples to improve code completion systems,” in Proceedings of API migration, which should combine diverse elements ESEC/SIGSOFT FSE 2009. ACM, 2009, pp. 213–222. of assessment, mapping, matching, code completion, code generation, and testing.