Transparent functors for the C++ Standard Template Library∗ Gábor Horváth Norbert Pataki xazax.hun@gmail.com patakino@elte.hu ELTE Eötvös Loránd University Faculty of Informatics Budapest, Hungary Abstract The C++ Standard Template Library (STL) is the most popular library based on the generic programming paradigm. The STL is widely used, because the library is part of the C++ Standard. It consists of many useful generic data structures (like list, vector, map, etc.) and generic algorithms (such as for each, sort, find, etc.) that are fairly irrespec- tive of the used container. Iterators bridge the gap between containers and algorithms. As a result of this layout the complexity of the library is greatly reduced and we can extend the library with new containers and algorithms simultaneously. Function objects (also known as func- tors) make the library much more flexible without significant runtime overhead. They parametrize user-defined algorithms in the library, for example, they separate the comparison in the ordered containers or define a predicate to find. Programmers typically had to duplicate the type of contained objects before the C++14 standard when template functors have been used. Improper duplication may result in runtime errors. C++14 introduces the notion of transparent functors. These functors are able to avoid the duplication of referred types. When transparent functors are in use, template arguments can be deduced by the compiler. In this paper we argue for the usage of transparent func- tors. We developed a tool that can transform the source code to take advantage of transparent functors. Our tool is part of the latest (5.0) Clang release. We analyzed when this transformation is correct and what are the template parameters when the transformation cannot be used. The improper usage of transparent functors results in incorrect code snippets. 1 Introduction The C++ Standard Template Library (STL) was developed with a generic programming approach [Aus98]. In this way containers are defined as class templates and many algorithms can be implemented as function templates. ∗ Supported by the NKP-17-4 New National Excellence Program of the Ministry of Human Capacities Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In: E. Vatai (ed.): Proceedings of the 11th Joint Conference on Mathematics and Computer Science, Eger, Hungary, 20th – 22nd of May, 2016, published at http://ceur-ws.org 96 Furthermore, algorithms are implemented in a container-independent way [Mus88], so one can use them with different containers [Str00]. C++ STL is widely used because it is a very handy, standard C++ library that contains beneficial containers (like list, vector, map, etc.) and large number of algorithms (like sort, find, count, etc.) among other utilities. The STL was designed to be extensible. We can add new containers that can work together with the existing algorithms. On the other hand, we can extend the set of algorithms with a new one that can work together with the existing containers. Iterators bridge the gap between containers and algorithms [Bec01]. The expression problem [Tor04] is solved with this approach. The STL also includes adaptor types which transform standard elements of the library for a different functionality [Ale01]. However, the usage of C++ STL does not mean bugless or error-free code [Dev09]. Contrarily, incorrect application of the library may introduce new types of problems [Por07]. One of the problems is that the error diagnostics are usually complex, and it is very hard to figure out the cause of a program error [Zol01, Zol04]. The violation of special precondition requirements (e.g. sorted ranges) is not tested, but results in runtime bugs [Gre06, Pat11b]. A different kind of stickler is that if we have an iterator object that pointed to an element in a container, but the element is erased or the container’s memory allocation has been changed, then the iterator becomes invalid [Dev07]. Another common mistake is related to removing algorithms. The algorithms are container-independent, hence they do not know how to erase elements from a container, just relocate them to a specific part of the container, and we need to invoke a specific erase member function to remove the elements physically. Since, for example the remove algorithm does not actually remove any element from a container [Mey03]. Most of the properties are checked at compilation time. For example, the code does not compile if one a uses sort algorithm with the standard list container, inasmuch as list’s iterators do not offer random accessibility [Jar06]. Other properties are checked at runtime. For example, the standard vector container offers an at method which tests if the index is valid and it raises an exception otherwise [Pat06]. Unfortunately, there is still a large number of properties that are tested neither at compilation-time nor at run-time. Observance of these properties is in the charge of the programmers. Functor objects make the STL more flexible as they enable the execution of user-defined code parts inside the library [Pat11a]. Basically, functors are usually simple classes with an operator(). Inside the library operator()s are called to execute user-defined code snippets. This can call a function via a pointer to a function or an actual operator() in a class. Functors are widely used in the STL because they can be inlined by the compiler and they cause no runtime overhead in contrast to function pointers. Moreover, in the case of functors extra parameters can be passed to the code snippets via constructor calls. Functors can be used in various roles: they can define predicates when searching or counting elements, they can define comparison for sorting elements and properly searching, they can define operations to be executed on elements. Associative containers (e.g. multiset) use functors exclusively to keep their elements sorted. Algorithms for sorting (e.g. stable sort) and searching in ordered ranges (e.g. lower bound) are typically used with functors because of efficiency. C++14 introduces the so-called transparent functors. They enable to remove unnecessary code duplication in template arguments. These functors are superior to work with the associative containers. The rest of this paper is organized as follows. In section 2 we show what are the transparent functors and what is the motivation for their usage. 2 Transparent functors In this section we present the notation of transparent functors. We show the motivation behind them, and the problems that these functors overcome. C++14 introduces the so-called transparent functors. These are specializations of the existing functors and they take advantage of the C++’s parameter deduction to avoid code duplication. Let us consider the fol- lowing code snippet: std::multiset> s;. If the multiset con- tains std::string objects, it is code duplication that the container compares the objects as strings. However, std::greater is a class template, so the template argument has to be specified. Code duplication is typically a bad pattern in programming. Duplicated code snippets are prone to become inconsistent. Let us consider the following code snippet: std::map> s; 97 s[ 1.4 ] = "Hello"; s[ 1.9 ] = "World"; std::cout << s.size(); // 1 This code snippet contains a use-case when the template arguments have become inconsistent because of maintenance problems. Probably the keys of the map were ints but later changed to doubles, while the comparison has not been touched. This compiles and the behaviour is strange. With the usage of transparent functors it can be fixed: std::map> s; s[ 1.4 ] = "Hello"; s[ 1.9 ] = "World"; std::cout << s.size(); // 2 This phenomenon indicates the appearance of C++14’s transparent functors which are the specializations of the template functor classes. They are specialized to void. These specializations consist of an operator() that is a template member function and it is able to use parameter deduction. It can be used with algorithms as well: std::vector v; // ... std::sort( v.begin(), v.end(), std::greater<>() ); Since the C++14 standard the transparent functors are the preferred ones. 3 The Clang compiler infrastructure Clang is an open source compiler for C/C++/Objective-C that is built on the top of the Low Level Virtual Machine (LLVM) compiler infrastructure. It is a rapidly evolving project supported by Apple, Microsoft and Google. Clang is widely used for the static analysis of C/C++ source code [Bab17]. The refactoring of existing code is important aspect of static analysis [Maj14, Sza14]. Refactoring tools are widely-used nowadays because agile methods (like DevOps) emphasize the importance of refactoring. One of the advantages of Clang is its modular architecture. One can use the parser as a library to build tools. The popularity of this compiler also implies that it is well tested. The parser module is a hand-written recursive descendant parser that does the parsing and the typing at the same time. So in Clang the result of the parsing is a typed Abstract Syntax Tree (AST). Clang has an embedded domain specific language (EDSL) to match the AST for simpler checks that do not require symbolic execution. This EDSL is written in a declarative functional style. The EDSL is called ASTMatchers [Hor15]. The language contains primitive matcher expressions that can be composed to build more complex patterns. Those primitive matchers can be classified into three categories: node matchers, narrowing matchers, and traversal matchers. The node matchers only match a certain type of AST node (e.g. a function call.) The narrowing matchers can match on properties of the AST nodes, to make the pattern more strict (e.g. the number of arguments provided for a function call.) The traversal matchers can specify the relationship of two AST nodes in the pattern (e.g. one node to be the child of the other.) There is an open source tool that is mainly used and developed by Google called Clang Tidy. This tool consists of many useful ASTMatcher patterns that can detect code smells. In some cases Clang Tidy also offers automated source code transformation to fix them. The Clang Tooling library also provides users with an infrastructure to handle the compilation database. That database consists of the compilation parameters for each file. Such database can be generated using external tools (that are not part of the compiler). Using this library one can easily load a compilation database and look up the corresponding compilation arguments for a file. This is a useful facility because the semantics of a software might depend on those arguments. 98 4 Our tool The tool we developed is an extension of Clang Tidy which was eventually included in the official set of checks. In the C++ source code there can be several source locations where a type is mentioned. In the Clang AST one type is only created once, and its memory address can be used as identity. During the parsing, every mention of a certain type will create a TypeLoc AST node that refers to the type and contains the source location where it was mentioned. In order to find the uses of non-transparent functors that could be refactored first we need to define a pattern that describes how such functors look like. const auto NonTransparentFunctors = classTemplateSpecializationDecl( unless(hasAnyTemplateArgument(refersToType(voidType()))), hasAnyName("::std::plus", ... , "::std::bit_not")); Those functors are class template specializations that a non-void template argument and their name matches one of the predefined functors from the STL. The next step is to find all source locations that mention these types. loc(qualType(unless(elaboratedType()), hasDeclaration(NonTransparentFunctors))) We do not permit to match the locations that are elaborated types. We would match the same location twice (both the std::greater and greater ) had we permitted elaborated types since the declaration of an elaborated type is the same as its inner type. It would be straightforward to replace every mention of a non-transparent functor with the corresponding transparent one. This solution however can alter the meaning of the code in some cases. std::map> s; In the example above removing the template parameter of greater would modify the semantics of the code. The original version uses lexicographical ordering while the modified one would compare the memory addresses of the strings instead of their value. Even though this pattern is rare, we would like to preserve the semantics of the code in a source to source translation. loc(qualType( unless(elaboratedType()), hasDeclaration(classTemplateSpecializationDecl( unless(hasAnyTemplateArgument(templateArgument(refersToType( qualType(pointsTo(qualType(isAnyCharacter()))))))), hasAnyTemplateArgument( templateArgument(refersToType(qualType(hasDeclaration( NonTransparentFunctors))))))))) We want to exclude those cases when one of the sibling template parameter of the non-transparent functor is a pointer to a character type. There are some other rules when the transformation can not be done. When some part of the AST that we matched is the result of macro expansions we can not rewrite the source code. The reason is that rewriting a macro definition might break other uses of the macro. The source transformation code is straightforward, the source range that represents the template argument of the non-transparent functor needs to be removed. We ran the tool on the LLVM codebase and found five usages of non-transparent functors that could be replaced by transparent ones. The tool was able to do the transformation in each case automatically and the resulting code compiled correctly. LLVM has an extensive test suite, our changes did not cause any regressions. In the future we might extend this check to also check for user written functors that could be transformed into a transparent functor. 99 5 Conclusion The C++ Standard Template Library is a popular library based on the generic programming paradigm. Functors are able to execute user-defined code snippets in the library without significant overhead. C++14 introduces transparent functors to avoid unnecessary code duplication. In this paper we presented the advantages of transparent functors. We have developed a tool that is able to refactor code to use transparent functors. The tool is part of the Clang compiler infrastructure. It works reliably on industrial scale projects. We presented the background and the approach of this tool. References [Ale01] A. Alexandrescu. Modern C++ Design. Addison-Wesley, 2001. [Aus98] M. H. Austern. Generic Programming and the STL: Using and Extending the C++ Standard Template Library. Addison-Wesley, 1998. [Aus96] M. H. Austern, R. A. Towle, A. A. Stepanov. Range partition adaptors: a mechanism for parallelizing STL. ACM SIGAPP Applied Computing Review, 4(1):5–6, 1996. [Bab17] B. Babati, N. Pataki. Analysis of Include Dependencies in C++ Source Code. Annals of Computer Science and Information Systems, 13:149–156, 2017. [Bec01] T. Becker. STL & generic programming: writing your own iterators. C/C++ Users Journal, 19(8):51– 57, 2001. [Dev09] G. Dévai, N. Pataki. A tool for formally specifying the C++ Standard Template Library. Ann. Univ. Sci. Budapest., Comput., 31:147–166, 2009. [Dev07] G. Dévai, N. Pataki. Towards verified usage of the C++ Standard Template Library. Proc. of The 10th Symposium on Programming Languages and Software Tools (SPLST), 360–371, 2007. [Eng00] D. Engler, B. Chelf, A. Chou, S. Hallem. Checking system rules using system-specific, programmer- written compiler extensions. Proc. of the 4th conference on Symposium on Operating System Design & Implementation 4:1, 2000. [Gre06] D. Gregor, J. Järvi, J. Siek, B. Stroustrup, G. Dos Reis, A. Lumsdaine. Concepts: linguistic support for generic programming in C++. Proc. of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications (OOPSLA 2006), 291–310, 2006. [Hal02] S. Hallem, B. Chelf, Y. Xie, D. Engler. A system and language for building system-specific, static analy- ses. Proc. of the ACM SIGPLAN 2002 conference on Programming language design and implementation (PLDI ’02), 69–82, 2002. [Hor15] G. Horváth, N. Pataki. Clang matchers for verified usage of the C++ Standard Template Library, Annales Mathematicae et Informaticae, 44:99–109, 2015. [Jar06] J. Järvi, D. Gregor, J. Willcock, A. Lumsdaine, J. Siek. Algorithm specialization in generic program- ming: challenges of constrained generics in C++. Proc. of the 2006 ACM SIGPLAN conference on Programming language design and implementation (PLDI 2006), 272–282. [Maj14] V. Májer, N. Pataki. Concurrent object construction in modern object-oriented programming languages. Proc. of the 9th International Conference on Applied Informatics, 2:293–300, 2014. [Mey03] S. Meyers. Effective STL. Addison-Wesley, 2003. [Mus88] D. R. Musser, A. A. Stepanov. Generic Programming, Proc. of the International Symposium ISSAC’88 on Symbolic and Algebraic Computation, Lecture Notes in Comput. Sci., 358:13–25, 1988. [Pat11a] N. Pataki. C++ Standard Template Library by Safe Functors. Proc. of The 8-th Joint Conference on Mathematics and Computer Science, Selected Papers, 357–368, 2011. 100 [Pat06] N. Pataki, Z. Porkoláb, Z. Istenes. Towards Soundness Examination of the C++ Standard Template Library. Proc. of Electronic Computers and Informatics, ECI 2006 186–191, 2006. [Pat11b] N. Pataki, Z. Szűgyi, G. Dévai. Measuring the Overhead of C++ Standard Template Library Safe Variants. Electronic Notes in Theoretical Computer Science, 264(5):71–83, 2011. [Pir08] P. Pirkelbauer, S. Parent, M. Marcus, B. Stroustrup. Runtime Concepts for the C++ Standard Tem- plate Library. Proc. of the 2008 ACM Symposium on Applied Computing, 171–177, 2008. [Por07] Z. Porkoláb, Á. Sipos, N. Pataki. Inconsistencies of Metrics in C++ Standard Template Library. Proc. of 11th ECOOP Workshop on Quantitative Approaches in Object-Oriented Software Engineering QAOOSE Workshop, ECOOP 2007, Berlin, 2–6, 2007. [Rep95] T. Reps, S. Horwitz, M. Sagiv. Precise Interprocedural Dataflow Analysis via Graph Reachability. Proc. of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 49–61, 1995. [Str00] B. Stroustrup. The C++ Programming Language - Special Edition. Addison-Wesley, 2000. [Sza14] Cs. Szabó, M. Kotul, R. Petruš. A closer look at software refactoring using symbolic execution. Proc. of the 9th International Conference on Applied Informatics, 2:309–316, 2014. [Tor04] M. Torgersen. The Expression Problem Revisited – Four New Solutions Using Generics. Proc. of European Conference on Object-Oriented Programming(ECOOP) 2004, Lecture Notes in Comput. Sci., 3086:123–143, 2004. [Zol01] L. Zolman. An STL message decryptor for visual C++. C/C++ Users Journal, 19(7):24–30, 2001. [Zol04] I. Zólyomi, Z. Porkoláb. Towards a General Template Introspection Library. Proc. of Generative Programming and Component Engineering: Third International Conference (GPCE 2004), Lecture Notes in Comput. Sci., 3286:266–282, 2004. [Xu10] Z. Xu, T. Kremenek, J. Zhang. A Memory Model for Static Analysis of C. Programs. Proc. of ISoLA’10 4th iternational conference on Leveraging applications of formal methods, verification, and validation, 1:535–548, 2010. 101