PapyrusRT: Modelling and Code Generation Ernesto Posse eposse@zeligsoft.com Zeligsoft Abstract. In this talk we introduce PapyrusRT, an open-source, in- dustrial-strength model-driven development environment for real-time and embedded systems, implementing UML-RT [2,3], a UML-based lan- guage. PapyrusRT is implemented on top of Papyrus, an Eclipse mod- elling tool for UML, SysML, and EMF models. We describe the moti- vations for this project and in particular for the need of an open-source environment. We provide a brief summary of the UML-RT language and give a brief description of the tool itself. Then we give an overview of the code generation process and its architecture, with emphasis on its extensibility. 1 Introduction Developing software for real-time and embedded systems (RTES) poses many challenges and model-driven engineering (MDE) methods have been proposed as a way to address them. UML [1] has become the de facto lingua franca of the software modelling world, with many tools, both commercial and non- commercial supporting parts of the language. Nevertheless, UML is a large and complex language and mastering it is itself a difficult task. For this reason, more specialized modelling formalisms and languages have been proposed, which are better adapted to the needs of RTES. One such language is UML-RT. UML-RT is a language that is based on UML (it is defined as a UML profile) and simplifies it in order to tame software complexity, better capture high-level system architecture, focus on the concurrent structure of a system, and improve the analyzability and predictability of a system’s behaviour. This is achieved mainly by restricting the language to two kinds of diagrams: composite structure and state machine diagrams. These diagrams have additional restrictions over general-purpose UML. In addition to these syntactic restrictions, UML-RT has a more precise execution semantics, designed with the needs of soft-real-time systems in mind. Although based in UML, UML-RT predates it (and influenced the defini- tion of the UML 2 standard). UML-RT has its roots at the Telos project at Bell Northern Research (part of Nortel) in 1987. In 1992, this project led to spin-off company called ObjectTime which released its namesake development environment implementing the core language. In 1994, an influential book on the language and the methodology was published [2], and the language became known as ROOM. In 1998, the name UML-RT was coined for the UML profile describing the ROOM language [3]. In 2000, Rational Software acquired Object- Time and turned the tool into Rational RoseRT. In 2002, Rational was acquired by IBM and in 2006 they migrated RoseRT to the Eclipse IDE where the tool was rebranded as IBM Rational Software Architect Real Time Edition, or RSA-RTE for short. In all these incarnations, UML-RT has been successfully applied to large- scale industrial projects. However, all these implementations have been propri- etary. This has presented a challenge to its users. As with all proprietary software, users are bound to the vendor for support, updates and customization. In the context of RTES, users rarely want a one-size-fits-all solution. They typically want tools that are better suited to their specific needs. This entails customiza- tions and/or extensions to the tools which are difficult or impossible to make. This requires a willingness on the vendor’s part to adapt a generic tool to each user’s specific needs. It is out of these considerations that the need for an open-source development environment for UML-RT arose. Under the open-source framework, users don’t depend on the vendor to adapt the tool. This is where PapyrusRT comes in. Pa- pyrusRT is a new open-source implementation of a full UML-RT development environment, including a graphical modelling environment, a code generator and a runtime system. This full set makes UML-RT models executable. PapyrusRT is implemented on top of Papyrus, a well-known UML mod- elling environment on Eclipse. Papyrus was chosen as the basis because it is open-source, it has a rich user-interface, it already supports the latest OMG UML standard (2.5.1) and is part of the rich Eclipse ecosystem, from which a wide range of components, tools and resources can be leveraged. In this presentation we give a brief tour of the language, the tool and the code generation process. 2 UML-RT 2.1 Capsules, ports, parts and state machines The central concept in UML-RT is the capsule. A capsule is, as the name sug- gests, an encapsulated entity. It is a class in the object-oriented sense, more precisely an active class, meaning a class whose instances have autonomous be- haviour, specified by a hierarchical state machine. Furthermore, capsules have a well-defined interface consisting of a set or ports, and capsule instances can communicate with their environment (other capsules) exclusively through these ports. Each port is typed by a protocol which defines the kinds of messages or signals allowed. Capsules may also define internal structure consisting of parts (sub-capsules), which are properties (in the UML sense) typed by some other capsule. Parts are connected by linking their ports with connectors. A connector can link only two ports, but ports and parts can be replicated to obtain different connection patterns. Figure 1 on page 3 shows a structure diagram depicting a capsule called “Top” with two parts called “pinger” and “ponger”, typed by some capsules named “Pinger” and “Ponger” respectively, each with a port and a connector linking them. Fig. 1. A UML-RT composite structure diagram for a capsule. A port in the outer boundary of a capsule can be connected to an internal part, in which case it is called a relay port, as it simply relays messages to and from the internal part. A boundary port not connected to an internal part is called an external port, and messages arriving there are handled by the capsule’s state machine. Similarly, it is through external ports that the capsule’s state machine can send messages outside. Non-boundary ports are called internal, and are used by the capsule’s state machine to communicate with the capsule’s internal parts. Protocols define three kinds of messages: input, output and input/output. Messages can be parametrized. This allows messages to carry data as a payload. Ports can have two kinds of role: base or conjugated. A conjugated port is one where the protocol messages’ direction is flipped: an input message in a base port is an output message in a conjugated port, and vice-versa. This means that messages from/to a base port at one end of a connector correspond to messages to/from a conjugated port at the other end of the connector, unless one of the ports is a relay port. The behaviour of a capsule is described by a hierarchical state machine. Fig- ure 2 on page 4 shows a state machine diagram. These state machines are like UML state machines with some restrictions: there are no AND-states (orthog- onal regions), that is, each composite state has exactly one region, so at any point the state machine is in exactly one state or pseudo-state. Therefore, there are no fork or join pseudo-states. The only pseudo-states allowed are initial, deep-history, choice, junction, entry, and exit. Transitions cannot cross state boundaries. Transitions can form a chain, but to enter or exit composite states they must go through entry and exit points explicitly. States may have entry and exit actions, but they do not have “do” actions. Shallow history is not supported and neither are final states. Transitions arriving at the boundary of a composite state are deemed to arrive at its deep-history pseudo-state. Capsule parts can have one of three roles: fixed, optional or plug-in. Fixed capsule parts are parts whose instance(s) are owned by the capsule that contains Fig. 2. A (hierarchical) state-machine diagram for a capsule. the part, and are created and destroyed at the same time as the containing cap- sule is created and destroyed. Optional capsule parts are parts whose instance(s) are also owned by the containing capsule, but are created and destroyed dynam- ically, by some action in the capsule’s state machine. Plug-in capsule parts are parts whose instances are not owned by the containing capsule. Rather, they are created elsewhere, and are “imported” and “deported” by some action in the capsule’s state machine. Since they are not necessarily owned by the cap- sule importing them, plug-in capsule instances can be shared between different capsules. Another important feature is that of service ports. Ports can be marked as service provision points (SPPs) or service access points (SAPs). These represents ports that provide (resp. access) some service to other capsules, usually in a different architectural layer. Unlike normal ports, service ports are connected dynamically. This is, the connection between an SAP and an SPP is performed at runtime by some action in the relevant capsule’s state machine. 2.2 Execution semantics As mentioned above, capsules have autonomous behaviour and therefore it is natural to think of them as threads, with each capsule instance executing con- currently with other capsule instances. However, UML-RT makes a distinction between capsules and threads. The concept of a capsule as an entity with au- tonomous behaviour is a modelling concept, whereas the concept of thread is a deployment concept. Each capsule is assigned to a thread, and more than one capsule may be assigned to the same thread. The execution of a UML-RT model is carried out by a runtime system (or RTS) which consists of one or more con- trollers and a runtime services library. Each controller runs in its own thread and maintains and executes the behaviour of the collection of capsules assigned to its thread. Hence, when we assign a capsule to a thread, we are assigning it to a controller. Assigning a capsule to a particular thread/controller can occur at runtime in the case of optional and plug-in capsule parts. A controller has, in addition to the collection of capsules assigned to it, a message (priority) queue. Messages in the queue can come from capsules in other controllers or in the same controller. The controller runs a main loop, extracting the message with the highest priority from the queue (or the first one, if all have the same priority), and directs the message to the target capsule, or more precisely, passes the message to the target capsule’s state machine. The execution semantics of state machines mandate a run-to-completion se- mantics (RTC), that is, an incoming message is fully processed before the next message is processed. This means that the state machine is always in a stable state when a message arrives, and it follows a full transition chain, possibly go- ing through several pseudo-states, and executing the corresponding transition actions as well as exit and entry actions encountered along the way, until it reaches a stable state, at which point the RTC step is finished and the state machine is ready to process the next incoming message. In addition to handling messages directed to its capsules, the controller is in charge of managing the capsules’ lifetimes for capsules that are created, de- stroyed, imported, or deported dynamically. The runtime services library provides, as the name suggests, a set of common services, in particular services related to logging, timing, and dynamic capsule operations in addition to the messaging operations already discussed. 3 The tool Figure 3 on page 6 shows a screenshot of the tool. The central view has the main canvas or model editor with a palette of elements on the right. The bottom view includes a Properties view for the properties of the currently selected element. This includes standard UML properties, UML-RT-specific properties, applied stereotypes, advanced properties, etc. The bottom also has views for validation, error reporting, etc. The top-left shows the project explorer to add and ma- nipulate projects. The middle-left shows the model explorer which presents the abstract syntax view of the model and imported libraries. The bottom-left shows an outline view. Code generation is triggered by selecting the root element in the model ex- plorer and choosing either “UMLRT Code Generator” or “UMLRT Code Gener- ator (regenerate)”. The first performs incremental generation (or full generation if the model has not been previously generated). The second regenerates the whole model. Fig. 3. PapyrusRT screenshot. In an application there must be a “top” capsule, which corresponds to the “main” capsule. By default it is a capsule called “Top”, but the user may select any other capsule as the top-capsule by right-clicking on it in the model explorer and selecting “Set as Top” or “Generate as Top”. If code generation succeeds, there will be a CDT project created (or updated) in the workspace, named after the input project with the “_CDT_project” suf- fix. This project includes the necessary Makefiles to compile the generated C++ sources and link it to the RTS, producing an executable application. The Make- files use a variable called “UMLRTS_ROOT” for the location of this library. The code generator attempts to assign it the correct value according to the installa- tion, but in circumstances where this location cannot be determined, the user may still specify it manually as an environment variable. In its current version (0.7.1), the generated code compiles C++03 with GCC 4.6.3, targeting Linux, with some Windows support. 4 Code generation PapyrusRT generates executable C++ code from the model. This includes both structural and behavioural elements of the model. The code generator can run either as a standalone application, or within the Eclipse environment. The input is a UML model with the UML-RT profile applied, and possibly other profiles, such as the RTCppProperties profile used to customize the generated C++ code. The output is an Eclipse CDT project, including all generated source files and Makefiles required for building (compiling and linking). The generator supports incremental generation, that is, if code has already been generated, then a run of the code generator will only generate those ele- ments that have changed, and their dependent elements. For example, if a pro- tocol changes, the code generator will regenerate the protocol, and all capsules with a port typed by the protocol. Model validation is performed by a separate operation. Nevertheless, the code generator does perform some limited validation and sanity checks as it proceeds. When errors in the model are encountered, these are reported to the user. Never- theless, there are certain types of errors which cannot be easily detected during code generation. These are errors, such as a model element missing a stereotype, that would require the code-generator to know the modeler’s intentions. 4.1 Model transformation The code-generation process is structured as a model transformation. More pre- cisely, it is a sequence of model-to-model transformations with a final model-to- text transformation. This is depicted in Figure 4 on page 7. Fig. 4. Model transformation. In the first phase, the UML model is translated into xtUMLrt, a simplified intermediate representation which contains all the required UML-RT constructs. The xtUMLrt meta-model is intended to simplify UML, in order to simplify the generator itself, while allowing customization, isolating the generator from the toolset, and providing a common language that allows its eventual extension to support xtUML, and, potentially, other approaches. Once the model has been translated into xtUMLrt, elements other than state machines are translated into a simplified C++ meta-model, from which the final phase of generating the actual source C++ files and CDT project is done. The C++ meta-model isolates the generator from issues such as format- ting, body/header file generation, file regeneration avoidance, CDT project and makefile generation, etc. For example, if we need to generate a class in C++, then the class must be declared in a header file and defined in the correspond- ing implementation file. Using the C++ meta-model, we can simply create a “CppClass” object (where CppClass is the meta-class used to represent C++ classes in the meta-model) without the need for separate rules to generate the header and the implementation and keep them coordinated. Instead, the last phase takes that CppClass object and writes the required declarations in the header and definitions in the implementation file. State Machines are also translated to the C++ meta-model but go through several sub-stages that expand the inheritance hierarchy and flatten the state machine. During the translation from xtUMLrt to the C++ meta-model, the genera- tor collects, for each element to be generated, the set of dependent elements. For each kind of element to be generated, a specialized generator class implements the specific translation. This is, there are specific generators for basic classes, capsules, state machines, protocols and a special generator for deployment that builds the deployed structure. 4.2 Architecture The overall architecture if the generator is depicted in Figure 5 on page 8. Fig. 5. Code generator architecture. The UMLRTGenerator class on top is the “director” that executes the trans- formations described in Figure 4 on page 7. It contains references to: – UML2xtumlrtTranslator: the translator from UML to xtUMLrt. – CppCodeGenerator: the core generator from xtUMLrt to the C++ meta- model. – CppCodePattern: the “C++ code pattern”, a class that provides a facade to the C++ meta-model with multiple factory methods, and which caches generated elements that can be shared by multiple parts of the generator. It also contains the main model-to-text method. The CppCodeGenerator also has a reference to the CppCodePattern, as well as references to: – GeneratorManager: the “generator manager”, which keeps track of the indi- vidual element-specific generators. – Collector: the “collector”, which creates element-specific generators for each element and their dependent elements. – UML2ChangeTracker: the “change tracker”, which keeps track of which ele- ments have already been generated and which have changed. CppCodeGenerator first invokes the collector to build a list of applicable genera- tors. The collector obtains element-specific instances from the generator manager which provides instances from the built-in generator classes, or from genera- tors provided as extensions. Generators for elements which have already been generated but have not changed are pruned from the list. All element-specific generators are subclasses of AbstractCppGenerator. Once all the generators for elements to be generated are collected, and the list pruned according to which elements have changed, the code generator in- vokes the generate method for each element-specific generator instance. This will result in the specific generator leaving the resulting C++ model elements in the CppCodePattern instance. Finally the code generator will instruct the CppCodePattern instance to perform the model-to-text transformation by in- voking its write method. The code generator can be extended with the standard Eclipse extension mechanism: the org.eclipse.papyrusrt.codegen.cpp plugin contains an ex- tension point called generator which has two parameters: a type and a class, which must be provided by a user extension. The type parameter is one of the following: ClassGenerator, EnumGenerator, CapsuleGenerator, StateMachine- Generator, EmptyStateMachineGenerator, ProtocolGenerator, StructuralGen- erator, or ArtifactGenerator. The class parameter is a class implementing the AbstractCppGenerator.Factory interface which has a create method return- ing an instance of a subclass of AbstractCppGenerator for a given model ele- ment. This is, a user-provided generator must subclass AbstractCppGenerator, and must provide a factory method that creates its instances for a given model el- ement. The AbstractCppGenerator class defines an abstract generate method that must be implemented by its concrete subclasses. For example, a custom StateMachineGenerator must inherit from AbstractCppGenerator, and pro- vide its factory method which will receive as input a state-machine instance. The generator manager will invoke this factory method during the collection process described above, overriding built-in generators. 5 Final remarks Open-source software presents both challenges and opportunities for software developers in general, and for the MDE community in particular. While an OSS project may not necessarily have the same resources as a commercially-backed product, the transparency and ability of the community to contribute to it, may provide an edge leading to greater adoption. Both industry and academia benefit from such endeavor. Industrial users gain unrestricted access and that allows them to develop their own custom and domain-specific variants without incurring on the costs of developing a fully fledged product from scratch. Aca- demics can develop their ideas and proofs of concept on an industrial-strength platform where they may reach a larger audience. One of the major obstacles for the adoption of MDE is the lack of access to mature and robust tools. While PapyrusRT is still in its early stages of development, its open-source nature pro- vides a natural ground to grow into an environment that yields some of the core promises of MDE. It is with this philosophy that PapyrusRT was conceived. Acknowledgements This project is a collaboration between Zeligsoft (2009) Ltd., CEA LIST, Malina Software, and Ericsson. Bran Selic (Malina) has provided the UML-RT profile, its documentation, as well as substantial consulting in clarifying the semantics of the language. Peter Cigéhn from Tieto has provided invaluable input regarding requirements, as well as extensive testing. Andreas Henriksson from Ericsson has also provided requirements as well as contributing the RTCppProperties profile for C++ generation. IncQuery Labs contributed part of the xtUMLrt intermediate meta-model. At Zeligsoft, the project is led by Simon Redding, with Charles Rivet as Product Manager and Ernesto Posse as Software Developer working on code generation. Andrew Eidsness has been the principal software developer working on code generation and the runtime system (RTS). The RTS has been implemented mostly by Barry Maher. Other Zeligsoft contributors include Young-Soo Roh, Tim McGuire, Toby McClean and Stephanie Chafe. The group at CEA LIST, headed by Sébastien Gérard with Rémi Schnekenburger as project lead, also including Ansgar Radermacher, Camille Letavernier, Önder Gürcan and Céline Janssens from All4tec has worked on the tooling, validation import and CDT integration. References 1. Object Management Group. UML Superstructure Specification v2.5. http://www.omg.org/spec/UML/2.5/, September 2012. 2. B. Selic, G. Gullekson, and P. T. Ward. Real-Time Object Oriented Modeling. Wiley & Sons, 1994. 3. B. Selic and J. Rumbaugh. Using UML for modeling complex real-time systems. Whitepaper, Rational Software Corp., 1998.