=Paper=
{{Paper
|id=Vol-3043/short9
|storemode=property
|title=Instrumenting C and Fortran Software With Kieker
|pdfUrl=https://ceur-ws.org/Vol-3043/short9.pdf
|volume=Vol-3043
|authors=Reiner Jung,Sven Gundlach,Wilhelm Hasselbring
|dblpUrl=https://dblp.org/rec/conf/kpdays/JungGH21
}}
==Instrumenting C and Fortran Software With Kieker==
Instrumenting C and Fortran Software With Kieker Reiner Jung1 , Sven Gundlach1 and Wilhelm Hasselbring1 1 Kiel Unversity, Christian-Albrechts-Platz 4, 24103 Kiel, Germany Abstract Kieker is a versatile monitoring framework well established for performance analysis and dynamic architecture recovery for JVM languages. As applications are not only written in Java, in the past Kieker has been supplemented with probes for Perl, Pascal, and C++ in context of embedded systems. In the context of our project OceanDSL, which aims to provide DSLs for ocean system models, we need to comprehend existing climate models written in Fortran and C. Thus, probes for C and Fortran were required. In this paper, we report on our efforts in realizing minimal invasive monitoring utilizing compiler features, probes, and tools tailored to instrument application based on these languages. Keywords Application Level Monitoring, Kieker, Instrumentation, C, Fortran 1. Introduction Performance analysis and architecture recovery are key tasks software engineers perform, especially for long-living systems [1]. Kieker provides the facilities to observe applications at runtime and analyze performance and architectures based on runtime and design time data [2, 3]. Kieker was first developed for JVM-based languages, but provides monitoring probes for Perl, Visual Basic, and DotNet. In the context of our project OceanDSL [4], which aims to provide DSLs for ocean system models, we need to comprehend existing climate models written in Fortran and C. Thus, probes for C and Fortran were required. In this paper, we report on the Kieker language pack for C, utilizing the GNU Compiler Collection (GCC) [5] and compatible compilers. As GCC supports additional languages beside C and Fortran, our probes can also be used with these languages. We illustrate the probes, the adjunct tooling, and the process of the language pack using two Earth System Climate Models (ESCMs). The Kieker language pack for C is introduced in Section 2 and its application to the two examples is illustrated in Section 3. Finally, we provide a summary and outlook in Section 4. SSP’21: Symposium on Software Performance, November 09–10, 2021, Leipzig, Germany " reiner.jung@email.uni-kiel.de (R. Jung); sven.gundlach@email.uni-kiel.de (S. Gundlach); hasselbirng@email.uni-kiel.de (W. Hasselbring) 0000-0002-5464-8561 (R. Jung); 0000-0003-4060-2754 (S. Gundlach); 0000-0001-6625-4335 (W. Hasselbring) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 2. Instrumentation The overall Kieker architecture comprises of a monitoring component which is embedded with a weaving technique into an application, an analysis component located outside the application, and a set of shared data structures for monitoring events. The communication between both is realized via files, TCP and a variety of other means of transport and storage. To support C and other object file languages, we only extend the monitoring side while we reuse the analyses provided by Kieker and Kieker-based projects. Furthermore the Kieker language pack for C [6] supports only binary logging via TCP which allows offsite logging and has the least impact on the analyzed application. In the following, we introduce all the necessary building blocks to monitor such programs. Configuration Similar to its Java cousin, the C variant of Kieker monitoring uses a configu- ration file to set parameters. The configuration file can be placed anywhere in the file system. To inform Kieker where the file can be found, a KIEKER_CONFIG environment variable must be set. In case the variable is not set, Kieker will work with built-in defaults. Event Types Kieker uses a language independent notation for its event types called Instru- mentation Record Language (IRL) [7]. With the IRL generator, we can generate event types for various languages like C or Fortran. The generator produces structs and typedefs for all Kieker events alongside serializer functions that produce binary serializations using big-endian encoding also known as network byte order. Kieker’s default binary protocol uses string tables or string registries to avoid redundantly transferred string values. Each unique string is given an ID and sent once following the Kieker format specification for binary data. In the serialized event, the string is then represented by its ID and the string is sent ahead of the event by the monitoring controller. Monitoring The monitoring component handles strings, timestamps, trace metadata, and writing the data via TCP. While in Java the number of event types can be resolved at runtime, in C this is done at compile time. Thus, event types have to be registered upfront. Therefore, three default flow events are currently supported, i.e., TraceMetadata, BeforeOperationEvent and AfterOperationEvent. All events can be overwritten with a event type mapping file. In the current implementation, logging is supported via TCP. While other options can be added, in all our use cases it is beneficial to be able to collect and process the monitoring data in a separate process, preferably on a separate machine to reduce interference. The TCP writer can be configured with two parameters, for the destination host and port where a collector or analysis tool is listening. The defaults for both values are localhost and port 5678. Probes The Kieker language pack for C provides probes for manual instrumentation, i.e., modifying the code, and probes to be woven in by compilers utilizing the GCC instrumentation facility. The latter declares two functions for the entry and exit points of a function, respectively. The manual probe functions have two parameters: class_signature and operation_signature. As C does not have classes, the class signature contains the name of the file where the function is defined in. Further we use the name class signature to conform to the Java induced naming scheme. Therefore the operation signature contains the function signature or a pointer. The GCC based probes reuse the manual probes. However, the GCC instrumentation interface does not provide names for files and operations. While this data can be linked to an executable, they are not available at runtime. Instead, the caller and callee are represented by a memory address. Thus, we use the address as operation signature and use a placeholder as class signature. Weaving The probes are injected with the instrumentation feature of the GCC [5] and the Intel Fortran compiler [8]. This is activated via the compiler (option -finstrument-functions) and weaves two calls into each function that are invoked when a function is entered or left, respectively (cf. Listing 1). We had used AspectC++ in the past, but it is only able to process C and C++ code [9]. Listing 1: GCC instrumentation function names [10] void _ _ c y g _ p r o f i l e _ f u n c _ e n t e r ( void ∗ t h i s _ f n , void ∗ c a l l _ s i t e ) _ _ a t t r i b u t e _ _ ( ( n o _ i n s t r u m e n t _ f u n c t i o n ) ) ; void _ _ c y g _ p r o f i l e _ f u n c _ e x i t ( void ∗ t h i s _ f n , void ∗ c a l l _ s i t e ) _ _ a t t r i b u t e _ _ ( ( n o _ i n s t r u m e n t _ f u n c t i o n ) ) ; The weaving should work with any compiler optimization settings. However, we suggest to avoid inlining functions. Also to resolve function names after data collection, debugging information must be added by the compiler (option -g). To collect monitoring data and store it in log files, Kieker provides a collector that listens on a TCP port (default 5678) for monitoring data and stores the data in Kieker log files supporting all logging features, including compression. Further, in case the collector is too slow and blocking the monitored application, monitoring data can also be collected with netcat and then replayed to the collector afterwards. Post-Processing Post processing is required when the GCC instrumentation function has been used, as the log only contains function pointers. Thus, we developed a log rewrite tool that resolves function and file names in the Kieker log utilizing the addr2line tool and the binary program file containing debugging symbols. The tool can be found in OceanDSL tool project.1 3. Application to Climate Models We applied the language support to two existing earth system climate models, i.e., MITgcm [11] and UVic [12]. To instrument them with Kieker, we used an instrumentation feature respectively a compiler option -finstrument-functions. The illustration of the required setup with a short introduction is as follows. MITgcm is a fairly modular general circulation model from the Massachusetts Institute of Technology. It is used to simulate the atmosphere and the ocean. It is configured and 1 OceanDSL toolshttps://git.se.informatik.uni-kiel.de/oceandsl/oceandsl-java-tools parameterized via a large set of small configuration and parameterization files which allow to setup different experiments. These can be seen as different variants of the model. MITgcm compiles with GCC and provides a set of configuration files for different architec- tures and compilers. The setup required only to modify the existing setup files by extending linux_amd64_gfortran and deactivating the optimization by setting the compiler flags for C and Fortran and lastly to extend the library path (cf. Listing 2). Listing 2: MITgcm configuration FOPTIM= " " F90OPTIM= " " FFLAGS= " $FFLAGS − f i n s t r u m e n t − f u n c t i o n s −g " CFLAGS= " $CFLAGS − f i n s t r u m e n t − f u n c t i o n s −g " LI BS = " $ L I B S −L / u s r / l i b / x86_64 − l i n u x −gnu \ −L / k i e k e r − l a n g − pack − c / l i b k i e k e r / . l i b s − l k i e k e r − l d l " UVic The second earth system climate model UVic is from the University of Victoria. In contrast to MITgcm it is a monolithic applications which is configured and parameterized with two files able to toggle features and other settings. These feature toggles also allow to create different variants. However, each toggle affects various locations within the code base. Figure 1 shows an architecture recreation of UVic based on dynamic and static analysis with Kieker. The compilation and linking configuration is mingled with many other settings in the mk.in file. For the setup we had to extend the library path and set the correct parameters for the compiler as shown in Listing 3. 2 <> UVic.unknown 2 < > 22 UVic.rot 2016 2 16 < > < > 1 360 2 6 UVic.ice UVic.mom 72 24 4 12 2 < > 8 < > 2 48 UVic.mtlm UVic.embm < > 5 2 576 UVic.common 8 6 < > 1 UVic.netcdf 12 Figure 1: The module structure of the UVic setup. White boxes indicate the modules based on static and dynamic data. Green and blue boxes are derived solely from dynamic or static data, respectively. Listing 3: UVic configuration L i b r a r i e s = − l n e t c d f − l n e t c d f f −L / u s r / l i b / x86_64 − l i n u x −gnu \ −L / k i e k e r − l a n g − pack − c / l i b k i e k e r / . l i b s − l k i e k e r C o m p i l e r _ F = i f o r t − r 8 −g − f i n s t r u m e n t − f u n c t i o n s −O0 \ −warn n o u n c a l l e d − c C o m p i l e r _ f = i f o r t − r 8 −g − f i n s t r u m e n t − f u n c t i o n s −O0 \ −warn n o u n c a l l e d − c L i n k e r = i f o r t − r 8 −g − f i n s t r u m e n t − f u n c t i o n s −O0 \ −warn n o u n c a l l e d −o As both examples show, the integration is fairly simple and can easily be introduced into build environments. 4. Conclusions We present our Kieker language pack for C which can also be used with other programming languages that are supported by GCC and compatible compilers. The language pack provides an implementation of all Kieker event data types and probes for manual and compile time instrumentation using the compiler instrumentation facilities. We demonstrated the application of these probes with the GNU Compiler Collection and the Intel Fortran Compiler, applied to two existing earth system climate models. Furthermore, we discussed necessary tooling to collect and post process the monitoring data. For future work, we will extend the set of probes to be able to log instances and add support for AspectC++ as an alternative instrumentation technique. The current implementation does not support adaptive monitoring which we will add in the future utilizing the Kieker Probe Controller. Furthermore, we will contribute the essential tooling from OceanDSL to Kieker. The Kieker language pack for C shows that it is simple to provide monitoring for other programming languages and technologies with Kieker. The C implementation shows that large software systems in C and Fortran can be dynamically analyzed with little effort for the developer, as the weaving can be controlled with a few compiler options. Acknowledgments Funded by the Deutsche Forschungsgemeinschaft (DFG – German Research Foundation), grant no. HA 2038/8-1 – 425916241. References [1] U. Goltz, R. Reussner, M. Goedicke, W. Hasselbring, L. Märtin, B. Vogel-Heuser, Design for future: managed software evolution, Computer Science – Research and Development 30 (2015) 321–331. doi:10.1007/s00450-014-0273-9. [2] W. Hasselbring, A. van Hoorn, Kieker: A monitoring framework for software engineering research, Software Impacts 5 (2020). doi:10.1016/j.simpa.2020.100019. [3] A. van Hoorn, J. Waller, W. Hasselbring, Kieker: A framework for application performance monitoring and dynamic software analysis, in: Proceedings of the 3rd ACM/SPEC Inter- national Conference on Performance Engineering (ICPE 2012), ACM, 2012, pp. 247–248. doi:10.1145/2188286.2188326. [4] R. Jung, S. Gundlach, S. Simonov, W. Hasselbring, Developing domain-specific languages for ocean modeling, in: Proceedings of the 8th Collaborative Workshop on Evolution and Maintenance of Long-Living Software Systems (EMLS 2021), volume 2814 of CEUR, 2021. URL: http://ceur-ws.org/Vol-2814/. [5] GCC-GNU, The GNU Compiler Collection, 2021. URL: https://gcc.gnu.org. [6] Kieker Project, Kieker Language Pack for C, 2021. URL: https://github.com/ kieker-monitoring/kieker-lang-pack-c.git. [7] R. Jung, C. Wulf, Advanced typing for the Kieker instrumentation languages, in: Sympo- sium on Software Performance 2016, 2016. URL: http://oceanrep.geomar.de/34626/. [8] Intel Corporation, Intel Fortran Compiler, 2021. URL: https://software.intel.com/content/ www/us/en/develop/tools/oneapi/components/fortran-compiler.html. [9] O. Spinczyk, et al., AspectC++ an aspect-oriented extension to the C++ programming language, in: Proceedings of the Fortieth International Conference on Tools Pacific: Objects for internet, mobile and embedded applications, 2002, pp. 53–60. [10] J. Racine, The Cygwin tools: a GNU toolkit for Windows, 2000. [11] V. Artale, S. Calmanti, et al., An atmosphere–ocean regional climate model for the mediter- ranean area: assessment of a present climate simulation, Climate Dynamics 35 (2010) 721–740. doi:10.1007/s00382-009-0691-8. [12] A. J. Weaver, M. Eby, et al., The UVic earth system climate model: Model description, climatology, and applications to past, present and future climates, Atmosphere-Ocean 39 (2001) 361–428. doi:10.1080/07055900.2001.9649686.