Extending Java’s Communication Mechanisms for Multicore Processors George C. Wells Department of Computer Science Rhodes University Grahamstown, South Africa G.Wells@ru.ac.za ABSTRACT With the current trend towards the increased use of multicore processors, there is a growing need for simple, efficient parallel programming mechanisms. While Java has good support for multithreaded and distributed application development, our research into tuple-space systems for multicore processors highlighted a gap in the concurrency facilities available in Java. This arises in the context of independent applications (running in separate virtual ma- chines) that need to synchronise their activities or communicate with each other. There are several possible solutions to this problem, ranging from extensions to the language and/or runtime environment through to the use of distributed programming methods. Using the latter introduces considerable performance overheads, and so we explored the use of the Java Native Interface in order to take advantage of the interprocess communication (IPC) facilities provided by the underlying operating system. The analysis and comparison of the performance of the standard approaches and our prototype library suggest that there are real benefits to be gained by alternative approaches to the provision of IPC mechanisms for independent Java programs executing on multicore systems. We hope that these findings will spur further investigation of this problem and other possible solutions. 1 Introduction In recent years there has been a dramatic shift in the trends of computer architec- ture development, as processors have adopted symmetric multiprocessor (SMP) tech- niques, exemplified by the increasing prevalence of multicore processors (with in- creasing numbers of cores). This shift has serious implications for software devel- opment practises, forcing programmers to adopt parallel programming techniques. However, parallel programming is not simple — as Jim Larus notes “The popular [parallel] programming models ... are performance-focused, error-prone abstractions that developers find difficult to use”[1]. Java provides a wide range of parallel and distributed programming techniques. There has always been good support for multithreaded programs. At the other end 1 PPPJ’10 WiP Poster Abstract of the parallel–distributed programming spectrum, Java has strong support for var- ious distributed programming models. However, there is a distinct gap in the con- currency tools available in Java when it comes to interprocess communication between Java programs running on a shared-memory system in separate virtual machines. This system configuration can be extremely useful in some situations, but is currently only supported by means of distributed programming mechanisms, using the “loop- back” network connection. Our recent research has involved systems for multicore processors, using separate Java processes. When we encountered disappointing per- formance we were prompted to investigate this issue, and have subsequently devel- oped initial prototypes of a possible solution. These problems, possible solutions and our prototypes are discussed in this paper. 2 Concurrency in Java Java has always provided comprehensive support for multithreaded applications, with coordination provided through object-level locking[2]. This model uses a shared- address-space model — specifically, the communicating threads must be executing in the context of a single Java Virtual Machine (JVM). More recently, the basic multi- threaded facilities offered by Java were extended through the provision of the Java Concurrency Utilities[3]. With these sophisticated facilities available, there is very good support for the development of sophisticated parallel/multithreaded applica- tions in a shared-memory, shared-address-space environment. Java also provides very good support for distributed applications running across networks. At the lowest level, Java provides basic network classes that allow for the creation and use of sockets. In conjunction with I/O streams and object serialisation, these provide a simple, but flexible and powerful communication mechanism. At a slightly higher level of abstraction, the Enterprise Edition provides a comprehensive messaging API, known as the Java Message Service (JMS). One of the most well-known forms of distributed computing is remote proce- dure/method calling. In Java this is provided by both Remote Method Invocation (RMI) and the Common Object Request Broker Architecture (CORBA). At one of the highest levels of abstraction, Java provides support for object- or tuple-space systems through JavaSpaces[4], which is based on the Linda coordina- tion language developed at Yale[5]. As a very high-level abstraction of the coordina- tion activities (i.e. communication and synchronisation) of a distributed application, Linda is very easy to use, but performance may be problematic. The development of Linda systems in Java and their optimisation has long been a focus of ours[6], and it was our research in this area that led to the investigation described in this paper. 3 The Problem The trend towards multicore processors led us to develop an implementation of our Linda system (eLinda) for such systems. A vital part of any Linda system is the component that manages the data. In our current system this is implemented as a “server”, run as a stand-alone process. There are many good reasons for this, not least that it provides a very useful separation between the eLinda system and the client applications making use of it. Besides the logical separation of concerns that this architecture provides, it also helps address security or reliability concerns, as the interaction between a client application and the eLinda system is limited to the ex- plicit communication between the processes. For example, the eLinda system cannot 2 PPPJ’10 WiP Poster Abstract access any data that is not explicitly passed to it, and exceptions and errors that might arise in the eLinda system do not directly impact the client application(s). We believe that as the use of multicore processors increases it will be increasingly important to support development of complex applications composed of separate processes. This application architecture bridges the gap between multithreaded ap- plications, and distributed applications. However, at present there are few options available for communication between separate processes in Java. The only widely- available solution is to make use of Java’s distributed programming facilities (using the loop-back network). When we used this for the eLinda system, the limitations of this approach became apparent, especially in respect of performance. Our intuition suggested that messages were having to work through the levels of the TCP/IP com- munication stack, much of which is irrelevant to communication between processes executing on a single computer system. 3.1 Possible Solutions There are a few possible solutions to this problem, all of which ultimately rely on using the mechanisms provided by the underlying operating system. A simple ap- proach to exploiting the underlying operating system’s IPC facilities is through the use of the Java Native Interface (JNI)[7]. This approach was adopted for our initial investigation, which forms the basis of this paper. The benefits and drawbacks of this approach are discussed in more detail later in this paper, but an obvious (and signifi- cant) disadvantage is the loss of application portability. An alternative approach would be to modify the Java Virtual Machine and the Java compiler to provide direct support for IPC. This would be considerably more complicated, and would require modifications to the Java language, and to the JVM. However, it would probably bring additional performance benefits, and could help address the portability problems inherent in using JNI. 4 The IPC Prototypes Unix-based operating systems provide a wide range of IPC mechanisms. Specifically, the release of Unix System V in the 1980s introduced the so-called System V IPC facilities. These include sophisticated message queues, semaphore sets and shared memory segments. In addition to these, Unix systems support the concept of pipes, and more specifically named pipes, for IPC. An initial prototype library was devel- oped (called LinuxIPC), providing access to these IPC facilities using JNI, for Ubuntu 8.04. This was later ported to the Solaris 10 operating system (called SolarisIPC). Both packages provide relatively complete access to the underlying operating system’s IPC mechanisms, together with other useful support functions. In order to simplify the use of these facilities, further classes were developed that implemented I/O streams using message queues, and shared memory (synchronised using semaphores). These greatly simplify the use of the IPC packages as they may be used in conjunction with Java’s data-formatting and object serialisation streams. The IPC stream classes also provide a useful degree of abstraction, isolating applica- tions from almost all of the details of the IPC system calls. The initial shared memory stream implementation was done using the semaphores and shared memory directly, but the performance was found to be poor, due to the overheads of making native calls. A customised native class was developed that integrated the use of the shared memory and the semaphores into a single native method, thus minimising the num- ber of calls to native methods. This provided a useful performance improvement, 3 PPPJ’10 WiP Poster Abstract Figure 1: Simple Benchmark Results for Solaris. evident in the results below. 5 Results Results for a dual-core Intel processor and Ubuntu were presented previously[8]. These have now been extended with results for an eight-core UltraSPARC T2 pro- cessor running Solaris (Sun T6320 server with Solaris 10). These results illustrate the problems inherent with the use of network-based IPC. The results shown in Figure 1 are for a simple communication benchmark, and highlight the relative efficiency of each of the different IPC methods (network sock- ets, named pipes/FIFOs, message queues, shared memory and semaphores, and op- timised shared memory streams), comparing them with network sockets. The times reported are for a round-trip message between two processes, carrying minimal data. As is clear from the figure, explicitly using sockets and semaphores is less efficient, but the other forms of IPC are more efficient than using sockets. The use of named pipes is by far the most efficient of the mechanisms, which is not surprising, as it only uses JNI in order to create the named pipe, whereafter it is accessed using standard file streams. As can also be seen from these results, the integrated shared memory streams are more efficient than the version using explicit calls to the shared memory and semaphore facilities, as discussed in Section 4 (the improvement is 19.4%). Performance was also investigated for varying volumes of data (Figure 2). These results follow the expected pattern of the time taken to send a message increasing with the data volume. Notably, the named pipe version is also the most efficient for all message sizes, by a significant margin. However, the network-based version is more efficient than the other forms of IPC for larger messages. 4 PPPJ’10 WiP Poster Abstract Figure 2: Results for Varying Size Data Transfers. 6 Discussion and Conclusions The results presented above clearly indicate that network sockets are not the most efficient mechanism for providing IPC in Java, thus confirming our initial concerns about this approach. The results also indicate that named pipes are the most efficient form of communication by a significant margin. This is particularly pleasing because using named pipes makes minimal use of JNI, thus minimising the impact on pro- gram portability1 . 6.1 Possible Extensions and Future Research The research described in this paper presents a preliminary overview of the problems involved in IPC for Java processes, and a prototype of one possible solution. Notably, our work until now has been restricted to Unix-based operating systems. An inves- tigation of the IPC facilities available under Windows would be useful, leading to the development of a JNI-based solution. In particular, this would allow for an initial assessment of the performance of native IPC facilities compared with socket-based communication under Windows. Similar research for Mac OS X and other common operating systems would also be useful in terms of characterising the extent to which there is a need for alternatives to socket-based communication for Java processes. Such a survey of the IPC facilities offered by different operating systems would also be useful in terms of establishing what, if any, mechanisms are common to all widely- used operating systems. If a common subset of IPC facilities could be identified, a Java package could be developed for a common abstraction, providing portability at the source code level. As mentioned in Section 3.1, a potentially better solution than using JNI would be to extend the Java language with IPC operations. This would provide a great deal of power and flexibility, and could take almost any form desired. In particular, some form of light-weight remote-object-access protocol (similar to RMI, but intended only 1 The use of JNI may even be avoided completely, as the named pipes can be created independently. 5 PPPJ’10 WiP Poster Abstract for IPC in an SMP environment) would be very useful, as it would allow program- mers to use familiar, object-oriented techniques to build parallel applications with good support for separation of concerns, and security. Whatever the form of the final solution, we believe that the current trend towards the wide-spread use of SMP architectures will continue, and will provide significant challenges for the development of software able to exploit the potential performance of these systems. While Java’s multithreading facilities provide an excellent solu- tion to many problems, they do not provide adequate isolation of semi-independent program components. Furthermore, the use of network-based mechanisms has been shown to offer poor performance for IPC in an SMP environment. The research pre- sented here provides an initial characterisation of these issues, and some indication of the potential for improved performance. We hope the Java community will start to explore these issues more widely, and that ultimately a portable, generic and high- performance solution will be found. Acknowledgments This research was performed while visiting the Department of Computer Science at the Uni- versity of California Davis at the kind invitation of Dr. Raju Pandey. Access to the Sun multi- core processor hardware for the Solaris testing was generously provided by Sun Microsystems through the Sun Partner Advantage Program. This project was funded by the South African National Research Foundation (NRF). Financial support was also received from the Distributed Multimedia Centre of Excellence (funded by Telkom, Comverse, Tellabs, Stortech, Amatole Telecom Services, Bright Ideas 39 and THRIP), and from Rhodes University. References [1] James Larus. Spending Moore’s dividend. Commun. ACM, 52(5):62–69, 2009. [2] D. Lea. Concurrent Programming in Java: Design Principles and Patterns. Prentice Hall, 2nd edition, 1999. [3] Java Community Process. JSR 166: Concurrency utilities. September 2004. [4] Philip Bishop and Nigel Warren. JavaSpaces in Practice. Addison Wesley, 2002. [5] David Gelernter. Generative communication in Linda. ACM Trans. Program. Lang. Syst., 7(1):80–112, January 1985. [6] G.C. Wells, A.G. Chalmers, and P.G. Clayton. Linda implementations in Java for concurrent systems. Concurrency and Computation: Practice and Experience, 16:1005– 1022, August 2004. [7] Sun Microsystems, Inc. Java native interface 5.0 specification. 2003. [8] G.C. Wells. Interprocess communication in Java. In H.R. Arabnia, editor, Proc. 2009 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’09), pages 407–413, Las Vegas, July 2009. CSREA Press. 6