<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extending Java's Communication Mechanisms for Multicore Processors</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>George C. Wells Department of Computer Science Rhodes University Grahamstown</institution>
          ,
          <country country="ZA">South Africa</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the current trend towards the increased use of multicore processors, there is a growing need for simple, efficient parallel programming mechanisms. While Java has good support for multithreaded and distributed application development, our research into tuple-space systems for multicore processors highlighted a gap in the concurrency facilities available in Java. This arises in the context of independent applications (running in separate virtual machines) that need to synchronise their activities or communicate with each other. There are several possible solutions to this problem, ranging from extensions to the language and/or runtime environment through to the use of distributed programming methods. Using the latter introduces considerable performance overheads, and so we explored the use of the Java Native Interface in order to take advantage of the interprocess communication (IPC) facilities provided by the underlying operating system. The analysis and comparison of the performance of the standard approaches and our prototype library suggest that there are real benefits to be gained by alternative approaches to the provision of IPC mechanisms for independent Java programs executing on multicore systems. We hope that these findings will spur further investigation of this problem and other possible solutions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In recent years there has been a dramatic shift in the trends of computer
architecture development, as processors have adopted symmetric multiprocessor (SMP)
techniques, exemplified by the increasing prevalence of multicore processors (with
increasing numbers of cores). This shift has serious implications for software
development practises, forcing programmers to adopt parallel programming techniques.
However, parallel programming is not simple — as Jim Larus notes “The popular
[parallel] programming models ... are performance-focused, error-prone abstractions
that developers find difficult to use”[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Java provides a wide range of parallel and distributed programming techniques.
There has always been good support for multithreaded programs. At the other end
of the parallel–distributed programming spectrum, Java has strong support for
various distributed programming models. However, there is a distinct gap in the
concurrency tools available in Java when it comes to interprocess communication between
Java programs running on a shared-memory system in separate virtual machines.
This system configuration can be extremely useful in some situations, but is currently
only supported by means of distributed programming mechanisms, using the
“loopback” network connection. Our recent research has involved systems for multicore
processors, using separate Java processes. When we encountered disappointing
performance we were prompted to investigate this issue, and have subsequently
developed initial prototypes of a possible solution. These problems, possible solutions and
our prototypes are discussed in this paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Concurrency in Java</title>
      <p>
        Java has always provided comprehensive support for multithreaded applications,
with coordination provided through object-level locking[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This model uses a
sharedaddress-space model — specifically, the communicating threads must be executing in
the context of a single Java Virtual Machine (JVM). More recently, the basic
multithreaded facilities offered by Java were extended through the provision of the Java
Concurrency Utilities[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. With these sophisticated facilities available, there is very
good support for the development of sophisticated parallel/multithreaded
applications in a shared-memory, shared-address-space environment.
      </p>
      <p>Java also provides very good support for distributed applications running across
networks. At the lowest level, Java provides basic network classes that allow for the
creation and use of sockets. In conjunction with I/O streams and object serialisation,
these provide a simple, but flexible and powerful communication mechanism. At a
slightly higher level of abstraction, the Enterprise Edition provides a comprehensive
messaging API, known as the Java Message Service (JMS).</p>
      <p>One of the most well-known forms of distributed computing is remote
procedure/method calling. In Java this is provided by both Remote Method Invocation
(RMI) and the Common Object Request Broker Architecture (CORBA).</p>
      <p>
        At one of the highest levels of abstraction, Java provides support for object- or
tuple-space systems through JavaSpaces[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which is based on the Linda
coordination language developed at Yale[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. As a very high-level abstraction of the
coordination activities (i.e. communication and synchronisation) of a distributed application,
Linda is very easy to use, but performance may be problematic. The development of
Linda systems in Java and their optimisation has long been a focus of ours[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and it
was our research in this area that led to the investigation described in this paper.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 The Problem</title>
      <p>The trend towards multicore processors led us to develop an implementation of our
Linda system (eLinda) for such systems. A vital part of any Linda system is the
component that manages the data. In our current system this is implemented as a
“server”, run as a stand-alone process. There are many good reasons for this, not
least that it provides a very useful separation between the eLinda system and the
client applications making use of it. Besides the logical separation of concerns that
this architecture provides, it also helps address security or reliability concerns, as the
interaction between a client application and the eLinda system is limited to the
explicit communication between the processes. For example, the eLinda system cannot
access any data that is not explicitly passed to it, and exceptions and errors that might
arise in the eLinda system do not directly impact the client application(s).</p>
      <p>We believe that as the use of multicore processors increases it will be increasingly
important to support development of complex applications composed of separate
processes. This application architecture bridges the gap between multithreaded
applications, and distributed applications. However, at present there are few options
available for communication between separate processes in Java. The only
widelyavailable solution is to make use of Java’s distributed programming facilities (using
the loop-back network). When we used this for the eLinda system, the limitations of
this approach became apparent, especially in respect of performance. Our intuition
suggested that messages were having to work through the levels of the TCP/IP
communication stack, much of which is irrelevant to communication between processes
executing on a single computer system.
3.1</p>
      <sec id="sec-3-1">
        <title>Possible Solutions</title>
        <p>
          There are a few possible solutions to this problem, all of which ultimately rely on
using the mechanisms provided by the underlying operating system. A simple
approach to exploiting the underlying operating system’s IPC facilities is through the
use of the Java Native Interface (JNI)[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. This approach was adopted for our initial
investigation, which forms the basis of this paper. The benefits and drawbacks of this
approach are discussed in more detail later in this paper, but an obvious (and
significant) disadvantage is the loss of application portability.
        </p>
        <p>An alternative approach would be to modify the Java Virtual Machine and the
Java compiler to provide direct support for IPC. This would be considerably more
complicated, and would require modifications to the Java language, and to the JVM.
However, it would probably bring additional performance benefits, and could help
address the portability problems inherent in using JNI.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 The IPC Prototypes</title>
      <p>Unix-based operating systems provide a wide range of IPC mechanisms. Specifically,
the release of Unix System V in the 1980s introduced the so-called System V IPC
facilities. These include sophisticated message queues, semaphore sets and shared
memory segments. In addition to these, Unix systems support the concept of pipes,
and more specifically named pipes, for IPC. An initial prototype library was
developed (called LinuxIPC), providing access to these IPC facilities using JNI, for Ubuntu
8.04. This was later ported to the Solaris 10 operating system (called SolarisIPC). Both
packages provide relatively complete access to the underlying operating system’s IPC
mechanisms, together with other useful support functions.</p>
      <p>In order to simplify the use of these facilities, further classes were developed that
implemented I/O streams using message queues, and shared memory (synchronised
using semaphores). These greatly simplify the use of the IPC packages as they may
be used in conjunction with Java’s data-formatting and object serialisation streams.
The IPC stream classes also provide a useful degree of abstraction, isolating
applications from almost all of the details of the IPC system calls. The initial shared memory
stream implementation was done using the semaphores and shared memory directly,
but the performance was found to be poor, due to the overheads of making native
calls. A customised native class was developed that integrated the use of the shared
memory and the semaphores into a single native method, thus minimising the
number of calls to native methods. This provided a useful performance improvement,
evident in the results below.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>
        Results for a dual-core Intel processor and Ubuntu were presented previously[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
These have now been extended with results for an eight-core UltraSPARC T2
processor running Solaris (Sun T6320 server with Solaris 10). These results illustrate the
problems inherent with the use of network-based IPC.
      </p>
      <p>The results shown in Figure 1 are for a simple communication benchmark, and
highlight the relative efficiency of each of the different IPC methods (network
sockets, named pipes/FIFOs, message queues, shared memory and semaphores, and
optimised shared memory streams), comparing them with network sockets. The times
reported are for a round-trip message between two processes, carrying minimal data.
As is clear from the figure, explicitly using sockets and semaphores is less efficient,
but the other forms of IPC are more efficient than using sockets. The use of named
pipes is by far the most efficient of the mechanisms, which is not surprising, as it only
uses JNI in order to create the named pipe, whereafter it is accessed using standard
file streams. As can also be seen from these results, the integrated shared memory
streams are more efficient than the version using explicit calls to the shared memory
and semaphore facilities, as discussed in Section 4 (the improvement is 19.4%).</p>
      <p>Performance was also investigated for varying volumes of data (Figure 2). These
results follow the expected pattern of the time taken to send a message increasing
with the data volume. Notably, the named pipe version is also the most efficient for
all message sizes, by a significant margin. However, the network-based version is
more efficient than the other forms of IPC for larger messages.</p>
    </sec>
    <sec id="sec-6">
      <title>Discussion and Conclusions</title>
      <p>The results presented above clearly indicate that network sockets are not the most
efficient mechanism for providing IPC in Java, thus confirming our initial concerns
about this approach. The results also indicate that named pipes are the most efficient
form of communication by a significant margin. This is particularly pleasing because
using named pipes makes minimal use of JNI, thus minimising the impact on
program portability1.
6.1</p>
      <sec id="sec-6-1">
        <title>Possible Extensions and Future Research</title>
        <p>The research described in this paper presents a preliminary overview of the problems
involved in IPC for Java processes, and a prototype of one possible solution. Notably,
our work until now has been restricted to Unix-based operating systems. An
investigation of the IPC facilities available under Windows would be useful, leading to
the development of a JNI-based solution. In particular, this would allow for an initial
assessment of the performance of native IPC facilities compared with socket-based
communication under Windows. Similar research for Mac OS X and other common
operating systems would also be useful in terms of characterising the extent to which
there is a need for alternatives to socket-based communication for Java processes.
Such a survey of the IPC facilities offered by different operating systems would also
be useful in terms of establishing what, if any, mechanisms are common to all
widelyused operating systems. If a common subset of IPC facilities could be identified, a Java
package could be developed for a common abstraction, providing portability at the
source code level.</p>
        <p>As mentioned in Section 3.1, a potentially better solution than using JNI would
be to extend the Java language with IPC operations. This would provide a great deal
of power and flexibility, and could take almost any form desired. In particular, some
form of light-weight remote-object-access protocol (similar to RMI, but intended only
1The use of JNI may even be avoided completely, as the named pipes can be created independently.
for IPC in an SMP environment) would be very useful, as it would allow
programmers to use familiar, object-oriented techniques to build parallel applications with
good support for separation of concerns, and security.</p>
        <p>Whatever the form of the final solution, we believe that the current trend towards
the wide-spread use of SMP architectures will continue, and will provide significant
challenges for the development of software able to exploit the potential performance
of these systems. While Java’s multithreading facilities provide an excellent
solution to many problems, they do not provide adequate isolation of semi-independent
program components. Furthermore, the use of network-based mechanisms has been
shown to offer poor performance for IPC in an SMP environment. The research
presented here provides an initial characterisation of these issues, and some indication
of the potential for improved performance. We hope the Java community will start
to explore these issues more widely, and that ultimately a portable, generic and
highperformance solution will be found.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Acknowledgments</title>
        <p>This research was performed while visiting the Department of Computer Science at the
University of California Davis at the kind invitation of Dr. Raju Pandey. Access to the Sun
multicore processor hardware for the Solaris testing was generously provided by Sun Microsystems
through the Sun Partner Advantage Program. This project was funded by the South African
National Research Foundation (NRF). Financial support was also received from the Distributed
Multimedia Centre of Excellence (funded by Telkom, Comverse, Tellabs, Stortech, Amatole
Telecom Services, Bright Ideas 39 and THRIP), and from Rhodes University.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>James</given-names>
            <surname>Larus</surname>
          </string-name>
          .
          <article-title>Spending Moore's dividend</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>52</volume>
          (
          <issue>5</issue>
          ):
          <fpage>62</fpage>
          -
          <lpage>69</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lea</surname>
          </string-name>
          .
          <article-title>Concurrent Programming in Java: Design Principles and Patterns</article-title>
          .
          <source>Prentice Hall, 2nd edition</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Java</given-names>
            <surname>Community</surname>
          </string-name>
          <article-title>Process</article-title>
          . JSR 166:
          <article-title>Concurrency utilities</article-title>
          .
          <source>September</source>
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Philip</given-names>
            <surname>Bishop</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nigel</given-names>
            <surname>Warren</surname>
          </string-name>
          .
          <source>JavaSpaces in Practice. Addison Wesley</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>David</given-names>
            <surname>Gelernter</surname>
          </string-name>
          .
          <article-title>Generative communication in Linda</article-title>
          .
          <source>ACM Trans. Program. Lang. Syst.</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <fpage>80</fpage>
          -
          <lpage>112</lpage>
          ,
          <year>January 1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.C.</given-names>
            <surname>Wells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Chalmers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.G.</given-names>
            <surname>Clayton</surname>
          </string-name>
          .
          <article-title>Linda implementations in Java for concurrent systems</article-title>
          .
          <source>Concurrency and Computation: Practice and Experience</source>
          ,
          <volume>16</volume>
          :
          <fpage>1005</fpage>
          -
          <lpage>1022</lpage>
          ,
          <year>August 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Sun</given-names>
            <surname>Microsystems</surname>
          </string-name>
          , Inc.
          <source>Java native interface 5</source>
          .0 specification.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.C.</given-names>
            <surname>Wells</surname>
          </string-name>
          .
          <article-title>Interprocess communication in Java</article-title>
          . In H.R. Arabnia, editor,
          <source>Proc. 2009 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'09)</source>
          , pages
          <fpage>407</fpage>
          -
          <lpage>413</lpage>
          ,
          <string-name>
            <surname>Las</surname>
            <given-names>Vegas</given-names>
          </string-name>
          ,
          <year>July 2009</year>
          . CSREA Press.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>