<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RuLAM Project:</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ivo Anjo</string-name>
          <email>ivo.anjo@ist.utl.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>João Cachopo ESW/INESC-ID Lisboa</string-name>
          <email>joao.cachopo@ist.utl.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Superior Técnico (UTL) Rua Alves Redol 9</institution>
          ,
          <addr-line>1000-029 Lisboa</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The era of multicore processors, capable of running multiple tasks concurrently, has arrived. Sadly, most existing software and even new applications being developed are not ready to take advantage of these new multiprocessing capabilities, and, thus, more processing cores do not translate into better performance when executing these applications. To tackle this problem, we envision the creation of a runtime environment that is able to parallelize automatically existing sequential applications without requiring any manual modification to those applications. To achieve this goal, we are exploring thread-level speculation supported by a Software Transactional Memory (STM), rather than relying on hardware-based mechanisms. In this paper, we present our work in the RuLAM project, which targets speculative parallelization on the Java platform.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The transition to multicore architectures is ongoing. Modern chip designers have
resigned their quest for sequential execution speed, and have shifted their resources to
building multiprocessing machines that can work on many tasks concurrently.</p>
      <p>It is only when all of the available processing cores are working that the new
multicore machines fully realize their potential. Yet, most of the software we run on
our computers today fails to take advantage of multicores, having little, if any,
parallelism.</p>
      <p>
        Regrettably, concurrent programming still presents a far greater challenge than
sequential programming. Furthermore, even though new applications may take
advantage of multicore architectures, a vast majority of existing code is still sequential
and it is not feasible to rewrite it within a reasonable time frame. Thus, an enticing
alternative is to parallelize applications automatically. In fact, there is a considerable
amount of research on parallelizing compilers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that has produced results in the area
      </p>
      <p>
        This work was supported by the FCT (INESC-ID multiannual funding) through the PIDDAC Program
funds and by the RuLAM project (PTDC/EIA-EIA/108240/2008).
of high performance computing for scientific applications. The problem is that
parallel computing is no longer confined to the realms of these applications. The programs
that we want to parallelize today require new approaches and techniques [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        This work explores a more dynamic approach — speculative parallelization — in
which parallelization decisions are optimistic. There have been other proposals for
both hardware [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] and software-based [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] speculative parallelization systems.
We propose a system supported by Software Transactional Memory (STM), which
provides a model similar to database transactions, but over memory read and write
operations. Under STM, a thread can begin a transaction, then proceed to operate in
isolation and finally it can atomically apply its changes at the end of the transaction,
after successful validation. We argue that such a system has several advantages over
hardware-supported approaches. First, because STM-based executions may be
unbounded, we can extend the range of possible parallelizations, thereby increasing the
potential for extracting parallelism from applications. Second, we may apply these
techniques to applications that run on machines that do not support transactional
execution, including all of the current mainstream hardware. Finally, we may leverage
on much of the intense research being done in the area of transactional memory.
      </p>
      <p>The differentiating feature of our approach is that we are targeting long-running
speculative executions, that may span multiple method calls, spawn other speculative
executions, and be able to execute non-transactional operations such as I/O without
aborting and discarding potentially useful work.</p>
      <p>To achieve our targets, we have been working on modifications to the JVSTM
software transactional memory, and on implementing a speculative parallelization
framework to support our research — the JaSPEx framework.</p>
      <p>The remainder of this short paper is organized as follows. In Section 2, we discuss
the usage of software transactional memory for speculative parallelization. Then, in
Section 3, we present the JaSPEx framework, and, finally, in Section 4, we summarize
the current findings and future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Speculative parallelization based on STM</title>
      <p>
        In the RuLAM project, our first prototypes were based on the Java Versioned Software
Transactional Memory (JVSTM) [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], which is a pure Java STM library. The JVSTM
is a versioned STM: memory positions keep not only the latest values committed
to them, but also the history of older values written in the past. This arrangement
allows read-only transactions to never conflict with any other concurrent transaction,
favoring applications that have a high read/write transaction ratio.
      </p>
      <p>
        Yet, using an STM for supporting thread-level speculation (TLS) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provides a
very different use-case from normal STM usage. In particular, issues such as commit
ordering, the atomicity model provided, and the nesting model supported present
opportunities for optimization. To take advantage of this fact, we are creating a
modified version of the JVSTM that is optimized for TLS usage, and that can be used as
a drop-in replacement for the regular JVSTM in JaSPEx, our parallelization research
framework.
      </p>
      <p>One of our goals is to allow speculative executions to go beyond what most
current TLS systems allow. This includes the possibility of speculative executions
spawning other speculative executions, and so we are also working on replacing the current
linear nesting model on the JVSTM, where concurrency within transactions is
disallowed, with one that allows parallelism when using nested transactions, so that a
hierarchy of parent, sibling and children transactions inside a single top-level
transaction may run concurrently.</p>
      <p>Because the JVSTM is a normal Java library, applications must first be modified to
behave transactionally under its control. This functionality is currently provided by
the transactification part of the JaSPEx framework. Moreover, because a pure-library
STM imposes certain limitations and overheads, we are investigating the possibility
of implementing the STM at the Java Virtual Machine (JVM) level, to improve
performance and allow better cooperation with the JVM.
3</p>
    </sec>
    <sec id="sec-3">
      <title>The JaSPEx parallelization research framework</title>
      <p>We have created the Java Speculative Parallel Executor (JaSPEx) framework as a
testing ground for software TLS parallelization techniques. It is implemented as a Java
library that can run on any Java Virtual Machine, and provides a custom class loader
that intercepts the loading of Java classes in bytecode form, and modifies them to
execute speculatively in parallel.</p>
      <p>Because the JaSPEx framework works directly on the application bytecode, it can
be applied to any language that can be compiled to run on the JVM – not just Java –
and it does not need access to the original source code.
3.1</p>
      <sec id="sec-3-1">
        <title>Transactification of Applications</title>
        <p>Before applying any further changes, applications are modified to run
transactionally under the control of the STM, a step we call the “transactification” of the
application. This step involves intercepting and modifying accesses to class and instance
fields, to arrays (for which currently there is only limited support), and handling of
non-transactional operations, such as calls to native code, I/O operations, and other
operations that cannot be undone by the STM.</p>
        <p>Two strategies are supported for handling non-transactional operations that occur
during a speculative execution: the speculative execution can be aborted, or it can be
dynamically transitioned to a non-speculative execution after validation, allowing
the already accomplished work to be merged with the global application state.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Speculative Execution</title>
        <p>After transactification, the framework identifies the tasks that are candidates for
speculative execution. Multiple task identification strategies can be used for this step.</p>
        <p>Whenever code execution reaches a possible task spawn point, a call is made into
the framework, allowing it to decide if the execution should proceed in parallel or
not, a decision that may also be influenced by the task scheduler.</p>
        <p>
          If the framework decides to proceed with the speculative execution, one or more
tasks are created and queued for execution by an ExecutorService, which keeps
a pool of worker threads. Tasks wait on other tasks for their results in a scheme
similar to fork/join [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] parallelism, but our framework differs from other fork/join-like
approaches by being automatic and not needing programmer input.
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Task Scheduling</title>
        <p>Also being developed is a scheduler that supports multiple strategies for task
scheduling. Our framework-level scheduler combines static code information with past
speculation statistics to try to minimize speculation conflicts and make better use of the
available machine resources.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future work</title>
      <p>In this paper, we gave a brief presentation of the RuLAM project, its objectives, and
our current work. We argued that an automatic parallelization approach should be
based on an STM, and we briefly described some of the aspects that must be taken
into account by the STM, if it is to be used to support speculative parallelization.</p>
      <p>We also presented JaSPEx, a software TLS framework that we created to
experiment with parallelization techniques. We described some of its functions, such as
transactifying an application and speculative task selection, creation, and scheduling.</p>
      <p>Preliminary results show that some benchmarks can benefit from our approach,
but our implementation has to be further fine-tuned to reduce execution overheads.
As the current JaSPEx framework works on an unmodified JVM, we plan to
investigate the effects of moving some of its functions inside the JVM itself.</p>
      <p>Finally, we hope that our approach will allow us to uncover new, untapped
parallelism in common applications. We also hope to gain insights on why some
applications cannot be parallelized, and how they could be changed to benefit from
parallelization.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Blume</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Eigenmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Faigin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hoeflinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Padua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pottenger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rauchwerger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tu</surname>
          </string-name>
          , et al.
          <article-title>Polaris: The Next Generation in Parallelizing Compilers</article-title>
          .
          <source>In Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.T.</given-names>
            <surname>Oplinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.L.</given-names>
            <surname>Heine</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.S.</given-names>
            <surname>Lam</surname>
          </string-name>
          .
          <article-title>In search of speculative thread-level parallelism</article-title>
          .
          <source>In 1999 International Conference on Parallel Architectures and Compilation Techniques</source>
          ,
          <year>1999</year>
          . Proceedings, pages
          <fpage>303</fpage>
          -
          <lpage>313</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tuck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ceze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Strauss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Renau</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Torrellas.</surname>
          </string-name>
          <article-title>POSH: a TLS compiler that exploits program structure</article-title>
          .
          <source>In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming</source>
          , pages
          <fpage>158</fpage>
          -
          <lpage>167</lpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.K.</given-names>
            <surname>Chen</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Olukotun</surname>
          </string-name>
          .
          <article-title>The Jrpm system for dynamically parallelizing Java programs</article-title>
          .
          <source>In Proceedings of the 30th annual international symposium on Computer architecture</source>
          , pages
          <fpage>434</fpage>
          -
          <lpage>446</lpage>
          . ACM New York, NY, USA,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pickett</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Verbrugge</surname>
          </string-name>
          .
          <article-title>Software thread level speculation for the Java language and virtual machine environment</article-title>
          .
          <source>Languages and Compilers for Parallel Computing</source>
          , pages
          <fpage>304</fpage>
          -
          <lpage>318</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.F.</given-names>
            <surname>Spear</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kelsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dalessandro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.L.</given-names>
            <surname>Scott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ding</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Fastpath speculative parallelization</article-title>
          .
          <source>In Proceedings of the Workshop on Languages and Compilers for Parallel Computing</source>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cachopo</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Rito-Silva</surname>
          </string-name>
          .
          <article-title>Versioned boxes as the basis for memory transactions</article-title>
          .
          <source>Science of Computer Programming</source>
          ,
          <volume>63</volume>
          (
          <issue>2</issue>
          ):
          <fpage>172</fpage>
          -
          <lpage>185</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cachopo. JVSTM - Java Versioned Software Transactional Memory</surname>
          </string-name>
          ,
          <year>2008</year>
          . http://web.ist.utl.pt/joao.cachopo/jvstm/.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lea</surname>
          </string-name>
          .
          <article-title>A Java fork/join framework</article-title>
          .
          <source>In Proceedings of the ACM 2000 conference on Java Grande</source>
          , pages
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          . ACM New York, NY, USA,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>