-

Proceedings of the International Workshop June

Comparative Empirical Evaluation of Reasoning Systems

Preface

0 1 0 Armin Biere Institute for Formal Models and Veri cation Johannes Kepler University Altenbergerstr. 69, 4040 Linz , Austria 1 Geo Sutcli e Department of Computer Science University of Miami P. O. Box 248154, Coral Gables, FL 33124-4245 , USA

2012

30 2012

COMPARE 2012 Copyright ' 2012 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. This volume contains the proceedings of the 1st International Workshop on Comparative Empirical Evaluation of Reasoning Systems (COMPARE 2012), held on June 30th, 2012 in Manchester, UK, in conjunction with the International Joint Conference on Automated Reasoning (IJCAR).

It has become accepted wisdom that regular comparative evaluation of reasoning systems helps to focus research, identify relevant problems, bolster development, and advance the eld in general. Benchmark libraries and competitions are two popular approaches to do so. The number of competitions has been rapidly increasing lately. At the moment, we are aware of about a dozen benchmark collections and two dozen competitions for reasoning systems of di erent kinds. It is time to compare notes.

What are the proper empirical approaches and criteria for e ective comparative evaluation of reasoning systems? What are the appropriate hardware and software environments? How to assess usability of reasoning systems, and in particular of systems that are used interactively? How to design, acquire, structure, publish, and use benchmarks and problem collections?

The aim of the workshop was to advance comparative empirical evaluation by bringing together current and future competition organizers and participants, maintainers of benchmark collections, as well as practitioners and the general scienti c public interested in the topic.

We wish to sincerely thank all the authors who submitted their work for consideration. All submitted papers were peer-reviewed, and we would like to thank the Program Committee members as well as the additional referees for their great e ort and professional work in the review and selection process. Their names are listed on the following pages. We are deeply grateful to our invited speakers|Leonardo de Moura (Microsoft Research) and Cesare Tinelli (University of Iowa)|for accepting the invitation to address the workshop participants. We thank Sarah Grebing for her help in organizing the workshop and compiling this volume.

June 2012

Vladimir Klebanov Bernhard Beckert Armin Biere Geo Sutcli e

III

Program Committee Bernhard Beckert Christoph Benzmuller Dirk Beyer Armin Biere

Vinay Chaudhri Koen Claessen Alberto Griggio Marieke Huisman Radu Iosif Vladimir Klebanov Rosemary Monahan Michal Moskal Jens Otten Franck Pommereau Sylvie Putot Olivier Roussel Albert Rubio Aaron Stump Geo Sutcli e

Program Co-Chairs Bernhard Beckert Armin Biere Vladimir Klebanov Geo Sutcli e Organising Committee Bernhard Beckert Sarah Grebing Vladimir Klebanov Additional Referees Sarah Grebing

IV Karlsruhe Institute of Technology, Germany Free University Berlin, Germany University of Passau, Germany Johannes Kepler University Linz, Austria SRI International, USA Chalmers Technical University, Sweden Fondazione Bruno Kessler, Italy University of Twente, the Netherlands Verimag/CNRS/University of Grenoble, France Karlsruhe Institute of Technology, Germany National University of Ireland Maynooth Microsoft Research, USA University of Potsdam, Germany University of Evry, France CEA-LIST, France CNRS, France Universitat Politecnica de Catalunya, Spain University of Iowa, USA University of Miami, USA Karlsruhe Institute of Technology, Germany Johannes Kepler University Linz, Austria Karlsruhe Institute of Technology, Germany University of Miami, USA Karlsruhe Institute of Technology, Germany Karlsruhe Institute of Technology, Germany

Karlsruhe Institute of Technology, Germany Regression Tests and the Inventor's Dilemma : : : : : : : : : : : : : : : : : : : : : : : : :

Leonardo de Moura Introducing StarExec: a Cross-Community Infrastructure for Logic Solving : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Aaron Stump, Geo Sutcli e, and Cesare Tinelli

Contributed Papers

Evaluating the Usability of Interactive Veri cation Systems : : : : : : : : : : : :

Bernhard Beckert and Sarah Grebing Broadening the Scope of SMT-COMP: the Application Track : : : : : : : : : : :

Roberto Bruttomesso and Alberto Griggio A Simple Complexity Measurement for Software Veri cation and Software Testing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Zheng Cheng, Rosemary Monahan, and James Power Benchmarking Static Analyzers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Pascal Cuoq, Florent Kirchner, and Boris Yakobowski The 2nd Veri ed Software Competition: Experience Report : : : : : : : : : : : :

Jean-Christophe Filli^atre, Andrei Paskevich, and Aaron Stump On the Organisation of Program Veri cation Competitions : : : : : : : : : : : : :

Marieke Huisman, Vladimir Klebanov, and Rosemary Monahan Challenges in Comparing Software Veri cation Tools for C : : : : : : : : : : : : :

Florian Merz, Carsten Sinz, and Stephan Falke Behind the Scene of Solvers Competitions: the \evaluation" Experience : :

Olivier Roussel Author Index : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2 3 18 28 32 36 50 60 66 78 V