Document Changes: Modeling, Detection, Storage and Visualization (DChanges 2013) Gioele Barabucci Angelo Di Iorio Department of Computer Science Department of Computer Science Università di Bologna Università di Bologna Bologna, Italy Bologna, Italy barabucc@cs.unibo.it diiorio@cs.unibo.it Uwe M. Borghoff Sonja Maier Institute for Software Technology Institute for Software Technology Universität der Bundeswehr München Universität der Bundeswehr München Neubiberg, Germany Neubiberg, Germany uwe.borghoff@unibw.de sonja.maier@unibw.de 1. PREFACE generation of versioning and merging/branching tools tai- This volume contains the proceedings of DChanges 2013, the lored for non-expert users with simplified interfaces, more first International Workshop on Document Changes: Mod- precise algorithms and more precise models for representing eling, Detection, Storage and Visualization. The workshop changes. It also describes some results on diff and merg- will be held at the 13th ACM Symposium on Document En- ing algorithms (and in particular the idea of version-aware gineering (DocEng 2013) in Florence, Italy, in September documents) that are going in this direction. 2013. The goal of this workshop is to share ideas, common is- Bio. Ethan V. Munson is Professor and Co-Chair for Com- sues and principles about diff models and algorithms, change puter Science in the Department of EECS at the University tracking, collaborative editing and document versioning. We of Wisconsin-Milwaukee. He received a Ph.D. in Computer want to look at these topics from different perspectives and Science from the University of California, Berkeley in 1994. want to understand which are the most common issues and He also holds a MS in Computer Science (UC Berkeley, 1989) which are the peculiarities of each domain and each ap- and Bachelor degrees in Computer Science (UCSD, 1986) proach. and Psychology (UCSD, 1978). 1.1 Keynote Dr. Munson’s research has focused on tools for managing There is a great overlap between document engineering and software documents (style sheets, program editors, version software engineering with regard to these issues. The key- control, and build systems) and on human-computer inter- note, given by Ethan Munson and titled Collaborative Au- action (programming interfaces and system latency). Dr. thoring Requires Advanced Change Management, is about Munson is a recipient of a National Science Foundation CA- such overlap and possible synergies. REER award, several research grants from industry, and four NSF educational grants. He is active in conference or- The core idea is that authors collaborating on textual docu- ganization, especially with the ACM Symposium on Docu- ments should have at least the same tools that software en- ment Engineering, and was Chair of ACM SIGWEB from gineers use when collaborating on source code. With some 2006 to 2011. important differences mainly due to some uncertainty and variability (for instance, the authors might not trust each other and might require a third-party validation) and lack 1.2 Research Papers of rigorous validating techniques (like compilers for software The core sessions of the workshop are on research papers. engineers). More important, non-technical users are not ex- We received 12 submissions from all around the world, among pected to have the same expertise of software engineers in which 7 papers were selected after a single-blind review pro- using versioning control systems. The talk envisions a new cess. The papers cover both practical and theoretical is- sues and can be clustered around three main topics: XML change management, understanding the evolution of non- textual documents and data structures and distributed col- laborative authoring. This work is licensed under the Creative Commons Attribution- The first session deals with the management of changes in ShareAlike 3.0 Unported License (CC BY-SA 3.0). To view a copy XML documents. of the license, visit http://creativecommons.org/licenses/by-sa/3.0/. DChanges 2013, September 10th, 2013, Florence, Italy. In Merging Uncertain Multi-Version XML Documents, the ceur-ws.org Volume 1008, http://ceur-ws.org/Vol-1008/preface.pdf authors focus on versioning uncertain XML documents. The problem is very challenging considering that unreliable in- unique identifier, and providing insert/delete operations on formation exists for each contribution and that some con- those items. The dynamic generation of these identifiers is a tributions cannot be trusted or merged at all. The paper challenging problem, since limited consumption of resources presents a reliable and fast algorithm for merging versions and reliable propagation of identifiers must be guaranteed. in such a scenario, together with its proof of correctness. It The paper presents a novel strategy for such a generation is part of a larger framework, which will be presented at the that works well with a large number of users and has a very main conference. limited impact on latency. The paper An Algorithm for Transforming XPath Expres- In Tracking Changes Through EARMARK: a Theoretical sions According to Schema Evolution deals with evolving Perspective and an Implementation the authors deal with XML schemas and documents. It studies how to automat- changes on markup structures. Their goal is to define in ically update queries on XML documents, when these doc- a precise and unambiguous way when (and how) the same uments change to meet changes of their validating schemas. markup element has to be considered as changed, if its con- The authors present a novel algorithm based on tree au- tent changes. The authors propose a theoretical representa- tomata, together with some experimental results. Though tion of change tracking information based on FRBR (Func- limited to only few XPath axes, the algorithm is very effi- tional Requirements for Bibliographic Records), that also cient and extensible. provides support for expressing provenance information. Their implementation of the framework is based on EARMARK, a The second session focuses on documents and data struc- Semantic Web-based meta-model that allows a fine-grained tures that are more complex than simple text. definition of overlapping structures on plain content. In The Concept Difference for EL-Terminologies Using Hy- 1.3 Round-table Session pergraphs the authors focus on automatic detection of the The workshop also includes a round-table session. The out- logical difference between ontologies. The logical difference come of the discussion is not reported in these proceedings, is defined as the set of queries that produce different answers since the workshop is not yet held. Some topics that will be on the ontologies being compared. Their approach consists discussed are: distributed editing issues (following-up the of modeling ontologies as hypergraphs, calculating simula- keynote), human-interpretation of changes and quality of tions between hypergraphs and converting them back to dif- deltas, and identification of editing patterns in other do- ferences between ontological axioms. The paper presents a mains like law-making and humanities. More updated de- theoretical and solid work, and anticipates possible exten- tails will be published on the workshop web page sions to richer logics. http://diff.cs.unibo.it/dchanges2013/roundtable/. The paper Staged Evolution with Quality Gates for Model Suggestions from the audience will be encouraged through- Libraries deals with the evolution of model libraries. The out the workshop. Our goal is to foster research collabora- authors put forward their quality staged model evolution tion and to also identify topics for a second edition. We hope theory for model libraries. Their theory is founded on evo- to have more and more interesting editions of DChanges in lution graphs, which offer a structure for model evolution the future and to gather a lively community of researchers in model libraries through evolution steps. These evolution around these themes. steps eventually form a sequence, which can be partitioned into stages by quality gates. Each quality gate is defined by a lightweight quality model and respective characteristics 1.4 Acknowledgements fostering reusability. In conclusion, we would like to thank all people who had ex- pressed interest in DChanges and the organizers of DocEng The paper Identifying Change Patterns in Software History – in the first place Simone Marinai and Kim Marriott – for focuses on diffing and versioning source code. The overall giving us the possibility of organizing it and for supporting goal is to identify patterns of code changes, in order to better us continuously. understand how a given codebase has evolved. The authors propose a layered approach that works on the AST (abstract Our thanks go to the committee members, for their hard syntax tree) representations of versioned files and combines work in circulating the call for papers and reviewing papers a tree diff algorithm and similarity grouping techniques to (perfectly on time!). cluster low-level changes into higher-level patterns. The ap- proach requires a few customizations to also work on other A special thank goes to Ethan Munson, for his illuminating programming languages. Experimental analysis of two Java keynote. projects are also presented in this work. We wish you a very good read, The last session of papers is on distributed collaborative The DChanges chairs authoring. The paper Concurrency Effects Over Variable-size Identi- fiers in Distributed Collaborative Editing tackles the prob- lem of building distributed editors with CRDT (Conflict-free Replicated Data Type). This approach consists of model- ing a document as a sequence of items, each with a global 2. COMMITTEE Committee members The workshop has been organized by four people, from two • Serge Autexier, DFKI Bremen research groups. They have been helped by a committee of experts from all around the world. • Stéphane Ducasse, INRIA Lille Nord Europe research center Organizers • Boris Konev, University of Liverpool • Gioele Barabucci is a research fellow at Università di Bologna. He recently received his PhD with a thesis • John Lumley on diff algorithms and delta models. • Pascal Molli, Université de Nantes - LINA • Uwe M. Borghoff is a full professor of Computer Sci- • Sebastian Rönnau ence at Universität der Bundeswehr München. With his research group, he published various papers on al- • Wolfgang Stürzlinger, York University gorithms for comparing textual documents and on re- lated topics. • Yannis Tzitzikas, University of Crete and FORTH-ICS • Angelo Di Iorio is an assistant professor at Università • Fabio Vitali, Università di Bologna di Bologna. He worked on various systems for docu- • Jean-Yves Vion-Dury, Xerox Research Centre Europe ment versioning and publishing, and collaborative edit- ing. • Sonja Maier is a Postdoc at Universität der Bundeswehr Additional reviewers: München. In her research, she focuses on tool creation and tool integration for (visual) domain-specific lan- • Emmanuel Desmontils guages, and is interested in tracking the evolution of text and diagrams. • Christina Lantzaki • Brice Nédelec Table of Contents Merging Uncertain Multi-Version XML Documents Mouhamadou Lamine Ba, Talel Abdessalem and Pierre Senellart Identifying Change Patterns in Software History Jason Dagit and Mathew Sottile The Concept Difference for EL-Terminologies using Hypergraphs Andreas Ecke, Michel Ludwig and Dirk Walther An Algorithm for Transforming XPath Expressions According to Schema Evolution Kazuma Hasegawa, Kosetsu Ikeda and Nobutaka Suzuki Concurrency Effects Over Variable-size Identifiers in Distributed Collaborative Editing Brice Nédelec, Pascal Molli, Achour Mostefaoui and Emmanuel Desmontils Tracking changes Through EARMARK: a Theoretical Perspective and an Implementation Silvio Peroni, Francesco Poggi and Fabio Vitali Staged Evolution With Quality Gates for Model Libraries Alexander Roth, Andreas Ganser, Horst Lichter and Bernhard Rumpe