=Paper=
{{Paper
|id=Vol-2226/paper6
|storemode=property
|title=Quantifying Collaboration in Synchronous Document Editing
|pdfUrl=https://ceur-ws.org/Vol-2226/paper6.pdf
|volume=Vol-2226
|authors=Adrian Pace,Louis Baligand,Stian Håklev,Jennifer K. Olsen,Nore de Grez,Bram De Wever
|dblpUrl=https://dblp.org/rec/conf/swisstext/PaceBHOGW18
}}
==Quantifying Collaboration in Synchronous Document Editing==
<pdf width="1500px">https://ceur-ws.org/Vol-2226/paper6.pdf</pdf>
<pre>
               Quantifying Collaboration in Synchronous Document Editing
Adrian Pace1 , Louis Baligand1 , Stian Håklev1 , Jennifer K. Olsen1 , Nore de Grez2 , Bram De Wever2

                      1
                          École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
                                          2
                                            Ghent University, Ghent, Belgium
                          {adrian.pace, louis.baligand, stian.haklev, jennifer.olsen}@epfl.ch
                                       {nore.degrez, bram.dewever}@ugent.be

                                                                       85% of university writing and company reports are collab-
                                                                       oratively written (Ede and Lunsford, 1990).
                           Abstract                                        Within educational sciences, researchers have looked at
                                                                       the effect of collaborative writing on the quality of the writ-
     Collaborative synchronous writing tools like                      ten product, but also its effect on students learning. For ex-
     Google Docs and Etherpad let multiple users edit                  ample, Storch (2005) found evidence that the students who
     the same document and see each others edits in                    write collaboratively produce shorter but better documents
     near real-time to simplify collaboration and avoid                in respect to accuracy, complexity and cogency. However,
     merge-conflicts. These tools are used extensively                 it is still an open question what collaborative writing pro-
     across many domains, including education, in                      cesses may be effective and how to measure these pro-
     both research and industry. The very nature of                    cesses, which we begin to explore in this paper.
     needing to constantly synchronize state between                       Collaborative writing generates very granular data,
     multiple users means that very granular editing                   which has a great potential for further analysis, but is very
     data is automatically captured and stored. In the-                difficult to collate and analyse. An example of a single
     ory, this data could provide important insights into              database entry might be ”John inserted the letter ’a’ at posi-
     the editing process, the contributions of the dif-                tion 380”, and during an editing session, thousands of these
     ferent users, how the text developed over time,                   entries are generated. Different editors also store the data
     and other questions relevant to researchers study-                slightly differently. For example, ShareDB encodes data
     ing writing from different theoretical and method-                into a JSON object, for example:
     ological angles. However, this extreme granular-
     ity of the data (down to individual key presses),                 {"seq":20,"v":20,"op":[{"p":["text",19],
     makes analysis very complex. Most of the re-                      "si":"d"}],"m":{"ts":1526393562757},
     search focused on automatic analysis of collabo-                  "d":"doc"}
     rative writing to date has focused on asynchronous                   Whereas Etherpad compresses the data about an opera-
     writing, and looked at the ”diffs” between one                    tion into a string like this:
     editing session and the next. In this paper, we
     present a method and a tool to construct infor-                   Z:z>1|2=m=b*0|1+1\$\n
     mative operations from text data, as well as pre-
                                                                          In this paper, we introduce a tool written in Python1 ,
     liminary metrics for measuring the collaborative
                                                                       that can parse the native database formats of several col-
     writing process. Additionally, our method adds to
                                                                       laborative editing tools or libraries (currently we support
     previous work in that it can be used to assess the
                                                                       Etherpad and ShareDB, but adding support for other for-
     writing during the writing process rather than just
                                                                       mats would require minimal changes). Beyond parsing
     being applied to an end product.
                                                                       individual operations, we combine atomic operations into
                                                                       larger meaningful operations such that a sentence written
 Introduction                                                          consecutively, without any pause larger than n seconds, and
 Collaborative writing is used extensively in many domains,            without moving to another location on the page, goes from
 such as families planning a trip, students collaborating on           being a number of atomic events with time-stamps, to being
 an essay, researchers writing grant proposals, and company            a single operation, with a start- and an end-time.
 staff coordinating their work. Studies have even shown that              However, to properly assign user intention to atomic
                                                                       events, we need to make decisions about cut-off points –
 In: Mark Cieliebak, Don Tuggener and Fernando Benites (eds.):         what is the n seconds threshold appropriate for deciding
 Proceedings of the 3rd Swiss Text Analytics Conference (Swiss-
                                                                          1
 Text 2018), Winterthur, Switzerland, June 2018                               https://github.com/chili-epfl/FROG-analytics


                                                                  50
that a sentence was written in two writing events as op-              Literature Review
posed to in a single event? If I am writing a sentence,
                                                                      Technology for collaborative writing
quickly backtrack to fix a spelling mistake, and continue
writing, should this be modelled as a series of small editing         The two main technologies for collaborative writing are
operations, or a single operation whose fundamental goal it           wikis, and text editors or word processors that allow for
is to insert the sentence?                                            synchronous editing. Wikis do not support synchronous
                                                                      editing. Rather, each user locks a page for a given time,
   To some extent, there can never be a perfect answer to
                                                                      makes the edits he or she wishes, and commits this change,
this, since it will depend on the research questions of the
                                                                      often with an edit summary, and a way for others to see a
researcher. If I am investigating editing behaviour at a very
                                                                      ”diff” of their changes. Given the high granularity of this
granular level, I might need access to all the spelling mis-
                                                                      data and the accessibility of this data through public sites
take corrections and backtracks. However, if I am inter-
                                                                      such as Wikipedia, there is a large body of research cover-
ested in how people add and reorganize information, coor-
                                                                      ing collaborative work in wikis. However, because the edits
dinate the development of a document and negotiate ideas,
                                                                      are made asynchronously, the research in this area does not
then I might prefer the writing events to be tracked at a
                                                                      provide insights into the collaborative writing process of
larger grain-size.
                                                                      students who are working on a document at the same time.
    By analyzing authentic data from students collabo-                    Tools that allow for synchronous editing are typically
ratively editing summaries, we have begun to explore                  based on the theory of Operational Transforms (OT, Sun
whether we can detect natural breakpoints for different op-           et al., 2004), which requires each edit to be decomposed
erations. We present an overview of the informed deci-                down to a set of atomic operations, such as ”add the char-
sions we have made, which we believe would be appropri-               acter M at position 480”, delete, move, change format (for
ate and useful to the majority of researchers. Nevertheless,          rich text editors), etc. Since the synchronization of data
our code is designed in such a way that it is very easy to            will never be perfectly real-time, we might have two users,
customize these settings as appropriate.                              where user1 deletes character 480, and user2 inserts a new
                                                                      character at position 480 at the same time. If user2’s edit
   Once we are able to detect these meaningful operations,            arrives at the server before user1’s, the result would be
we can replay through the entire history of a document, and           for user1’s operation to delete the newly inserted charac-
for each meaningful operation, also annotate it with addi-            ter, which would not correspond to user1’s intention.
tional contextual information, so that instead of merely stat-            To prevent this, OT uses the concept of document ver-
ing that ”John wrote the following sentence at 3PM”, we               sions. Each client keeps track of its latest document ver-
can add that ”John added a sentence to the end of a para-             sion, and reports this to the server with the operation. If
graph that had been authored by Chen Li”, or that ”John               the server receives two operations run against the same
added a sentence, while Rachida was editing a paragraph               document version, it needs to transform them into a sin-
two pages away”.                                                      gle operation that does not cause any conflict (in this case,
                                                                      delete the existing character at position 480, and then in-
   From this contextual data, we can begin to determine
                                                                      sert the new one, thus preserving both user1 and user2’s
metrics and indicators to summarize different phenomena
                                                                      intentions).
of interest. In the final section, we introduce a set of early
                                                                          To manage the coordinated version numbers, the OT ap-
indicators that may be predictive of quality of collabora-
                                                                      proach requires a central server and might run into prob-
tion or editing style, and we look at whether there is any
                                                                      lems with scaling above a certain number of people in
correlation between the editing preferences of the students,
                                                                      the same document. Conflict-free Replicated Data Types
and the metrics extracted from their collaboratively edited
                                                                      (CRDTs) have been suggest as a solution. These are data
documents.
                                                                      types that contain enough context that the merge could hap-
   The main contribution of our work is to create scores              pen on any of the clients, without requiring a central co-
that can provide insight into the collaborative performance           ordinating server (André et al., 2013). Currently, our ap-
of the authors of a document. It is important to note that            proach is focused on OT data, as that is the most widely
we do not know if a high score is good or not in terms of             supported approach, but it could possibly be extended to
collaboration. However, our aim was to measure numeri-                support CRDT-systems in the future.
cally the aspects that could help assess the collaboration in
the writing process.                                                  Automatic analysis of collaborative writing
   We begin by introducing the data we used. We then de-              One significant thread of research in the field of collabora-
scribe the implementation of our approach. Finally, we de-            tive writing has been centered around providing feedback
fine the metrics and look at how they behave on genuine               to teachers in order to assess the collaborative writing ef-
data collected in an experiment.                                      forts and performances of their students. However, most


                                                                 51
of the data has been collected at a very coarse granular-                The students were then randomly split into groups of
ity on long writing sequences. In the case of Wikipedia,              three (n=17) and asked to provide a synthesis based on
researchers look at a few snapshots during the writing pro-           three provided sources within 90 minutes. The instructions
cess and analyze the differences within an article at differ-         asked for a synthesis text of 800-1000 words, which would
ent points in time (Hu et al., 2016).                                 summarize the most important information from the source
    With Google Docs, a number of papers have conducted               texts in an integrated and well-organized manner. As a
fairly sophisticated analyses of collaborative writing based          preparation for this task, each group member was provided
on automatically requesting the entire document snapshot              with another source text which they needed to summarize
each minute, or even more frequently, and extracting the              in advance.
user actions by calculating the ”diffs” between each snap-               The students were not allowed to talk during the session,
shot (McNely et al., 2012). One example of this approach is           but communicated with each other via chat and comments
WriteProc, based on a taxonomy of the collaborative writ-             in Etherpad (the writing environment). Finally, students
ing process by Lowry et al. (2004), which traces semantic             were asked to fill in an online questionnaire about their ex-
changes between different versions and changing concepts              perience. In this paper, we use data from the collaborative
(Southavilay et al., 2009, 2010, 2013). However, these are            writing process that occurred within the second stage of
also asynchronous documents where authors rarely wrote                the experiment to conduct an unsupervised analysis of the
simultaneously (Wang et al., 2015; Sun et al., 2014)                  traces from the collaborative writing sessions.
    The only paper we found that looked directly at the in-
dividual operations was from Liu et al. (2017), which used            Data Logs
regular expressions to classify the compressed Etherpad
                                                                      Our data was collected from students working on a set of
operations as either add, delete or move, and then quan-
                                                                      Etherpad documents. Etherpad2 is a highly customizable,
tified the number of different types of operations over dif-
                                                                      open source, online editor that provides collaborative edit-
ferent time windows.
                                                                      ing in real-time. It allows multiple users to edit the same
    To summarize, existing research has been tied to a spe-           text document through a web page interface.
cific tool (Google Docs API, Etherpad database), and there
                                                                          Etherpad operations are saved within a database as ad-
has not been any attempt to create an abstract intermedi-
                                                                      ditions, deletions, or formatting of text (e.g., bolding text).
ate representation that would allow us to apply the same
                                                                      Each change (also referred to as a writing event) is defined
higher level analysis to data from multiple platforms. Also,
                                                                      by its author, a timestamp, the document version (incre-
most analyses are done by comparing ”diffs” from snap-
                                                                      mented at each writing event), and the modification, which
shots, rather than directly accessing the operations, which
                                                                      consists of the position in the document at which the event
leads to data loss in situations of synchronous editing (e.g.,
                                                                      takes place and what characters to delete or add. This al-
impossible to tell who has edited what). Finally, the only
                                                                      lows a view of the document to be reconstructed over time
attempt at looking at the actual operations did not try to ex-
                                                                      rather than only keeping the final product.
tract semantics, but rather focused on quantifying the kinds
                                                                          For our analysis, we are interested in how users collab-
of operations over time.
                                                                      orate in writing content, so we only focus on the addition
    In this paper, we work on a much finer granularity and            and deletion actions taken within the document. Although
look at individual edits instead of differences between the           our analysis focused on Etherpad data, our implementation
document in various points in time. Instead of focusing on            works with any software that could log the same types of
the semantics, we created metrics based on the location in            data streams as long as the writing events are stored in fine
the document of the authors’ contributions and how bal-               granularity. Depending on the format the database users to
anced the writing styles of the users are.                            store the changes, a simple parsing function would have to
                                                                      be coded to fetch the document changes from the database
Data Collection                                                       into our system.

Context                                                               Implementation
To acquire authentic synchronous writing data, we orga-               Our system is an application written in Python that sends
nized an experimental session with 50 Master students                 the metrics (discussed below) for a selection of documents
in educational sciences. The session consisted of three               defined by the user to a server. At regular configurable
phases. The students first completed an online individ-               intervals (5 seconds by default), it looks for changes in
ual questionnaire that queried learners implicit writing be-          tracked documents and send the updated metrics.
liefs measured with the Writing Beliefs Inventory (White                 At each time point, the application collects the relevant
and Bruning, 2005) and students’ individual writing strate-           document changes that happened since the last update from
gies based on the Writing Style Questionnaire (Kieft et al.,
                                                                         2
2008).                                                                       http://etherpad.org/


                                                                 52
the editor database. For each of the updates, relevant in-                   ate a Write from simply copying and pasting text into
formation is stored around the position of the event, text                   the document.
added/deleted, author, and timestamp. The information
collected at each stage is used to infer the writing oper-              In order for the operations to be relevant, we need to care-
ations that occurred and subsequently, the metrics on the               fully select the parameters that distinguish them from each
writing process.                                                        other (e.g., the threshold number of added characters to ei-
                                                                        ther classify an addition as a Write or as an Edit).
                                                                            To determine this threshold between Write and Edit, we
Writing Operations                                                      plotted the distribution of the number of Write, Delete,
The data collected from Etherpad is very granular, and each             Paste and Edit operations with respect to the threshold
data point taken separately does not have meaning, besides              number of added characters to classify the operation as
contributing to our ability to reconstruct the entire docu-             Edit. We selected a length of 15 characters because the
ment at any point in time. To give more meaning to the                  distribution stays relatively constant from this point. The
writing process, we used the saved changes within the doc-              average number of characters in an English word is 5.1
ument to calculate meaningful operations that begin to cor-             (Bochkarev et al., 2012), which supports our decision as it
respond with actual behavioural and cognitive processes.                roughly means that deleting or adding less than three words
   The small changes that are saved within Etherpad (i.e.,              is an Edit.
additions and deletions) can be grouped together when they                  The application also groups the writing events into para-
occur continuously, to provide writing behaviors. If users              graphs, collections of writing events that are currently lo-
stop for a coffee break or begin editing at another location            cated on the same line. This gives more insight into the
in the document, a new operation is formed. If the author               context in which an operation was written: Is this opera-
has taken a short break, but not begun a new operation by               tion occurring in a paragraph written by a single author? Is
writing somewhere else in the document, he or she can go                it a significant change to the paragraph given the length of
back to the piece of text that was being edited, and continue           the paragraph?
editing without starting a new operation.                                   This context is further extended with various details
   We classify the operations into four different types:                such as its length compared to the document, whether its
                                                                        the first operation of the day, and whether there were other
  • Write: An operation is classified as a Write if the au-             people writing in the document at the time.
    thor enters more characters than a set threshold. We
    consider this type of operation as representing draft-              Writing Metrics
    ing the bulk of the text. It occurs, for example, when
                                                                        To begin to assess the collaborative writing process, we cre-
    authors begin writing an essay and are adding ideas.
                                                                        ated eight different metrics that could be applied to the doc-
    The Write operation contains mostly addition changes
                                                                        ument at any point in time (Table 1). The metrics are de-
    but may have some deletions, as they form part of the
                                                                        rived from understanding the operation type that took place
    writing process.
                                                                        and the context that the operation is within. These metrics
  • Edit: An Edit is similar to a Write in that it can con-             fall into three basic categories: time, operations, and inte-
    sist of both addition and deletion changes. However,                gration. Except when noted, each of the metrics has a score
    an operation is classified as an Edit when the number               between 0 and 1.
    of character changes falls below the threshold for a                    The timing metrics are all related to when students were
    Write. Edits often occur when the authors review an                 writing in the document. Within this category, the metrics
    essay and fix typos or change words.                                include the count of day breaks (DB), the count of short
                                                                        breaks, and the amount of synchronicity in the document.
  • Delete: If an operation consists of an author remov-                The number of day breaks tracks whether the document
    ing more than a certain number of characters (deletion              was written across multiple days or in a single day. If there
    change), then it is classified as a Delete. The idea of             was more than eight hours between any two changes, the
    this type of operation is to observe when an author                 count of the day breaks is incremented. This metric allows
    does not simply remove a word (this is considered as                us to have a rough estimate of duration for the editing pro-
    an Edit or as part of a Write), but when he or she re-              cess.
    moves a significant amount of characters, such as a                     The second timing metric is the number of short breaks
    whole sentence or paragraph.                                        (SB) that were taken. The number of short breaks is cal-
                                                                        culated by assessing the number of instances where there
  • Paste: If the writer adds more than several characters              are no changes for at least 10 minutes (but less than 8
    (addition change) in one single writing event (which                hours). The short breaks are important to measure, because
    are sampled every few milliseconds), then the opera-                they can be an indicator of different writing styles, where
    tion is classified as a Paste. This is useful to differenti-        some people prefer to plan before writing and others do


                                                                   53
                                                                    of these operations over time, for the whole group, or for
     Table 1: Metrics to assess collaborative writing.
                                                                    individual users, helps us to understand the editing flow.
                     Time Metrics
                                                                       The overall operations (OO) metric captures the divi-
 Day Breaks                 Number of days the docu-                sion of different metrics across the entire document. For
                            ment was written over                   each type of operation, the proportion is calculated given
 Short Breaks               Number of pauses for longer             the total number of operations in the document.
                            than 10 minutes
 Synchronicity              Percentage of text that was                  # of char. classified as {delete, edit, paste, write}
                            written in a synchronous                                        # of characters
                            manner
                  Operation Metrics                                 A score of 0 indicates that there are no operations of this
 Overall Operations         Proportion each operation               type in the document while a score of 1 indicates that all of
                            was used in the document                the operations are of this type. This metric may help us to
 User Operations            Proportion that an author               better understand the writing style of the users.
                            used an operation normal-                   The second metric in the operations category is the user
                            ized by the total use of the            operations (UO), which reflects the distribution of the dif-
                            operation                               ferent operations across the users in the pad. This metric
                 Integration Metrics                                reflects how much the different users are balancing the dif-
                                                                    ferent operations among themselves and, as with the over-
 User Contribution          Balance of contributions be-
                                                                    all operations metric, there is a separate number for each
                            tween authors
                                                                    operation.
 Paragraph Integration      Interleaved or blocked para-
                            graphs by different authors                       1           X                                 1
 Paragraph Contribution Balance of contributions                                                    proptype (i) log
                                                                        log(#authors)                                  proptype (i)
                            within paragraphs between                                   i∈authors

                            authors                                     For example, if one user does all of the editing, then
not. These different writing styles may impact the success          there would be a score of 0 for the editing. However, if
of collaboration process depending upon the alignment of            there is an even distribution among the users, then there
the writing styles between the members of the group.                would be a score of 1. This metric may be beneficial in un-
                                                                    derstanding the roles that the users take within their group.
                                                                        The third category of metrics involves the integration
   The final timing metric is the synchronicity (SYN) of            of writing within the document, including user contribu-
the writing in the document. The synchronicity is measured          tion, paragraph integration, and within paragraph integra-
taking the number of synchronously written characters nor-          tion. The user contribution (UC) metric measures the bal-
malized over the total number of characters written.                ance of the writing contributions between users. The met-
           # of synchronously typed characters                      ric is close to 1 when the participation is equal between the
                                                                    users, and is close to 0 when a single author has done the
                     # of characters
                                                                    majority of the writing. This metric is important to track in
A synchronous character is classified as synchronous if two         a collaborative setting because it can be indicative of social
or more users are writing within three minutes of each              loafing within the group.
other. When all of the text was written in a synchronous
                                                                                  1            X                          1
manner, then the score is 1, and if there is no overlap be-                                              prop(i) log
tween the writing times of the users, the score is 0. The                   log(#authors)
                                                                                             i∈authors
                                                                                                                       prop(i)
synchronicity of authoring is an important measure of the
collaborative writing dynamics, especially when the users              The second integrative metric is the paragraph integra-
work in an otherwise synchronous setting, because it may            tion (PI) within the document. The paragraph integration
offer insights into the roles and dynamics of the group.            measures how interleaved the paragraphs written by differ-
   The second categorization of the metrics involves the            ent users are. The measure will be 1 if the main author of
operation types that are present within the document. As            each paragraph is alternating, or close to 0 if many blocks
mentioned above, to better understand how the users were            in a sequence have the same primary author (for example,
writing their document, we rolled-up the basic additions            one author mostly wrote the four first paragraphs, and the
and deletions stored by Etherpad into four types of opera-          other author wrote the four next ones). This metric may be
tions: Delete, Write, Edit, and Paste. Depending upon the           an indicator of the collaboration that is occurring within the
role of a user in a group or the stage of the writing that          group. When writing text, some group members will divide
the group is currently engaged in, the relative frequency of        the labor and have each member write a different section of
the types of operations may change. The relative frequency          the paper in a more cooperative style. Other groups will


                                                               54
work in a more collaborative style with all group members
contributing to each section.
       # of author alternations between Paragraphs
                   # of Paragraphs − 1
    The final metric, within paragraph integration, is a mea-
sure of the extent to which the users in a group contributed
to each of the paragraphs equally (PC). It will be close to 0
if a paragraph was written by a single user, and 1 if all au-
thors contributed equally to a paragraph. As with the pre-            Figure 1: Evolution of the PI over the 90-minute assign-
vious metric, within paragraph integration can be used as a           ment on all documents
measure of the collaboration between the group members.
              X                         
                          length(p)                1
           p∈paragraphs
                        length(total)     log(#authors in p)
                                                     
                  X                           1
                             propp (i) log           
              i∈authors in p
                                           propp (i)


    To investigate if our eight metrics are measuring differ-
ent aspects of the document, we checked the correlations
between each of the metrics. We did not find any signifi-
cant correlations between the metrics, indicating that they
are all measuring separate aspects of the document writing            Figure 2: Evolution of the PC over the 90-minute assign-
process. Separately, each of our eight metrics can provide            ment on all documents
insights into the group writing process. However, the met-               Similarly, the PC metric (Fig.2), which measures the
rics do not need to be looked at in isolation. By analyzing           sharing of operations within a paragraph, also shows a
several of the metrics together, we may be able to detect ad-         growth pattern over time. Like the PI metric, this metric
ditional processes of the collaboration process that would            begins at 0 and grows to about 0.15 by the end of the ses-
not be evident from a single metric.                                  sion. However, the pattern over time is very different. For
                                                                      the first half of the session, the PC metric remains relatively
Metrics Over Time                                                     close to 0, indicating that users work on their own individ-
Although all of our metrics can be applied to a finished              ual part of the document, without attempting to integrate
writing product, one of the strengths of our approach is that         their work.
the different metrics can be calculated in real-time, to both            In the second half of the session, the score begins to
eventually adapt to students, and to track how the writing            grow at a more linear rate, indicating that there is much
process changes as time progresses.                                   more sharing between paragraphs. Although the main au-
    In order to study the behavior and evolution of our met-          thor of the paragraphs may have stayed the same (as in-
rics over time, we calculated the metrics of each pad at              dicated by not much of a rise in the PI score), the group
given intervals. We split the lifespan of the document, i.e.          members were looking at each others work more, as indi-
90 minutes, in 32 linearly separated time-slices. For each            cated by the higher PC score.
time-slice, we display a box plot showing the spread of the
different metrics across the seventeen groups. Below we
present the analysis of three different metrics (paragraph in-
tegration, paragraph contribution, and synchronicity), and
how they develop over time.
    The PI (Fig.1) measures the amount that the members of
the group are interleaving their different paragraphs. In our
data set, we can see that the score begins at 0, when no one
has yet written. As the authors begin to add more text, we
can see that the score rises to about 0.15 by the end of the
first half of the session. In the second half of the session,         Figure 3: Evolution of the synchronicity over the 90-
the metric remains relatively constant, indicating that there         minute assignment on all documents
is no additional integration of paragraphs between group
members.                                                                 Finally, we investigated how the synchronicity met-


                                                                 55
ric changed over time (Fig.3). Because the groups were               live visualizations of the collaborative process in multiple
formed and were asked to produce their summary within a              groups for workshop organizers or teachers, and in the fu-
90-minute timeframe, we would expect to see that the doc-            ture we might consider adding user-facing visualizations or
ument was written very synchronously. From the data, this            even prompts and adaptive interfaces to support group col-
is what we find. At the beginning of the document when the           laboration.
group is just getting started, the score is close to 0. How-             Finally, the analyses described in this paper only con-
ever, in Fig.3, we can see that it quickly moves up to around        cern themselves with behaviour and could be applied to any
0.4 within a few time-slices. For the remainder of the doc-          language. However, by adding semantic context to the data
ument, the scores remains between 0.6 and 0.8, which is an           – for example using word2vec-like methods to measure se-
indicator of high synchronicity.                                     mantic distance between two users’ contributions over time
                                                                     (is there convergence?) – we might be able to further un-
Discussion and Conclusion                                            derstand the collaborative process.
These results can give an overall idea of the behavior of the
metrics and of their importance in gaining insight of the
document writing process. As we can observe, the metrics
                                                                     References
evolve through time. The initial increase in the PI at the           André, L., Martin, S., Oster, G., and Ignat, C.-L. (2013). Sup-
beginning of the document writing process indicates that               porting adaptable granularity of changes for massive-scale col-
                                                                       laborative editing. In Collaborative Computing: Networking,
authors begin by editing separate parts of the document,               Applications and Worksharing (Collaboratecom), 2013 9th In-
with very little intra- or inter-paragraph coordination.               ternational Conference Conference on, pages 50–59. IEEE.
   As the writing processes proceeds, the authors stop cre-
ating new paragraphs and begin editing the existing ones,            Bochkarev, V. V., Shevlyakova, A. V., and Solovyev, V. D. (2012).
which naturally leads to more collaboration and textual in-            Average word length dynamics as indicator of cultural changes
                                                                       in society. CoRR, abs/1208.6109.
tegration. This insight can help the supervisor of the au-
thors in determining if each writer write their own block of         Ede, L. and Lunsford, A. (1990). Singular Texts/plural Authors:
text or whether they have contributed in every main aspects             Perspectives on Collaborative Writing. Southern Illinois Uni-
of the document.                                                        versity Press.
   In addition, as time increases, the evolution of the PC           Hu, X., Ng, T.-D. J., Tian, L., and Lei, C.-U. (2016). Automating
shows that authors tend to write more inside of each oth-              assessment of collaborative writing quality in multiple stages:
ers paragraph. It would seem they write their own block                The case of wiki. In Proceedings of the Sixth International
of text at the beginning and then start gradually collaborat-          Conference on Learning Analytics & Knowledge, LAK ’16,
ing by writing in each others contributions. The score still           pages 518–519, New York, NY, USA. ACM.
remains relatively low meaning that there still is a main            Kieft, M., Rijlaarsdam, G., and van den Bergh, H. (2008). An
author per paragraph and that the contributions from other              aptitude-treatment interaction approach to writing-to-learn.
authors stay small.                                                     Learning and Instruction, 18(4):379 – 390.
   Moreover, the Synchronous score evolution shows that
the documents are written by at least two authors at the             Liu, M., Pardo, A., and Liu, L. (2017). Using learning analytics
                                                                        to support engagement in collaborative writing. International
same time during the overall duration of the assignment                 Journal of Distance Education Technologies, pages 79–98.
except at the very beginning.
   The aim of our metrics is to provide real-time insights           Lowry, P., Curtis, A., and Lowry, M. (2004). A taxonomy of
on the way authors contribute in writing an online docu-               collaborative writing to improve empirical research, writing
                                                                       practice, and tool development. Journal of Business Commu-
ment. They were designed to analyze whether the authors                nication, pages 66–99.
are collaborating between them and thus focus on the inter-
action between the writers. We are currently planning new            McNely, B. J., Gestwicki, P., Hill, J. H., Parli-Horne, P., and John-
experiments that will give us more detailed data about how             son, E. (2012). Learning analytics for collaborative writing:
collaborative writing processes manifest themselves in the             A prototype and case study. In Proceedings of the 2Nd Inter-
                                                                       national Conference on Learning Analytics and Knowledge,
data.                                                                  LAK ’12, pages 222–225, New York, NY, USA. ACM.
   We are also planning to analyze the quality of the written
documents, as well as the knowledge post-test, to see if we          Southavilay, V., Yacef, K., and Calvo, R. (2009). Writeproc: A
can correlate the proposed quality metrics in this document,            framework for exploring collaborative writing processes. In
with actual differences in quality or learning gains.                   ADCS 2009: Proceedings of the Fourteenth Australasian Doc-
                                                                        ument Computing Symposium, 4 December 2009, pages 543–
   The tool we have developed can provide these metrics                 548.
live during an editing session, and it will be integrated
in our collaborative learning platform FROG3 , to provide            Southavilay, V., Yacef, K., and Calvo, R. (2010). Process mining
                                                                        to support students’ collaborative writing. Educational Data
   3
       https://github.com/chili-epfl/FROG                               Mining, pages 257–266.


                                                                56
Southavilay, V., Yacef, K., Reimann, P., and Calvo, R. (2013).
   Analysis of collaborative writing processes using revision
   maps and probabilistic topic models. In Proceedings of the
   Third International Conference on Learning Analytics and
   Knowledge, pages 38–47. ACM.

Storch, N. (2005). Collaborative writing: Product, process, and
   students reflections. Journal of Second Language Writing,
   14(3):153 – 173.

Sun, D., Xia, S., Sun, C., and Chen, D. (2004). Operational trans-
   formation for collaborative word processing. In Proceedings
   of the 2004 ACM conference on Computer supported cooper-
   ative work, pages 437–446. ACM.
Sun, Y., Lambert, D., Uchida, M., and Remy, N. (2014). Collab-
   oration in the cloud at Google. In WebSci 2014 - Proceedings
   of the 2014 ACM Web Science Conference.
Wang, D., Olson, J. S., Zhang, J., Nguyen, T., and Olson, G. M.
  (2015). Docuviz: Visualizing collaborative writing. In Pro-
  ceedings of the 33rd Annual ACM Conference on Human Fac-
  tors in Computing Systems, CHI ’15, pages 1865–1874, New
  York, NY, USA. ACM.

White, M. J. and Bruning, R. (2005). Implicit writing beliefs and
  their relation to writing quality. Contemporary educational
  psychology, 30(2):166–189.


                                                                     57

</pre>