<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Statistical Modeling by Gesture: A graphical, browser-based statistical interface for data repositories</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>James Honaker</string-name>
          <email>jhonaker@iq.harvard.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vito D'Orazio</string-name>
          <email>dorazio@iq.harvard.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Quantitative Social Science, Harvard University</institution>
          ,
          <addr-line>1737 Cambridge St, Cambridge, MA 02138</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We detail our construction of TwoRavens, a graphical user interface for quantitative analysis that allows users at all levels of statistical expertise to explore their data, describe their substantive understanding of the data, and appropriately construct and interpret statistical models. The interface is a browser-based, thin client, with the data remaining in an online repository, and the statistical modeling occurring on a remote server. In our implementation, we integrate with tens of thousands of datasets from the Dataverse repository, and the large library of statistical models available in the Zelig package for the R statistical language. Our interface is entirely gesture-driven, and so easily used on tablets and phones. This, in combination with being browser-based, makes data exploration and quantitative reasoning easily portable to the classroom with minimal infrastructure or technology overhead.</p>
      </abstract>
      <kwd-group>
        <kwd>statistical user interfaces</kwd>
        <kwd>open data</kwd>
        <kwd>directed graphs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>With the proliferation of open data sources and replicable
data repositories comes the promise of increased access to
scienti c information and the democratization of
quantitative knowledge. However, meaningful analysis of this
multitude of data remains outside the grasp of analysts who lack
training in statistical modeling or knowledge of and access to
statistical software platforms. For those with the necessary
expertise, access is still limited by data transfer bottlenecks
and installation and hardware overhead. To reduce
barriers to statistical and quantitative reasoning and to promote
the proliferation of empirical knowledge, we introduce the
TwoRavens interface.</p>
      <p>
        TwoRavens is a gesture-driven, Web-based device for the
analysis of statistical models and the exploration of data. It
All code, documentation, and numerous examples for our
open source TwoRavens software project are available at:
http://datascience.iq.harvard.edu/tworavens
is open-source and integrates with Dataverse repositories for
open archiving of data[
        <xref ref-type="bibr" rid="ref12 ref6 ref7">12, 6, 7</xref>
        ], providing users access to the
more than 50,000 data les currently housed in Dataverse
repositories, as well as new data that users may upload free
of charge. The statistical analysis component is powered by
Zelig, a library for statistical inference in R that provides
access to dozens of statistical models through a common
call structure [
        <xref ref-type="bibr" rid="ref11 ref14">14, 11</xref>
        ]. TwoRavens links the power of Zelig
to one of the largest database collections anywhere, requires
no installation, and is accessible to devices ranging from
mobile phones to tablets to smartboards.
      </p>
      <p>For ease of use, the UI is designed to mirror the common
quantitative work ow. Moving left to right on the interface,
users make intuitive column and row selections or subsets of
the data.1 Users may perform transformations on these
selections using the input box in the top right. In the center,
users specify the intended statistical model in the
framework of a directed graph, and have the option to perform
transformations on selected columns or tag selections with
speci c properties, such as \nominal." Finally, on the right,
users select their statistical model and have the option to
set covariate values on right-hand side variables to obtain
additional bootstrapped simulations of quantities of
interest. Upon estimation, statistical results are displayed using
a combination of graphs and tables.</p>
      <p>In this article we describe the UI and its integration with
Dataverse and Zelig. In future research we intend to build
upon this foundation in two ways. First, as users supply
information about hypothesized relationships in the data,
TwoRavens will synthesize their input and automatically
suggest appropriate modeling decisions, such as functional
form and choice of model. Second, as models are estimated,
the results will be cumulatively recorded across users to form
a coherent map and meta analysis of the statistical results
that exist in Dataverse.
2.</p>
    </sec>
    <sec id="sec-3">
      <title>DESIGN GOALS AND PRINCIPLES</title>
      <p>In designing this software, some key goals were set that
we believe the next generation of statistical tools will need
to achieve. The challenges and solutions to reaching these
goals informed some general design principles, which we feel
are important to discuss and convey for future researchers.
A paramount goal was to keep the interface as a thin client,
keeping both the data itself, and the work of statistical
1Depending on the discipline, columns are known as
variables, elds, features, etc., and rows are known as
observations, cases, units of analysis, etc.
processing, remote from the interface. We also required a
broadly accessible platform that could be productively
used by individuals at all levels of statistical expertise, from
novice users with no statistical training, to experienced
quantitative researchers. Furthermore, to make data accessible
to users with di erent levels of available computational
infrastructure, we worked to make this software device
independent and feasible not only on traditional computers,
which have previously been the sole domain of statistical
software, but tablet, mobile, and other smart devices.
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>Goal: Thin Client</title>
      <p>
        Although we have written an interface for statistical data
analysis, neither the data itself, nor any statistical
analysis, occurs or resides at the client side.2 It is a challenge
to explore data, and to implement statistical models,
without the the immediacy of local interaction. However, having
a thin client with remote data and remote statistical
processing allows some enormous, novel advantages that are
increasingly useful in modern data-intensive science: 1) When
datasets are large, transferring the data to the user for
exploration can be the primary bottleneck for speed, which this
overcomes. 2) Similarly, when datasets are large, the nal
analysis might require distributed processing, which again
requires another transfer of the data. In our architecture,
again, the code for statistical processing is instead placed
close to the original data. 3) When the data has privacy
implications, there are settings where it is allowable to grant
access to summaries and statistical representations of the
data, (or privacy preserving versions of these statistics, such
as those that are provably di erentially private [
        <xref ref-type="bibr" rid="ref19 ref8 ref9">9, 19, 8</xref>
        ]),
but not grant access to individual observations of the data
itself. This architecture, where both the data and the
analysis on the data remains remote, can be an enforceable way
to grant access to private data in these settings.
2.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Goal: Broadly Accessible</title>
      <p>Statistical consulting is increasingly a necessary service
provided by research universities to aid their research
faculty. Most substantive researchers are heavily invested in
the collection of their data, understand their variables well,
and possess expert substantive knowledge about the
plausibility of various relationships. What they may not know is
the set of plausible statistical models appropriate to their
objectives, the statistical language to describe their goals and
aims, or the software knowledge to implement these choices.
Consultants often provide that link, allowing experts
without statistical training to translate their knowledge and goals
into appropriate statistical software routines, and
interpreting the results back in a substantively meaningful fashion.
Obviously, however, not every user of data has access to a
trained statistician. One overarching goal of the TwoRavens
project is to provide as much of this service as can be
automated. At this stage of the interface, this means providing a
way for users to intuitively communicate all their knowledge
about the data, so an appropriate model can be constructed
for the quantitative task at hand.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Goal: Device Independent</title>
      <p>2Or slightly rephrased, we have built a package for statistical
analysis of data, that can not handle data, nor analyze any
statistics.</p>
      <p>Part of expanding the set of users who have access to
quantitative reasoning, means lowering the infrastructure
requirements to statistical analysis. While professional
researchers may have extensive computational resources, many
other settings, especially community colleges and high schools,
have limited resources for instructors and none for students.
Even free and open source statistical packages like R
generally require the expenses of some level of IT support to
install in a networked lab. Moreover, even in well-funded
universities, substantive classes that use quantitative data
may have no classroom lab access. Our goal is to enable
a computationally capable access to statistical exploration
with very minimal infrastructure.
2.4</p>
    </sec>
    <sec id="sec-7">
      <title>Resulting Design Principles</title>
      <p>
        In meeting these goals, we developed the following design
principles for this interface: 1) Browser Based: The
ability to run the software entirely through a browser, with no
additional installed software, is possible because of the thin
nature of the interface, and helps facilitate device
independence as it can run on any device that includes a browser
2) Gesture Driven: Common statistical packages are
normally command line, script, or menu driven, but to
facilitate ease of use across devices and without user expertise,
we have tailored a process that allows full exploration and
setup of statistical models by interacting directly with the
data though gesture 3) Graphical Representation:
Directed graphs, also called probabilistic networks, are
increasing used by statisticians as a representation of complex data
generation processes [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], particularly in structural equation
modeling as well as casual inference [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Using directed
graphs to convey possible relationships between variables
allows novices to convey all of their substantive understanding
of a dataset, substantive experts and teachers to
communicate in a manner conducive to qualitative discussion, and
still allow statisticians to explicitly de ne their hypotheses
of the data generating process. 4) Maximal
computational leverage: Most statistical packages are interactive
in a command driven relationship, waiting for instructions
before creating any analysis product. To facilitate the
architecture of the interface, we instead have an impatient
design that preprocesses any graphs and summary statistics
that we can envision might be needed, so they are already
loaded on the thin client, without needing data interactions
or queries. Much of this preprocessing can be done when the
data is ingested to the repository, before the user has even
discovered the data.
3.
      </p>
    </sec>
    <sec id="sec-8">
      <title>TWORAVENS SOFTWARE</title>
      <p>
        The interface is written in Javascript and incorporates
aspects of interactive graphics, Web-based statistics, and
customized R applications. The interactive visualizations use
Data-Driven Documents (D3), which has been in uenced by
tools such as Protovis and Processing [
        <xref ref-type="bibr" rid="ref17 ref3 ref4">4, 3, 17</xref>
        ]. Although
components of TwoRavens appear as a Web interface for
statistical software (comparable to R- ddle or SAS
OnDemand), its interaction with R is closer to that of
applications created using tools such as RStudio's Shiny [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In
its entirety, TwoRavens is comparable to GleamViz, which
contains an interactive statistical modeling tool and whose
remote servers handle the necessary processing [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Where
TwoRavens distinguishes itself from GleamViz is that it does
not require a desktop client and it is a general-purpose
statistical modeling tool, whereas GleamViz is tailored for
modeling infectious diseases. Thus, as a Web-based,
gesturedriven tool for statistical modeling, TwoRavens is unique.
      </p>
      <p>Furthering its distinction from existing statistical
software, it integrates with Dataverse, providing instant
access to tens of thousands of datasets by simply launching
TwoRavens from any Dataverse repository page, and with
Zelig, delivering analysts meaningful yet easily interpretable
statistical estimates and graphics. Given the availability of
open access data in repositories such as Dataverse, the open
source power of R, and the trend towards Web-based,
interactive visualizations, such a device ties together many
threads of modern quantitative computing.
3.1</p>
    </sec>
    <sec id="sec-9">
      <title>Dataverse and Zelig Integration</title>
      <p>
        In the last decade, the Data Science team at Harvard's
Institute for Quantitative Social Science (IQSS) has developed
software infrastructure and tools to facilitate and enhance
data sharing, preservation, citation, reusability and analysis
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Over that time, the team has continuously developed
two software products now widely used by the research
community: Dataverse, a repository infrastructure for sharing
research data, and Zelig, a statistical package for R.
      </p>
      <p>
        Dataverse is \an open source data repository, which
allows one to publish, share, reference, extract, and analyze
research data" [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The Harvard Dataverse Network contains
more than 52,000 studies with more than 700,000 les, and
is the world's largest collection of social science data sets
(http://thedata.org). The primary connection between
TwoRavens and Dataverse is an API accessing metadata
that complies with formatting standards set by the Data
Document Initiative (DDI), an organization that promotes
diligent data management [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In addition to keeping it thin,
this facilitates the deployment of TwoRavens to any data
repositories that comply with DDI standards.
      </p>
      <p>
        Zelig is a wrapper and interface that allows a large body
of di erent statistical models in the R statistical language
to be used from a uni ed call structure [
        <xref ref-type="bibr" rid="ref11 ref14">11, 14</xref>
        ]. It is also a
modeling architecture that interprets these statistical
models in a substantively meaningful fashion [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. By integrating
with Zelig, TwoRavens has minimal backend manipulations,
other than to map the information that has been entered by
the user into an appropriate Zelig call. Thus, new
functionality in Zelig may be translated to new functionality in
TwoRavens rather seamlessly. Both R and Zelig are open
source and freely available.
      </p>
    </sec>
    <sec id="sec-10">
      <title>THE USER INTERFACE</title>
      <p>Javascript is lightweight, and allows us to run the
interface entirely through an internet browser. For ease of use
and cross-platform portability, all functionality has
gesturedriven capability, so statistical models on datasets stored
in online repositories could be run from a tablet or mobile
device without a keyboard. This ability expands the set of
ways individuals use archived data and quantitative analysis,
including bringing real-time data analysis into the classroom
without using a computer laboratory setting.</p>
      <p>A screenshot of TwoRavens is shown in gure 2. The
work ow of developing a statistical analysis moves from left
to right, rst examining and selecting variables in the dataset,
then constructing a framework of possible relationships, and
then choosing and interpreting an appropriate statistical
model for that framework. In gure 2, the user has
selected a model, tagged ti_cpi as the dependent variable,
and graphed an appropriate statistical model.
4.1</p>
    </sec>
    <sec id="sec-11">
      <title>Left Panel - Data Selection</title>
      <p>
        Each variable in the dataset can be introduced as a node
in the center space by clicking on the variable name in the
left panel. Users may not be familiar with each variable
in the dataset, so on mouseover information pertaining to
that variable appears, such as primary summary statistics,
graphs, metadata and short descriptions of the variable.3
This information is precalculated by Dataverse when a new
dataset is uploaded, and stored in a json le that is
compatible with the DDI schema [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Although not explicitly in the left panel, TwoRavens
includes an option to perform transformations on variables,
such as taking the log of a variable or multiplying it by
some factor, using the input box in the top right. Users may
transform variables in two ways: (1) by manually entering
text into the transformation input box; or (2) by clicking on
the input box, selecting a variable from the drop down list,
and then selecting a transformation from the function list.
In this way, the gesture-driven functionality is preserved,
while allowing additional exibility for users familiar with R
functions.</p>
      <p>Users may be interested in subsetting the dataset to
examine only observations that have speci c values (for example,
only European countries, or only respondents over the age
of 60). For such cases, we include a Subset tab that shows
the distribution of each variable. These distributions are
either a density plot or a bar plot, depending on the variable's
level of measurement and its number of unique values. By
brushing the plots with the pointer, users select ranges of
that variable upon which the data is to be subsetted. The
numbers associated with the range are shown so that users
may be precise in the ranges they specify. All metadata
is remotely recalculated for the new subset using the same
Dataverse ingest routines, and a new space that represents
the subsetted data is added to the carousel in the center
space of the interface.
3For devices not compatible with mouseover, we enable this
feature via click and hold.</p>
    </sec>
    <sec id="sec-12">
      <title>Center Space - Relationship Mapping</title>
      <p>In the focal, central space, the substantive knowledge of
the researcher is easily communicated by drawing a directed
graph representation. With a two- nger click, arrows are
drawn connecting nodes, depicting the possible relationships
between variables. The nodes and arrows are an application
of D3's force layout. In the simulated forces that propel the
visualization, each node has a gravity, which draws all the
nodes together, as well as a charge that repels them from
getting too close. The arrows act as simple springs, and are
removed when clicked. As the graph between the variables
is built up by the researcher, it dynamically rearranges itself
by these simulated forces. The researcher can also physically
drag the pieces around, which acts as an additional force in
the dynamic visualization, or completely turn o the
simulated forces, and place every node manually for complete
control of the representation (using the force toggle icon,
represented by a pin).</p>
      <p>Dependent variables, time, cross-sectional, and nominal
identi ers are properties that may be tagged, by the user,
to a node. Each property is denoted with their own
colored halo. To tag a property, a user mouses over a node in
the directed graph, at which point buttons appears as arcs
around the perimeter of the node. Each arc is colored and
labeled, and clicking on the arc associates that property to
that node. At this point, the halo is colored appropriately
and a legend appears, reminding users what the color of the
halo represents. Additionally, the background color of the
tagged variable in the left panel is changed to re ect the
color of the tagged property.</p>
      <p>The portion of the center space that is visible is actually
just one element of a carousel, and users have the option
to add and remove new workspaces (hereafter, elements).
For example, if a user subsets the data, then an additional
element is added to the carousel that corresponds to the
subsetted data. Users toggle between elements by clicking
a chevron or by clicking and swiping in the direction they
want the carousel to move. Their current element is shown
by highlighting its corresponding dot in the top-center of the
center space. By clicking on the plus sign to the right of the
dots, users are duplicating the current carousel, and placing
its representation at the right of the element array. By
clicking the minus sign, users are dropping this element from the
carousel. To clear a modeling space but not drop it from the
carousel, users may select the Erase icon, represented with
a magnet.
4.3</p>
    </sec>
    <sec id="sec-13">
      <title>Right Panel - Model Implementation</title>
      <p>After examining the data and constructing a diagram of
relationships, in the right panel users begin to investigate
statistical models. The Models tab provides a list of the
available statistical models that can be employed by Zelig.
On mouseover, users see a brief description of the model so
that they have some guidance as to which to select.</p>
      <p>
        The next tab, labeled Set Covar., provides the ability to
interpret any estimated model by means of predicted and
expected values at chosen values of the covariates, as well
as rst di erences created by the changes in the predictions
across changes in the covariates [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. As shown in gure
3, users may choose values of the covariates at which to
interpret the model by means of sliders superimposed on
the densities of the variables. The slider positions initially
default to the mean of each variable, while the scale of the
slider marks each standard deviation away from the mean
within the range of the variable. For bar plots, the value of
the bar is also placed on the slider's scale.
      </p>
      <p>When this information is complete, the researcher can
estimate the model by clicking the Estimate button. At this
time, the information extracted from the user is passed to
an instance of an R application hosted on a remote server.4
This remote application rst builds a formula representation
of the model from the graph connections the researcher has
constructed. In the present version, this is all variables that
have a path to the dependent variable, but more complex
graphs can include intermediate or post-treatment variables
as well as consequences of the dependent variable, which are
4We developed our R application in Rook and host it in
rApache.
useful for forecasting and imputation, but omitted from the
formula for a causally oriented analysis. Using this formula,
the R application calls the applicable statistical model from
the Zelig library. The results from Zelig are asynchronously
returned to the browser interface. As can be seen in gure 4,
the plots that Zelig produces, as well as a table of estimates,
are available for viewing inside the Results tab.</p>
    </sec>
    <sec id="sec-14">
      <title>EXAMPLE APPLICATION</title>
      <p>
        The Quality of Government (QoG) represents one database
for which the primary bene ts of our tool might be
recognized for two target audiences, the novice user and the
classroom instructor. The QoG is a collection of
countryyear datasets whose variables include data on population,
respect for human rights, and political regime type [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Its
primary objective \is to address the theoretical and
empirical problem of how political institutions of high quality can
be created and maintained" [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>A simple empirical exploration of this question using the
QoG data, however, can be fraught with di culties for novice
users. Minimally, one needs to be familiar with some
statistical software package and to understand enough statistics
to be able to analyze the data. The QoG data would have
to be downloaded locally, as would the software being used.</p>
      <p>Our tool improves accessibility by reducing or removing
these initial barriers. Users interact with the data in visual,
gesture-based ways, and so the time spent learning how to
use TwoRavens is negligible in comparison to what is
necessary to analyze data with R, for example. Statistical
models are represented visually as a directed graph, and users
may instantaneously view each variable's distribution and
summary statistics by hovering over a node in the graph.
For example, Figure 3 shows a description of the QoG
variable bl_asy25f in the left panel. The variables are listed in
the left panel, and are added and removed from the
modeling space by clicking on the variable name. The modeling
space, the center of the gure, shows a representation of a
user-de ned statistical model.</p>
      <p>For instructional purposes, the QoG represents a type of
dataset that may be used for a class project. It contains
746 columns of data in a time-series, cross-sectional format.
Many of these columns are substantively interesting
dependent variables, while others appear as explanatory variables
in many models in quantitative Political Science. For
instructors to teach how to use data to study questions in
politics, requires some degree of expertise in a statistical
software package, a projector, and a computer with the rights
to that software. For student projects, each student would
have to download the data and some statistical software to
their personal computer, and the instructor would have to
provide materials and guidance on software usage. Most
statistical software is proprietary, so students are often
restricted to labs that are equipped with the necessary licenses.
TwoRavens removes these barriers, and all that is necessary
for the instructor to bring data to the classroom, and for
students to analyze the data, is an internet connection and
a Web browser.
6.</p>
    </sec>
    <sec id="sec-15">
      <title>CONCLUSIONS</title>
      <p>TwoRavens is a Web-based tool for statistical analysis
that is lightweight, broadly accessible, and device
independent. It integrates with Dataverse, providing access to the
tens of thousands of data repositories that exist there, and
leverages the power of Zelig, an R library that provides a
common call structure for a large number of statistical
models. Entirely gesture-driven and intuitive for users of all
levels, TwoRavens reduces barriers to statistical analysis and
promotes the proliferation of empirical research.</p>
      <p>This article details the design of the user interface and
its applicability for use by statistical novices and users not
familiar with or who do not have access to statistical
software. Although integral and foundational to the TwoRavens
project, the UI is only the rst of three layers. In future
research, the TwoRavens project will be adding the model
\selector" and results \accumulator" layers to provide more
automated guidance on model selection and speci cation. The
selector synthesizes the user input and automates
suggestions, such as potential omitted variables, functional form,
and choice of statistical model. The accumulator stores all
models that have been estimated on each dataset, and
provides users with feedback on existing research using that
dataset and datasets judged to be similar.</p>
    </sec>
    <sec id="sec-16">
      <title>ACKNOWLEDGMENTS</title>
      <p>The authors would like to thank Merce Crosas and Gary
King for extensive feedback and ideas, Leonid Andreev, Michael
Heppler and Elizabeth Quigley for continued development
assistance, and Dwayne Liburd for creating our logo.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Data document initiative</article-title>
          . http://www.ddialliance.org,
          <year>Jun 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] Quality of Government. http://www.qog.pol.gu.se/research/, May
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bostock</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Heer</surname>
          </string-name>
          .
          <article-title>Protovis: A graphical toolkit for visualization</article-title>
          .
          <source>IEEE Trans. Visualization and Comp</source>
          . Graphics,,
          <volume>15</volume>
          (
          <issue>6</issue>
          ):
          <volume>1121</volume>
          {
          <fpage>1128</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bostock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ogievetsky</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Heer.</surname>
          </string-name>
          <article-title>D3 data-driven documents</article-title>
          .
          <source>IEEE Trans. Visualization and Comp</source>
          . Graphics,,
          <volume>17</volume>
          (
          <issue>12</issue>
          ):
          <volume>2301</volume>
          {
          <fpage>2309</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W. V.</given-names>
            <surname>Broeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gioannini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Quaggiotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Colizza</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Vespignani</surname>
          </string-name>
          .
          <article-title>The gleamviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale</article-title>
          .
          <source>BMC infectious diseases</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <fpage>37</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Crosas</surname>
          </string-name>
          .
          <article-title>The dataverse network: An open-source application for sharing, discovering and preserving data</article-title>
          .
          <string-name>
            <surname>D-Lib</surname>
            <given-names>Magazine</given-names>
          </string-name>
          ,
          <volume>17</volume>
          (
          <issue>1-2</issue>
          ),
          <year>2011</year>
          . Available at: http://j.mp/12yqVCZ.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Crosas</surname>
          </string-name>
          .
          <article-title>A data sharing story</article-title>
          .
          <source>Journal of eScience Librarianship</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ):
          <volume>173</volume>
          {
          <fpage>179</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>McSherry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nissim</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Calibrating noise to sensitivity in private data analysis</article-title>
          .
          <source>In Theory of Cryptography</source>
          , pages
          <volume>265</volume>
          {
          <fpage>284</fpage>
          . Springer Berlin Heidelberg,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Reingold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Rothblum</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Vadhan</surname>
          </string-name>
          .
          <article-title>On the complexity of di erentially private data release: e cient algorithms and hardness results</article-title>
          .
          <source>In Proceedings of the 41st annual ACM symposium on Theory of computing</source>
          , pages
          <volume>381</volume>
          {
          <fpage>390</fpage>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Edwards</surname>
          </string-name>
          . Introduction to graphical modelling. Springer,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Imai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>King</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Lau</surname>
          </string-name>
          .
          <article-title>Toward a common framework for statistical analysis and development</article-title>
          .
          <source>Journal of Computational Graphics and Statistics</source>
          ,
          <volume>17</volume>
          (
          <issue>4</issue>
          ):
          <volume>892</volume>
          {
          <fpage>913</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>King</surname>
          </string-name>
          .
          <article-title>An introduction to the Dataverse Network as an infrastructure for data sharing</article-title>
          .
          <source>Sociological Methods and Research</source>
          ,
          <volume>36</volume>
          :
          <fpage>173</fpage>
          {
          <fpage>199</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>King.</surname>
          </string-name>
          <article-title>Restructuring the social sciences: Re ections from Harvard's Institute for Quantitative Social Science</article-title>
          .
          <source>PS: Political Science and Politics</source>
          ,
          <volume>47</volume>
          (
          <issue>1</issue>
          ):
          <volume>165</volume>
          {
          <fpage>172</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Imai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Lau</surname>
          </string-name>
          . Zelig:
          <article-title>Everyone's statistical software</article-title>
          ,
          <year>2007</year>
          . http://zeligproject.org.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tomz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Wittenberg</surname>
          </string-name>
          .
          <article-title>Making the most of statistical analyses: Improving interpretation and presentation</article-title>
          .
          <source>American journal of political science</source>
          ,
          <volume>44</volume>
          (
          <issue>2</issue>
          ):
          <volume>347</volume>
          {
          <fpage>361</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          .
          <article-title>Causality: models, reasoning and inference</article-title>
          , volume
          <volume>29</volume>
          . Cambridge Univ Press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Processing</surname>
          </string-name>
          .js. https://www.processing.org,
          <year>Jul 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Shiny</surname>
          </string-name>
          . http://shiny.rstudio.com,
          <year>Jul 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Privacy-preserving statistical estimation with optimal convergence rates</article-title>
          .
          <source>In Proceedings of the 43rd annual ACM symposium on Theory of computing</source>
          , pages
          <volume>813</volume>
          {
          <fpage>822</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Teorell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Charron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dahlberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Holmberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Rothstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sundin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Svensson</surname>
          </string-name>
          .
          <article-title>The quality of government basic dataset made from the quality of government dataset</article-title>
          ,
          <year>2013</year>
          . Available at: http://www.qog.pol.gu.
          <source>se [Accessed: 15 May</source>
          <year>2014</year>
          ].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>