=Paper= {{Paper |id=Vol-369/paper-9 |storemode=property |title=Open Data Commons, a License for Open Data |pdfUrl=https://ceur-ws.org/Vol-369/paper08.pdf |volume=Vol-369 |dblpUrl=https://dblp.org/rec/conf/www/MillerSH08 }} ==Open Data Commons, a License for Open Data== https://ceur-ws.org/Vol-369/paper08.pdf
     O P E N D ATA C O M M O N S , A L I C E N S E F O R
                      O P E N D ATA
             Paul Miller                                       Rob Styles                                    Tom Heath
               Talis                                             Talis                                         Talis
 Knightʼs Court, Solihull Parkway                  Knightʼs Court, Solihull Parkway              Knightʼs Court, Solihull Parkway
     Birmingham, B37 7YB                               Birmingham, B37 7YB                           Birmingham, B37 7YB
      +44 (0) 870 400 5000                              +44 (0) 870 400 5000                          +44 (0) 870 400 5000
     paul.miller@talis.com                              rob.styles@talis.com                         tom.heath@talis.com

ABSTRACT                                                               1.    INTRODUCTION
                                                                       Much attention is currently being paid to the concept of Open
Attendees at the WWW2007 panel session on Open Data [1, 2]
                                                                       Source [13], and to the value its adoption can bring to the
will remember a wide-ranging discussion of the role that easily
                                                                       development and dissemination of software within a vibrant
accessible data could play in endeavors from scholarly publishing
                                                                       mixed economy comprising traditionally commercial, open
[3] to the creation of canonical product catalogs [4]. The authors
                                                                       source, and hybrid solutions of various forms. In the academic
argued there and subsequently [5] that an effective and flexible
                                                                       sector, too, existing models of publication are being challenged by
licensing framework is needed in moving forward.
                                                                       the rise of the philosophically related Open Access [14]
Paradoxically, we argue that you need to actively and consciously      movement. Here, as in the software world, the vehement
assert your desire that third parties be able to use data you place    polarization of early protagonists is increasingly giving way to a
online in order for those ‘visible’ and ‘accessible’ data sets to be   more pragmatic world view in which various models co-exist to
utilized most effectively.                                             meet a diverse set of requirements.

Significant progress has been made in the past twelve months,          In scholarly publishing, there has tended to be an unfortunate
with engagement [6] from Creative Commons [7, 8, 9] and others         presumption that rights in the raw data underpinning a paper’s
[10] resulting in a license [11] and notion of ‘community              analyses and conclusions will be retained and enforced; that these
norms’ [12] upon which all can build.                                  data will not be shared in order to allow readers to test the results
                                                                       for themselves. More recently, some funders have begun to
Categories And Subject Descriptors                                     require that both reports of research and data produced by research
                                                                       be made easily available for re-examination, and organizations
                                                                       such as Creative Commons are taking a serious interest in this
E.m [Data, Miscellaneous]
                                                                       area with their Science Commons project.

General Terms                                                          However, beyond these scholarly disciplines far less attention has
                                                                       been paid to the manner in which data can be used and reused,
                                                                       with only a few projects such as OpenStreetMap [15] really
Legal Aspects
                                                                       challenging the traditional models of control over creating and
                                                                       accessing the underlying data upon which so many applications
Keywords                                                               rely.

Open Data, Linked Open Data, Licensing, License, Creative              Almost everywhere one looks, now, increasing volumes of data
Commons, Science Commons, Open Data Commons.                           are being published to the Web with the explicit aim of
                                                                       interoperability and a strong but often implicit commitment to
                                                                       openness. Despite this commitment in principle, data is rarely
                                                                       made available in a manner that makes it straightforward to
                                                                       ascertain the uses to which it may subsequently be put by a third
                                                                       party. In small, tightly-knit groups where interchange of data may
                                                                       be governed by existing social norms this may rarely present a
                                                                       problem. However, with data interchange and interoperability
            Copyright is held by the author/owner(s).
                                                                       reaching Web scale, social norms alone cannot be relied upon to
          LDOW2008, April 22, 2008, Beijing, China.                    enforce fair and appropriate usage of data. Instead, licenses are
                                                                       required that make explicit the terms under which data can be
                                                                       used. By explicitly granting permissions, the grantor reassures
                                                                       those who may wish to use their data, and takes a conscious step
                                                                       to increase the pool of Open Data available to the web.
In this paper we will briefly outline and contextualize existing          may be better served by assessing how companies might make
work in the field, highlighting the cases in which existing licenses      data open rather than closed.
are appropriate and those areas in which they can not be
                                                                          Boyle asks for, and discusses, the empirical evidence of databases
meaningfully applied. We will then present the work of the Open
                                                                          being created in the EU and US. The differences in numbers
Data Commons, and describe the rationale behind the Open Data
                                                                          should provide insight into the economic ups and downs as the
Commons Public Domain Dedication and License.
                                                                          EU adopted a robust database right in 1996 while the US ruled
                                                                          against such protection in 1991.
2. DATA IS NOT A CREATIVE WORK
                                                                          Boyle explains that the US Chamber of Commerce oppose the
Discussion of opening access to resources on the web often turns,         creation of a database right in the US;
sooner or later, to the laudable activities of Creative Commons,
and we shall look at this effort in a little more detail shortly. It is   “[The US Chamber of Commerce] believe that database providers
important to understand at this point, however, that the legal            can adequately protect themselves with contracts, technical means
protections upon which Creative Commons (and other similar)               such as passwords, can rely on providing tied services and so on.”
licenses rely depend upon national and international legislation          And therein lies the rub. Without appropriate protection of
around Copyright. Copyright protection applies to acts of                 intellectual property we have only two extreme positions
creativity (‘creative works’), and categorically does not extend          available: locked down with passwords and other technical
either to databases nor to those non-creative parts of their content.     means; or wide open and in the public-domain. Polarising the
Despite numerous cases in which well-meaning individuals or               possibilities for data into these two extremes forces the creator of
organizations release data onto the Web and apply a Creative              data toward one of two extremes, neither of which are likely to
Commons or similar license to this, there is no meaningful - or           address the nuance of their own circumstances and desires.
defensible - legal basis to this and they have in effect done little
more than sow yet more confusion in this already complex space.           With only technical and contractual mechanisms for protecting
                                                                          data, creators of databases can only publish them in situations
If we are to release large quantities of data onto the Web with the       where the technical barriers can be maintained and contractual
explicit intention that it be used and reused, then a different           obligations can be enforced.
solution is required.
                                                                          We don’t tolerate this with creative works, our photographs, our
                                                                          blog posts and so on. Why would we expect it to make sense for
3. POLARIZING THE OPTIONS                                                 databases? Whether or not it makes sense comes down to whether
Back in November of 2004 James Boyle published ‘A Natural                 or not it is beneficial to society. We allow Copyright in order to
Experiment’ in the Financial Times [16]. This piece saw him               provide adequate remuneration to be collected by the creator of a
debating the merits of intellectual property rights over data with        work. We allow patents to allow the recovery of development
Thomas Hazlett and Richard Epstein. His primary thrust was that           costs for an invention. Which is database right more like?
we should be making policy decisions in this area based on
                                                                          The patent is a very broad monopoly. If one had a patent on the
empirical data about the economic benefits one way or another.
                                                                          clock, a mechanical means of measuring the passage of time,
Something all three protagonists agree on.
                                                                          nobody else would be able to make clocks without payment of
Much has changed since 2004, not least our understanding of how           some fee. Copyright on the other hand is much narrower, only
the web can affect the way we collaborate, share, communicate; it         allowing protection for the specific design of particular clocks.
fundamentally affects the way we live. We chat, we blog, we               Database right in the EU is like Copyright. It is a monopoly, but
Twitter, we Flickr and we Joost. Content flows from person to             only on that particular aggregation of the data. The underlying
person in unprecedented ways and at unprecedented speeds. This            facts are still not protected and there is nothing to stop a second
changes the nature of the experiment that Boyle talks about.              entrant from collecting them independently.

In Europe we have a right somewhat akin to Copyright,                     Richard Epstein points to this in his contribution to the Financial
specifically intended to provide protection for aggregations of           Times’ discussion;
data; databases. If this European Database Right were working,
                                                                          “The question is why do databases fall outside [the general
“we would expect positive answers to three crucial questions.             principle of copyright], when the costs of compilation are in many
First, has the European database industry’s rate of growth                cases substantial for the initial party and trivial for anyone who
increased since 1996, while the US database industry has                  receives judicial blessing to copy the base? In answering this
languished? [...] Second, are the principal beneficiaries of the          question, it will not do to say, as the Supreme Court said in the
database right in Europe producing databases they would not have          well known decision in Feist Publications v. Rural Telephone
produced otherwise? [...] Third, [...] is the right promoting             Service, (1991) that these compilations are not ‘original’ in the
innovation and competition rather than stifling it?”                      sense that it requires no thought to check the spelling of the
                                                                          entries and to put them all in alphabetical order. But that obvious
Boyle’s first two questions centre around the creation of databases       point should be met with an equally obvious rejoinder. If it
and his third, by his own admission, is difficult to measure. If one      requires no thought or intelligence to put the information together,
of our primary goals for the growth of the Internet is to have a          then why not ask the second entrant into the market to go through
web of data that can be linked and accessed across the globe we           the same drudge work as the first.”
This is exactly what we see happening with Open Street Map. The        and locking it away behind a pay wall or similar restrictive
United Kingdom’s national mapping agency, Ordnance Survey,             mechanism.
have rights over the map data they have collected. The protection
                                                                       Copyleft is just one position along a spectrum where ‘locked
covers the collection of geospatial data that they have created.
                                                                       away’ and ‘free as a bird’ sit at each end. What the web shows us
They are not granted a monopoly in geospatial data.
                                                                       is that other business models form crucial parts of the eco-system.
This leaves a special case of databases, those which are created at    Epstein picks up on the controlling aspect of Boyle’s argument:
low cost as a by-product of normal business. Examples used in
                                                                       “They can control their list of subscribers; give them each
Boyle’s article are telephone numbers, television schedules and
                                                                       passwords; charge them based on the amount of the information
concert times. Boyle gives us the answer directly;
                                                                       that is used, or some other agreed-upon formula; and require them
“the [European] court ruled that the mere running of a business        not to sell or otherwise transfer the information to third parties
which generates data does not count as “substantial investment”        without the consent of the data base owner.”
enough to trigger the database right.”
                                                                       Imagine if this were true of Copyright material on the web? It has
That a database right may not and should not apply in all cases,       been, and still is on the occasional site. But mostly copyright
and that there is a requirement to restrict anti-competitive           owners are starting to see the value of publishing content online
practices, does not necessarily extend to the conclusion that a        and they are underpinning the delivery of that content to
right is not required.                                                 consumers with other business models. Without Copyright the
                                                                       types of business that could participate would be reduced.
It seems that much of the debate around intellectual property
rights has focussed on how they are used to keep things closed.        Epstein goes on to say:
Having suggested earlier that we have only the abilities to keep
                                                                       “The contractual solution is surely preferable, because general
databases locked away or in contrast open them completely, there
                                                                       publication will allow for use by others that may not offend the
is scope for considering - and defining - protections that lie
                                                                       copyright law, but which will block the possibility of payment for
somewhere between these two extremes.
                                                                       the costly information that is supplied.”

4. EXISTING LICENSES                                                   And again, the very heart of the matter. If we are to encourage
                                                                       those who have large databases to make them open, to post them
In response to Thomas Hazlett’s contribution to the Financial          on the Semantic Web, we must provide them with models and
Times debate, Boyle asks;                                              solutions that are preferable to technical barriers and restrictive
“How many databases are now created and maintained entirely            contracts. Allowing them to pick their own position on the
‘free’ and thus escape commercial directories altogether? There        spectrum seems a necessary part of that. You can see any form of
are obviously many, both in the scientific and the consumer realm.     protection in two lights. When Boyle says;
One can no more omit these from consideration, than one can            “They make inventors disclose their inventions when they might
omit free software from the software market.”                          otherwise have kept them secret.”
This is an important point, and worthy of consideration. Taking        we say;
one of the most prevalent free software licenses, the Gnu Public
License [17], what might that look like for data?                      “They allow inventors to disclose their inventions when they
                                                                       might otherwise have had to keep them secret.”
One of the primary functions of the GPL is that it enforces
Copyleft – the requirement to license derivative, and even             In the world of creative works, notions espoused by Lawrence
complimentary, works under the same license. That is, any              Lessig and others over a number of years are becoming
commercial software that makes use of GPL code must, under the         increasingly well understood. A Creative Commons license, for
terms of the license, also be released under the GPL. The viral        example, is recognized as giving the holder of rights an ability to
nature of this license is possible only because of the legal backing   prospectively grant certain permissions rather than limit use of
of Copyright legislation.                                              their work by expecting all comers to request these permissions,
                                                                       again and again. Those rights are not cast aside, removing all
Without a legally recognised Database right, communities have no       opportunities to protect your work, your name, or your potential
mechanism to publish openly and still insist upon this kind of         revenue stream. Rather, you are provided with a means to
Share-Alike agreement for their data.                                  explicitly declare that your work may be used and reused by
Consider the impact of this for situations where you you might         others in certain ways without their needing to request permission.
use the idea of promiscuous copying to maintain the availability       Any other use is not forbidden; those uses must simply be
of data. Promiscuous copying relies on two things; lots of copies      negotiated in the 'normal' way... a normal way that also applied to
being made and lots of copies being available. Without the             those uses covered by Creative Commons licenses before the
necessary licensing in place there is no mechanism with which to       advent of those licenses.
compel those who have copies to make those available. Public           Creative Commons licenses are an extension of copyright law, as
Domain means, by definition, no restriction. There is nothing to       enshrined in the legal frameworks of various jurisdictions
prevent someone from taking data released into the public domain       internationally. As such, it doesn't really work terribly well for a
lot of (scientific, business, whatever) data... but the absence of      The result of this effort was the Open Data Commons Public
anything better has led people to apply Creative Commons                Domain Dedication and License [11], itself a fusion of ideas from
licenses of various types on data that they wish to share. It will be   the Talis Community License, an initial phase of redrafting from
interesting to see what happens, the first time someone seeks           Hatcher and Waelde, and a focussed piece of activity to align with
redress in a court, citing the Creative Commons license that they       a related framework developed within the Science Commons
selected as an appropriate protection against abuses of their data.     project of Creative Commons at the same time.
                                                                        The current iteration of the license asks licensors to waive various
5. A LICENSE FOR OPEN DATA                                              local protections in order to create a level playing field upon
Back in 2006, Talis released a first public attempt at an open data     which a set of ‘community norms’ may be documented in order to
license, the Talis Community License [17], and began to use it for      define a set of shared expectations as to the ways in which the
some early submissions to the Talis Platform [18]. In building a        data may subsequently be reused. The first of those community
Platform, we understood from the outset the importance of               norms is defined on the Open Data Commons site [12], and all
recognizing - and celebrating - the rights of those contributing        concerned expect compatible sets of norms to be created
their data to the shared pool. The Talis Community License              elsewhere in time.
allowed us to do that.                                                  The Public Domain Dedication and License is now available for
Not long after, Tim O'Reilly wrote;                                     use, following a period of consultation. At the time of writing, all
                                                                        those concerned in getting to this stage are engaged in the process
“One day soon, tomorrow's Richard Stallman will wake up and             of placing the wider Open Data Commons initiative itself on a
realize that all the software distributed in the world is free and      sound footing, creating a safe place in which this license and
open source, but that he still has no control to improve or change      those to follow it may be maintained and evolved. The Open
the computer tools that he relies on every day. They are services       Knowledge Foundation (OKF) in Cambridge, UK, is to lead by
backed by collective databases too large (and controlled by their       providing that neutral new home, and funders, drafters and other
service providers) to be easily modified. Even data portability         interested parties are united in supporting this move to a sound
initiatives such as those starting today merely scratch the surface,    and sustainable footing [20].
because taking your own data out of the pool may let you move it
somewhere else, but much of its value depends on its original           6. CONCLUSIONS AND OUTLOOK
context, now lost.” [19]
                                                                        There is a lot still to do, but the interdisciplinary collaboration
At Talis, we have an interest in seeing large bodies of structured
                                                                        we’re already seeing with respect to permissive licensing of data
data available for use. Through the Talis Platform, we offer one
                                                                        for the web means that we can all begin to move forward in
means whereby such data may be stored, used, aggregated and
                                                                        lowering the walls of our silos, releasing data to play its part in
mined, although we clearly recognize that similar data may very
                                                                        the Data Web. All of us invest heavily in collecting and curating
well also be required in diverse contexts.
                                                                        data, which is traditionally locked away and left to atrophy, failing
Recognizing that contributors of such data need to be reassured as      to achieve anything like its true potential. Appropriately released
to the uses to which we - and others - may put their hard work, we      and sensibly licensed, data held by every one of us can contribute
spent some time drafting what was then called the Talis                 hugely to the promise of the Semantic Web. Here, the whole
Community License. This draft license is based upon protections         really is far greater than the sum of its parts.
enshrined in European Law, and has been used 'in anger' for a
                                                                        The current license is available for use. It provides us with the
while to cover contributions of millions of records to one
                                                                        capability to build upon the efforts of those philanthropic
particular application on the Talis Platform.
                                                                        contributors to the existing Linking Open Data project [21], and
Despite interest in open (or 'linked') data, licenses to provide        to take the linked data proposition to that broader market of data
protection (and, of course, to explicitly encourage reuse) are few      curators who need more persuasion and reassurance. The
and far between. Amongst zealous early adopters, there does seem        opportunity is immense, as is the benefit to the Semantic Web
to be a tendency to either (mis)use a Creative Commons license,         itself.
to say nothing whatsoever, or to cast their data into the public
domain. None of these strategies are fit for application to
business-critical data.
                                                                        7. REFERENCES
Building upon our original work on the TCL, we provided
funding to lawyers Jordan Hatcher and Charlotte Waelde [10].            [1] Building a Semantic Web in which our Data can Participate,
They were tasked with validating the principles behind the                  WWW2007 Panel Session (May 2007). http://www2007.org/
                                                                            panel7.php
original license, developing an effective expression of those
principles that could be applied beyond the database-aware shores       [2] Miller, P. 2007 Presentations from WWW2007 Open Data
of Europe, and working with us to identify a suitable home in               panel now online. In Nodalities weblog. http://
which this new license could be hosted, nurtured, and carried               blogs.talis.com/nodalities/2007/05/
                                                                            presentations_from_www2007_ope.php
forward for the benefit of stakeholders far outside Talis.
[3] Suber, P. 2007 Peter Murray-Rust on open access and open
    data. In Open Access News. http://www.earlham.edu/~peters/
    fos/2007/05/peter-murray-rust-on-open-access-and.html
[4] Miller, P. 2007 Jamie Taylor Talks with Talis about Metaweb
    and Freebase. In Nodalities weblog. http://blogs.talis.com/
    nodalities/2007/05/jamie_taylor_talks_with_talis.php
[5] Styles, R. 2007 Open Data Licensing, an unnatural thought.
    In Nodalities weblog. http://blogs.talis.com/nodalities/
    2007/07/open_data_licensing_an_unnatur.php
[6] Miller, P. 2007 Licensing open data - Creative Commons and
    Talis have something to say. In Nodalities weblog. http://
    blogs.talis.com/nodalities/2007/12/
    licensing_open_data_creative_c.php
[7] Creative Commons. http://creativecommons.org/
[8] Wilbanks, J. 2007 Announcing the Protocol for
    Implementing Open Access Data. In Science Commons
    weblog. http://sciencecommons.org/weblog/archives/
    2007/12/16/announcing-protocol-for-oa-data/
[9] Steuer, E. 2007 Creative Commons launches CC0 and CC+
    Programs. Creative Commons media release. http://
    creativecommons.org/press-releases/entry/7919
[10] Miller, P. 2007 Seeking a license for open data. In Nodalities
     weblog. http://blogs.talis.com/nodalities/2007/09/
     seeking_a_licence_for_open_dat.php
[11] ODC Public Domain Dedication and License. http://
     www.opendatacommons.org/odc-public-domain-dedication-
     and-licence/
[12] ODC Community Norms. http://
     www.opendatacommons.org/odc-community-norms/
[13] Open Data Commons. http://www.opendatacommons.org/
[14] Linked Data definition from Wikipedia. http://
     en.wikipedia.org/wiki/Linked_Data
[15] OpenStreetMap. http://www.openstreetmap.org/
[16] Boyle, J. 2004 James Boyle: a natural experiment. Financial
     Times 22 November. http://www.ft.com/cms/s/
     2/4cd4941e-3cab-11d9-bb7b-00000e2511c8.html
[17] The Talis Community License. http://www.talis.com/tdn/tcl/
[18] The Talis Platform. http://www.talis.com/platform/
[19] O’Reilly, T. 2006 Four Big Ideas About Open Source. In
     O’Reilly Radar weblog. http://radar.oreilly.com/archives/
     2006/07/four-big-ideas-about-open-sour.html
[20] Open Knowledge Foundation. http://www.okfn.org/
[21] Linked Data Project. http://www.linkeddata.org/