n2Mate: Exploiting social capital to create a standards-rich
                    semantic network
        David Peterson                        Anne Cregan                     Rob Atkinson                        John Brisbin
         BoaB interactive                 National ICT Australia            CSIRO Land & Water                   BoaB interactive
         2/84 Denham St.                    223 Anzac Parade              Lucas Heights Research                 2/84 Denham St.
  Tow nsville, QLD Australia 4810     Kensington NSW Australia 2052             Laboratories              Tow nsville, QLD Australia 4810
         +61 7 4724 2933                     +61 2 8306 0458             Private Mail Bag 7, Bangor              +61 7 4724 2933
  david@boabinteractiv e.com.au         anne.cregan@nicta.com.au            NSW 2234, Australia           john@boabinteractive.com.au
                                                                           rob.atkinson@csir o.au

ABSTRACT                                                                1. SOCIAL AND TECHNICAL CONTEXT
A significant boost on the path towards a web of linked, open data
                                                                        The current emergence of a data web has re-focussed our attention
is the establishment and promotion of common semantic resources
                                                                        on standards. To be truly effective, the semantic web needs to
including ontologies and other operationalised vocabularies, and
                                                                        evolve towards a minimum number of ontologies, highly re-used,
their instance data.     Without consensus on these, we are
                                                                        and densely interlinked, rather than a sparse network with
hamstrung by the famous “n-squared” mapping problem. In
addition, each vocabulary has its own associated attributes to do       minimal interoperability .
with why it was developed, what purposes it is best suited for, and     1.1 The standard proble m with standards
how accurate and reliable it is at both a content and technical
level, but most of this information is opaque to the general            The project to link open data can be realised through explicit
community.                                                              declarations by one data source in relation to another. These
                                                                        “hard” linkages provide a high degree of certainty, but make data
Our theory is that it is the lack of socially -sensitised processes     maintenance exponentially difficult as the number of hard
highlighting who is using what and why, that have led to the
                                                                        linkages grows.
current unmanageable plethora of vocabularies, where it is far
easier to build your own vocabulary than try to find a suitable,        Standards, understood as nodes of agreed meaning, provide a
reliable existing one.                                                  more scalable approach to data linking. By agreeing to use the
                                                                        same term to describe similar ideas in our different data, we
We therefore suggest that there is considerable value in the
                                                                        establish an implicit (semantic) linkage between our data. The
development of an online facility that performs the function of
                                                                        project to conceive, negotiate, and promote standards, however,
providing a space listing vocabulary and ontology resources with        has proven to be even more difficult than the maintenance of hard
their associated authority, governance and quality of service
                                                                        linkages.
attributes. Presenting this in a visual form and providing pivotable
search facilities enhances recognition and comprehension.               It is often noted, with some irony, that the great thing about
                                                                        standards is that there are so many to choose from...and if you
Additionally, and critically, the facility provides a focal point
                                                                        can‟t find one you like, you can always create your own. .
where discourse communities can make authority claims, rate
vocabularies on various parameters, register their commitment to        While these sentiments provide excellent platforms for pub-based
or usage of particular vocabularies, and provide feedback on their      oratory, the realities are not so easily dismissed. Application
experiences. Through social interaction, we expect the most solid       designers, knowledge seekers, and agencies with a mandate to
and useful vocabularies to emerge and form a stable semantic            interoperate are all too familiar with the significant resource
platform for content representation and interlinked knowledge.          drains that occur when standards are hard to locate, difficult to
Our strategy is to become sufficiently enmeshed in the native           apply, or confusing to distinguish between.
information habits of people and their derivative institutions to       Standard vocabularies and data definitions have been quietly
reveal and collect their standards-seeking needs and activities with    multiplying in traditional media since ancient Sumer (ca.
a minimum of effort on their part.                                      Wikipedia, Cuneiform) but in more recent times the Semantic
This paper describes a pilot facility testing the theory above.         Web has inspired a hyperbolic growth in contributions to the
Dubbed “n2M ate”, it is a novel exploitation of social networking       standards project. For instance, a search in Swoogle on the word
software to provide a lightweight and flexible platform for testing     “address” returns 12,834 semantic web documents; on “book” it
the efficacy of leveraging social networks to link existing registers   returns 19,601 (at 2008-01-24). For someone seeking to exercise
and „seed‟ an information space focussing on the use of standards       the efficiencies of knowledge reuse, this wealth of choice is
in online information management.                                       simply overwhelming and self-defeating. The current state of
                                                                        affairs reveals semantic fragmentation, not semantic integration
The paper uses examples from the Australian context to provide          and knowledge creation.
clear illustration of the central arguments.
                                                                        Even within a narrow domain like the Australian government,
Keywords                                                                there are a wealth of terminologies and metadata “standards”
Registers, vocabularies, standards, linking density, rdf graph,         available for government agencies to consider. It is not clear if a
social networking, knowledge re-use, n2M ate, n-squared                 whole of government survey of standards has ever been
                                                                        undertaken, but informal observation suggests that there are
                                                                        hundreds of attempts to describe very similar concept spaces.
1.2 Does anyone have a wheel like mine?                                   1.3 Scalable register networks
People have been trying to standardise themselves in one way or           As we have argued, there are many technical standards and
another for quite some time. The most obvious benefit of this             common policies in use across a wide range of government
instinct toward standardisation is communication efficiency, a            activities, but the very number of such activities and standards is
direct input to the rate of knowledge creation. By speaking the           in itself posing a significant challenge.
same language, we can communicate and collaborate far more                AGIM O and others have a role in promoting the use of common
effectively. Yet the barriers to standardisation appear to take on        approaches, but it is increasingly difficult to track which standards
new forms as fast as we evolve knowledge.
                                                                          apply to which set of problems.
In our present age the benefits of information interoperability are       In general, there is an issue about the scalability of any approach
now well understood, if only through their absence. M ost people          for improving interconnectedness. We believe that the most
and institutions involved in project scoping, information product         promising strategy is to utilise registers to hold metadata about
development, and online service provision clearly grasp the power         standards and their implementation, including records of
of knowledge re-use and the cost efficiencies of standards-based          organisations, projects, standards, controlled vocabularies (and
interoperation. This assertion is supported by the existence of an        associated people and roles). A network of such registers, coupled
entire government department whose mandate is to promote                  through normal web services mechanisms, has the potential to
effective and efficient information sharing, governance structures,       form a semantic fabric that addresses the business-level needs of
tools, methods and re-usable technical components across the              people and institutions. Whilst this is potentially a vast
Australian Government.                                                    undertaking, the bulk of target information already exists, and
The Australian Government Information M anagement Office                  there are already a great many people actively tasked with
(AGIM O) published a Government Architecture Reference M odel             identifying, using and promoting standards. These people are
1
  that discusses “...a repository of architectural artefacts (including   likely to be receptive to an effort such as n2M ate.
standards, guidelines, designs and solutions) that may be utilised        A network of registers, supported by a “register of registers”
by agencies to deliver an increasing range of Whole of                    addresses the most important questions: who is doing what, which
Government services.”                                                     standards are relevant, who can I talk to, what is the governance
In practice, however, we find that the task of identifying and            model for these artefacts, and how trustworthy is the source.
verifying the suitability of existing artefacts is simply too time-       Through a richly populated network of registers, these become
consuming. As a consequence, there are a great many ontologies            questions any organisation can rapidly address, and in doing so
and informal vocabularies used by a very limited number of                can promote commonality of approach within and amongst
organisations or agencies, with a great sparsity of intermappings         various discourse communities.
between them, even though there is a very large amount of
                                                                          1.4 Socially-sensitive metadata
crossover in terms of content.
M ore globally, the Linking Open Data (LOD) project [1] holds             One of the dark secrets of the machine-based knowledge project is
datasets that currently comprise over 2 billion triples but reveal        the enormous loss of content as we move from people‟s minds to
only about 3 million links (SWEO, 2007), so overall the graph is          their documents and datasets. David Snowden, amongst many
very sparsely interconnected [2].                                         others, has pointed to the impossibility of “collecting” knowledge
                                                                          from people without providing a meaningful context:
In many ways the current situation is akin to a train network that
has millions of stations (nodes) covering the same area                   “Human knowledge is deeply contextual, it is triggered by
(knowledge domains) but with a great sparsity of tracks                   circumstance and need, and is revealed in action. .... to ask
(mappings) between stations, and hardly any trains and passengers         someone what he or she knows is to ask a meaningless question in
(services, publishers, agents, users) running on the vast majority        a meaningless context. Tacit knowledge ... comes about when our
of them.                                                                  skilled performance is p unctuated in new ways through social
                                                                          interaction” [3].
Our experience with efficient rail networks shows that we want to
reach a necessary minimum of stations interconnected with an              A socially-sensitised strategy provides the meaningful context and
optimised number of tracks, and attract a maximum number of               familiar atmosphere that people require before they can (or will)
trains to utilise the infrastructure. This obviously gives us a far       reveal their knowledge in a useful way.
more robust and useful semantic network to traverse.                      We suggest there is a cluster of persistent problems in complex
In related research, it should be possible to show how the density        information spaces that can be socially characterised as follows:
of interconnectedness in the RDF graph improves the efficiency            Who and what:
of machine process operation without producing a debilitating
level of ambiguity. We would argue that the degree of                              Owner: Who owns it?
interconnectedness implemented between ontologies can be taken                     Creation: Who created it?
as a proxy indicator of interoperability across the knowledge
                                                                                   M aintenance: Who is responsible for maintaining it?
domain.
                                                                                   Domain: which domains is it relevant to? This will
                                                                                    include a number of different ways of considering
                                                                                    domains.
                                                                                   Usage: Who uses it?
1
    http://www.agimo.gov.au/services/GovDex
         Endorsement: Who endorses it? This will include                      standard project management practice in reporting on
          various parameters and a rating system.                               her project‟s progress.
                                                                      In the absence of a useful standards locator, it‟s not likely that she
         Processes: What Business, Government or other
                                                                      will achieve a high standard of conformance to the norms of her
          processes is it used in? What role does it play?
                                                                      discourse community.
         Governance: Who is in charge of it? Who has formally        In the absence of a socially -sensitised register space, it is not
          agreed to support, maintain, and implement it?
                                                                      likely her discourse community is actively sharing their
Quality of Service Parameters:                                        experience and wisdom with standards.
         Provenance: What guarantees are there that the              2.2 Instance Data
          information is accurate and verified?
                                                                      The facility needs to be designed around a sufficient minimum of
         Currency: How often is it updated? What guarantees are
                                                                      predicates that embody the “business logic” of the facility and
          there that it is up to date?
                                                                      establish the semantic armature we require for inferencing.
         Availability: What guarantees are there regarding the
                                                                      We propose the following [shows predicate] as a starting point:
          availability of the vocab, dereferencing considerations
                                                                                 Organisations are [responsible for] people, projects,
Other Considerations:
                                                                                  standards, and vocabularies
         How does it relate to other standards in the space?
                                                                                 People are [associated with] Projects
         User experiences
                                                                                 Projects are [implemented by] Standards
2. SOCIAL ARCHITECTURES AND                                                      Standards are [expressed with] vocabularies
SEMANTIC NETWORKS                                                                Trust or utility of Standards are [ranked by] People
The principle social platform techniques we seek to exploit           Using these indicative predicates as a starting point, we can
include:                                                              answer a matrix of discovery questions through faceted
    Popularity Rankings: number of times a standards artefact        visualisation. In each search operation, the user can rotate to a
     is referenced (implemented).                                     facet of interest to continue the discovery process.

    Authority Badges: mechanism to advertise an authority                       I know someone like me [PersonName] > What projects
     claim over a standards artefact.                                             are they associated with?
    Related to (“Friends of a S tandard (FOAS )” ): linkages                    Those projects are like mine [ProjectName] > What
     from standards artefacts to their cohort of implementers.                    standards are used in them?
    Trust ratings: showing satisfaction with the custodian of a                 Those standards are of interest [StandardName] > How
     standards artefact.                                                          can I decide which one is most appropriate for me?
    Hero worship: most interlinked, most trusted, most useful        The logic described here is possible because we have imposed a
Each of these techniques have corresponding interface strategies      limited set of predicate types. These types are native to the
that provide a powerful social platform in which people (and          n2M ate facility. To take advantage of existing social networks
institutional roles) can operate quite naturally.                     that utilise other predicate types, Semantic Web vocabularies such
Each of these techniques also forms a search facet that can be        as SIOC [4] and FOAF [5] will be used.
traversed with high efficiency faceted search and browsing tools.     The facility will also consider structured lists of resources, like a
                                                                      list of country names available from the same address, to itself be
2.1 Use Case                                                          a kind of register. For instance, many applications need a list of
A simple use case will help us set the stage for describing the       every valid country name for users to input their address
technical architecture proposed.                                      information. The ability to reference an external source that is
                                                                      authoritative, accurate, up -to-date and reliably available and
A researcher is preparing her research plan on a section of the
                                                                      derefenceable reduces the need for application maintenance.
Great Barrier Reef. Although she is an experienced marine
scientist, she is new to the GBR and to her host research facility.   The metadata held in these registers can be typed according to
She suspects she should be using:                                     existing conceptualisations. For example, the National Data
         standard naming conventions for the GBR regions;            Network 2 draws on ideas from the M etadata Open Forum 3 to
         standard identifications for the particular reefs;          classify their metadata as: Discovery metadata; Quality metadata;
                                                                      and Definitional metadata.
         standard data sampling techniques appropriate to the
          Australian tropics;                                         We note that the semantic register network can also list web
         standard data formats, enumerators, and vocabularies in     services in addition to typical standards artefacts such as
          her datasets;                                               ontologies and vocabularies.
         standard citations of agencies, programmes, and people
          referenced in her work;
         standard metadata fields and vocabularies to describe       2
          her research output;                                            http://www.nationaldatanetwork.org/
                                                                      3
                                                                          http://metadataopenforum.org/
We intend to specifically tune this facility to the needs of            Semantic interpretati on: MOAT
government and community agencies that have a mandate to                MOAT 7 (M eaning of a Tag) could serve as the basis for giving
participate in the creation and maintenance of highly effective         extended quality of information to free form folksonomy tagging.
approaches to service improvement.                                      This will allow users of the bookmarking system to have the
                                                                        flexibility of folksonomy and the interlinked structure of the
3. IMPLEMENTATION OPTIONS                                               Semantic Web. The added benefit is that M OAT is a distributed
A demonstrator version of n2M ate can be established using              system and can tap into other servers to give extended meaning to
readily available tools and datasets so that a more detailed critique   free-form tags.
can be pursued with a minimum of upfront overhead. In this
section we discuss some of the more promising approaches.               Tri ple-store: Sesame
                                                                        Sesame 8 could provide backend triple store, graph manipulation,
3.1 Key components                                                      RDF inferencing, and remote SPARQL [7] endpoint access.
The registration process, and maintaining a
network of linked objects, is the function of
traditional registry technologies, such as ebXM L
Registry. Navigating and efficiently querying the
contents and relationships is not well supported by
this environment.
It is proposed to automate the harvesting of object
relationships from the “Register of Registers” into
a triple-store. This is the same pattern found in
data-mining, where transactional database content
is restructured into generalised query -oriented
structures. For our purposes, automated discovery
of patterns is not the focus: fast, efficient visual
presentation is essential.
Users will be parsing through extensive data
structures, and may need to propose and refine
their discovery logic in quick, exploratory sorties.


Visualisation and facet search: Gnizr +
Solr
We want a tool that thinks natively in URIs and
triples. Gnizr 4 is an open source front end that
handles user account management, bookmarking,
tagging, and semantic search
Every object stored by gnizr is a bookmark (URI),
and the folksonomy tag interface is SKOS [6]
enabled.
Solr 5 is an open source enterprise search server
based on the Lucene Java search library, with                           Figure 1: n2Mate Conceptual architecture
XM L/HTTP and JSON APIs, hit highlighting, faceted search,
caching, replication, and a web administration interface.               Policy l ayer: PLING
                                                                        The development of robust approaches to policy negotiation is
Solr could be used to facet the data into searchable and                being driven by a W3C Interest Group 9. The n2M ate project
browseable components. For example, if users are interested in          could field test various strategies for handling issues of personal
what ontologies Sun M icrosystems is using, they select Sun from        privacy, information reuse, and access control.
the „Who is Using‟ facet. The other facets instantly re-order and
re-number themselves and the user is free to further refine the
results by selecting additional facets.
Faceted search visualisation can be negotiated through cluster
maps (eg, Aduna 6) with a high degree of efficiency.


4                                                                       7
    http://code.google.com/p/gnizr/                                         http://moat-project.org/
5                                                                       8
    http://lucene.apache.org/solr/                                          http://sourceforge.net/projects/sesame/
6                                                                       9
    http://www.aduna-software.org                                           http://www.w3.org/Policy/pling/
Trust and Governance: POWDER                                                        Ping the Semantic Web 16: archives the location of
POWDER 10 is the W3C‟s Protocol for Web Description                                  recently created/updated, web-accessible RDF
Resources, currently in development.
                                                                         3.3 Data harvesting and processing
Governance: is related to the idea of trust. In the context of this
                                                                         n2M ate can leverage existing search engine services, such as
project, we want to explore two aspects of governance:
                                                                         those listed above, to collect data instances from target registers
1. How to make it easy for agencies who have a mandate to be an          and sources. M any of these have or are developing APIs that
authority for some asset to discharge their duty in an efficient and     facilitate direct access to their collections and service points.
useful way.
                                                                         Where well-formed registers and artefact collections exist already,
2. How to provide users with a suite of trust measures that will         n2M ate could establish harvesting relationships (presumably
allow them to evaluate the qualities of a particular asset in relation
                                                                         through appropriate API arrangements). OWL files, RDF data
to their needs.                                                          dumps, and SPARQL endpoints could be pointed to the n2M ate
POWDER seeks to develop a mechanism through which                        system for automated data fetching and processing.
structured metadata can be authenticated and applied to groups of
                                                                         Additionally, trust algorithms would be created from graph
web resources.
                                                                         inferencing, metadata and social data to further guide the
POWDER provides us with a means to both retrieve information             prospective n2M ate user, allowing them to more quickly
about a block of Web Resources and authenticate that this                determine what is the best artefact to use in their situation. This
information may be attributed to the owners of the information.          will be an evolving process that will occur over time as the quality
3.2 Testing the system with existing resources                           of data and user interactions flows back and forth.

There are already many semantically rich registers implicit in the       4. CONCLUSION
operations of government, including the identifier of government         The unique aspect of this proposal is that it leverages the hidden
agencies, registers of company names, standards recognised by            formal and informal knowledge networks created by existing
Standards Australia, legislation and regulations, management             business processes, and marries this information with social
areas for land, water, soils, health etc. This represents a wealth of    networking models to provide a useful way of organising and
entities about which assertions can be made, to create a                 navigating the wealth of available information. It uses the
semantically rich environment.                                           community of people using vocabularies to empower others,
Semantic Web data can be roughly broken down into 3 levels: [2]          starting with the places where agreements already exist.
    1. Vocabulary / Ontology                                             The n2M ate provides a tool that encourages use of standardised
    2. Individual occurrence of those terms and actual                   artefacts by exposing existing registers, leveraging social
         instances of non-information resources                          networks and building a central reference point for users that will
    3. The links that tie the vocabularies to their occurrences          assist them to identify relevant semantic assets for their needs,
                                                                         choose amongst them, and feel confident about their utilisation.
All three of these need to be captured with adequate provenance
data to bootstrap n2M ate.                                               Further, research into the strategy proposed should provide
                                                                         contributions to related projects, such as the development of:
The following web services can be utilised to populate/update
information as well as add important metadata to the Register of               A lightweight mechanism revealing the state of
Registers component of n2M ate.                                                 interconnectedness in and between discourse communities.
           Watson 11: A gateway to the Semantic Web, focusing                 A bridging space between government, business, community,
            on: semantic data quality; relations between ontologies;            academia and science knowledge assets to enhance
            access to semantic data                                             broadscale interoperability.
           Talis S chema Cache 12: Cross-linked and navigable                 A genetic algorithm to breed, select, and hybridise various
            index of ontologies and vocabularies.                               standards artefacts such as ontologies, services, and trust
           S woogle 13: Search engine for Semantic Web artefacts               authorities.
           Sindice 14: Indexes the RDF web and pulls out the            In conclusion, we suggest that there is currently a significant level
            triples. From there it essentially creates a reverse         of inefficiency in the applied domain of project scoping,
            lookup.                                                      information product development, and online service provision
                                                                         due to the inadequacy and irrelevance of existing knowledge
           Falcons 15: Currently indexing 34,566,728 objects
                                                                         registers.
            (2008-02-01), Provides bi-directional resource linking.
                                                                         We further suggest that a promising solution strategy involves
                                                                         using the power of social networks, coupled with semantic
10
     http://www.w3.org/2007/powder/                                      discovery and visualisation tools, to create a socially -sensitised
11
     http://watson.kmi.open.ac.uk/Overview.html                          semantic network of standards registers.
12
     http://schemacache.test.talis.com/
13
     http://swoogle.umbc.edu
14
     http://sindice.com
                                                                         16
15
     http://iws.seu.edu.cn/services/falcons/                                  http://pingthesemanticweb.com/
5. REFERENCES
5.1 Citations
[1] C. Bizer, T. Heath, D. Ayers, and Y. Raimond. Interlinking
    Open Data on the Web (Poster). In 4th European Semantic
    Web Conference (ESWC2007), pages 802–815, 2007.
    http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProje
    cts/LinkingOpenData
[2] M . Hausenblas, W. Halb, Y. Raimond, and T. Heath. What is
    the Size of the Semantic Web? - M etrics for M easuring the
    Giant Global Graph, 2007.
[3] Snowden, Dave. Information vs Knowledge.
   http://www.rkrk.net.au/index.php/Information_Vs_Knowledge
[4] J. Breslin, A. Harth, U. Bojars, and S. Decker. Towards
    Semantically-Interlinked Online Communities. In Second
    European Semantic Web Conference, ESWC 2005, Heraklion,
    Crete, Greece, M ay 29-June 1, 2005. Proceedings, 2005.
    http://sioc-project.org/
[5] D. Brickley and L. M iller. FOAF Vocabulary Specication.
    Namespace Document 2 Sept 2004, FOAF Project, 2004.
    http://xmlns.com/foaf/0.1/.
[6] D. Brickley and A. M iles. SKOS core vocabulary specication
    2005-11-02. W3C working draft, W3C, November 2005.
    updated version under http://www.w3.org/TR/swbp -skos-
    core-spec.
[7] E. Prud'hommeaux, A. Seaborne, eds, SPARQL Query
    Language for RDF http://www.w3.org/TR/rdf-sparql-query/.
5.2 Special thanks….
Renato Iannella provided background thinking on the Policy
Aware Web. Alan Ruttenberg of the Science Commons and Tom
Heath of the Linking Open Data project provided encouragement
and wisdom from their broad experience. Steve M atheson from
the Australian Bureau of Statistics corroborated our intuition that
social platforms could play an important role in standards
adoption.