<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Interlinking Distributed Social Graphs *</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Matthew</forename><surname>Rowe</surname></persName>
							<email>m.rowe@dcs.shef.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="department">OAK Group Department of Computer Science</orgName>
								<orgName type="institution">University of Sheffield Regent Court</orgName>
								<address>
									<addrLine>211 Portobello Street</addrLine>
									<postCode>S1 4DP</postCode>
									<settlement>Sheffield</settlement>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Interlinking Distributed Social Graphs *</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">341A1DAC9A14B0D2A2D46FC7C4ABA5B9</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:52+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.4 [Information Systems Applications]: General Semantic Web</term>
					<term>Social Web</term>
					<term>Linked Data</term>
					<term>RDF</term>
					<term>FOAF</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The rise in use of the social web has forced web users to duplicate their identity in fragmented information spaces. Commonly these spaces contain rich identity representations hidden within walled garden data silos. This paper presents work to export social graphs from such data silos as RDF datasets, and provide linkage between these social graphs according to a graph matching paradigm. Our work contributes to the linked data movement by providing a decentralised social graph containing linked data describing fragmented identity components.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>The social web has allowed web users to replicate their offline lives and actions in an online environment. Sharing photos, messaging friends and networking to build relationships both socially and professionally has drawn users into using social web platforms to organise their lives. This has brought about the generation of rich social data attached to individual web users, describing their online identity. Properties of this identiy include biographical information; name, address, date of birth, and social infrormation; relationships, contacts and affiliations. The modern web user has a fragmented identity distributed over multiples platforms and services. Each service contains a different social graph, aggregating each graph would generate a clear description * Copyright is held by the author/owner(s). LDOW2009, April 20, 2009, Madrid, Spain. of the person, who their contacts are with useful contact information, and an identity reference point through a URI.</p><p>Work by the linked data<ref type="foot" target="#foot_0">1</ref> community has linked together existing accessible social web data sources (Flickr exporter, DBPedia) and published the linked content. Despite social web platforms and services offering advanced functionalities to enhance a person's online identity, exporting the internal social graph is not supported therefore inhibiting access to useful datasets. Such sites are described as 'data silos', where data is hidden within a walled garden. Exported and aggregated social graphs from different services could be used when signing up to a new web service to import the user's existing social network, trust networks could be created based on transitive relationships in the social network, and recommendation systems could retrieve suggestions based on the imported social graph (based on the actions of social network members).</p><p>This aggregation also creates a dentralised description of a person's online identity. The aggregation contains linked data which can be referenced for additional identity information from distributed data silos. Thus exporting information from closed data services and opening up information for reuse <ref type="bibr" target="#b6">[7]</ref>.</p><p>In this paper we present our contribution to the linked data project by describing an approach to export social data in a semantic graph format using RDF from a range of social web services, and aggregate the generated RDF by providing links between the exported datasets. The former component relies on mapping XML schemas to the appropriate ontology, whereas the latter is a graph matching problem providing links between person instances in separate graphs that are found to refer to the real world person or entity.</p><p>Deciding when a link should be created is an issue due to the non-existence of global unique identifiers for people, using simply the name of a person is problematic due to name ambiguity. Therefore we utilise as much additional semantic information as possible to aid the linking process. We present our own approach to matching person instances using low-level reasoning, and evaluate its success against two formal graph matching functions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">SOCIAL WEB LINKED DATA INTIATIVES</head><p>Representing social data found within social web services and platforms has been investigated by the SIOC (Semantically-Interlinked Online Communities) project<ref type="foot" target="#foot_1">2</ref> . Work by Bojars et al presented in <ref type="bibr" target="#b0">[1]</ref> demonstrates how social web platforms such as blogging services and social bookmarking platforms can express their content using the SIOC specification, providing semantic formalisations for authors, and their generated content. SIOC extends the FOAF (Friend of a Friend) specification <ref type="bibr" target="#b2">[3]</ref> designed to describe personal data consisting of biographical information and express relationships with other people in a social network. FOAF is well suited to our work, as each social web platform is built on the social model of making relationships and linking people together.</p><p>Although such formalisations exist it is essential to export information from data silos into the suitable format. Work by Alexandre Passant presented in <ref type="bibr" target="#b9">[9]</ref> describes how information is exported from Flickr<ref type="foot" target="#foot_2">3</ref> , the photo sharing platform, as RDF using the SIOC and FOAF specifications. Similarly work has also been carried out to produce an exporter<ref type="foot" target="#foot_3">4</ref> of RDF using FOAF from the microblogging site Twitter<ref type="foot" target="#foot_4">5</ref> , although the exportation of information is somewhat limited by not extracting any geographical or additional biographical information for social network members. Recent work by QDos<ref type="foot" target="#foot_5">6</ref> has created a FOAF Builder application<ref type="foot" target="#foot_6">7</ref> capable of providing linked data representations between a person's interests with the DBPedia dataset. However, in the context of this paper it fails to aggregate social graphs from data silos, instead pointing links to the existence of accounts within those silos for a given person. Social data extraction was described by Halpin in <ref type="bibr" target="#b5">[6]</ref> as exporting information into RDF using GRDDL<ref type="foot" target="#foot_7">8</ref> from XHTML, providing social markup existed. The markup had to be mapped to a well defined vocabulary such as FOAF or SIOC depending on the information used. I.e., the XFN Microformat<ref type="foot" target="#foot_8">9</ref> would be mapped to FOAF. In our previous work <ref type="bibr" target="#b11">[11]</ref> we describe our methodology for exporting social data from the social networking site Facebook<ref type="foot" target="#foot_9">10</ref> , producing RDF according to the FOAF specification. We extend this work in the context of this paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">REQUIREMENTS</head><p>Our approach to aggregating social graphs is split into two distinct stages: The first stage generates RDF using the necessary ontologies from various social web services, and the second stage then aggregates these social graphs. RDF provides a useful formalisation to describe information within each data silo, capturing biographical information and social network information. Where possible our approach must provide links to social network members in separate networks that are the same person. Based on these functionalities we have defined four requirements that our approach must fulfill:</p><p>1. Export social data contained within data silos into the same semantic form.</p><p>2. Link person instances from separate social networks referring to the same real world person.</p><p>3. Maximise the number of correct links while minimising the number of incorrect links.</p><p>4. Publish a decentralised linked social graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">SOCIAL GRAPH EXPORTATION</head><p>Exporting social graphs from walled garden data silos commonly involves the trivial task of mapping XML schemas offered by the web service to a semantic specification. We extend our previous work in <ref type="bibr" target="#b11">[11]</ref> by incorporating OpenID<ref type="foot" target="#foot_10">11</ref> to address person resolution, and enable information linkage. At a low-level this involves the requirement of an OpenID resource for the social graph owner, which is then assigned to the foaf:Person instance in the graph using the foaf:openid relation. For exporting social graphs, we use the FOAF specification to describe available social information in a semantic format. In the remainder of this section we present several exporters developed to extract RDF from several web platforms<ref type="foot" target="#foot_11">12</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Social Networking Sites</head><p>We have created social graph exporters for two social networking sites. The first exporter extracts social data from Facebook by mapping the returned XML response to the FOAF ontology thus capturing the identity information, and the social network consisting of instances of foaf:Person linked by the foaf:knows property to provide relationship ties. The FOAF ontology is well suited to capturing social information due its extensive expression and definition of identity information. Thus we consider XML schemas used by social web services to be subsets of the FOAF specification. We found that there were no properties that we could not map from the XML schema to concepts from the FOAF ontology.</p><p>Geographical information including city and country is formliased as an instance of geo:Feature by assigning the person's city and country to the geo:name and geo:inCountry properties from the Geonames ontology <ref type="foot" target="#foot_12">13</ref> . We chose the Geonames ontology due to its adoption by the Semantic Web community as a standard for describing geographical concepts. Each social network member is assigned a URI using the user identification number from the service, unfortunately the exportation of email addresses and web sites is not allowed which would serve as a useful dereferencing point. The following is a snippet of the RDF exported from Facebook.</p><p>&lt;foaf:Person rdf:ID="#me"&gt; &lt;foaf:name&gt;Matthew Rowe&lt;/foaf:name&gt; &lt;foaf:givenname&gt;Matthew&lt;/foaf:givenname&gt;</p><p>We applied the same approach when exporting social graphs from the social networking site MySpace<ref type="foot" target="#foot_13">14</ref> . As MySpace follows the OpenSocial<ref type="foot" target="#foot_14">15</ref> specification, our exportation tool can be adapted to easily extract social graph information from several other social networking platforms supporting the same specification (Bebo, Orkut, Hi5). As the OpenSocial specification and MySpace's data accessibility are still in development at the time of writing this paper and conducting our work, we were unable to extract rich social network data. This reduced the exportation process to only the person name of each social network member described using foaf:name and the user identification number from the site which was employed as a URI, each is assigned to an instance of foaf:Person which we bind to the social graph owner using the foaf:knows property. The ability to only extract person names and a URI meant that we were unable to export geographical information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Microblogging Platforms</head><p>For exporting RDF from the micro-blogging site Twitter we followed the same methodology as the exportation of social graphs from the previous social networking sites. Twitter also offers an XML response (without the required authentication) enabling a mapping to be made between the XML schema to concepts from the FOAF ontology. As Twitter allows access to geographical information we model this data as an instance of geo:Feature, and assign the city and country to the geo:name and geo:inCountry properties respectively. Each social network member is described using an instance of foaf:Person, and the geo:Feature instance is assigned using the foaf:based_near property. We also used the display name used by each member of the social network as their URI. Figure <ref type="figure" target="#fig_0">1</ref> shows an example social graph exported from Twitter. Following the exportation of social graphs from their hosting service, it is essential to enrich the graph where possible. The first stage of the exportation process only creates a geographical reference using the place name which can cause problems with ambguity. Therefore we enrich this representation by resolving the place name with unique identifiers. To perform this process we query the Geonames Web Service<ref type="foot" target="#foot_15">16</ref> (which accesses the Geonames dataset) using the geo:name and geo:inCountry properties assigned to the geo:Feature instance for each social network member. This returns a list of possible URIs for the location. We select the most relevant URI from the list and then assign this to the geo:Feature instance, and the latitude and longitude of the location which are assigned using the geol:lat and geol:long properties from the Geolocations vocabulary <ref type="bibr" target="#b1">[2]</ref>. This additional enrichment offers extra information for person resolution in the graph space, essential for the following social graph aggregation procedures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Graph Enrichment</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">SOCIAL GRAPH AGGREGATION</head><p>Aggregating social graphs identifies matching foaf:Person instances in separate graphs and provides links between these instances using the owl:sameAs property. The main challenge we face is deciding if two instances of foaf:Person with the same name in different social networks refer to the same real world person or entity. To make this decision we formalise a graph from each instance of foaf:Person and all outgoing properties and relations in the social network: This graph contains biographical information about the person, which can then be compared using graph matching techniques to derive a similarity measure and therefore the possibility of a match.</p><p>Formally we define the person graph as G = V, E where V denotes all the nodes within the graph represented as resources and literals extracted from the foaf:Person instance in the social network, and E denotes the edges con-necting those nodes represented as semantic relations and properties. Therefore the social network found within a given FOAF file contains a set of person graphs, each describing a different real world entity. Imagine we have two FOAF files F1and F2 where g ∈ F1and h ∈ F2 describes all the graphs within F1 and F2 respectively, and a similiarity function sim(g, h) that measures the similarity of the two graphs: Our task is to find the graphs in both F1 and F2 that achieves the maximum similarity measure whilst exceeding a given threshold, and therefore provide a match. This reduces the aggregation problem to a graph matching task; to investigate this procedure we now present three alternative methods for computing graph similarity. We evaluate and compare the success of each method later.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Node/Edge Overlap</head><p>Graph matching using node and edge overlap as described in <ref type="bibr" target="#b7">[8]</ref> utilises the Jaccard distance <ref type="bibr" target="#b3">[4]</ref> between two graphs to derive a similarity measure, the intuition behind this method being that the fewer edits required to transform one graph into another, then the more similar the graphs are. Essentially this method counts the number of edit operations to perform the transform, which are then normalised by summing the node and edge counts from each graph. Therefore the edit distance is described as:</p><formula xml:id="formula_0">sim(g, h) = 1 − 2 |Vg ∪ V h | + |Eg ∪ E h | |Vg| + |V h | + |Eg| + |E h |</formula><p>When generating the intersection of the node set from g and h we used the Levenstein string similarity measure <ref type="bibr" target="#b4">[5]</ref> to derive term similarity. If the similarity measure is above a predefined threshold then the nodes are classed as equivalent. String matching is not required for finding the intersection between the edge sets due to their semantic types being formally defined, this reduces the comparison to a trivial matter of binary comparison of objects. It is important to note that for consistency in our work, we used the same Levenstein string similarity measure when comparing literals in each graph matching method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Node Mapping</head><p>Work by <ref type="bibr" target="#b10">[10]</ref> provides a method to match graphs by providing possible mappings between nodes in each graph. In a similar approach to our work, semantic triples are modeled as edges in a graph describing a link between an object and a subject by a predicate taking the form (s, p, o). This approach derives similarity measures between every possible combination of object nodes (with an outgoing edge) and also between every combination of subject nodes (with an incoming edge) in separate graphs. Thus creating a list of all possible node mapping combinations between two graphs along with the similarity measure of the two nodes. The set of node mappings between two graphs is chosen that maximises the cumulative similarity score. Therefore the similarity between two graphs is defined as:</p><formula xml:id="formula_1">sim(g, h) = P n i=1 max(strsim(s i g , s . h )) + P n j=1 max(strsim(o j g , o . h )) #mappings</formula><p>The application of the approach in <ref type="bibr" target="#b10">[10]</ref> is to detect links between music datasets for artists, and records. In the context of our work we follow a similar line of application by attempting to provide links between members of different social networks, essentially held in separate datasets. Therefore we can adapt this approach for our work by comparing person graphs, and deriving matches based on the best possible node mappings according to the cumulative similarity score: This score like node/edge overlap must exceed a predefined threshold for a match to be valid.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Graph Reasoning</head><p>Due to the semantic structure of the graphs we are comparing it is possible to utilise semantic metadata to detect a positive match using some basic low-level reasoning: Imagine we have two graphs that we wish to compare: We extract the string literal from both graphs connected from the foaf:Person instance using the foaf:name property, thereby returning the person name of each graph. We compare the names using the Levenstein string metric to derive a match. If the names match we then move on to comparing other properties from each graph to confirm that the foaf:Person instances are in fact referring to the same real world object.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.1">Unique Identifiers</head><p>Unique identifiers, where available, can be exported from social web services and defined using the foaf:homepage, foaf:mbox and foaf:phone properties for the website, email address and telephone number respectively within the social graph. We find the edges in each graph that point to such unique identifiers, and compare them. Our intuition is that a match would provide suffficient confidence to confirm the link between foaf:Person instances in both social networks. However, should an edge only exist in one graph there is not sufficient knowledge to confirm the link, therefore we move on to analysing further semantic information defined in the graph space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.2">Geographical Reasoning</head><p>When deciding if two people from different social networks refer to the same real world entity we rely on geographical information as another useful information source for reaching a match decision. We follow the intuition that the owner of several social networks would not be friends with two or more people who share the same name and live in the same place. For example, a person named "Matthew Rowe" is friends with "Sam Chapman" on Facebook, and friends with "Sam Chapman" on Twitter. It is likely that "Sam Chapman" refers to the same person in each social network. On Facebook "Sam Chapman" is described as living in "Sheffield" and on Twitter "Sam Chapman" is also described as living in "Sheffield". Therefore we believe such additional information is sufficient to confirm that both instances of "Sam Chapman" are the same person, and should therefore be linked.</p><p>We extract the geo:Feature class attached to each instance of foaf:Person by the foaf:based_near property. In the previous section, we added additional geographical edges to the geo:Feature instance to describe the latitude and longitude of the location, together with a URI obtained from the geonames web service. Given that we wish to compare instances of foaf:Person sharing the same name, we first compare the location URI assigned to the geo:Feature class. Should the URIs match then we confirm that both foaf:Person instances refer to the same real world entity. However, if the URIs are different we compare the geographical proximity of the locations. Our reasoning behind this comparison is that people will divulge more sensitive information in different social networks, for example in a walled garden social networking site the user feels safer, averting prying eyes and would therefore state what suburb they reside in. Conversely, on a micro-blogging platform the user may only define the city they reside in. One method is to derive the geographical distance using the latitude and longitude described by the geo:lat and geo:long properties, and calculate the distance between the two points using the Haversine formula from <ref type="bibr" target="#b13">[13]</ref>. Should the derived distance be less than a predefined threshold then the person instances are deemed to be the same. However, following several experiments we found such a method to be prone to making incorrect matches in rural areas. Therefore decided to use the semantics of the location to better effect:</p><p>We analyse the place names to derive a relation between the locations and discover the semantics of that relation, if one exists. For example, a person named "Matthew Rowe" may reside in "Crookes" whereas another person, also named "Matthew Rowe" may reside in "Sheffield". We derive the locality of Crookes to analyse if there exists a relation to Sheffield. To do this we query DBPedia 17 using the following SPARQL query: This query returns the literal "Districts of Sheffield" describing the label for the DBPedia category containing all districts in Sheffield. The geo:name property can then be matched against this literal to confirm that the location "Crookes" is a district of "Sheffield" and therefore the graphs should be linked as they refer to the same real world entity. We define the similarity function using the following pseudocode: As we demonstrate above, the sim() function returns three classes of match: match, maybematch, and nomatch. If the foaf:name and geo:Feature URI match in each graph then we are fairly confident that the foaf:Person instances refer to the same real world entity, we also believe that the same level of confidence should be attached to matching a suburb using the the checksuburb() function. In both cases we return match. However, we are less confident when location infomation is not available for either person, likewise if we cannot discover if either foaf:Person instance's location is somehow related, therefore we only return a maybematch.</p><formula xml:id="formula_2">sim(g,</formula><p>Only when the foaf:name properties do not match are we confident that the foaf:Person instances refer to different people.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">PRODUCING LINKED DATA</head><p>Following the previous section we now have a set of matched graphs where each graph refers to the same real world entity or person. We produce links between these representations in the form of a new RDF graph describing the aggregated social graph content. We do not wish to duplicate information contained within each exported social graph, but instead provide links to this information for later reuse (we explain our reasoning behind this decision in the preceding subsection). For biographical information we aggregate all available properties from each social graph to generate a complete identity representation. For example the Facebook social graph contains identity information such as name, and data of birth, whereas the Twitter and MySpace social graphs contain the homepage, aggregating this contain defragments this person identity to generate a complete profile.</p><p>A new social network is created containing the aggregation of individual social networks from each social graph, matched instances of foaf:Person are merged to create a new instance as follows: For this instance we only include the foaf:name property and an identifier. We use hash values for identifiers according to guidelines described in <ref type="bibr" target="#b12">[12]</ref> due to the relatively small size of the datasets being considered for linkage. Where possible we reuse the identifiers from the available social graphs. As figure <ref type="figure" target="#fig_1">2</ref> demonstrates we reuse the identifier from the Twitter social graph for the merged foaf:Person instances. For foaf:Person instances that contain no aggregated content (I.e., only appear in the Facebook social graph), we simply reuse the identifier from the accompanying social graph (eg. User identification number). For each merged instance of foaf:Person we include a reference using the owl:sameAs property to the resource containing the instance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Social Graph Control</head><p>By providing linked data representations for each instance of foaf:Person in the aggregated social graph we attempt to minimise the duplication of personal information while offering decentralisation through linked data. This minimisation is an essential component of the defragmentation of identity information. It also passes responsibility for data access to separate locations and therefore separate access policies. For example, if a person may wish for their Facebook social graph to remain separate and only accessible by people they know and trust then access responsibility is delegated to the hosting service. Work by Yeung et al presented in <ref type="bibr" target="#b6">[7]</ref> presents a case for the control of social data using trusted hosting services, thereby delegating access responsibility to the hosting party. Recent advancements in access to social data now allows authentication to be controlled by such services such as OAuth 18 and recently FOAF+SSL 19 . The 18 http://oauth.net/ 19 http://esw.w3.org/topic/foaf+ssl latter option being particularly well suited to this setting where users are allowed access to a social graph depending on their authenticated FOAF file containing a trusted person that matches the requested social graph.</p><p>Another motivation behind this design decision was for the allowance of extensibility and addition of new triples into the aggregated social graph. If we imagine that a user has created their aggregated social graph containing exported social data from Facebook and Twitter, meanwhile they have also recently built up a rich social graph of contacts and information on the professional networking site LinkedIn<ref type="foot" target="#foot_16">20</ref> . Linking this new social graph to the existing aggregated graph only requires the addition of new triples to the exisiting sctructure, which in essence would be either a new foaf:Person instance linked by the foaf:knows property for a new person in the aggregated graph space, or a new owl:sameAs relation pointing to the foaf:Person instance in the exported social graph from LinkedIn.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">EXPERIMENTS 7.1 Data Sets</head><p>In order to evaluate and compare the success of our graph matching methodology against the two alternative methods we generated three files containing valid RDF according to the FOAF ontology. Each file was obtained from a different web service (Facebook, MySpace, and Twitter) using the exportation methodology described in section 4, and each file holds information related to one web user who has an account with each social web service.</p><p>We analyse the success of the three matching methods when attempting to match instances of foaf:Person in the different datasets by evaluating for type I errors (false positives) and type II errors (false negatives). Positives indicate a match and therefore where the datasets should be interlinked for that concept, negatives indicate where a match should not take place and therefore no linked data should exist. The optimum method should produce neither of these error types.</p><p>Figure <ref type="figure">3</ref> demonstrates how each individual dataset used in the experiment contains possible overlaps which should be linked together. These overlaps consist of social network members from each dataset who are the same person. For example, "Sam Chapman" appears in the Facebook dataset, and "Sam Chapman" also appears in the Twitter dataset, therefore a decision must be made whether a link could be established. The lack of an intersection between the Twitter and MySpace datasets is due to the fragmentation of identity information in each data silo. The user in question whose information we extracted from each service, used Twitter for professional purposes, Facebook for both professional and social, and MySpace for music and social purposes. Therefore the music and professional elements should not overlap, thus separating the Twitter and Myspace datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2">Results</head><p>Tables <ref type="table" target="#tab_4">1 and 2</ref> show the results obtained from our analysis together with the gold standard indicated in the final column. As we can see from ments none of the three graph matching methods produced links between these datasets, therefore we decided to omit the results as there was nothing to present.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">CONCLUSION AND FUTURE WORK</head><p>This paper presents our work investigating the exportation of social data described using semantic ontologies, and the linking of this data where possible. Comparing the outcome of this work with the previously detailed requirements it is clear that social data has been exported from data silos in a semantic form, and person instances from separate social networks are linked together where possible. Our method to perform low-level reasoning when matching person graphs yields good results in comparison with similar methods by maximising the number of correctly matched person instances and minimising the number of incorrect matches.</p><p>The produced RDF containing linked data describes links between matched person instances. Our decision not to aggregate all biographical information for each social network member is due to the privacy policies that we believe social data must adhere to. Instead, links to the existence of this data are provided. The data contained within the exported social graph data can then be controlled by a separate access policy. This fits in nicely with recent work to address privacy and trust within the Semantic Web community, where technologies such as FOAF+SSL (to control social graph access) and POWDER<ref type="foot" target="#foot_17">21</ref> (to describe social graph properties) are being adopted.</p><p>Our future work will include additional social graph exportation tools for other web services, and also the release of the aggregation service. We also plan to test our graph matching approach on additional larger datasets, and hope that existing XML schemas expand to allow additional information to be exported.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Twitter Social Graph</figDesc><graphic coords="3,316.81,53.80,259.78,208.69" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Linked Social Graphs</figDesc><graphic coords="6,53.80,53.80,257.76,167.76" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>Get location URI geoidgfrom g assigned to geo:Feature Get location URI geoid h from h assigned to geo:Feature If geoidg = geoid h return match Else If geoidg = null and geoid h = null return maybematch Else Get city name cg from g using geo:name Get city name c h from h using geo:name return checkSuburb(cg, c h )</figDesc><table><row><cell>return match</cell></row><row><cell>Else</cell></row><row><cell>return nomatch</cell></row><row><cell>checkSuburb(cg, c h ) :</cell></row><row><cell>Get districts of city label lg for cg</cell></row><row><cell>If strsim(lg, c h )</cell></row><row><cell>return match</cell></row><row><cell>Else</cell></row><row><cell>return maybematch</cell></row><row><cell>If mboxg=mbox h</cell></row><row><cell>return match</cell></row><row><cell>Get homepageg from g using foaf:homepage</cell></row><row><cell>Get homepage h from h using foaf:homepage</cell></row><row><cell>If homepageg=homepage h</cell></row><row><cell>17 http://dbpedia.org</cell></row></table><note>h) : Get person name ng from g using foaf:name Get person name n h from h using foaf:name If strsim(ng, n h ) &gt; threshold Get mboxg from g using foaf:mbox Get mbox h from h using foaf:mbox</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>node/edge overlap returns</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 1 :</head><label>1</label><figDesc>Matching the Facebook and Twitter Datasetsthe poorest results by correctly matching the fewest person graphs and also incorrectly matching the most person graphs. Node mapping performs well but still falsely classifies 3 instances of foaf:Person as being no matches. Graph Reasoning outperforms both previous methods by producing the most correct links between the person graphs. The reason for this method's outperformance is due to the large number of triples available in each person graph. Both the Facebook and Twitter datasets contain rich social data that can be exported from each service, namely geographical information that can be used to classify positive matches.In the experiments we permitted links to be created for foaf:Person concepts that returned a maybematch when performing graph reasoning.</figDesc><table><row><cell>The matching of graphs within the Facebook and MySpace</cell></row><row><cell>datasets yields interesting results. As Table 2 demonstrates</cell></row><row><cell>node/edge overlap performed poorly by only finding 2 in-</cell></row><row><cell>stances of foaf:Person that intersected the datasets. Node</cell></row><row><cell>mapping derived 10 positive links between foaf:Person in-</cell></row><row><cell>stances, but incorrectly created 107 links. Graph Reason-</cell></row><row><cell>ing did not classify as many correct links as node mapping</cell></row><row><cell>yet only falsely generated two links between the datasets.</cell></row><row><cell>The results from this part of the evaluation throw up some</cell></row><row><cell>interesting points: When we consider that we wish to min-</cell></row><row><cell>imise the number of false links, then the most naive method;</cell></row><row><cell>node/edge overlap is the most reliable, however this pro-</cell></row><row><cell>cedure creates very few correct links. The reason for the</cell></row><row><cell>poor performance of node mapping (generated 107 incorrect</cell></row><row><cell>links) and graph reasoning (only generating 5 correct links)</cell></row><row><cell>is due to the triples available in each dataset. Unlike the</cell></row><row><cell>Twitter dataset, the MySpace dataset only contains a name</cell></row><row><cell>property for each instance of foaf:Person. This means that</cell></row><row><cell>the graph matching task has very few triples and therefore a</cell></row><row><cell>limited graph structure to perform low-level reasoning with,</cell></row><row><cell>and in the case of node mapping must rely heavily on the</cell></row><row><cell>string similarity metric to derive the mapping which as the</cell></row><row><cell>results demonstrates is unreliable.</cell></row><row><cell>As mentioned previously when discussing the datasets for</cell></row><row><cell>use during the experiment, the Twtiter and MySpace datasets</cell></row><row><cell>do not contain any overlap. When performing the experi-</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 2 :</head><label>2</label><figDesc>Matching the Facebook and MySpace Datasets</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://linkeddata.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://sioc-project.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://www.flickr.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://tools.opiumfield.com/twitter/mattroweshow/rdf</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">http://www.twitter.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">http://qdos.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">http://foafbuilder.qdos.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">http://www.w3.org/2004/01/rdxh/spec</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">http://www.gmpg.org/xfn/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_9">http://www.facebook.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="11" xml:id="foot_10">http://openid.net</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_11">http://ext.dcs.shef.ac.uk/˜u0057/SocialGraphAggregator/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="13" xml:id="foot_12">http://www.geonames.org/ontology/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="14" xml:id="foot_13">http://www.myspace.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="15" xml:id="foot_14">http://code.google.com/apis/opensocial/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="16" xml:id="foot_15">http://www.geonames.org/export/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="20" xml:id="foot_16">http://www.linkedin.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="21" xml:id="foot_17">http://www.w3.org/TR/2008/WD-powder-dr-20081114/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9.">ACKNOWLEDGEMENTS</head><p>We would like to thank both Harry Halpin and Sam Chapman for allowing their information to be presented in this paper as examples. And also Neil Ireson for his meticulous proof reading expertise.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Weaving SIOC into the Web of Linked Data</title>
		<author>
			<persName><forename type="first">U</forename><surname>Bojars</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Breslin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the WWW 2008 Workshop Linked Data on the Web (LDOW2008)</title>
				<meeting>the WWW 2008 Workshop Linked Data on the Web (LDOW2008)<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008-04">Apr 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Basic geo (wgs84 lat/long) vocabulary</title>
		<author>
			<persName><forename type="first">D</forename><surname>Brickley</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2003-01">January 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">FOAF vocabulary specification</title>
		<author>
			<persName><forename type="first">D</forename><surname>Brickley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Miller</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007-05-24">May 2007. May 24th, 2007</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical report</note>
	<note>FOAF project. Published online on</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">A Graph-Theoretic Approach to Enterprise Network Dynamics</title>
		<author>
			<persName><forename type="first">H</forename><surname>Bunke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dickinson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kraetzl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">D</forename><surname>Wallis</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<publisher>Birkhauser</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Armadillo: Integrating knowledge for the semantic web</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chapman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Norton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ciravegna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Dagstuhl Seminar in Machine Learning for the Semantic Web</title>
				<meeting>the Dagstuhl Seminar in Machine Learning for the Semantic Web</meeting>
		<imprint>
			<date type="published" when="2005-02">February 2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Social semantic mashups: Exploring networks using microformats and grddl</title>
		<author>
			<persName><forename type="first">H</forename><surname>Halpin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of XML Conference</title>
				<meeting>XML Conference</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Decentralization: The future of online social networking</title>
		<author>
			<persName><forename type="first">C</forename><surname>Au Yeung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Liccardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Seneviratne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Berners-Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">W3C Workshop on the Future of Social Networking Position Papers</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Papadimitriou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dasdan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Garcia-Molina</surname></persName>
		</author>
		<title level="m">Web graph similarity for anomaly detection</title>
				<imprint/>
	</monogr>
	<note>poster</note>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m">WWW &apos;08: Proceeding of the 17th international conference on World Wide Web</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1167" to="1168" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">me owl:sameAs flickr:33669349@N00</title>
		<author>
			<persName><forename type="first">A</forename><surname>Passant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the WWW 2008 Workshop Linked Data on the Web (LDOW2008)</title>
				<meeting>the WWW 2008 Workshop Linked Data on the Web (LDOW2008)<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008-04">Apr 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Automatic interlinking of music datasets on the semantic web</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Raimond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sutton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sandler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the WWW 2008 Workshop Linked Data on the Web (LDOW2008)</title>
				<meeting>the WWW 2008 Workshop Linked Data on the Web (LDOW2008)<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2008-04">Apr 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Getting to me: Exporting semantic social network from facebook</title>
		<author>
			<persName><forename type="first">M</forename><surname>Rowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ciravegna</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ISWC 2008 Workshop Social Data on the Web (SDOW2008)</title>
				<meeting>the ISWC 2008 Workshop Social Data on the Web (SDOW2008)</meeting>
		<imprint>
			<date type="published" when="2008-10">October 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Cool uris for the semantic web</title>
		<author>
			<persName><forename type="first">L</forename><surname>Sauermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cyganiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Voelkel</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
		<respStmt>
			<orgName>Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical report</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Virtues of the haversine</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">W</forename><surname>Sinott</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1984">1984</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
