=Paper=
{{Paper
|id=None
|storemode=property
|title= A Qualitative Analysis of Edge Closure in Information Networks
|pdfUrl=https://ceur-ws.org/Vol-710/paper44.pdf
|volume=Vol-710
}}
== A Qualitative Analysis of Edge Closure in Information Networks==
A Qualitative Analysis of Edge Closure in Information Networks
Hareendra Munimadugu Anca Ralescu
School for Electronics & Computer Systems Machine Learning & Computational Intelligence Lab
Machine Learning & Computational Intelligence Lab School of Computing Sciences & Informatics
School of Computing Sciences and Informatics University of Cincinnati
University of Cincinnati Cincinnati, Oh 45221-0030
Cincinnati, OH 45221-0030 anca.ralescu@uc.edu
munimah@mail.uc.edu
Abstract differences exist with the application of the same technique
to the different networks. Pre-evaluating links is exciting;
since it is interesting to be able to predict who someone’s
Social Networks Analysis is one of the cutting edge research friends are before they actually form friendship in the net-
areas which finds applications in Information Retrieval and
Data Mining and which deals with very large amounts of data.
work. In a similar context we can say that a particular user
A considerable amount of research in this field has focused in a network will be able to efficiently develop and retain
on mining useful information patterns in networks. Other ap- friendships because of his position in the network and the
plications have focused primarily on structure of networks. possibility of forming links in this manner. Many methods to
Several models have been proposed to address both issues. evaluate community growth have concentrated on networks
Existing models have been developed and replaced with bet- involving people and tried to explain about their relation-
ter ones. One direction of this research has focused on how ships. However it is also quite interesting to see how some
to implement one method of analysis on several data asso- of these methods apply to similar networks not involving
ciations in order to understand how different data models be- people, rather involving different aspects about people.
have. Responses to such questions account for the underlying
differences in properties of the data considered. In its broad-
est sense, a community is a large set of nodes that have been
collected over a period of time. In the present scenario efforts Keywords
are being made to develop a complete model that can cor-
rectly explain and predict how links are formed in networks
and how a network of nodes dynamically ”progresses” into a
Social Network, Information Network, Edge closure, Link
bigger and more diverse one. If an idea is implemented on Formation, Nodal Analysis, Closure ratio.
networks derived from different domains the results can ac-
count for the underlying differences in the properties of the
data. Such an approach is useful because it helps to find not Problem Formulation
only differences based on domain but also identify similarity
and therefore to correlate one domain with another.
Computational analysis of Social Networks continues to be
a subject of intense interest and. The problem of social ties
is believed to be related to structure and function of social
Introduction networks (Granovetter 1985). Positive and negative links
play a major role in the same (Bogdanov, Larusso, and Singh
The ideas of human relations in a community have received 2010); this interplay has given foundation to many meth-
particular attention since time immemorial. Therefore com- ods developed for efficient determination of influence and
munity structure and expansion are two areas that have been opinion (Leskovec, Huttenlocher, and Kleinberg 2010). The
studied by philosophers, sociologists, and, more recently, by process of link copying which is implicit in all information
computer scientists. It is an exciting venture to try to develop networks and which explains the formation of new links is
a method that can account for the way in which people form known as the directed closure process (Romero and Klein-
and maintain relations with one another. This brings us to berg 2010). The methodologies as mentioned above have
the community structure analysis and link prediction and it been extensively used for understanding information net-
has become one of the important research problems in anal- works consisting of people. Examples of such networks are
ysis of social networks. In this paper we take up one recent Slashdot, twitter, Facebook and several others. Here we try
method of evaluating link formation in social communities to consider information networks not consisting exclusively
and we compare its application to two different networks. of people though they contain relevant and important infor-
In other words we do this by applying the same method on mation about people. Here we are interested to see how this
two different collections of data and comparing results. We methodology applies to information networks having sev-
then identify some possible criteria due to which significant eral different characteristics. We would also like to see how
differences in basic characteristics of networks account for In the data set on Wikipedia, a directed edge exists between
observed differences in network formation. The idea of edge two nodes in the order of connection of one page and a page
closure is one of the recent important developments in pre- in the references list. We shall start with a test data collection
dicting link formation and community structure. Our objec- of a hundred nodes where each node is a YouTube video.
tive here is to implement this method on different data sets or Two data sets are used simultaneously to test our hypothesis.
information networks with inherently distinct properties and The other data set consists of an equal number of Wikipedia
concentrate upon what characteristics of the data sets might links or pages.
be influential in the difference of community structure by
application of the same technique. As stated we focus our
attention upon networks which involve not people but some Remark 1 It is actually not essentially important that both
practical aspects regarding people. One such practical as- data sets contain an equal number of data points because
pect is to determine the interest and opinion of a group of what we are primarily interested is the extent of closure or
people with respect to a certain trend or their attitude to- in other words the percentage of nodes exhibiting this phe-
wards a topic. This type of treatment of networks can be nomenon.
useful to also determine closeness between various domains
of knowledge.
For the sample data set of YouTube, we have a collection
of tuples where each tuple has two nodes and the order in
Much research has been carried out on social networks con-
which they are connected is specified; it is the direction of
sisting of undirected edges. Evaluation of an information
the edge.
network such as YouTube videos is comparatively challeng-
ing because closing of undirected triangles of nodes in a so-
cial network is relatively easier. It is instinctively known that Data points for the Wikipedia information are also com-
the application of the same method will lead to different ob- prised of a collection of tuples. It is very clear that the num-
servation and result in different information networks. The ber total of tuples in the flat file is the number of relations or
short term goal of this research is to study the relevance and the number of edges in the graph.
effect of directed closure on various information networks.
Experimental Design
Towards Problem Resolution
As a starting example we begin with a specific context or
It is a significant fact that in order to carry out such a re- theme in a query in YouTube.
search of an already developed technique, we have var-
ious kinds of online social communities available. So- An edge exists between two nodes in the graph if the source
cial networks (Facebook, Orkut, and connotation networks), video has the destination video in its top ten favorites or sug-
Information networks (YouTube, Wikipedia, Web blogs, gested videos’ list. This edge is going to be directional be-
news blogs), Hybrid networks (twitter) and signed networks cause it points towards one node starting from another as in
(Slashdot, Epinions) are among some examples of what the favorites list.
communities we can possibly consider. However here we
consider the information networks and try to provide a clear An edge exhibits closure if it completes a triangle between
and concise analysis of such networks. We are interested in three nodes in order, or if it is going to be the edge that
how the concept of edge closure is applicable to an informa- completes the triangle (Romero and Kleinberg 2010). As a
tion network. Some of the examples of such a network are working example, we shall consider ”University of Cincin-
E-book repositories, phishing corpus, Wikipedia pages, text nati Engineering” as our theme or our concept for the data
corpus and YouTube. Because YouTube and Wikipedia are analysis. This means that is the query the user is interested
some of the largest networks available on the World Wide in. Four of the related nodes in YouTube are ”Tips to suc-
Web, and also because the information in these networks is ceed in Engineering”, ”Is Engineering right for me?”, ”How
freely available, we take these two information networks for an engineer folds a T-shirt” and ”Advice for Engineering
the purpose of research in this paper. Therefore the data sets Students”.
that we consider for this research are derived from YouTube
videos and Wikipedia pages. These are essentially collection A similar query on Wikipedia yields four of the data points
of tuples. In the data consisting of YouTube videos, each from Wikipedia data which are ”University of Cincinnati
video is taken as a node in the information network. One College of Engineering”, ”University of Cincinnati”, ”grad-
node is directionally connected to another if the first video uate students” and ”University of Cincinnati College of De-
has the second in the list of top ten ’favorites’ or ’sugges- sign, Architecture, Art and Planning”.
tions’. Considering this order is important because the ini-
tial few videos from YouTube hits are going to be those that We can notice that, because the nodes are based on the same
are closely associated with the main or desired result. Thus theme, they might be already linked, but what is more im-
an edge is generated between nodes in a directed graph. portant is whether they exhibit closure.
When the results are taken as nodes in a directed graph, the are retrieved based on words, meaning that a change of a
nodes present in the graph formed from the YouTube re- word might result in a possible difference in context in the
sults are as shown in Table while those present in the graph search. In Wikipedia even though combined meaning of
formed from the Wikipedia results are shown in Table . words is important sometimes a change of any one word
might result in a related but different result. This analysis
applies to both the information networks, but significance in
Table 1: YouTube results words or phrases varies for each network. These are infor-
mation networks and therefore dynamic; when a new node
Node 1 : Tips to succeed in engineering is added to the network, new links form between the node
Node 2 : Is engineering right for me? and its neighbors. However in some cases this leads to the
Node 3 : How an engineering student folds a T-shirt change in links between the existing nodes. One example
Node 4 : Advice for engineering students of this is as follows. When the new node is very closely re-
lated to some of the neighbors there can be a change because
the new node might be included in the top hits, thereby re-
Table 2: Wikipedia results moving an already linked node from the top ten favorites list.
However this is obviously not always possible. Hence within
Node 1 : University of Cincinnati College of the same context the inclusion of a new node in YouTube
Engineering signifies two things: that a new node has really arrived, or
Node 2 : University of Cincinnati that a video moves up in the list of hits of another video,
Node 3 : Graduate students causing a replacement.
Node 4 : University of Cincinnati College of Design,
Architecture, Art and Planning
Conclusion
Directional connectivity exists in these networks in the fol-
lowing manner. In the first example, node 1 has nodes 2 and
3 in its top ten hits. Therefore the directed arrow in the graph The experimentation on online data or text might identify
exists from node 1 to node 2 and also from node 1 to node 3. the closeness between features of the data sets or networks
Similarly node 2 and node 3 have node 4 in their hit list. The in general. It is useful to determine the degree of difference
directed closure is satisfied if node 1 has node 4 in the list. in results for a same theme for different networks. This can
In the example taken the closure does exist because node 4 is be useful to explain the general behavior of a certain net-
indeed present in the hit list of node 1. Similarly, in the sec- work with respect to an idea. In other words, this might
ond example directed connectivity exists between the nodes account for the way a certain network behaves with respect
considered. A similar explanation holds for the second ex- to a theme or context. We might also be able to predict how
ample also. In our testing phase over large data sets we make a network is going to behave with respect to a certain theme,
use of the idea that for large data sets nodal closure can be based on its behavior to similar themes. In order to extend
evaluated based upon analyzing ordered lists prepared for a this we can consider other information networks like phish-
node [4]. This is the additional information needed because ing corpus, E-book repositories, personal, and news and web
we cannot always have time stamps for such data. We shall blogs. There are many networks available online which can
carry out this test of edge closure on both data sets taking be used for such research. After such experimentation it
an equal number of nodes at one time. For the testing ex- might be possible to say that a particular network has simi-
ample the result is that in both cases directional closure is lar results with another in a given context. This can lead to
satisfied. In the data sets we consider, the number of con- saying that the related communities in such networks might
nections between nodes is fairly large. Many of the arrows behave similarly under similar given situations. One inter-
are also bidirectional. The expected result of the experiment pretation is that for a new incoming node the two networks
is that in the case of YouTube there will be a significantly might result in a similar number of links from a particular
more closure overall linkage when compared to Wikipedia community of nodes. It might be possible to group different
pages. This, according to the hypothesis is based upon the networks based on their similarity in behavior towards one
search on a theme or idea. We can give the following rea- particular context. Further work might include other infor-
soning to the expected result. In a search on YouTube the mation networks. One type of networks which offers scope
resultant videos or hits are based upon the overall query or for such research is networks with weighted edges (Kunegis,
collection of phrases or words. The videos will be displayed Lommatzsch, and Bauckhage 2009). It might be possible to
based on the theme of the query, which means combined extend the idea to networks with weights assigned to edges;
meaning of words is important. In such a case there will be it might be very interesting to see if closure is applicable to
emphasis on certain words that are important. Unimportant a network containing mixture of signed links It can also be
words, though a part of query do not affect search results sig- extended to compare several networks at once when consid-
nificantly. We might say that a change of these unimportant ering a common theme, or it is possible to compare other so-
words does not cause the results to vary significantly. But re- cial networks. It will be a more challenging work for signed
sults will change drastically should the main words change. networks. It will aid to clarify the understanding of web
However in case of the results displayed in Wikipedia pages communities and information networks. This can find useful
applications in related fields such as Sociology, Linguistics, Kunegis, J.; Lommatzsch, A.; and Bauckhage, C. 2009.
Mathematics and others. The Slashdot Zoo: Mining a social network with negative
edges. In Proceedings of the 18th international conference
on World wide web, 741–750. ACM.
Leskovec, J.; Huttenlocher, D.; and Kleinberg, J. 2010.
References Signed networks in social media. In Proceedings of the
28th international conference on Human factors in com-
Bogdanov, P.; Larusso, N.; and Singh, A. 2010. Towards puting systems, 1361–1370. ACM.
Community Discovery in Signed Collaborative Interaction Romero, D., and Kleinberg, J. 2010. The Directed Clo-
Networks. In 2010 IEEE International Conference on Data sure Process in Hybrid Social-Information Networks, with
Mining Workshops, 288–295. IEEE. an Analysis of Link Formation on Twitter. Arxiv preprint
Granovetter, M. 1985. Economic Action and Social Struc- arXiv:1003.2469.
ture: The Problem of Embeddedness. American journal of
sociology 91(3):481–510.