=Paper= {{Paper |id=None |storemode=property |title=The Use of Modularity Algorithms as part of the Conceptualization of the Perspectival Form in Large Networks |pdfUrl=https://ceur-ws.org/Vol-1210/datawiz2014_07.pdf |volume=Vol-1210 |dblpUrl=https://dblp.org/rec/conf/ht/RegattieriMM14 }} ==The Use of Modularity Algorithms as part of the Conceptualization of the Perspectival Form in Large Networks== https://ceur-ws.org/Vol-1210/datawiz2014_07.pdf
           The use of modularity algorithms as part of the
          conceptualization of the perspectival form in large
          Lorena Regattieri                          Jean Maicon Medeiros                                Fabio Malini
            Labic - UFES                                   Labic - UFES                                  Labic - UFES
  Federal University of Espirito Santo           Federal University of Espirito Santo          Federal University of Espirito Santo
        regattie@ualberta.ca                     jeanmrmedeiros@gmail.com                         fabiomalini@gmail.com

ABSTRACT                                                              the algorithm that seeks to analyze them, as the natural language
                                                                      vocalized on them, are in continuous process of interrelation to
How can we identify perspectives in large networks through the        interpret the social world. The algorithm alone does not explain
application of modularity algorithms? In the digital humanities       these relationships. But collective action, today generative of
[1][2], there is a fair number of scholarly work exploring            digital traces [4] cannot be explained alone, only with historical
computational routines to cluster and analyze enormous amounts        social theories of the humanities.
of data. Recently, social data became a valuable source to study      Graph clustering or community detection [5][6][7][8] in complex
collective phenomenon, they provide the means to comprehend           networks have a long history of research in machine learning and
human collectivity by using graph network analysis. In this paper,    graph theory [9]. The studies in the field have gain attention from
we describe our approach on the manner of post-social                 several areas, the most common studies are find in biology,
anthropology [3] and social sciences using technical methods:         technological, and physics. In the meantime, the literature in
quantitative analysis and modularity optimization. The                Natural Language Processing [10][11] and Probabilistic Neural
computational turn is part of the ongoing process to conceptualize    Networks [12] have shown us the possibilities in document
the "perspectival form", as the other would be the semantic           modeling, text classification, and collaborative filtering for large
analysis of the qualitative data. This technique uses a python        corpora.
script to extract the co-occurrence hashtags network from a
Twitter dataset in order to apply in the context of the open-source   In this paper, we describe a certain method developed by
software Gephi. Our experiments successfully exhibit how social       researchers at Laboratory of Studies in Images and Cyberculture
networks can be unfolded when submitting a sample dataset of          (LABIC)1, located at Federal University of Espirito Santo
hashtags to the procedure found in the critical dimension of          (UFES), Brazil. It consists in being a simple, but efficient and
computational models. Therefore, it discovers the flow of             peculiar method developed to support studies in social sciences
perspectives when the strategy is follow in new workspaces,           and humanities. Our novel perspectival framework uses a Twitter
creating then categories that reveals points of view underneath the   dataset publicly available online, thus, a variety of 500k+ tweet
controversy. Concluding, this study presents a theoretical and        twitter feeds are draw on for examples. Such method uses Gephi
methodological framework based in the post-structuralists, a          [13] and its algorithms, resulting in visualizations and statistics.
composition that aims to support studies in different fields of       The method aims to find communities on a network formed by co-
social sciences and humanities.                                       occurrence of hashtags in a tweet, in other words, we set a
                                                                      network of hashtags in order to compose a multiplicity.
Categories and Subject Descriptors                                    The relevance in the contemporary context of online network sites
D.3.2 [Programming Languages]: Language Constructs and                serves as the means to interpret the political and collective
Features – abstract data types, polymorphism, control structures.     actions, that is why Twitter is our "field" of work. We consider the
I.5.3 [Pattern Recognition]: Clustering - algorithms, similarity      social network a rich terrain of dispute, noticing the many
measures.                                                             uprisings around the world: #OccupyWallStreet, #15M,
J.4 [Computer Applications] Social and Behavioral Sciences –          #OccupyGezy, #VemPraRua, and #NãoVaiTerCopa. Other social
Sociology                                                             phenomena can be considered a perspective in progress, like
                                                                      #ClimateChange. While recently proposed methods practice
General Terms                                                         detecting topics in historical and literature corpus by using
Documentation, Human Factors, Theory, Algorithms and Design.          probabilistic topic modeling [14], we aimed to present a new
Keywords                                                              methodology to underline not just a topic model procedure for
                                                                      digital data, but to reveal the points of view in constant flow, in
Post-Social Anthropology, Network Science, Amerindian                 fact, profiles in a battlefield.
Perspective, Modularity Algorithms, Complex Networks.
                                                                      In order to comprehend the layers of texts in the digital traces left
                                                                      by humans, we rely in the actor-network-theory [15]. The main
1. INTRODUCTION                                                       idea is to work in the same level of both, the actors and its
This paper understands that social networking is an
anthropological phenomenon. A graph of social networking is a
                                                                      1 http://www.labic.net
material representation of human relationships. Therefore, both
attributes. “A network is fully defined by its actors." [16] ANT         hashtag, based in our tests, prove to be the better solution for
and network analysis provide the argument to study digital data          social scientists working with data science. When using the
without worrying about the standpoint of the individual or               hashtag sign, the user is segmenting a topic of interest, more than
collective. It is possible to negotiate to one level to another, from    that: he allies itself to a point of view on a subject. It is simple to
the parts to its whole, only by continuously rearranging the actors,     analyze that once someone have generated a tweet and already
or the nodes. There is no overlapping, it is matter of reorganizing      used a hashtag, it is as if the user is already categorizing the text
ones positioning. The cartography of controversies [17] is the           for the researcher. In addition, the hashtag represents the existence
didactical application of the ANT, it serves as a range of               of a debate that matter or even just some cause that people aimed
techniques to explore public debates. Observation and description        to call attention for it. Either way, the many ways that people give
is essential to the scholarly work done in this paper. In this           meaning to points of view by indexing value to a specific word
meeting between computing methods and the post-social                    will qualified a perspective in the public debate.
anthropology [3], the Lautorian socio-technical networks
approach will support the process of revealing points of view in
Our methodological framework poaches the Amerindian
Perspectivism [18] to find the foundation for our ongoing
experiments to compose a "perspectival form" in large networks.
Again, they are called large networks because they are made of
thousands or even millions of nodes and edges. Most importantly,
comprehending the node as a social profile in the network, thus,
the edges, as the link between One and the Others. Then, a
network is only constituted by the existence of the other. Eduardo
Viveiros de Castro subverts the idea we have of cannibalism,
which is an idea that guided in the conception of "to cannibalize"
the other is to eat the other. He inverts the enunciation, saying that
cannibalism is a way out of self to go into the other, for each
other. The node as a profile on the social network it increasingly
comes out of the self to "retweet" what is better or worse from
another, therefore, assuming the point of view of that other (and
they are of many types). Nowadays, the other is the element that
captures us. It is an anthropological turn, which we live in.
In fact, this is our inspiration to reconceive a qualitative-
quantitative method of analyses throughout machine steps, which          Figure 1: The figure shows the center of the network
we know in computing as the algorithm. When applying these               #VemPraRua, consisting of 125 000 Retweets. Only when
procedures to comprehend collective phenomena, it produces new           analyzing the perspectives (networks around the center) it is
perspectives and methods. The computer requires the cascade of           possible to understand the different perspectives on the
texts and hashtags we collected in our dataset to metamorphose           network.
into the grid of numbers. [19] The framework we have been
testing is based in the Louvain algorithm [20], in which we              2. THE ANTHROPOLOGICAL THOUGHT
compute to maximize the network modularity.
                                                                         AND NETSCIENCE
The use of Twitter, in particular, has led us to a couple of
challenges in text clusterization process. As the qualitative            The substance of our framework is in how we interpret modules
research process evolve and the number of tweets increases to            without changing the levels or scale of plan. In online social
millions, categorization and the topology of the network became a        networks, we argue the existence of movements and circulation in
problem. “The whole is always smaller than its parts”.[16] A large       a flat surface with no consideration to hierarchy. The node is
network features an illusory representation. It overlaps itself in       situated in the terrain of dispute, one that is only defined by its
distinct layers, social groups and thoughts, as if was part of a         network.[16] In this case, when exploring the dots in the graph,
single network topology. In theory, the social is crossed by a           which in our dataset are the hashtags, the actor moves to the
multiplicity of natures, perspectives, worldviews, produced by           network, interacting with others in the same level. This is where
different human groups. And here is our hypothesis: thereby,             we stand with Latour, in a flat ontology.
every network is, rather, a network of perspectives, which are
usually in dispute.                                                      The approach we reclaim to study online networks is the one
                                                                         inherit from Pierre Clastres.[3] In any case, we propose a
The methodology that first was based in data mining and                  descriptive study of a terrain which we understand to be in
clustering thousands of words needed a new framework. Given              constant dispute. This allows us to rely once again in the
this problem, we created the hashtag network script. After the           indigenous world, which there is a surviving violence itself, a
consultation of literature available [21] new possibilities have         reference to problematize the thesis of repulsion and attraction of
rise, from the initial goal to find a method to fastening                the algorithm of modularity. In short, we make use of the concept
clusterization of words and categories to the use of hashtags to         of cannibalism, which derives from the complex notion of
find perspectival forms. Nowadays, the discussions indexed to a          cannibalism. Applied in the field of hashtags as views, this very
hashtag often become themes of conversations between halls. The          cannibalism lives of the perspectival forms within the network
revealing then a mode of operationalization. This is a process of      studied in information networks, a political aspect that we find in
maximal reduction of one single node and another, almost like a        the modes of existence peculiar to the indigenous society, a way
microscopic work to see the minor points of view. "Exchange, or,       of existence, i.e., a substantially minor of existence, in a minority
the circulation of perspectives: exchange of exchange, that is,        character. Therefore, we are concern with the mechanisms that
change.” [22]                                                          inhibit or block the emergence of a totalizing discourse.
                                                                       Therefore, "perspectivism does not state the existence of a
In data science, complex networks [23] are identified as very large
                                                                       multiplicity of points of view, but the existence of the point of
networks, millions or billions of nodes and edges. This sort of
                                                                       view as a multiplicity." [27]
networks occur in different contexts, it is possible to recognize in
nature, society, technology, economics, etc. One of its                Modularity is one of the possible measure for detecting
fundamental characteristics is the temporal evolution aspect.          communities in complex networks. A set of nodes categorize itself
Complex systems constitute themselves of many non-identical            as community by its modularity if the fraction of links between
elements connected by a diversity of interactions. Several             them is higher that expected ia network called “null model”,
networks in nature, ecology, economics, human relationships in         which is used as a reference. [28]. A complex network with a high
social networks and the web has the same topological structure.        modularity indicates strong community structure, in other words,
They are known scale-free networks [24]. We will associate this        the nodes inside the same community has a dense connectedness
computational concept with the understanding of networks from          and has a sparse connexion between other communities.
Bruno Latour.
                                                                       The algorithm applied in this paper to find communities, since we
In this sense, the actor-network theory (ANT) comes in hand with       use Gephi [13], is the Louvain Method. Such method does
the inquiry we propose. The large networks in this empirical study     community detection in weighted graphs and has characteristics
come from the NET, which we purposely stress in the same way           such as greedy heuristic, local optimization of modularity, very
Latour does with ANT. To trace the circulation and interactions of     fast (complexity O(nlog(n), n: number of nodes), non-
points of view and objects, ANT is going to explore the                deterministic, return hierarchical partition. The Louvain Method
constitutive connections between actors (the actants), both            is an “algorithm that finds high modularity partitions of large
animate and inanimate, and the generative potential of those           networks in short time and that unfolds a complete hierarchical
interactions. In his own words, “(…) network does not designate a      community structure for the network, thereby giving access to
thing out there that would have roughly the shape of                   different resolutions of community detection.” [20]. Think of the
interconnected points, much like a telephone, a freeway, or a          network as a perspective. Well then, the nodes that compose such
sewage ‘network’… It qualifies its objectivity, that is, the ability   network will form an alliance, ie, they will form a covenant
of each actor to make other actors engage in unexpected                relationship between viewpoints. The link between two nodes is
relations.”[15] More precisely, we consider social profiles as         exactly the distance between them, and also, the distance between
living things. Often happens that in the information networks, it is   points of view. It turns out, then, that the way which we apply the
not possible to recognize the "form", only the information. By that    algorithm maximizing the modularity, the network is partitioned
we meant the profiles that uses the language like a human              into modules, testing all nodes until no node can belong to
component, but notice, they are only information, or robots to act     another module. It is a dimension of alterity, the same as found in
as man. However, the meaning arises from the disparate actions.        Amerindian perspectivism. "Perspectives encourage you to
[27]                                                                   believe OUT of them." (Roy Wagner)[2] The algorithm repeats
                                                                       this process of exchange and change, successive times for all
We mend our theoretical foundations in the connections we
                                                                       nodes. Autophagy is a survival of hashtags in the network. A
perceive between anthropology and post-structuralism. Which
                                                                       roundup of alliances.
summing up is circumscribed in the post-social-anthropological
net of authors listed here, considering then the deleuzian concept
that comes from the mathematics, where we find the means to            4. METHODOLOGY
comprehend the multiplicity as a point of view. It creates a new
kind of entity, rejecting any generalizations, the one we know as      "The object as such:          why     a   perspective   is   not   a
‘rhizome’. Therefore, a rhizomatic multiplicity does not, in fact,     representation"[31].
behave as one, because it is not possible to do that when it
operates as assemblages of becomings. Here is when Latour meets        The first step of the method is, of course, to have the dataset to be
Deleuze and the notion of actor-network, one which the network         analyzed, the collection of tweets formated in a comma separated
cannot be one thing, yet, again, because anything can be               file (csv). The tool utilized to get these tweets is called
considered a network.[22] And finally, in the next section,            yourTwapperKeeper2. The procedure begins with the choice of a
building up from this interdisciplinary dialogue, we present how       term or hashtag, the tool does the job of archiving the massive
the amerindian perspectivism support our hypothesis in exploring       amounts data. This process provides a historiography of what
the complex world of large networks, finding a perspectival form       have been vocalized related to the research expression. With
within the modularity algorithm.                                       enough data to go through ethnographic rendering, we can go to
                                                                       the “field”, which for us means to explore a database of entities
3. THE PERSPECTIVAL FORM WITHIN                                        and attributes.
THE MODULARITY                                                         The second step is data processing. As we know, hashtags are one
                                                                       of the most commonly used form of categorization and indexation
We were called into the indigenous world to reflect the network        among users in social networks, such as Twitter and Facebook.
studies, mainly due to a natural notion of multiplicity in the
indigenous society.[26] Primarily because we have for long             2 http://www.github.com/540co/yourtwapperkeeper
One can say that the hashtag summarize the content of the tweet,        The dataset consists in 271.013 tweets that were collected
positively or negatively, confirming it or contradicting it. So, this   between february 4th and may 4th, 2014. This image is a view
next step consists in creating a “Hashtag network” from the tweets      between acts in the third step of our method, after the first pass of
previously collected. The Hashtag network is a complex network          the modularity optimization algorithm and rearrangement of the
that links hashtags if there is co-occurrence between them in the       nodes with highest weighted degrees in each perspective. It is an
same tweet and it forms a weighted network, as it can happen            overview of #worldcup’s hashtags network as the main
twice with the same hashtags. The creation of this complex              perspectives are emphasized. As we can see a certain noise or
network is provided by a script programmed in our lab and its           distortion is identified in the network, as in “#cricket”, where the
output is a csv file that will be used in the data mining process.      hashtags mean to mention the cricket world cup, or in
                                                                        #teamfollowback, where users tend to flood their timeline in order
The third step relies on drawing the network and manipulating
                                                                        to get more followers.
with its structure. In order to visualize the network, we import it
to Gephi. For now, the first view of the network is a hairball, a       In this perspective of the network (Figure 2), it is visible the
completely unintelligible graph. This is the time when modularity       english topic being discussed. The different subtopics, evident
comes into the picture. But before that, there’s a very important       among the nodes, make this assumption clear. And so, as seen in
act. We will have to delete the “main node”, in other words, the        the hashtags #epl, #bpl and #premierleague, meaning the
hashtag that links all nodes. Therefore, the next move is to apply      discussion of the English Premier League a.k.a. the english
“Modularity”, set the parameters of your choice and wait until          national championship, and in #nufc and #lfc, meaning
calculation is over. Next step, applying the modularity class           Newcastle, United FC and Liverpool FC, both english teams, and
calculated for each node and thus forming the communities. One          last, but obviously not least, the hashtag #rio, that clearly connects
way to apply it on the network is setting the colours to the nodes,     the main discussion #worldcup, as the English team is going to
thereby emphasizing the communities, in our case, the topics of         train in the Rio De Janeiro city before the cup.
discussion. The next important move is to calculate the “Average
Weighted Degree” which gives the user a way to apply different
sizes to the nodes from their weighted degree, and this was the
next step. The network isn’t longer a hairball and the recognition
of communities is clearer, thus, as for the biggest nodes in each
community, they define the points of view of that community.
Lastly, each community is a network of point of views and they
are distributed through Gephi’s workspaces. Now, we apply the
modularity and calculate the average weighted degree again. The
final touch consists in setting the design of the graph with the
“Circular Layout” option, it is also more visually interesting to
order the nodes based in the modularity class. We advise for
matter of design to find the node with higher degree, in which we
will identify the most prominent point of view of the particular
network. By now, we expect for terms of visualization and
exploration to have a network of hashtags, i.e, the perspectival
form of the network.                                                              Figure 3: #qatar perspectives on #worldcup.
                                                                        After emphasizing the nodes with highest weighted degrees, the
4.1 The case with the #WorldCup                                         human interaction, as research, is truly required to engage the
                                                                        process of perspective perception. The hashtag #ukraine involves
                                                                        the perspective of protests and their recent history with russia, the
                                                                        multiples hashtags are seen in the composition of point of views.
                                                                        We can identified the following words: #crimea, #sanctions,
                                                                        #russiainvadesukraine, and #worldwar3. But also in this
                                                                        perspective, there is fractal element, because we can also foresee
                                                                        the hashtags #wc2018, #2018worldcup, and #worldcup2018,
                                                                        which suggests that people are already expressing concerns on the
                                                                        country that will host the next world cup, in 2018. As for
                                                                        #gymnastics, the perspective lies in the gymnastics world cup that
                                                                        happened in doha in 2014, which can be seen as noise in our main
                                                                        investigation. And in #qatar, where the 2022’s world cup will be
                                                                        hosted, the multiplicity, as point of view, is focusing on several
                                                                        discussions involving #humanrights, #workersrights, #slavery,
                                                                        and such.

                                                                        4.2 The case with the #ClimateChange
    Figure 2: #worldcup’s main perspectives and #england
                 perspectives on #worldcup.                             The dataset on climate change was collected between February,
                                                                        2nd and May, 5th of 2014. In total, we have exactly 1.048.576
million tweets. To analyze the data, we put together a hashtag
network of 21.415 nodes.
The number for the hashtags provides a sample of the "heat" of
the debate online. In the Figure 4, we had only computed the
modularity the first time, the graph display the partition of the
network into modules. The points of view with higher average
weighted degree indicates as results: #carbonbubble, #energy,
#obama, #tcot, #nsa, #gree#, #news, #ows, #truth, #obama,
#bbcnews, #fracking, #travel, #jobs, #earthday, #organic, #climate
and #climate2014. Who is what in this network? Appearances can
be deceptive, although, a few interesting revelations appears
already. For instance, #tcot means Top Conservatives on Twitter,
this network has a longer effect in the network because it has has
an alliance to american Tea Party.

                                                                      Figure 5: The blue network arises as a perspectival form with
                                                                      high modularity.

                                                                      5. CONCLUSION

                                                                      In this paper we have presented theoretical references in Post-
                                                                      Social Anthropology and Complex Networks to support our
                                                                      methodological framework for studies of social information data.
                                                                      Twitter is a rich field of productions, it can create alarming
                                                                      discussions over the necessity to debate the ecological crises, such
                                                                      as the hashtag #climatechange. There is a social memory within
                                                                      the hashtag, that’s why in this research we addressed the
                                                                      exploration of points of view though the hashtags in the network.
                                                                      However, the hashtag is also a fictional character that brings
                                                                      together a collective memory and puts it to act in the public space,
                                                                      influencing the understanding of what we understand to be reality.
                                                                      This is not a simulacro 2.0, it is a practice that activates a mode of
                                                                      human existence, the fictional, to expand our critical capacity.
                                                                      In the case of #climatechange, we confirmed the existence of a
            Figure 4: #climatechange perspectives.                    variety of networks in the large network. Different perspectives
                                                                      that are completely distinguishable. Such as, the distance between
                                                                      #actonclimate and #teaparty.The analysis of the #worldcup
Still, note that we have design the perspectival forms in order to    assemble the perspectival form as a multiplicity. Inviting us to dig
visually demonstrate the capacity of some point of views to           into the point of view, emphasizing that it is not possible to
establish more regimes of alliances. In this orange network of        generalize the network. This procedure, that analyzes the co-
point of views, the high value of internal modularity, clearly        occurrence of hashtags in a dataset of tweets, leaves behind tweets
echoing the american Republican Party tongue. At the same time,       with no hashtags and one hashtag only. This implicates on a
the green network maintain a link to the orange network, the          certain limitation for the method, but also it focuses on its main
multiple points of view embedded in this green network are            goal: to study the connection between the hashtags of a tweet and
#globalwarming and #deniers. No wonder, this perspectival form        perceive the perspectival form originated by its connections on a
preserve this alliance with American conservative party.              complex network.
The blue network proposes a perspectival form of the                  We describe the intercorrelation of algorithms and the humanities,
anthropocene. A hahstag itself, #anthropocene reflects the            together it composing a powerful tool that allows a routine of data
currently reality of concerns brought by the notion of Gaia.          mining, processing, and visualization of social information.
Bringing issues like # energy, # food, # weather, a dimension of      Applying our research methodology has evidenced our hypothesis
the ecological crisis. The reflection of man before the outburst of   since it indicates that there are variety of points of view, so a more
Gaia. In this case, the blue network has links to the different       detailed study of network demands to take into account the
perspectival forms, such as the #cdnpoli, a network of the point of   perspectives of the network. It is also important to note,
views involving the environmental crises in Canada. In there, we      perspectives converge in the same direction, so the groups are
can find the #KXL #KeystoneXL, the hashtags used about the oil        well defined in which side it defends. Our method indicates that
debate.                                                               research involving informational networks, such as studies
                                                                      concerning degree, sentiment, hub and authority, which do not
                                                                      take into account the perspectives in dispute in the networks, will
                                                                      tend always to reach conclusions that privilege the richest nodes
                                                                      with more connections. For future work, we plan to refine our
