<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Co-posting Author Assortativity in Reddit</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Cauteruccio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enrico Corradini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgio Terracina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domenico Ursino</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Virgili</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DEMACS, University of Calabria</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>DII, Polytechnic University of Marche</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the context of social networks, a renowned paper of Newman introduced the notion of \assortativity", also known as \assortative mixing". Strictly akin to the concept of homophily, it shows how much a node tends to associate with other nodes somewhat similar to it. Degree centrality is the most used similarity metrics for evaluating assortativity between nodes, but several more could be dealt with. Assortativity was deeply investigated in many past researches, given di erent social platforms. However, Reddit was not one of the social networks taken into account, even if it is a really popular social medium. In this paper, we want to nd out the possible presence of a form of assortativity in Reddit; in particular, we focus our analysis on co-posters, i.e. authors posting contents on the same subreddit.</p>
      </abstract>
      <kwd-group>
        <kwd>Reddit</kwd>
        <kwd>Co-posters</kwd>
        <kwd>Assortativity</kwd>
        <kwd>Social Network Analysis</kwd>
        <kwd>Degree Centrality</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Assortativity and degree assortativity were introduced in a renowned paper of
Newman [17]. Here, the author de nes a measure of assortativity for networks
showing that real social networks are often assortative, whereas technological and
biological networks tend to be disassortative. He also models an assortative
network and exploits it for analytic and numeric studies. At the end of this analysis,
he nds that assortative networks tend to percolate more easily than
disassortative ones and that they are more robust to node removal. Another important
study concerning social network assortativity was proposed in [18]. In this
paper, the authors con rm the results of [17] and analyze the relation between
clustering and assortativity in communities inside a social network. Recently, a
detailed overview of assortative mixing in complex network was presented in [19].
Here, the authors investigate assortativity, and in particular degree
assortativity, in di erent kinds of complex network. The concept of assortativity in social
networks is a speci c case of homophily. It comes from the famous homophily
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). This volume is published
and copyrighted by its editors. SEBD 2020, June 21-24, 2020, Villasimius, Italy.
principle \similarity breeds connection" [13] that can be applied for network ties
of every type. The result is that people's personal networks are homogeneous
w.r.t. many sociodemographic, behavioral, and intrapersonal characteristics.</p>
      <p>After the famous paper of Newman, a lot of researchers started to investigate
assortativity in social networks. However, in spite of this, there are several
platforms (many of them famous) where assortativity has not been yet investigated.
One of them is Reddit3. This is a heterogeneous crowd-sourced news aggregator
and online social network, originally self-declared as \the front page of Internet".
It was founded in 2005 and, in few years, has become an ecosystem of 430M+
average monthly active users4. In Reddit, users can post their contents as texts,
images or links to external resources. Submitted contents (also simply called
posts) can be read by other users and discussed via comments. Users can
subscribe to multiple subreddits in order to receive the latests content on their front
pages. An important feature of Reddit is voting, which represents the mechanism
a ecting the visibility and the ranking of both posts and comments.</p>
      <p>This paper aims at ful lling the gap mentioned above and presents some
analyses we performed in order to evaluate assortativity in Reddit. For this
purpose, we rst built a dataset with all the posts published in Reddit from
January 1st, 2019 to September 1st, 2019. Then, we performed several analyses
on it. Starting from this dataset, we built a suitable social network representing
co-posting activities in Reddit. Then, we carried out several investigations on
this network and we compared the results obtained from them with the ones
returned by operating on a corresponding null model. At the end of this task,
we found that Reddit is assortative with respect to degree centrality, as far as the
co-posting relationship is concerned and we de ned a hypothesis that explains
this result.</p>
      <p>The outline of this paper is as follows: In Section 2, we describe related
literature. In Section 3, we illustrate the dataset we used for our analyses. In
Section 4, we perform our investigation on assortativity in Reddit. Finally, in
Section 5, we draw our conclusions and have a look at future developments of
our research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>As previously pointed out, [17] and [18] can be considered as the founding
fathers of the notion of assortativity. After this papers, in [7], the authors modeled
biological, technological and online social networks starting from microscopical
mechanisms of growth. Exploiting this model to perform statistical evaluations,
they found that the statistical properties of biological, technological and
online social networks are in good agreement with those of the real-world social
networks of scientists co-authoring papers in condensed matter physics. Here,
assortativity plays a key role. Indeed, the authors show that online social
networks are generally assortative, whereas the majority of technological and
bi3 https://www.reddit.com
4 https://www.redditinc.com/
ological networks appear to be disassortative with respect to degree centrality.
The investigation of [7] was expanded in [10]. Here, the authors proposed an
analysis on assortativity/disassortativity for di erent kinds of network. Speci
cally, they considered the same network categories highlighted in [17, 18, 7]. They
both con rmed the disassortativity of biological and technological networks and
the assortativity of real social networks, analogously to what was shown in [7].
Di erently from the wide-spread belief and the results of [7], they found that
not all of online social networks are assortative. Almost all the results of [10]
were con rmed in [9].</p>
      <p>Online social networks simulating real life activities show an opposite
behavior. The authors of [23] analyzed the assortativity and other network parameter
on both standard social graphs and interaction graphs. They showed that the
latter present a higher assortativity than the former. The authors of [6] present
a study on degree assortativity for co-author networks. In [8], an interesting
investigation on the relationship between assortativity and centrality is presented.
Here, the authors study the relation between the degree-degree correlation coe
cient and the BC-BC (i.e., Betweenness Centrality-Betweenness Centrality) one.
In [11], a detailed study on the relation between Shannon entropy and degree
assortativity was presented. Here, the authors de ned a general class of
degreedegree correlated networks and obtained the corresponding Shannon entropy
starting from some suitable parameters. They found that the maximum entropy
does not typically correspond to neutral networks but to either assortative or
disassortative ones.</p>
      <p>In [4], the authors investigated the assortativity of psychological states in
real world social networks and online social networks. Speci cally, they wanted
to check the tendency of online social networks to be assortative, as it happens
for real world social networks. The authors of [12] further analyzed assortativity
on Twitter. They crawled this network and obtained 41.7 million user pro les,
1.47 billion social relations, 4,262 trending topics, and 106 million tweets. They
took into consideration several network parameters, like degree distribution,
diameter, reciprocity of user friendship declaration, homophily and assortativity.
The authors of [3] proposed an interesting application of degree assortativity.
They exploited this measure, along with several other ones, to classify YouTube
users in spammers, promoters, and legitimates.</p>
      <p>The concept of assortativity was also expanded along several directions. For
instance, the authors of [22] extended and evaluated this concept on a weighted
social network representing research collaborations. Another interesting
extension is the concept of type assortativity that de nes a way to measure if and
how a social graph belonging to a single type exhibits homophily. In [2], the
authors used paths, walks and random walks to de ne the concept of high order
assortativity and showed that classical assortativity can be considered as a
particular case of the new proposed notion. They also presented several examples
and applications to airline networks and Enron e-mail networks.</p>
      <p>Assortativity was also considered in several other analyses, such as node
classi cation and network robustness measurements. The authors of [16] used
assortativity to improve the prediction of node attributes, based on the fact that
this measure provides information about each node, given its neighbors. This
approach is particularly useful in those situations where data is inaccurate or
missing. In [20], the authors measured the robustness of network community by
means of a metric called \community assortativity", based on the classical notion
of assortativity. Finally, in [5], the authors expanded the concept of assortativity
from social networks to social internetworking systems, i.e. systems where two
or more social networks interact with each other through common users called
bridges.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Dataset description</title>
      <p>Data used in our investigation activity was downloaded from pushshift.io,
which is a website well-known as a Reddit data source. We obtained all the posts
found on Reddit from January 1st, 2019 to September 1st, 2019. All contents
posted in a month were added to the dataset at the end of the next month. We
had a total of 150,795,895 posts available for our analyses. Each post had the
following set of attributes provided by pushshift.io: id, subreddit, title,
author, created utc, score, num comments and over 18.</p>
      <p>The server used for our experiment was equipped with 16 Intel Xeon E5520
CPUs and 96 GB of RAM with the Ubuntu 18.04.3 operating system. Python
3.6 was the programming language used for the analyses, along with its library
Pandas, for ETL operations on data, and its library NetworkX, for operations
on networks. Performing ETL operations, we found that some authors who left
Reddit wrote posts being in our dataset. So, we decided to delete them. After
this activity, we had 122,568,630 posts in total. Beginning from this cleaned data,
we gured out that the number of authors who wrote these posts was equal to
12,464,188. The number of subreddits they posted was 1,356,069.</p>
    </sec>
    <sec id="sec-4">
      <title>Analyzing author assortativity</title>
      <p>This section represents the core of our paper. In fact, it aims at verifying if a
form of assortativity exists in Reddit. To do so, we focused on co-posters, i.e.
authors who post on the same subreddit.</p>
      <p>Co-posting network P is the support network we de ned to perform our
analyses. Formally speaking, P = hN; Ei.</p>
      <p>Here, N is the set of the nodes of P; there is a node ni 2 N for each author
ai who posted at least once. There is an edge (ni; nj ; wij ) 2 E if the authors ai
and aj (associated with the nodes ni and nj , respectively) posted at least once
in the same subreddit. wij indicates the number of subreddits having at least
one post of ai and, simultaneously, at least one post of aj .</p>
      <p>We have that the number of nodes of P, which is 12,464,188, is exactly the
same as the number of authors of our testbed. On the other hand, the arcs of P
are about 925 billions. We computed that the density of the network is 0.00596,
while the average clustering coe cient is 0.43753.</p>
      <p>The rst task done was evaluating the degree centrality of the nodes of P.
In Figure 2, we show the corresponding distribution.</p>
      <p>As we can see from this gure, degree centrality follows a power law; this
result is aligned with the theory underlying this form of centrality [21]. The
maximum value of degree centrality is 1,820,412, while the minimum one is 0.</p>
      <p>In order to check a possible existence of assortativity in Reddit, we sorted
the authors according to their degree centrality, in a descending order. We then
partitioned the resulting list into intervals. Speci cally, we took intervals with
equal width5 fI1; I2; ; I40g, each made up of 312,500 authors. As a
consequence, Ik, 1 k 39, contained all the authors comprised in the
interval (312; 500 (k 1); 312; 500 k], open at left and closed at right of the
5 Eventually, the last interval had a width a bit lower than the other ones.
sorted list. The interval I40 contained all the authors comprised in the interval
(12; 187; 500 ; 12; 464; 188].</p>
      <p>First of all, we considered the rst interval, i.e. I1. For each interval Ik,
1 k 40, we determined how many authors of I1 are connected through an
arc to at least one author of Ik. The results obtained are reported in Figure 3.
Then, we determined the percentage of the authors of Ik connected with at least
one author of I1. The results obtained are reported in Figure 4.
The analysis of Figures 3 and 4 clearly shows a strict correlation, i.e. a sort
of backbone, between the authors with the highest degree centrality.</p>
      <p>We compared our ndings with the ones obtained through a null model, in
order to verify the statistical signi cance of our results in an unbiasedly random
scenario. In particular, we shu ed all the arcs between the nodes of P (that, in
our case, represent co-postings), in order to build the null model. In this way,
we left unchanged all the features of P, excluding the distribution of co-posting
relationships, which was unbiasedly random in the null model. Next, we repeated
all the previous analyses on the null model. Figures 5 and 6 show the obtained
results. The comparison between these two last gures and Figures 3 and 4
highlights the similarity of the distributions represented therein. Many of the
intervals that obtained the highest values in Figures 3 and 4 continue to reach
the highest values in Figures 5 and 6. However, in the null model, the values are
much smaller. So, we can conclude that the behaviors observed are not random,
but intrinsic to Reddit.</p>
      <p>However, this is not enough to prove the existence of a degree assortativity
for co-posters in Reddit. Indeed, we must check if this trend is also veri ed for
authors with an intermediate degree centrality and for ones with a low degree
centrality.</p>
      <p>For this reason, we have to redo the previous tasks done for I1 for all intervals.
Due to space constraints, we consider only the intervals I20, as the representative
of the intermediate degree centrality author intervals, and the interval I39, as
the representative of the low degree centrality author intervals6.</p>
      <p>Figure 7 shows the number of authors of I20 connected to at least one author
of Ik. Figure 8 shows the percentage of the authors of Ik connected with at least
one author of I20. These gures clearly highlight the existence of a correlation
between the authors with an intermediate degree centrality.
Also here, we compared the results with the null model. Figures 9 and 10
present the results obtained. Comparing them with Figures 7 and 8, we note
that, again, the behaviors observed are not random, but they are a feature of
Reddit.</p>
      <p>Finally, Figure 11 reports the number of the authors of I39 connected to at
least one author of Ik, while Figure 12 shows the percentage of the authors of
Ik connected with at least one author of I39. Here too, a strict correlation exists
6 We did not choose I40 because the number of its authors is less than the ones of the
other intervals.</p>
      <p>Fig. 9. Number of authors of I20 connected to at least one author of Ik in the null
model
between the authors with a low degree centrality. We compared these results with
the ones obtained through the null model reported in Figures 13 and 14. Again,
this comparison con rms that the behaviors observed is a property intrinsic
to Reddit. The existence of a backbone among the authors with a high (resp.,
intermediate, low) degree centrality helps us to conclude that, actually, Reddit is
assortative with respect to degree centrality, as far as the co-posting relationship
is concerned.</p>
      <p>This important nding can be explained through the concept of karma and
the posting rules existing in Reddit. Indeed, each user has associated a karma,
i.e. a score taking her past \reputation" into account. Users with high karma are
generally very active and often submit high quality contents, appreciated by
others. So, they likely have a high degree centrality. In other words, we can recognize
a direct correlation between karma and degree centrality for authors. Reddit's
posting rules state that each subreddit has associated a minimum threshold of
karma that authors must have to post on it [14, 15, 1]. This threshold is dynamic
and changes over time. When it is low, all users can post on that subreddit.
When it becomes moderate, users with low karma (and maybe low degree
centrality) cannot post on it. When it becomes high, only users with high karma
(and maybe high degree centrality) can post on it. In this way we can segment
users into groups with homogeneous degree centrality.
In this paper, we have presented several investigations that we performed to
evaluate assortativity in Reddit. First, we have built a dataset comprising all
posts found in Reddit from January 1st, 2019 to September 1st, 2019. Then,
we have constructed a co-posting network that represented the reference
structure on which performing our analyses. Afterwards, we have carried out several
investigations on both the co-posting network and a corresponding null model.
Finally, we have compared the results obtained and we have found that Reddit is
assortative with respect to degree centrality, as far as the co-posting relationship
is concerned.</p>
      <p>
        In the future, we plan to extend this work in several directions. For example,
we plan to evaluate the possible existence of other forms of assortativity or
Fig. 13. Number of authors of I39 connected to at least one author of Ik in the null
model
disassortativity in Reddit. They could involve, for instance, centrality measures,
other than degree centrality, or user activities, other than posting. In addition,
we plan to investigate other issues analyzed in other social platforms and not
yet investigated in Reddit.
its Applications (SCA 2013), pages 335{341, Karlsruhe, Germany, 2013. IEEE
Computer Society.
6. M. Catanzaro, G. Caldarelli, and L. Pietronero. Assortative model for social
networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics,
70(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ):037101{037104, 2004. The American Physical Society.
7. M. Catanzaro, G. Caldarelli, and L. Pietronero. Social network growth with
assortative mixing. Physica A: Statistical Mechanics and its Applications, 338(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ):119{
124, 2004. Elsevier.
8. K.I. Goh, E. Oh, B. Kahng, and D. Kim. Betweenness centrality correlation in
social networks. Physical Review E, 67(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ):017101, 2003. APS.
9. H. B. Hu and X. F. Wang. Evolution of a large online social network. Physics
      </p>
      <p>
        Letters A, 373(12):1105{1110, 2009. Elsevier.
10. H.B. Hu and X.F. Wang. Disassortative mixing in online social networks. EPL
(Europhysics Letters), 86(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ):18003, 2009. IOP Publishing.
11. S. Johnson, J.J. Torres, J. Marro, and M.A. Munoz. Entropic origin of
disassortativity in complex networks. Physical Review Letters, 104(10):108702, 2010.
      </p>
      <p>
        APS.
12. H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or
a news media? In Proc. of the International Conference on World Wide Web
(WWW'10), pages 591{600, Raleigh, NC, USA, 2010. ACM.
13. M. McPherson, L. Smith-Lovin, and J.M. Cook. Birds of a feather: Homophily in
social networks. Annual Review of Sociology, 27:415{444, 2001. JSTOR.
14. J. Meese. It belongs to the Internet: Animal images, attribution norms and the
politics of amateur media production. M/C Journal, 17(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ):1{3, 2014. M/C.
15. D. Morrison and C. Hayes. Here, have an upvote: Communication behaviour and
karma on Reddit. Informatik, pages 2258{2268, 2013. Gesellschaft fur Informatik
eV.
16. D. Mulders, C. de Bodt, J. Bjelland, A. Pentland, M. Verleysen, and Y.-A. de
Montjoye. Inference of node attributes from social network assortativity. Neural
Computing and Applications, pages 1{21, 2019. Springer Nature Switzerland AG.
17. M.E.J. Newman. Assortative mixing in networks. Physical Review Letters,
89(20):208701, 2002. APS.
18. M.E.J. Newman and J. Park. Why social networks are di erent from other types
of networks. Physical Review E, 68(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ):036122, 2003. APS.
19. R. Noldus and P. Van Mieghem. Assortativity in complex networks. Journal of
      </p>
      <p>
        Complex Networks, 3(
        <xref ref-type="bibr" rid="ref4">4</xref>
        ):507{542, 2015. Oxford University Press.
20. D. Shizuka and D.R. Farine. Measuring the robustness of network community
structure using assortativity. Animal Behaviour, 112:237{246, 2016. Elsevier.
21. M. Tsvetovat and A. Kouznetsov. Social Network Analysis for Startups: Finding
connections on the social web. 2011. O'Reilly Media, Inc.
22. M. Vaanunu and C. Avin. Homophily and nationality assortativity among the
most cited researchers' social network. In Proc. of 2018 IEEE/ACM International
Conference on Advances in Social Networks Analysis and Mining (ASONAM),
pages 584{586, Barcelona, Spain, 2018. IEEE Computer Society.
23. C. Wilson, B. Boe, A. Sala, K.P.N Puttaswamy, and B.Y. Zhao. User interactions in
social networks and their implications. In Proc. of the ACM European Conference
on Computer systems (EuroSys'09), pages 205{218, Nuremberg, Germany, 2009.
ACM.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>K.E. Anderson</surname>
          </string-name>
          .
          <article-title>Ask me anything: what is Reddit? 2015</article-title>
          . Emerald.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Arcagni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Grassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stefani</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Torriero</surname>
          </string-name>
          .
          <article-title>Higher order assortativity in complex networks</article-title>
          .
          <source>European Journal of Operational Research</source>
          ,
          <volume>262</volume>
          (
          <issue>2</issue>
          ):
          <volume>708</volume>
          {
          <fpage>719</fpage>
          ,
          <year>2017</year>
          . Elsevier.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>F.</given-names>
            <surname>Benevenuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Almeida</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          .
          <article-title>Detecting spammers and content promoters in online video social networks</article-title>
          .
          <source>In Proc. of the International Conference on Research and Development in Information Retrieval (SIGIR '09)</source>
          , pages
          <fpage>620</fpage>
          {
          <fpage>627</fpage>
          , Boston, MA, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Bollen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Goncalves</surname>
          </string-name>
          , G. Ruan, and
          <string-name>
            <given-names>H.</given-names>
            <surname>Mao</surname>
          </string-name>
          .
          <article-title>Happiness is assortative in online social networks</article-title>
          .
          <source>Arti cial life</source>
          ,
          <volume>17</volume>
          (
          <issue>3</issue>
          ):
          <volume>237</volume>
          {
          <fpage>251</fpage>
          ,
          <year>2011</year>
          . MIT Press.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>F.</given-names>
            <surname>Buccafurri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lax</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nocera</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Ursino</surname>
          </string-name>
          .
          <article-title>Internetworking assortativity in Facebook</article-title>
          .
          <source>In Proc. of the International Conference on Social Computing and</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>