=Paper=
{{Paper
|id=Vol-2667/paper22
|storemode=property
|title=Analysis of open data of a social network in order to identify deviant communities
|pdfUrl=https://ceur-ws.org/Vol-2667/paper22.pdf
|volume=Vol-2667
|authors=Rostislav Mikherskii,Dmitry Kuznetsov
}}
==Analysis of open data of a social network in order to identify deviant communities ==
Analysis of open data of a social network in order to identify deviant communities Rostislav Mikherskii Dmitry Kuznetsov Physico-technical Institute Physico-technical Institute V.I. Vernadsky Crimean Federal University V.I. Vernadsky Crimean Federal University Simferopol, Russia Simferopol, Russia mrm03@mail.ru dimabrayankuznetsov@mail.ru Abstract—The system of analysis of open data of the social emergence of group norms, and the use of linguistic markers. network Vkontakte is developed and programmatically Similar studies have been conducted for groups promoting implemented. Two ways of identification of deviant suicidal behavior [15]. communities are proposed. The first way is by the number of In [16-20], the categorization of pornographic content community subscribers blocked by the social network for violating the rules. The second way, by the presence of and the frequency of its use were studied. common subscribers between the studied community, and the In [21], the authors also focused on consumption community about which it is precisely known that it is deviant. networks for adult content, which is present in many online It is experimentally established that the second method of social networks and on the Internet as a whole. The authors identification of deviant communities gives the best result. of this work investigated how such communities interact with the entire social network. They found that few small and Keywords—big data, open data, social network closely related communities are responsible for much of the I. INTRODUCTION production of content. Produced content is distributed through the rest of the network mainly directly or through Analysis of open data from social networks is a bridge communities, reaching at least 450 times more users. significant area in the field of big data processing. In In this work, a demographic analysis of the networks of particular, an important task for both law enforcement producers and consumers of adult content was also carried agencies and social network administrators is to identify out. It has been shown that it is possible to easily identify communities of these networks that disseminate socially several key users in order to radically eradicate the process dangerous content.Many works that were written recently of distribution of pornographic content. have been devoted to discussion of this problem.The work The issue of community polarization in social networks [1] is devoted to the development of a method for assessing was studied in detail in [22]. It proposed a new polarization the degree of connectedness of user profiles of social metric based on the analysis of the boundary of a pair of networks based on open data. The degree of connectedness (potentially polarized) communities, which better reflects the of user profiles is understood as the probability of meeting concepts of antagonism and polarization. profile owners in real life.In [2], a review of methods that Cyber aggression, as a form of deviant behavior in the detect the demographic attributes of a user from their profile Internet environment, was studied in detail in [23-28]. This and messages is made. In [3,4], forms of deviant behavior of socio-psychological phenomenon has many forms, the main users of the Russian-language segment of the Internet are of which are trolling, cybermobbing and astroturfing. examined in detail.In particular, in [4] it was shown that the As can be seen from the above review of published main reason for deviant behavior in social networks is scientific papers, the search for deviant communities is an virtuality and anonymity. In [5], according to foreign important task both for scientists involved in researching sources, a review of the main methods of analysis of social such communities and for law enforcement agencies. networks in relation to the task of identifying suspicious and Unfortunately, most often, the identification of deviant criminal communities is carried out. communities is carried out manually, often only by user To study social networks in terms of social relationships, complaints. the Social Network Analysis (SNA) method is often used. The aim of this work was to develop a methodology for The SNA method is described in detail in [6–8]. In this identifying deviant communities in the social network method, the objects of research are the nodes, and the Vkontakte in automatic mode. To achieve this goal, two relationships characterizing the relationship between them. options have been proposed to search for such communities. Nodes can be communities, users of social networks, etc. The connections between these nodes can be money II. RESULTS transfers, communication, friendship, etc. This method has In the first version, the following algorithm for searching been successfully used to study the organization structure of for such communities is proposed and programmatically the Al-Qaida terrorist network [9], to study the network of implemented. For the studied community, the number l of terrorist organizations operating in India [10], to analyze the subscribers blocked by the social network for breaking the topological structure of criminal networks, in particular the rules, as well as the total number L of subscribers of this network of methamphetamine traffic [11]. These research studies are mainly motivated by the need to find effective community, is determined. The coefficient k l is found. It L methods to undermine criminal or terrorist organizations. Anorexia-oriented online communities have been studied is assumed that if the coefficient k is greater than some in [12–14]. A wide range of issues was studied in these critical value kd, then the community under study is deviant. works, including the construction and management of The software implementation of the above algorithm was member identities, the processes of social recognition, the implemented in the Python programming language. During Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Data Science the implementation of this program, 50,704 communities of deviant community is determined. It is assumed that a the Vkontakte social network were randomly selected. In sufficiently large number of communities from this list will order to shorten an influence of statistical error, only also be deviant. This algorithm was programmatically communities with 100 or more total subscribers were implemented using the Python programming language. selected from the general list. Due to system and API To test the performance of this program, the deviant limitation, communities with a few members were community “Mom Anarchy” was chosen with an considered. A coefficient was calculated for each of these identification number of 177615404. This community is communities. Further, all communities were sorted in engaged in popularizing the ideas of anarchism and has descending order of magnitude of this coefficient. Table 1 32097 subscribers. The data processing time was 18 hours. presents the first 20 communities from the list. Followers of this community are also subscribed to 940512 other communities. All of them were sorted in descending TABLE I. COMMUNITIES WITH A HIGH PERCENTAGE OF BLOCKED order by the number of users who are also subscribed to the SUBSCRIBERS Mama Anarchy community. Table 2 presents the first 20 № Community Number of Number of Percentage of communities from this list. identification subscribers blocked blocked number in the subscribers subscribers TABLE II. COMMUNITIES WHOSE SUBSCRIBERS ARE ALSO community from the total SUBSCRIBERS OF THE MOM ANARCHY COMMUNITY number of community № Community Number of The number subscribers, k• identification number subscribers in the of 100% community subscribers 1 172017411 104 101 97.1154 who are also 2 171896750 122 114 93.4426 subscribers 3 41398959 107 98 91.5888 of the Mama 4 125043269 1017 904 88.8889 Anarchy 5 19613748 960 852 88.75 community 6 176328754 226 193 85.3982 1 *** 5539982 15035 7 148023353 495 419 84.6465 2 91050183 9356399 12924 8 188941498 530 438 82.6415 3 *** 707327 12712 9 23811356 1116 921 82.5269 4 159146575 1162785 12521 10 164252296 152 123 80.9211 5 *** 563784 11987 11 150230769 198 157 79.2929 6 *** 4403183 11644 12 130381011 200 157 78.5 7 *** 2768306 11317 13 154988787 410 317 77.3171 8 *** 2508543 11246 14 155397881 847 654 77.2137 9 57846937 11275065 11224* 15 149830913 107 81 75.7009 10 *** 2684988 11154 16 170030633 577 428 74.1768 11 *** 2586853 11145 17 *** 174 129 74.1379 12 150550417 937052 10916 18 164288533 153 113 73.8562 13 149094324 2076903 10832* 19 143657800 424 312 73.5849 14 30316056 1809325 10451 20 157513161 420 309 73.5714 15 66678575 4976245 10299 16 12353330 3555825 10167 17 154168174 1264550 10145 In order to prevent propaganda of deviant communities, 18 173556111 641480 10005 in this table and further in table 2, the identification number 19 *** 3802683 9576 of all such communities is replaced by the symbols “***”. 20 133180305 3116645 9540 As can be seen from this list, there is only one deviant community in it (community under No. 17). This community As can be seen from this table, out of 20 communities of was classified as deviant due to the presence of pornographic the presented list, 9 are deviant. The main reasons that these material in it. communities are attributed to deviants are: propaganda of Thus, the hypothesis that the percentage of blocked users violence, criticism of the existing constitutional system, and in deviant communities is greater than in non-deviant the use of profanity. This results show us that algorithm communities has not been experimentally confirmed. results should not be considered as final predictions but as an Furthermore, it’s clear that some communities are abandoned assumption. Still results must be managed by special person and they can contain a lot of banned users because of lack of to make a conclusion about community content. The main moderation and new subscribers. Other communities can be aim of this algorithm for now is to narrow the search for related to advertising or temporary events. But they are still deviant communities. not deviant despite the fact that social network Vkontakte has Furthermore, this algorithm allows to consider special rules that restrict the creation of such communities as communities with bigger amount of subscribers. However, it communities with an inappropriate content. should be mentioned that API has a strong impact on The second option for searching for deviant communities algorithms productivity. Therefore such systems have a is based on the following algorithm: One community is portability limitations. Nevertheless, the core idea of this found for which it is known for certain that it is deviant. For system is to show dependencies between blocked users this community, a list of subscribers is defined. Each of these amount and community content. subscribers defines the communities to which it is The scientific novelty of this work lies in the proposed subscribed. For each of the communities in this list, the algorithm, which helps to identify deviant communities. number of subscribers who are also subscribers of the studied Despite the fact that current algorithm can only help us to VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 99 Data Science make suggestion about community content, there could be [5] M. Basarab, I. Ivanov, A. Kolesnikov and V. Matveev, “Detection of illegal activities in cyberspace based on the analysis of social ways to improve it by using extra algorithms and tools, such networks: algorithms, methods and tools (review),” Cybersecurity as image recognition tools and text analyzer. Therefore, issues, vol. 4, no. 17, pp. 11-19, 2016. holistic recognition system could be developed to make more [6] L.C. Freeman, “The development of social network analysis: A study accurate predictions about deviant communities in social in the sociology of science,” Social Networks, vol. 27, no. 4, pp. 377- networks with open API. 384 , 2005. [7] A. Hopkins, “Graph theory, social networks and counter terrorism,” III. CONCLUSION Univ. of Massachusetts Dartmouth, pp. 22, 2010. [8] L.C. Freeman, “Centrality in social networks conceptual Thus, the second way of identifying deviant communities clarification,” Social networks, vol. 1, no. 3, pp. 215-239, 1979. is much more effective than the first. This technique for [9] V. Krebs, “Mapping networks of terrorist cells,” Connections, vol. 24, identifying deviant communities in automatic mode can be no. 3, pp. 43-52, 2002. applied not only on the social network Vkontakte but also in [10] P. Choudhary and U. Singh, “A survey on social network analysis for other social networks. We also note that the second method counter-terrorism,” Int. Journal of Computer Applications, vol. 112, no. 9, pp. 24-29, 2015. can be applied not only to search for deviant communities, [11] J. Xu, H. Chen, “The topology of dark networks,” Communications of but also when searching for communities related to the the ACM, vol. 51, no. 10, pp. 58-65, 2008. studied community, for example, in marketing research. In [12] J. Gavin, K. Rodham and H. Poyer, “The presentation of “pro- the case of such studies, it is possible to determine the anorexia” in online group interactions,” Qualitative Health Research, interests of community users and, accordingly, build a policy vol. 18, no. 3, pp. 325-333, 2008. to attract new users to this community. [13] J.D.S. Ramos, A.D.F PereiraNeto and M. Bagrichevsky, “Pro- Another possible use of this method is to conduct an anorexia cultural identity: characteristics of a lifestyle in a virtual community,” Interface (Botucatu), vol. 15, no. 37, pp. 447-460, 2011. advertising campaign of a certain community. In this case, as [14] N. Boero and C.J. Pascoe, “Pro-anorexia communities and online the studied community, you can choose the community interaction: Bringing the pro-ana body online,” Body & Society, vol. whose advertising you want to conduct. Define a list of 18, no. 2, pp. 27-57, 2012. communities associated with this community and place [15] S.M. Haas, M.E. Irr, N.A. Jennings and L.M. Wagner, advertising messages in these communities. “Communicating thin: A grounded model of Online Negative Enabling Support Groups in the pro-anorexia movement,” New Media It should also be noted that to search for deviant & Society, vol. 13, no. 1, pp. 40-57, 2010. communities, it may be useful to use machine learning [16] M. Schuhmacher, C. Zirn and J. Volker, “Exploring youporn methods, such as, for example, artificial immune systems categories, tags, and nicknames for pleasant recommendations,” [29-31] or convolutional neural networks [32-41]. However, Workshop on Search and Exploration of X-Rated Information. ACM, even when using machine learning, the method of identifying pp. 27-28, 2013. deviant communities by the presence of common subscribers [17] G. Tyson, Y. Elkhatib, N. Sastry and S. Uhlig, “Are People Really Social in Porn 2.0?” Proceedings of the 9th International AAAI between the studied community, and the community about Conference on Web and Social Media (ICWSM), pp. 436-444, 2015. which it is known for certain that it is deviant will not lose its [18] G.M. Hald and A. Stulhofer, “What types of pornography do people relevance. This is primarily due to the fact that this method use and do they cluster? Assessing types and categories of has a high degree of transparency in interpreting the results pornography consumption in a large-scale online sample,” Journal of obtained, in contrast to machine learning methods, which are Sex Research, pp. 1-11, 2015. often a black box, the results of which are often [19] G.M. Hald, N.N. Malamuth, T. Lange, “Pornography and sexist attitudes among heterosexuals,” Journal of Communication, vol. 63, incomprehensible. no. 4, pp. 638-660, 2013. Thus, in this study, a new method is proposed that allows [20] G.M. Hald, “Gender differences in pornography consumption among you to quickly, cheaply and efficiently search for deviant young heterosexual danish adults,” Archives of sexual behavior, vol. communities. 35, no. 5, pp. 577-585, 2006. [21] M. Coletto, L.M. Aiello, C. Lucchese and F. Silvestri, “On the ACKNOWLEDGMENT Behaviour of Deviant Communities in Online Social Networks,” Proceedings of the 10th International AAAI Conference on Web and In conclusion, we would like to thank Marina Social Media (ICWSM), pp. 72-81, 2016. Vsevolodovna Glumova, Director of the Physico-technical [22] P.H.C. Guerra, Jr.W. Meira, C. Cardie, R. Kleinberg, “A measure of Institute of the V. I. Vernadsky Crimean Federal University, polarization on social media networks based on community and Victor Vasilyevich Milyukov, head of the Department of boundaries,” Proceedings of the 7th International AAAI Conference on Web and Social Media (ICWSM), pp. 215-224, 2013. computer engineering and modeling of the Physico-technical Institute of the V. I. Vernadsky Crimean Federal University, [23] J.S. Chibbaro, “School counselors and the cyberbully: interventions and implications,” Journal of Professional School Counseling, vol. for their assistance in organizing research. 11, no. 1, pp. 65-68, 2007. [24] R. Gable, J. Snakenborg and R. Van Acker, “Cyberbullying: REFERENCES Prevention and Intervention to Protect Our Children and Youth,” [1] V. Kataeva, I. Pantyukhin and I. Yurin, “Methods for assessing the Preventing School Failure, vol. 55, no. 2, pp. 88-95, 2011. degree of connectivity of social network user profiles based on open [25] W. Heirman and M. Walrave, “Cyberbullying: Predicting data,” Open education, vol. 21, no. 6, pp. 14-22, 2017. Victimisation and Perpetration,” Children & Society, vol. 25, pp. 59- [2] A. Gomzin and S. Kuznetsov, “Methods for constructing socio- 72, 2011. demographic profiles of Internet users,” Proceedings of the ISP RAS, [26] J.S. Donath, “Identity and Deception in the Virtual Community,” vol. 27, no. 4, pp. 129-142, 2015. Communities in Cyberspace,” London: Routledge, pp. 26, 1999. [3] A. Baklantseva, “Transformation of social norms and deviations in [27] N.E. Willard, “From Cyberbullying and Cyberthreats: Responding to the Russian-language Internet,” News of universities in the North the Challenge of Online Social Aggression, Threats, and Distress,” Caucasus region. Social sciences, vol. 3, pp. 21-25, 2014. Champaign, IL: Research Press, pp. 303, 2007. [4] D. Cherenkov, “Deviant behavior in social networks: causes, forms, [28] R.A. Vnebrachnykh, “Trolling as a form of social aggression in consequence,” Nauka-Rastudent.Ru, vol. 7, no. 19, pp. 29, 2015. virtual communities,” Bulletin of the Udmurt University. Philosophy. Sociology. Psychology. Pedagogy, vol. 1, pp. 48-51, 2012. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 100 Data Science [29] R. Mikherskii, “Application of an artificial immune system for visual [36] C. Lian, M. Liu, J. Zhang and D. Shen, "Hierarchical Fully pattern recognition,” Computer Optics, vol. 42, no. 1, pp. 113-117, Convolutional Network for Joint Atrophy Localization and 2018. DOI: 10.18287/2412-6179-2018-42-1-113-117. Alzheimer's Disease Diagnosis Using Structural MRI," IEEE [30] G. Luh, “Face recognition based on artificial immune networks and Transactions on Pattern Analysis & Machine Intelligence, vol. 42, no. principal component analysis with single training image per person,” 04, pp. 880-893, 2020. DOI: 10.1109/TPAMI.2019.2895781. Immune Computation, vol. 2, no. 1, pp. 21-34, 2014. [37] A. Bulat and G. Tzimiropoulos, "Hierarchical Binary CNNs for [31] D. Dasgupta, S. Yu and F. Nino, “Recent advances in artificial Landmark Localization with Limited Resources," IEEE Transactions immune systems: Models and applications,” Applied Soft Computing, on Pattern Analysis & Machine Intelligence, vol. 42, no. 02, pp. 343- vol. 11, no. 2, pp. 1574-1587, 2011. DOI: 10.1016/ 356, 2020. DOI: 10.1109/TPAMI.2018.2866051. j.asoc.2010.08.024. [38] V.A. Sindagi and V.M. Patel, “A survey of recent advances in CNN- [32] Y. Li, X. Zhang and D. Chen, “CSRNet: Dilated convolutional neural based single image crowd counting and density estimation,” Pattern networks for understanding the highly congested scenes,” Proc. IEEE Recognit. Lett., vol. 107, pp. 3-16, 2017. Conf. Comput. Vis. Pattern Recognit, pp. 1091-1100, 2018. [39] K. He, G. Gkioxari, P. Dollar and R. Girshick, "Mask R-CNN," IEEE [33] M. Kalayeh and M. Shah, "Training Faster by Separating Modes of Transactions on Pattern Analysis & Machine Intelligence, vol. 42, no. Variation in Batch-Normalized Models," IEEE Transactions on 02, pp. 386-397, 2020. DOI: 10.1109/TPAMI.2018.2844175. Pattern Analysis & Machine Intelligence, vol. 42, no. 6, pp. 1483- [40] S. Lin, R. Ji, C. Chen, D. Tao and J. Luo, "Holistic CNN 1500, 2020. DOI: 10.1109/TPAMI.2019.2895781R. Compression via Low-Rank Decomposition with Knowledge [34] A. Farrugia and C. Guillemot, "Light Field Super-Resolution Using a Transfer," IEEE Transactions on Pattern Analysis & Machine Low-Rank Prior and Deep Convolutional Neural Networks," IEEE Intelligence, vol. 41, no. 12, pp. 2889-2905, 2019. DOI: Transactions on Pattern Analysis & Machine Intelligence, vol. 42, no. 10.1109/TPAMI.2018.2873305. 05, pp.1162-1175, 2020. DOI: 10.1109/TPAMI.2019.2893666. [41] I. Rocco, R. Arandjelovic and J. Sivic, "Convolutional Neural [35] Z. Zhou, J. Shin, L. Zhang, S. Gurudu, M. Gotway and J. Liang, Network Architecture for Geometric Matching," IEEE Transactions “Fine-tuning convolutional neural networks for biomedical image on Pattern Analysis & Machine Intelligence, vol. 41, no. 11, pp. analysis: Actively and incrementally,” Proc. IEEE Conf. Comput. 2553-2567, 2019. DOI: 10.1109/TPAMI.2018.2865351. Vis. Pattern Recognit., pp. 7340-7349, 2017. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 101