Identifying Fake Profile in Online Social Network
Himanshi Gupta Nagariya, Neha Dhanotiya, Shruti Joshi and Sarika Jain

National Institute of Technology, Kurukshetra, India


                   Abstract
                   Online Social Networks involve a huge amount of people from all over the world and it has
                   become a big part of their life. People use social networks to share their feelings, to make
                   new friends, to set up new businesses, to connect with friends and family and what not. The
                   Online Social Networks provides a great advantage to individuals in different ways but it
                   also suffers with some disadvantages. There are many people who use these networks to
                   cause harm to others by making fake accounts on these networks. For detection of such fake
                   and genuine accounts we can use machine learning algorithms. The machine learning
                   algorithms are applied for the prediction and classification of datasets through the different
                   models that are prepared. It sometimes become difficult to differentiate between the results
                   of different models and so we to use a hybrid approach of machine learning algorithm can
                   make this task easy. In our work we compared the 8 different combinations of classification
                   algorithms and calculated their accuracy on the dataset of an Online Social Network. We
                   used the combination of Random Forest, Support Vector Machine, Logistic Regression,
                   KNN, and Decision Trees. After comparing the result of each hybrid approach, we
                   concluded that the best accuracy was obtained by combination of SVM and Logistic
                   Regression and Neural Network. So, we proposed a model for the detection of fake account
                   with the hybrid approach giving the best accuracy among all the combinations.

                   Keywords
                   Online Social Network, Fake Account Detection, Feature Extraction, Spammer.


1. Introduction                                                                                Our work is concerned with the
                                                                                           Classification algorithms that come under
    Machine Learning is a branch of                                                        the      Supervised    Machine      Learning.
artificial intelligence (AI) which is able to                                              Classification is a supervised learning
provide a system the ability to act without                                                approach in which the machine takes the
being programmed explicitly. It is used in                                                 input data learns from that data and then
many        fields    like    Google      cars,                                            further classifies the testing data according
recommendation engines, friend suggestions                                                 to its training data.
in social media networks, shopping apps,                                                      Although        classification    algorithms
cybercrimes etc.                                                                           (Support      Vector      Machine,      Logistic
    Machine Learning has made a                                                            Regression, Decision Tree, Random Forest,
phenomenal change in the way how data                                                      Artificial Neural Network) can be used
was extracted and interpreted by replacing                                                 separately and individually but in our
the old statistical techniques. Classifications                                            system we are developing a hybrid model
of machine learning techniques are:                                                        combining two or three machine learning
Reinforcement,           Supervised         and                                            models has helped in increasing the
Unsupervised Machine Learning.                                                             accuracy of the model and its predicative
                                                                                           power. The fact that which hybrid model
ACI’21: Workshop on Advances in Computational Intelligence                                 will perform better is unknown, but it is
at ISIC 2021, February 25–27, 2021, New Delhi, India
EMAIL: himanshi100497@gmail.com                (H. Nagariya);                              also affected by the dataset provided and
nehadhanotiya1612@gmail.com                 (N. Dhanotiya);                                also the feature selection. The concept to
shrutijoshijma@gmail.com                             (S.Joshi);
jasarika@nitkkr.ac.in                                 (S. Jain)                            develop a hybrid model is in a two- stage
ORCID: 0000-0002-7432-8506                           (S. Jain)                             manner,      first    using     clustering    or
            © 2020 Copyright for this paper by its authors. Use permitted under Creative
            Commons License Attribution 4.0 International (CC BY 4.0).                     classification techniques for pre-processing
              CEUR Workshop Proceedings (CEUR-WS.org)
of data and in second stage the output of the         1.1.    Motivation
first stage to build second stage predictive
classifier. It can be made using different                As the number of people using OSN
algorithms of supervised or unsupervised              increases, so does the fake social media
learning but in our work, we developed the            accounts creation. The main motivational
model using classification algorithms of              factor in identifying those fake accounts is
supervised learning. Our main contribution            the cyber-crime rate, as these accounts were
is to propose a hybrid approach of machine            created primarily to commit cyber robbery
learning algorithms and to compare the                or to commit cybercrime anonymously or
hybrid of different classification algorithms.        unidentified is a significant increase from
Eight different experiments were conducted,           last few years. Fake account owners also try
and the accuracy thus obtained was                    to take advantage of people's kindness by
compared.                                             composing fake messages and spreading
    The total number of users in online               false news through these fake accounts in
social networking sites is continuously               order to steal money from sinless people. In
increasing and with that the number of fake           addition, people want to create multiple
accounts is also increasing. As in                    accounts that don't belong to anyone,
September 2019, monthly active users on               created just to raise votes in an online
Facebook are 2.45 billion worldwide.                  voting system, and receive referral
According to Alexa, after Google and                  incentives, as in online games.
YouTube the third most visited website is                 The detection of fake accounts in OSN
Facebook. In a survey it is found that there          attracts many researchers, so several
are a greater number of female accounts in            algorithms for detection of fake accounts have
the world than the total population of                been developed using machine learning
female. From this, we can infer how many              techniques and various functions to connect to
fake profiles have been created. According            the account. Spammers can also find ways to
to Statistics April 2018 stats report,                support such techniques. These security
Facebook has more than 336 million active             technologies provide sophisticated detection
Twitter accounts, but Facebook is the leader          mechanisms that require the continuous
with 2,196 million users worldwide. In                development of new approaches to spam
September 2019, monthly active users on               detection. The main hazards in detection of
Facebook are 2.45 billion, of which India             fake accounts are to achieve accuracy and
has the most. 270 million users. People who           response time in the analysis of
log on to Facebook daily are approximately            characteristics.
1.62 billion. And among these 83 million
accounts are fake on Facebook. This
statistics was given by Facebook in their             1.2.    Challenges
Wall Street reports (SOURCE: Zephoria
Digital Marketing). Figure 1 shows the                    Modeling a Fake Profile Detection
monthly active users in the year 2019 on              System is an old problem but due to the
various OSNs.                                         many challenges this problem presents there
                                                      still exist a lot of gaps that have been
                                                      identified and need to be worked upon. The
                                                      many challenges this system presents have
                                                      been listed below:
                                                              The data is not readily available:
                                                      accounts on online social networks are
                                                      highly private and protected, so the
                                                      networking sites do not reveal any account
                                                      information to maintain the confidential
                                                      nature and keep the trust of their users.
                                                              There is a lot of overlapping
                                                      between genuine and fake accounts: At
Figure 1: Monthly active users in different OSNs in   times the feature set of legitimate and fake
                    year 2019
accounts overlap, and this poses a              providing an optimal solution. But the fact
considerable setback when it comes to           that the mannerism of fake accounts keeps
training the neural network by making it        on evolving with time and there are
learn the pattern to differentiate between      enormous numbers of challenges and gaps
them.                                           still left to tackle, this problem still has a lot
       The number of parameters to             of significance. In order to study the work
process: The enormous number of                 already done on Fake Account Detection we
parameters between learning and decision        searched articles and research papers on two
making is a major obstacle in developing        major sources: i) general online indexing
systems for detecting fake accounts.            websites, ii) publisher databases. Examples
       Selection of optimal features           of former are Research Gate, Towards Data
(variables) is a big challenge: When it         Science, IEEE Explorer and Google Scholar
comes to optimal feature selection, it needs    and examples of latter are Scopus, Springer,
to be really dealt with care as the             ACM Digital Library and Elsevier
performance of whole system depends on          databases.
which features it’s taking into consideration       The major machine learning techniques
for classification of fake and genuine
                                                we used in detection of fake accounts are
accounts. And at times it’s really perplexing
                                                Neural Network, Support Vector Machine,
to decide on these optimal features.
       Ability to handle noise in the data:    Random Forest, and Hybrid Models for
Noise means missing or incorrect data           Comparative analysis of Fake Account
which poses challenges while processing         Detection.
the dataset. There is no means by which we          Yang et al. trained SVM using the
can make up for this lost information as        ground- truth obtained by Ren Ren for
such systems aren’t partition tolerant, so      detecting fake accounts. By making use of
this adversely affects the outcome.             simple features like frequency of friend
       Heterogeneity in features.              requests,       accepted requests and per-
       Single user multiple accounts.
                                                account clustering coefficient they trained
       Many of the times it resembles a
legitimate transaction: At times the fake       the classifier and got 99% true-positive rate
account activities are stacked up in close      (TPR) and 0.7% false- positive rate (FPR).
resemblance with the legitimate ones.           Íntegro draws out low-cost features from
Hence, it becomes difficult to comprehend       user-level activities to train the classifier for
them and abort them before they make it to      the identification of undetermined victims
completion.                                     in social graph and used feature-based
                                                detection.
1.3.   Gaps Identified                              A different approach for hybrid was
                                                introduced by Mateen et al., by using
       We can extend the evaluation of         content- based features like total number of
propose feature by testing on different         tweets, hash tag ratio, URL’s ratio and
social networking sites like Facebook,          some graph- based features also and used
Twitter etc. as most the previous researches    the dataset of Twitter. They also made a
were done on any one social site among          comparison J48, Decorate and Naïve Bayes
Facebook, Twitter, LinkedIn, Myspace etc.       in which Decorate was the best performer.
       The existing system does not work       Somya et al.’s approach was quite different
for the real time accurately on changing        from others for detection as they tried to
the features.                                   detect the account as fake on the user’s
       Identiﬁcation of rumor sources on       homepage using Chrome extension which
social media by using the content-based         runs on the user site. Along with this they
features.                                       used Petri net based solution for the
                                                identification of source of malicious content
                                                running on Pn2 simulator environment.
1.4. Related Work                                   Using a support vector machine and a
                                                neural network, Khaled et al [20] obtained
   Fake Profile Detection is an old problem     98% accuracy and compared the accuracy
and there has been a lot of work done in        obtained by the hybrid of SVM and NN.
BalaAnand et al. [3] achieved 90.3% accuracy     done like improving upon the response time,
using a random forest classifier, support        prevention from fake accounts instead of
vector machine, and k-nearest neighbor           detecting and dealing with their aftermaths.
method. For their work, Gupta et al [7]          Our work is aiming to deliver a system
selected a dataset on Twitter and used a         which will have the highest accuracy and
labeled dataset with a specific user and tweet   hence will be effective in prevention from
feature. They used a hybrid of naive             such fake profiles by implementing and
algorithms to classify, cluster, and make        comparing different algorithms. This is done
highly accurate decisions.                       by ensemble machine learning technique
                                                 which speeds up the training of neural
                                                 networks and helps them to take decisions
1.5.    Organization                             faster. Efficient parameter selection is also
                                                 one of the major objectives of this work for
    In our work we have implemented various      which we are selecting six features manually
algorithm to find the most efficient             which will give a better control on the
algorithm. To do so we have conducted            output of neural networks. The proposed
several experiments and compared their           solution makes use of the hybrid of the
results. Further, in this paper we have three    machine learning techniques and combines
sections which are briefly define below:         their advantages and uses one to cancel out
    This section is followed by Section 2,       the loopholes of the other and hence
System Architecture. In this section, flow       delivering an efficient and cost-effective
diagram and architecture of our work is          system.
introduced and is described in brief.                In our proposed system we are aiming to
    In Section 3, Experimental Results, of up    design a hybrid system using artificial
to now what modules we have implemented          neural network, support vector machine and
is shown along with pseudocode and               logistic regression that will be able to
discussed the various results produced by        precisely and accurately detect fake profiles
our system and have shown the outputs            in online social network. Goal of the work
generated on various inputs in the form of       is to maximize the accuracy and to
the graph for the better understanding and       minimize the time required by using hybrid
algorithm of the technique is also               approach of the Neural Network, Support
mentioned in this section.                       vector machine and Logistic Regression.
    In Section 4, Conclusion, we provide an          Figure 2 depicts the flowchart of our
understanding of the overall conclusion of       system. The dataset which we have is
the proposed solution i.e. the combination       partitioned into two sets, Train Dataset and
of the techniques which is efficient than        Test Dataset in the ratio 4:1.The train
others and is given better accuracy.             dataset then goes into Support Vector
                                                 Machine and Logistic Regression Classifier
2. System Architecture                           where classes are predicted. Then these
                                                 classifiers are appended to a voting
                                                 classifier where final decision of class is
   Although fake profile detection is a          made. The output from voting classifier i.e.
robust field, but it has many challenges and     train data and the predicted class from
gaps which we have discussed and have            voting classifier is fed to Neural Network
based our work on. There are a lot of            classifier as input. After training has been
existing solutions to fake profile detection     completed, we get a Trained System on
but all of them have some or the other           which Test dataset is ran to find the
drawback. There is a lot of work already         accuracy of the system.
done in this field and a lot more needs to be
                                                Figure 3: Architecture of proposed system


                                                 2.1.    Algorithm for System

                                                 Algorithm
                                                 INPUT: The dataset from CSV files
                                                 OUTPUT: Accuracy
                                                 1. Read dataset: Read genuineusers.csv and
                                                 fake users.csv and append them in a list,
                                                 named x, and make list y for labelling class.
                                                 Return x,y
                                                 2. Feature Extraction: Convert non-integer
                                                 features in dataset to integer. Store and
                                                 overwrite selected 6 features in list x.
 Figure 2: Flowchart of proposed system          Return x
                                                 3. Split data into training data and test data
    Figure 3 depicts the architecture of the     using 5 cross validation and store them
proposed system in which the first step is       separately in x_train, x_test, y_train, y_test.
collection of data of any social networking      4. Scaling of the X_data. (x_train, x_test)
sites in which you want to detect the fake       5. Use ensemble classifier, voting classifier,
accounts. In our proposed work we collect        with SVM and Logistic Regression.
the data from the web sources. And then the      6. Store result in y_pred variable and
data is preprocessed by using feature
                                                 Return y_pred.
extraction techniques in our work we
                                                 7. Repeat step 3 with y pred and x_test
manually select the features. And then
training of data is there and then pass the      8. Output from step 3 is given to Neural
result in voting classifier and then training    Network and then store the output in y_pred.
and testing of data in neural network            9. Testing: Evaluating our trained model
classifier and then we got the result in the     against the test data. The output is visual
form of fake and real accounts.                  graph consisting of True_Positive_Rate and
                                                 False_Positive_Rate with accuracy, i.e, ROC
                                                 curve.
                                                 10. Print the classification accuracy on testing
                                                 dataset.
                                                 Plot the confusion matrix.
                                                 Print the execution time.
                                                 11. Exit
3. Experimental Result                           select the features manually and we
                                                 compare the result obtained from three
   No proposal can be modeled into a             ways and we get better result from the
system without some experiments to               manually selection of features and the
support it. In this section we have included     features we select manually are:
the results and outputs produced during
experiment with our system and by our                    statuses_count
system under various inputs and parameters.              followers_count
                                                         friends_count
3.1. Implementation Details                              favourites_count
                                                         listed_count
   Each phase of our proposed system is                  lang_code
briefly described in this section along with
description, results at each stage are also         The language code feature is of string
provided.                                        type we convert it into integer. After calling
                                                 extract feature function it prints the
                                                 extracted feature name and describes the
3.1.1. Data Collection                           entire extracted feature in summarized by
                                                 printing mean, quartile, count, std, min,
   For the model to work upon, there is a        max etc.
need for data collection. The dataset can be        Figure 4 shows the data distribution in
collected from various online platforms and      each column or feature in terms of count,
can also be created by using Crawler. We         mean, standard deviation, minimum and
have collected two datasets through online       maximum values, and average of 25%, 50%
from well-known websites Kaggle and              and 75% of the data points when taken in
GitHub. But we worked on the dataset             ascending order.
which is collected by Kaggle and in that we
are using two CSV files corresponding to
fake and genuine users. Figure 5 shows the
sample of csv file. And the code for reading
both the files are:


  genuineusers=pd.read_csv("users.csv")
   fakeusers= pd.read_csv(“fusers.csv")
                     ,

                                                 Figure 4: Data distribution in each column
3.1.2. Data Preprocessing
    Data pre-processing is used to achieve       3.1.3. Training of Classifiers
the better result from any machine learning
model and data processing is used to clean          As we are using the hybrid approach of
the data from raw data we import the useful      the techniques in our proposed system, so
libraries which will rescale or clean our data   we have done experiments with six
and the libraries we import are numpy,           techniques i.e. SVM, RF, LR, DTC, NN,
panda, scikit-learn and from sklearn we          KNN and finalize the techniques that gives
import preprocessing to clean our data.          the best result and they are Support Vector
    Now in the next part for data pre-           Machine, Logistic Regression and Neural
processing we use feature extraction             Network. First we train our data using
technique first we try the principal             support vector machine independently and
component analysis technique and then we         then we train our data on Logistic
use the genetic algorithm and then after we      Regression independently and after
Figure 5: Sample of CSV file


analyzing the result of both the
classification techniques we merge both the      called epoch. For this instance, we have
techniques to check the accuracy of both of      taken our epoch to be 10, total number of
them together and hybrid approach of both        layers to be 3, it took approximately minutes
the techniques gives us the best result and      and seconds to train the system with final
after training the data from both the voting     accuracy and loss value to be respectively.
classifier is used to get the best result from
both and then passing value for any one of
them and then we use 5 fold cross
validation technique to avoid the situation
of overfitting as in k-fold cross validation
technique dataset in divided into k folds
where 1 fold is used for validation or testing
while others are used for training and in
these way we can avoid the situation of
overfitting. After getting the score of each
fold final estimated score is printed and in
these we got 0.91 and the accuracy on
testing dataset is 99.56.and after that the
confusion matrix is plotted which will gives     Figure 6: Training of Neural Network
us the 261 true positive value and 7 false
                                                    Now the output produced by several
negative value and 29 false positive and 267
                                                 hybrid techniques. We have collected two
true negative value and then we plot the
                                                 datasets say, D1 and D2 and the difference
normalized confusion matrix which gives us
                                                 between these datasets is in their size, D2 is
all the four (TP,TN,FP,FN) values in
                                                 large as compared to D1. D2 contains
percentage form along with precision,
                                                 approx. 3500 rows while D1 contains
recall, f1 score and support and all these are
                                                 approx. 1500 rows. The results that we have
evaluation criteria. For fake recall we got is
                                                 obtained with different algorithms on both
0.98 and for genuine it is 1.00 and f1 score
for both is 0.99 and overall accuracy is0.99.    datasets are different and D2 gives less
                                                 system with less accuracy as compared to
                                                 D1.
3.1.4. Training of Neural Network
   Figure 6 shows the training of neural
network. Each line corresponds to each
round of forward and backward propagation
 Table 1:
 Comparison of Accuracy (Support Vector Machine,
 Decision Tree Classifier, Logistic Regression
 Random Forest, Neural Network)

 Hybrid Techniques          D2            D1
    SVM+RF+NN             91.94%        96.01%
    SVM+LR+NN             97.3%         99.56%
     RF+LR+NN             93.32%        95.79%
 SVM+DTC+LR+NN            96.34%        99.33%
 SVM+DTC+RF+NN            92.87%        95.79%
   SVM+DTC+NN             91.48%        96.45%
 SVM+RF+KNN+NN            92.31%        97.12%
                                                       Figure 8: Confusion matrix of proposed hybrid
                                                       model
    As we can see there is an accuracy
 difference between both datasets used by
                                                   Table 2 shows the results of the seven
 different algorithms so further, we will be
                                                   experiments that we performed using
 working and showing results for only
                                                   different combination of classification
 dataset, D1. We are using two csv files one
                                                   algorithms like Support Vector Machine,
 is of genuine users and other one is of fake
                                                   Random Forest, Logistic Regression, KNN
 users.
                                                   with Neural Network. In the above table we
    Figure 7 shows the accuracy of each of
                                                   can see that SVM, Log Reg, and NN is
 our experimental model in ascending order
                                                   giving the maximum of true positive true
 and the model with highest accuracy being
                                                   negative resulting in maximum accuracy of
 our trained system.
                                                   all.

                                                   Table 2:
                                                   Results of combination of several techniques

                                                         Hybrid      TP   FP   FN   TN    Accuracy
                                                       Techniques                           (%)
                                                      SVM+RF+NN      56    3    0    54    96.01
                                                      SVM+LR+NN      55    1    0    57    99.56
                                                       RF+LR+NN      55    4    0    54    95.79

                                                     SVM+DTC+LR      54    2    0    57    99.33
                                                         +NN
                                                     SVM+DTC+RF      55    4    0    54    95.79
Figure 7: Accuracy of models in ascending order          +NN
                                                      SVM+DTC+N      55    5    0    53    96.45
                                                          N
    Figure 8 shows the confusion matrices            SVM+RF+KNN      55    4    0    54    97.12
 for our proposed hybrid model which gives               +NN
 us the summary of true positive, true
 negative, false positive and false negative
 without normalization.                            4. Conclusion
                                                      If we look at the system designs, majority
                                                   of implementations for fake account
                                                   detection is either graph-based or feature-
                                                   based and they may use the graph analysis
                                                   techniques or machine learning techniques to
                                                   identification of accounts as fake or real. In
our proposed framework we use feature-based             24.2 (2015): 773-787.
dataset and selected the features manually.          [9] Sahoo, SomyaRanjan, and Brij B. Gupta.
This approach is based upon the user-level               "Hybrid approach for detection of
activities and the user’s account details. We            malicious profiles in twitter." Computers
are comparing the hybrid approach of different           & Electrical Engineering 76 (2019): 65-81.
classification algorithms and pass them in           [10] Kaur, Ravneet, and Sarbjeet Singh. "A
voting classifier and then pass the result in             survey of data mining and social network
Neural network what we got from the voting                analysis based anomaly detection
classifier. In addition to our satisfying                 techniques." Egyptian informatics journal
conclusion, we have maintained the                        17.2 (2016): 199-216.
highest accuracy in detecting fake accounts by       [11] Jia, Jinyuan, Binghui Wang, and Neil
testing and training the dataset on different             Zhenqiang Gong. "Random walk based
hybrid approach of classification algorithms.             fake account detection in online social
The results show the increase of the accuracy             networks." 2017 47th Annual IEEE/IFIP
results of the different classification algorithm.        International Conference on Dependable
                                                          Systems and Networks (DSN). IEEE,
5. References                                             2017.
                                                     [12] Dhawan, Sanjeev. "Implications of
                                                          Various      Fake     Profile    Detection
[1] Joshi, Shruti, et al. "Identifying Fake
                                                          Techniques in Social Networks." IOSR
     Profile in Online Social Network: An
                                                          Journal of Computer Engineering (IOSR-
     Overview and Survey." International                  JCE), AETM'16 (2016): 49-55.
     Conference on Machine Learning, Image           [13] Gurajala, Supraja, et al. "Fake Twitter
     Processing, Network Security and Data
                                                          accounts: profile characteristics obtained
     Sciences. Springer, Singapore, 2020.                 using an activity-based pattern detection
 [2] Mohanty, Sachi, et al. Recommender
                                                          approach." Proceedings of the 2015
     System with Machine Learning and
                                                          International Conference on Social
     Artificial Intelligence. Wiley-Scrivener,            Media & Society. 2015.
     2020.                                           [14] Xiao, Cao, David Mandell Freeman, and
 [3] Balaanand, Muthu, et al. "An enhanced
                                                          Theodore Hwa. "Detecting clusters of
     graph-based semi-supervised learning                 fake accounts in online social networks."
     algorithm to detect fake users on Twitter."          Proceedings of the 8th ACM Workshop
     The Journal of Supercomputing 75.9
                                                          on Artificial Intelligence and Security.
     (2019): 6085-6105.                                   2015.
 [4] Boshmaf, Yazan, et al. "Integro:
                                                     [15] Adikari, Shalinda, and Kaushik Dutta.
     Leveraging Victim Prediction for Robust
                                                          "Identifying fake profiles in linkedin."
     Fake Account Detection in OSNs." NDSS.               arXiv preprint arXiv:2006.01381 (2020).
     Vol. 15. 2015.                                  [16] Al-Qurishi, Muhammad, et al. "A
[5] Erşahin, Buket, et al. "Twitter fake account
                                                          prediction system of Sybil attack in
     detection." 2017 International Conference            social network using deep-regression
     on Computer Science and Engineering                  model." Future Generation Computer
     (UBMK). IEEE, 2017.                                  Systems 87 (2018): 743-753.
 [6] Mateen, Malik, et al. "A hybrid approach
                                                     [17] Masood, Faiza, et al. "Spammer detection
     for spam detection for Twitter." 2017 14th           and fake user identification on social
     International Bhurban Conference on                  networks." IEEE Access 7 (2019):
     Applied Sciences and Technology                      68140-68152.
     (IBCAST). IEEE, 2017.                           [18] Cresci, Stefano, et al. "Fame for
 [7] Gupta, Arushi, and Rishabh Kaushal.
                                                          sale:Efficient detection of fake Twitter
     "Improving spam detection in online social           followers." Decision Support Systems 80
     networks." 2015 International conference             (2015): 56-71.
     on cognitive computing and information          [19] Yang, Zhi, et al. "Uncovering social
     processing (CCIP). IEEE, 2015.                       network sybils in the wild." ACM
 [8] Rahman, Sazzadur, et al. "Detecting
                                                          Transactions on Knowledge Discovery
     malicious      Facebook       applications."         from Data (TKDD) 8.1 (2014): 1-29.
     IEEE/ACM transactions on networking             [20] Khaled, Sarah, Neamat El-Tazi, and
     Hoda MO Mokhtar. "Detecting fake
     accounts on social media." 2018 IEEE
     International Conference on Big Data
     (Big Data). IEEE, 2018.
[21] Gupta, Aditi, and Rishabh Kaushal.
     "Towards detecting fake user accounts in
     facebook." 2017 ISEA Asia Security and
     Privacy (ISEASP). IEEE, 2017.
[22] Benevenuto, Fabricio, et al. "Detecting
     spammers on twitter." Collaboration,
     electronic messaging, anti-abuse and
     spam conference (CEAS). Vol. 6. No.
     2010. 2010.
[23] Stein, Tao, Erdong Chen, and Karan
     Mangla. "Facebook immune system."
     Proceedings of the 4th workshop on
     social network systems. 2011.