Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina


       A Constrained Multi-view Clustering Approach to Influence Role Detection
                              Chengyao Chen1, Dehong Gao2, Wenjie Li1, Yuexian Hou3
                         1
                          Department of Computing, The Hong Kong Polytechnic University, Hong Kong
                                           2
                                             1688cn, Alibaba.INC(Hangzhou), China
                            3
                              School of Computer Science and Technology, Tianjin University, China
                                  cscchen@comp.polyu.edu.hk, dehong.gdh@alibaba-inc.com,
                                         cswjli@comp.polyu.edu.hk, yxhou@tju.edu.cn

                             Abstract                                     others the latest news of a product. And someone promotes
                                                                          the product by popularity. It is quite clear that different
     Twitter has provided people with an effective way
     to communicate and interact with each other. It is an                influential users play different influence roles. Meanwhile, a
     undisputable fact that people’s influence plays an                   company may have different objectives in different
     important role in disseminating information over the                 promotion stages and needs users with different influence
     Twitter social network. Although a number of                         roles to conform to [Brown and Hayes, 2008]. For example,
     research work on finding influential users have been                 a company which targets to improve product brand awareness
     reported in the literature, they never really seek to                may want to choose the users with high popularity to help
     distinguish and analyze different influence roles,                   with. However, for a company whose product quality is
     which are of great value for various marketing                       questioned by customers, it may be a better choice to invite
     purposes. In this paper, we move a step forward to                   domain experts who have professional knowledge to explain
     further detect five recognized influence roles of                    and convince. Selecting influential users with appropriate
     Twitter users with regard to a particular topic. By                  influence roles in accordance with specific marketing
     exploring three views of features related to topic,                  objectives is more effective than just seeking for the most
     sentiment and popularity respectively, we propose a                  influential ones in general.
     novel constrained multi-view influence role                             Despite the importance of influence role, previous work
     clustering approach to group potential influential                   mostly emphasizes on measuring the general influence power
     Twitter users into five categories. Experimental                     of a user on others through the information of the netwrok
     results demonstrate the effectiveness of the                         structure[Cha et al., 2010; Weng et al., 2010], or maximizing
     proposed approach.                                                   the influence propagation which assists companies to find the
                                                                          proper set of people to promote products [Kempe et al., 2003;
1 Introduction                                                            Chen et al., 2009]. Without any exception, they all take the
Nowadays, Twitter has become one of the most popular                      influence as the same type. The lack of considering the effects
social media platforms for people to share information and                of different influence roles on different marketing objectives
communicate with each other. It creates more and more new                 will inevitably hinder the companies from proposing more
business opportunities with a variety of online marketing                 suitable marketing strategies. This motivates us to further
activities [Anagnostopoulos et al., 2008]. Recent years have              analyze and detect different influence roles of users, which
witnessed that an increasing number of enterprises have                   could be used to further extend the previous work in
started to attach importance to locating favorable influential            achieving different marketing goals. [Chen et al., 2014]
users and manipulating their opinions to attract potential                proposed the idea to distinguish different types of influential
customers or improve sales. Understanding social influence                users, but lacked compelte study on how to detect them.
over large-scale networks is crucial to business marketing                           Table 1. Five categories of influence role.
management.                                                                 Role Category       Influence Manner         Marketing Effect
   Although all influential users perform influence on others,                                 Support and defend
                                                                              Enthusiast                                  Improve sales
[Brown and Hayes, 2008]has verified that the way people use                                           products
to influence others varies and produces different effect.                    Information         Publish product          Enhance brand
Someone always strongly praises a product and persuades                     Disseminator           information             memorability
others to buy. Someone changes others’ opinions on a                                             Gather facts and            Improve
                                                                                Expert
                                                                                              professional opinions         reputation
product with professional analysis. Someone timely informs
                                                                                                                             Improve
Copyright © 2015 for the individual papers by the papers' authors.            Celebrity       Popular among people
                                                                                                                            awareness
Copying permitted for private and academic purposes. This                                       Show no obvious
volume is published and copyrighted by its editors                              Others                                         None
                                                                                                     influence


                                                                     29
    Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
    July 27th, 2015 - Buenos Aires, Argentina


   To better characterize influence roles, we define five                the mutual information, the K most relevant words that co-
distinct categories with reference to the definition in the              occur with the topical word within a window of size two are
WOMMA’s                      influencer              guidebook           extracted as keywords to form the topic profile collectively.
(www.womma.org/influencers). They are enthusiast,                        These K words provide a more complete picture of the topic
information disseminator, expert, celebrity and others. The              than the topical word itself. For all the tweets of a given user,
brief descriptions of them are summarized in Table 1. We can             a topical vector weighed by tf-idf is built to capture his/her
clearly see that one’s influence role is largely determined by           word distribution over the extracted keywords.
his/her behaviors and personal characteristics, but not totally
                                                                         Sentiment-view Representation
dependent on how much influence he/she has. Different from
                                                                         The sentiment view reveals the preferred attitudes when a
previous work that measures users’ influence mainly based
                                                                         user expresses his/her opinions and tends to differentiate
on social connections, we summarize three aspects that help
                                                                         among the enthusiast who often posts tweets with positive
to distinguish influence roles, including the interest to a topic
                                                                         sentiments, the disseminator whose tweets is mainly neutral
(e.g., enthusiast, information disseminator and expert pay
                                                                         ones and the expert whose opinions may be either positive or
more attention than the other two), the attitude to the topic
                                                                         negative. To measure the sentiment of users, the lexicon
(e.g., enthusiast always praises, expert sometimes praises and
                                                                         AFINN (http://www2.compute.dtu.dk/~faan/data/) is used,
sometimes not) and the popularity over the social network
                                                                         where each word is attached with an integer value between
(e.g., celebrity has more followers). Accordingly we extract
                                                                         negative five and positive five, denoting its sentiment polarity
three views of features, i.e., the topic view, the sentiment
                                                                         and strength. Based on this lexicon, the positive/negative
view and the popularity view from users’ posts and profiles
                                                                         sentiment scores of a tweet are calculated by aggregating the
for influence role detection.
                                                                         sentiment strengths of all the positive/negative words it
   We also note that each view can only partially reflect the
                                                                         contains. The sentiment view representation of a user is then
influence role from its own perspective. However, when they
                                                                         defined as the average positive-sentiment score and average
complement with each other, the three views together provide
                                                                         negative-sentiment score of all his/her tweets
more complete information for influence roles. Based on the
three-view user representations, we propose a novel                      Popularity-view Representation
Constrained Multi-view Influence Role Clustering (CMIRC)                 Apart from the interests and attitudes to a topic, the popularity
approach upon an optimization framework to partition                     (or to say the authority) of a user can also imply the influence
influential users into five recognized categories. Unlike other          role in some extent. Three features are selected including the
existing multi-view clustering approaches, CMIRC allows                  number of followers, the number of followees and a binary
the cluster numbers in the different views to be different and           value indicating whether a user account is verified or not. The
so provides more flexibility for integrating data from multi-            popularity view tends to distinguish the people with different
views. It connects the local clustering information from each            levels of popularity like celebrities and enthusiasts.
individual view and the global multi-view clustering results
with a local-global mapping mechanism.                                   2.2 Constrained Multi-view Influence Role
   Another advantage of CMIRC is its capability to                           Clustering
incorporate the prior knowledge based upon the semi-                     To better use the data collected from multiple sources, multi-
supervised learning framework. Actually, it is very common               view clustering approaches partition data into clusters by
that the influence roles are known to a small number of users            integrating features from multiple views. They have been
who are easily identified by a company. Then people can use              successfully applied to image recognition and text mining, etc.
such information as the prior knowledge to find out many                 [Bickel et al., 2004; Cai et al., 2013; Liu et al., 2013]. These
others for their needs. To incorporate the prior knowledge to            approaches share a common assumption, i.e., the features
guide clustering, we apply two kinds of group-level                      from each single view are complete for clustering, yet better
constraints, the same-cluster constraints and the different-             clustering performance can be expected by exploring the rich
cluster constraints, to define which groups of users must be             information among multiple views. Naturally, the cluster
or must not be in the same cluster. The experimental results             numbers of different views are often supposed equal to the
demonstrate the effectiveness of CMIRC when compared                     final multi-view cluster number. From the previous analysis,
with other single-view and multi-view clustering approaches              however, we believe that it is more reasonable and practical
                                                                         to allow the cluster numbers of different views to be different
2   Influence Role Detection                                             for influence role detection. As a result the clustering results
                                                                         in each view will be also different from the ultimate
2.1 Three-View User Representation                                       clustering results. To this end, we develop a Constrained
Topic-view Representation                                                Multi-view Influence Role Clustering (CMIRC) approach to
The motivation of using topic view is the intuition that                 group data into different numbers of clusters in individual
different roles may have different degrees and different                 views (i.e., local clusters) and utilize the mapping matrix to
focuses of attention to the topic. To start with, a word like            bridge the gap between the single-view clusters and the multi-
“iPhone” is selected as the topical word. Then, measured by


                                                                    30
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina


view clusters (i.e., global clusters). The introduction of the                Figure 1 explains how 𝑢𝑖 ’s local cluster in view j
mapping matrix is one of the main contributions of this work.              corresponds his global cluster. Assume that 𝑢𝑖 belongs to the
   Another advantage of this approach is its semi-supervised               first global cluster as presented in 𝐺𝑖 and the mapping matrix
framework that allows us to incorporate the prior knowledge                describes the first global cluster is mapped to the second local
easily. Say, we can take a small number of users whose                     cluster. Then 𝑢𝑖 should be found in the second local cluster.
influence roles are manually labeled as the prior knowledge                   Apart from the use of mapping matrix, two types of
to guide the clustering of others. To incorporate the prior                constraints are also integrated into CMIRC. They are defined
knowledge into CMIRC, we employ two kinds of group-level                   through user groups 𝑢𝑔𝑖 = {𝑢1 , 𝑢2 , … , 𝑢𝑛(𝑢𝑔𝑖) } , where
constraints [Law et al., 2004] to define which group of users              𝑛(𝑢𝑔𝑖 ) is the number of users in this user group 𝑢𝑔𝑖 . Same-
must be or must not be in the same cluster. Specifically, the              Cluster constraints are a set of user groups, i.e., 𝑆𝐶 =
same-cluster (𝑆𝐶) constraints include several groups of users              {𝑢𝑔1 , 𝑢𝑔2 , … , 𝑢𝑔𝑙 } . The users in each 𝑆𝐶 group must be
and the users in each group must belong to the same cluster,               assigned to the same cluster. Different-Cluster constraints are
either local or global cluster. The different-cluster ( 𝐷𝐶 )               a set of user group pairs, i.e., 𝐷𝐶 = {𝑝1 , 𝑝2 , … , 𝑝𝑟 } and 𝑝𝑘 =
constraints contain several group pairs and the users in the
                                                                           ⟨ 𝑢𝑔𝑖 , 𝑢𝑔𝑗 ⟩. The users in two different groups of a pair in 𝐷𝐶
two groups of a pair cannot be in the same cluster.
   To better describe our approach, let’s start with a variant             must belong to different clusters. All users in 𝑆𝐶 and 𝐷𝐶
K-means clustering algorithm which utilizes data from                      compose 𝑈𝑐𝑜𝑛 . Compared with the pair-wise constraints,
multiple sources [Cai et al., 2013]. Let 𝑈 = {𝑢1 , 𝑢2 , … , 𝑢𝑛 }           during cluster assignment, we could assign a cluster to the
represents n Twitter users. Each user 𝑢𝑖 is represented by m               whole group without the need to assign users to clusters one
views of features, 𝑋𝑖 = {𝑋𝑖1 , 𝑋𝑖2 , … , 𝑋𝑖𝑚 } , where the j-th            by one. Such a strategy avoids computational complexity in
           𝑗                                                               the optimization procedures introduced later.
element 𝑋𝑖 represents the features of view j, and it is a row
                                                                           Finally, CMIRC that partitions the users 𝑈 into t clusters with
vector containing 𝑑𝑗 elements. Then a typical multi-view
                                                                           m-view features constrained by 𝑆𝐶 and 𝐷𝐶 can be
clustering task can be formulated as the following                         formulated by the following optimization problem
optimization problem.                                                                        𝑚    𝑛
                           𝑚   𝑛                                                                                                    𝑇
                                                                                                                  𝑗
                                      𝑗       𝑇                                       min ∑ ∑ 𝛼𝑗 ‖𝑋𝑖 − 𝐺𝑖 𝑀 𝑗 𝐶 𝑗 ‖                             (1)
                    min ∑ ∑ 𝛼𝑗 ‖𝑋𝑖 − 𝑃𝑖𝑗 𝐶𝑗 ‖                                        𝐺,𝑀,𝐶                                              2
                    𝑃,𝐶                           2                                          𝑗=1 𝑖=1
                          𝑗=1 𝑖=1
                                                                                𝑡                                     𝑚
             𝐾𝑗                                       𝑚

     𝑠. 𝑡. ∑ 𝑃𝑖𝑗𝑘 = 1, 𝑃𝑖𝑗𝑘 ∈ {0,1}, ∀𝑖 = 1,2, … , 𝑛, ∑ 𝛼𝑗 = 1             𝑠. 𝑡. ∑ 𝐺𝑖𝑘 = 1, 𝐺𝑖𝑘 ∈ {0,1}, ∑ 𝛼𝑗 = 1,
             𝑘=1                                      𝑗=1                      𝑘=1                                    𝑗=1
                                                                               𝐾𝑗                                        𝑡
                                    1×𝐾 𝑗
   Similar to K-means, 𝑃𝑖𝑗 ∈           here describes the cluster                      𝑗                                        𝑗
indicator for user 𝑢𝑖 in view j. It also represents the local                  ∑ 𝑀𝑖𝑘 ≥ 1, ∀𝑖 = 1, … , 𝑡, ∑ 𝑀𝑖𝑘 = 1, ∀𝑘 = 1, … , 𝐾𝑗 ,
                                        𝑗  𝑗                                   𝑘=1                                        𝑖=1
clustering results. 𝐾𝑗 and 𝐶 𝑗 ∈ 𝑑 ×𝐾 denote the cluster                        𝑗
                                                                               𝑀𝑖𝑘 ∈ {0,1}, ∀𝑢𝑖 , 𝑢𝑗 ∈ 𝑢𝑔𝑘 ∧ 𝑢𝑖 ≠ 𝑢𝑗, 𝐺𝑖 = 𝐺𝑗 ,
number and cluster centers in view j. 𝛼𝑗 is a factor to balance
the weight of view j. If the cluster numbers in all m views are                ∀⟨𝑢𝑔𝑞 , 𝑢𝑔𝑝 ⟩ ∈ 𝐷𝐶 ∧ ∀𝑢𝑖 ∈ 𝑢𝑔𝑞 ∧ ∀𝑢𝑗 ∈ 𝑢𝑔𝑝 , 𝐺𝑖 ≠ 𝐺𝑗
the same (i.e., 𝐾𝑗 =t, where t represents global cluster                   where 𝐺𝑖 represents the global cluster assignment for user
number), 𝑃𝑖𝑗 for all the views should be consistent. This                  𝑢𝑖 which satisfies 1-of-K coding scheme. 𝐶 𝑗 is the local
implies that the local clustering results in every view are                cluster center in the j-th view and 𝑀 𝑗 is the mapping matrix.
equal to the global clustering results. However, with our                  𝑀 𝑗 satisfies the constraints that every local cluster must be
assumption, the cluster number in each view is different, so               mapped to at least one global cluster and every global cluster
we cannot derive the global clustering results directly from               must be mapped to one and only one local cluster.
𝑃𝑖𝑗 . In order to connect local clustering and global clustering              In order to solve this optimization problem, we rewrite the
together, we transform the local clustering results 𝑃𝑖𝑗 in view            objective function in Equation (1) as Equation (2), and apply
j into the combination of global cluster assignment 𝐺𝑖 ∈ 1×𝑡               the following iterative updating process to solve it.
                                                                                                                      𝑚
                                     𝑗
and a mapping matrix 𝑀 𝑗 ∈ 𝑡×𝐾 .                                                                 𝑂 = min ∑ 𝛼𝑗 𝐻 𝑗 ,                             (2)
                                                                                                          𝐺,𝑀,𝐶
       Figure 1. Illustration of global and local cluster mapping                                             𝑗=1
                                                                                                      𝑇      𝑗 𝑗𝑇                       𝑇   𝑇
                                                                           where 𝐻 𝑗 = 𝑇𝑟{(𝑋𝑗 − 𝐶 𝑀 𝐺 𝑇 )𝐷 𝑗 (𝑋𝑗 − 𝐶 𝑗 𝑀 𝑗 𝐺 𝑇 )𝑇 ,
                                                                           and 𝐷 𝑗 is the degree matrix derived from 𝐸 𝑗 .
                                                                              𝑗       1                                       𝑇
                      × t                     =                             𝑑𝑖𝑖 = 𝑗𝑖 , ∀𝑖 = 1,2, … 𝑛, and 𝐸 𝑗 = 𝑋𝑗 − 𝐺𝑀 𝑗 𝐶 𝑗   (3)
                                                                                     2‖𝑒 ‖

         t                                            𝐾𝑗                    Fix 𝐺, 𝑀 𝑗 , 𝐷 𝑗 and update local cluster center 𝐶𝑗
                                                                             As stated before, the combination of 𝐺𝑖 and 𝑀 𝑗 represent
                                   𝐾𝑗                                      local cluster results. In this step, the local cluster centers are
Global cluster 𝑮𝒊         Mapping matrix 𝑴𝒋       Local cluster 𝑷𝒊𝒋


                                                                      31
    Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
    July 27th, 2015 - Buenos Aires, Argentina


updated by minimizing the distances from users to their                     are constrained by Same-Cluster as the influence roles of
corresponding clusters. It is solved by differentiating the                 these clusters.
objective function in Equation (2) for each view with respect
to 𝐶 𝑗 . The optimal solution of 𝐶 𝑗 is obtained by setting the             3   Experiments and Discussion
derivation to zero, which gives us                                          Four topics about well-known electronic products, “iPhone”,
                   𝑇                   𝑇
        𝐶 𝑗 = 𝛼𝑗 𝑋𝑗 𝐷 𝑗 𝐺 𝑗 𝑀 𝑗 (𝛼𝑗 𝑀 𝑗 𝐺 𝑇 𝐷 𝑗 𝐺𝑀 𝑗 )−1  (4)               “Samsung Galaxy”, “Xbox” and “PlayStation” are selected
 Fix 𝑀 𝑗 , 𝐶 𝑗 , 𝐷 𝑗 and update global cluster assignment 𝐺                to construct the experimental datasets. We collect the tweets
   We update 𝐺 through each row of its, 𝐺𝑖 in the following                 that contain the topical word like “iPhone” from 3rd to 30th
order. First we update 𝐺𝑖 for the users who are not                         April 2014. Among users who post these tweets, the ones
                                                                            who have more than 500 followers and have been re-tweeted
constrained separately, and then update 𝐺𝑖 for the users who
                                                                            at least once are regarded as influential users. The size of an
are in 𝑈𝑐𝑜𝑛 together. In particular, if the user 𝑢𝑖 is not
                                                                            influential user pool ranges from 4912 (for Samsung Galaxy)
constrained, we locate the local cluster for each user through
                                                                            to 90906 (for iPhone). To be consistent, 4912 influential users
the mapping matrix. Then, what we need to do is to find                     are sampled for each topic. These users together with their
𝐺𝑖 from its limited solutions that minimize the sum of                      tweets and account information are used in the experiments.
distances between it and the center of its assigned local                      Due to the lack of annotated datasets, for each topic we
cluster for each view, as presented in the Equation (5).                    randomly select 200 from 4912 influential users and invite
                       𝑚
                                   𝑗         𝑇                              human annotators to label their influence roles for evaluation
      𝐺𝑖 = argmin ∑ 𝛼𝑗 ‖𝑋𝑖 − 𝐺𝑖 𝑀 𝑗 𝐶 𝑗 ‖                    (5)            purpose by providing users’ posts and their account
                 𝐺𝑖                              2
                       𝑗=1                                                  information. The numbers of the annotated users across five
   Constrained by 𝑆𝐶 and 𝐷𝐶 , we give each user group in                    influence roles are presented in Table 2. We randomly choose
𝑈𝑐𝑜𝑛 a global cluster assignment, i.e., 𝐺𝑐𝑜𝑛(𝑢𝑔𝑖 ) , a row vector           1/5 users of each influence role to build the constraints
that represents the assignment for users in user group 𝑢𝑔𝑖 in               required by CMIRC, and the rest are used for evaluation.
𝑆𝐶. By concatenating the assignment vectors for each user                                Table 2. Evaluation data on four topics
group, we form a certain number of candidate assignment                            Role              Information
                                                                                         Enthusiast                 Expert Celebrity Others
matrixes that guarantee the 𝐷𝐶 constraints in column. From                   Topic                   Disseminator
all these candidates, the one that minimizes the objective                     iPhone        9            31          13        20    127
                                                                               Galaxy       21            32          15        19    113
function in Equation (6) is defined as 𝐺𝑐𝑜𝑛 ,
                                                                                Xbox        20            25          14        15    126
                               𝑛(𝑢𝑔 )   𝑗   𝑇                      𝑇
𝐺𝑐𝑜𝑛 = argmin ∑𝑚    𝑙                             𝑗 𝑗
               𝑗=1 ∑𝑖=1 ∑𝑘=1 ‖𝑋𝑢𝑔𝑖𝑘 − 𝐺𝑐𝑜𝑛(𝑢𝑔𝑖 ) 𝑀 𝐶 ‖
                            𝑖                                                PlayStation    13            29          15        14    129
          𝐺𝑐𝑜𝑛                                                          2
                                                     (6)                       We compare CMIRC with (1) Baseline K-means
         𝑗
where 𝑋𝑢𝑔𝑖𝑘 are the j-view features of user 𝑢𝑘 who is in                    clustering (BKC) and Constrained K-means clustering (CKC)
group 𝑢𝑔𝑖 . Then the global cluster assignment 𝐺𝑘 for user                  that concatenates three views together; (2) two existing multi-
𝑢𝑘 in user group 𝑢𝑔𝑖 is regarded as 𝐺𝑐𝑜𝑛(𝑢𝑔𝑖) .                             view clustering approaches, i.e., Multi-view K-means
                                                                            Clustering (MKC) [Cai et al., 2013] and Negative Matrix
 Fix 𝐺, 𝐶 𝑗 , 𝐷 𝑗 a, and update global and local cluster                   Factorization (NMF) based Multi-view Clustering (NMFMC)
      mapping matrix 𝑀 𝑗                                                    [Liu et al., 2013]. To further understand the contribution of
     𝑗
   𝑀 is the mapping matrix between global and local clusters.               each view, we also compare with (3) Constrained Single-
For each view, based on the constraints for 𝑀 𝑗 , we construct              View K-means Clustering (CSCtopic, CSCsentiment and CSCaccount)
candidate mapping matrixes and possible choices for the                     and (4) Constrained Two-View K-means Clustering
local cluster assignment by transforming from the global                    (CMIRCts, CMIRCsa and CMIRCta). In addition, (5) CMIRC
cluster assignment 𝐺. The one that assigns users to the best                without constrains (MIRC) is also compared. Three
local clusters to guarantee the overall minimized distance                  commonly-used metrics are used to evaluate performances.
over all the users is selected to be the updated mapping matrix.            They are macro-average precision (MP), macro-average
                           𝑛                                                recall (MR), and macro-average F-measure (MF).
                               𝑗            𝑇
       𝑀 = argmin ∑ ‖𝑋𝑖 − 𝐺𝑖 𝑀 𝑗 𝐶 𝑗 ‖
          𝑗
                                                             (7)               For CMIRC, we compare different settings of cluster
                  𝑀𝑗                             2                          number for each view from 2 to 5 to find the one with best F-
                        𝑖=1
 Fix 𝐺, 𝐶 𝑗 , 𝑀 𝑗 and update 𝐷 𝑗                                           measure. For the topics “iPhone” and “PlayStation”, (3, 2, 3)
  𝐷 𝑗 is introduced to aid solving the optimization problem in              for topic, view, sentiment view and popularity view is the
Equation (4) and it can be calculated directly from 𝐺, 𝐶 𝑗 , 𝑀 𝑗            best one, while for the topics “Samsung Galaxy” and “Xbox”,
according to Equations (3).                                                 (3, 5, 5) is the best one. The cluster number for each view on
  Of the four steps in CMIRC iterations, three are convex                   two-view clustering CMIRC and MIRC are also set the same
problems related to one variable. It can be proved that each is             as CMIRC. And for BKC, CKC and CSC, the cluster number
                                                                            is set the same as the global cluster number 5.
guaranteed to converge to an optimal solution. Once the
global clusters are ready, we select the labels of the users who


                                                                       32
    Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
    July 27th, 2015 - Buenos Aires, Argentina


                                                    Table 3: Performance evaluation
                 Topic               iPhone                    Galaxy                Xbox                               PlayStation
  Approach                   MP        MR       MF     MP        MR      MF       MP MR                 MF       MP         MR      MF
 Combined     BKC          0.2551    0.3526   0.1551 0.2725 0.2575 0.1443 0.2063 0.2340               0.0901   0.2964     0.3095 0.1467
   view       CKC          0.3366    0.3518   0.2649 0.3529 0.3314 0.1526 0.2419 0.2152               0.1585   0.2803     0.3407 0.2026
             NMFMC         0.3627    0.4371   0.3465 0.3497 0.3568 0.2154 0.2874 0.2839               0.2770   0.3892     0.3170 0.2812
Multi-view
              MKC          0.3404    0.2983   0.1979 0.4132 0.3253 0.3155 0.3035 0.2960               0.2436   0.3333     0.3393 0.2328
        CMIRC              0.4670    0.5056   0.4020 0.4914 0.3417 0.3616 0.4338 0.3337               0.3207   0.4031     0.3676 0.3531
         MIRC              0.4200    0.3731   0.3730 0.4012 0.3126 0.3298 0.3352 0.3074               0.2914   0.3752     0.3477 0.3065
             CSCtopic      0.2667    0.4552   0.2546 0.2367 0.3585 0.1542 0.1957 0.1898               0.1530   0.2745     0.2176 0.1809
Constrained
            CSCsentiment   0.2357    0.2527   0.1341 0.2256 0.1087 0.1417 0.2141 0.2036               0.1436   0.2044     0.2064 0.1007
Single-view
            CSCaccount     0.2628    0.3525   0.2270 0.3108 0.3179 0.2868 0.3236 0.1160               0.1376   0.2520     0.2088 0.1133
             CMIRCts       0.2812    0.3240   0.1746 0.2977 0.2065 0.1879 0.4183 0.2761               0.2781   0.2971     0.2156 0.1903
Constrained
            CMIRCsa        0.2988    0.4559   0.3050 0.4386 0.3435 0.2917 0.2850 0.2630               0.2439   0.2466     0.2231 0.1993
 Two-view
             CMIRCta       0.3555    0.4230   0.3220 0.4066 0.3330 0.2987 0.3908 0.2952               0.2879   0.3419     0.2578 0.2474

   For another parameter 𝛼𝑗 , it is set to make all single views        Moreover, By comparing constrained clustering approaches
have balanced contributions to the final clustering results. We         with single-view, two-view, and three-view, we observe that
compute 𝛼𝑗 based on the average 2 -norm distance, i.e., 𝑑𝑖𝑠𝑗 ,          the performance gets better when more views are involved. It
                                                                        shows that the three views including topic, sentiment and
between a user and all other users in view j. 𝛼𝑗 is negatively
                                                                        popularity views are all necessary to identify influence roles.
related to 𝑑𝑖𝑠𝑗 . That is,                                              At last, in three single-view constrained K-means clustering
                                     𝑗      𝑗
             𝑑𝑖𝑠𝑗 = ∑𝑛𝑖=1 ∑𝑛𝑘=𝑖+1‖𝑋𝑖 − 𝑋𝑘 ‖2                            approaches, it is difficult to distinguish which view is better.
  and                                                       (8)         However, when compare three two-view constrained
        𝛼1 𝑑𝑖𝑠1 = 𝛼2 𝑑𝑖𝑠2 = 𝛼3 𝑑𝑖𝑠3 , 𝑠. 𝑡. ∑3𝑗=1 𝛼𝑗 = 1                clustering approaches, we find that the combination of the
   This gives us (0.177, 0.621, 0.202), (0.086, 0.588, 0.326),          topic view and the popularity view performs the best,
(0.093, 0.604, 0.303) and (0.180, 0.618, 0.202) for the topics          followed by the combination of the sentiment view and the
“iPhone”, “Samsung Galaxy, “Xbox” and “PlayStation”. The                popularity view. The importance of user’s popularity in
parameters 𝛼 for CMIRC on two-view clustering are set                   identifying influence roles is clear. Meanwhile, topic view
analogously. The parameters used in MKC and NMFMC to                    and sentiment view are still important and necessary to
balance the relative weights among different views are also             supplement the popularity view.
turned for their best performance. The constraints in all the              To provide a more intuitive understanding of what are the
constrained approaches are used in the same way. We assign              users with each influence role look like, we provide the
the labels of constrained users as the roles of the                     cluster centers to illustrate the characteristics of each role in
corresponding clusters for all constrained clustering                   each view in Tables 4 to 6. We present the five most
approaches. For BKC, MKC and NMFMC, we choose the                       representative (popular) words used by the users in each role
assignment that maximizes the MF as the mapping of the                  and the ratio of average positive score and positive score of
clusters to the influence roles. We repeat the experiments for          all the users belong to the same role. The ratio is bigger if in
all the approaches 10 times using random initialization and             general people are more positive. We also give the average
present their average performance in the Table 3. The                   numbers of followers and followees, and the percentage of
performance of the proposed CMIRC consistently beat all                 the verified accounts for reference. The general feelings from
others in all three metrics.                                            the topic view analysis are (1) enthusiasts and celebrities tend
   Besides, we note that all multi-view clustering approaches           to share their own experiences and assessments with the
outperform the baseline BKC, and CMIRC beats the CKC. It                words like “buy” and “love”; (2) experts who care more about
demonstrates the power of multi-view clustering approaches              specific aspects like to mention the detailed words such as
and verifies that representing data in different views actually         “charger” and “battery”; (3) the general words like “news”
works for influence role detection. However, comparing                  and “mobile” are often used by information disseminators
different multi-view clustering approaches, CMIRC and even              who pass the latest news to people. From the sentiment view
MIRC without constraints get more accurate results. It proves           analysis, we do observe a significant trend that in general
the rationality of our assumption that each view can only               enthusiasts express more positively while information
represent partial information, and by employing the                     disseminators hold more neutral sentiment. We can also see
insufficient views together, we infer better global clustering          that the popularity of celebrity is pretty high and it alone is
results. Meanwhile, we also see that CMIRC performs better              able to pick out celebrities easily.
than MIRC that lacks of prior knowledge, which proves that
building appropriate constraints to model the different                 4 Conclusion
influence role demands from a company is important.                     In this work, we address the issue of influence role detection.


                                                                   33
     Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015)
     July 27th, 2015 - Buenos Aires, Argentina


We propose a Constrained Multi-view Influence Role                        constraints are used to model the prior information. The
Clustering (CMIRC) approach to partition Twitter users into               results indicate the effectiveness of our proposed approach.
five clusters with three views of features (i.e., topic view,             In the future, we will continue to explore more features to
sentiment view and popularity view). In CMIRC, different                  capture their actual marketing effects on their followers.
cluster numbers are allowed for different views and the
                                                  Table 4: Role characteristics on topic view
                               Enthusiast                     Information Disseminator                               Expert
    iPhone        love, real, gaming, screen, battery     news, apple, charger, battery, selling      news, apple, charger, battery, selling
    Galaxy        win, chance, space, buy, s5 chanlle     fingerprint, android, 5s, tech, launch      fingerprint, android, 5s, tech, launch
     Xbox             play, game, enter, buy, lol            360, ps4, Microsoft, tv, coming            white, china, flaw, security, sales
  PlayStation       Game, play, win, lol, awesome          Xbox, sony,coming, update,release       sales, code, confirm, communiyy, console
                                              Table 5: Role characteristics on sentiment view
                                                 iPhone                                                  Galaxy
                Topic
                                          Information                                           Information
  View                     Enthusiast                     Celebrity      Expert      Enthusiast                Celebrity           Expert
                                          Disseminator                                          Disseminator
             Positive        1.0702         3.34E-05        1.0702       0.0816          2.0      2.33E-08       0.7978            0.9997
Sentiment
             Negative        0.0772         3.74E-05        0.0772       0.1680       3.25E-09    1.58E-08        0.112            0.0003
                                                  Xbox                                                 PlayStation
             Positive        1.1178         3.78E-07        0.8545       0.0871          3.0       1.1426        0.0002            1.1426
sentiment
             Negative        0.083          2.45E-07        0.0898        1.110       7.38E-10     0.0852        0.0004            0.0852
                                               Table 6: Role characteristics on popularity view
                Topic                             iPhone                                                  Galaxy
  View                                  Information                                             Information
                          Enthusiast                     Celebrity      Expert      Enthusiast                  Celebrity          Expert
                                        Disseminator                                            Disseminator
              Follower      1800            1800          129968          1800         3157         2893          63537             2994
Popularity    Followee      857              857           1866            857          997         1032           1538              995
             isVerified      0                0           0.9999            0             0           0           0.9999              0
                                                   Xbox                                                 PlayStation
              Follower      3326            4066          185459          2666         1680         1680         169180             1680
Popularity    Followee      905             1077           1448           968           365          365            738              365
             isVerified      0            1.05E-06           1              0        2.24E-07     2.24E-07           1            2.24E-07

                                                                             Influence in Twitter: The Million Follower Fallacy. In
Acknowledgments                                                              ICWSM 2010, pages 10-17. AAAI Press, 2010.
The work described in this paper was supported by the grants              [Chen et al., 2009] Wei Chen, Yajun Wang, and Siyu Yang.
from the Research Grants Council of Hong Kong (PolyU                         Efficient influence maximization in social networks. In
5202/12E and PolyU 152094/14E) and a grant from the                          SIGKDD 2009, pages 199-208. ACM, 2009.
National Natural Science Foundation of China (61272291).                  [Chen et al., 2014] Chengyao Chen, Dehong Gao, Wenjie
                                                                             Li, Yuexian Hou. "Inferring topic-dependent influence
References                                                                   roles of Twitter users." In SIGIR 2014, pages 1203-1206
[Anagnostopoulos et al., 2008] Anagnostopoulos Aris, Ravi                    ACM, 2014.
   Kumar, and Mohammad Mahdian. Influence and                             [Kempe et al., 2003] Kempe David, Jon Kleinberg, and Éva
   correlation in social networks. In SIGKDD 2008. pages 7-                  Tardos. Maximizing the spread of influence through a so-
                                                                             cial network. In SIGKDD 2003, pages 137-146. ACM,
   15. ACM, 2008.                                                            2003.
[Bickel et al., 2004] Bickel Steffen and Tobias Scheffe.                  [Law et al., 2004] Law Martin HC, Alexander Topchy, and
   Multi-View Clustering. In ICDM 2004, pages 19-26.                         Anil K. Jain. Clustering with Soft and Group Constraints.
   IEEE, 2004.                                                               In Structural, Syntactic, and Statistical Pattern
[Brown and Hayes, 2008] Brown D and Hayes, N. Brown,                         Recognition, pages 662-670.2004.
   Duncan, and Nick Hayes. Influencer Marketing: Who                      [Liu et al., 2013] Liu Jialu, Wang Chi, Gao Jing and Han
   Really Influences Your Customers? Butterworth-                            Jiawei. Multi-view Clustering via Joint Nonnegative
   Heinemann, Oxford. 2008.                                                  Matrix Factorization. In SIAM 2013 , pages 252-260.2013.
[Cai et al., 2013] Cai Xiao, Feiping Nie, and Heng Huang.                 [Weng et al., 2010] Weng Jianshu, Lim Ee-Peng, Jiang Jing
   Multi-view K-means Clustering on Big Data. In IJCAI                       and He Qi. TwitterRank: Finding Topic-sensitive
   2013, pages 2598-2604. AAAI Press, 2013.
                                                                             Influential Twitterers. In ICWSM 2010, pages 261-270.
[Cha et al., 2010] Cha Meeyoung, Haddadi Hamed, Be-
   nevenuto, Fabrıcio, Gummadi Krishna P. Measuring User                     ACM, 2010.


                                                                     34