Introduction

Social Influence Analysis based on Facial Emotions

Pankaj Mishra

Rafik Hadfi

Takayuki Ito

ito.takayuki@nitech.ac.jp rafikg@itolab.nitech.ac.jp 0 0 Department of Computer Science and Engineering Nagoya Institute of Technology , Gokiso, Showa-ku, Nagoya, 466-8555 Japan

2016

15 25

In this paper, we propose a method to analyse the social correlation among the group of people in any small gathering; such as business meetings, group discussion, etc.; Within such networks, correlation is build based on the tracked facial emotions of all the individuals in the network. The facial emotional feature extraction is based on active appearance model; whereas the approach for emotion detection lies in the dynamic and probabilistic framework of deep belief networks. Combining active appearance model with deep belief networks for emotion recognition gives higher recognition performance level compared with other methods. The analysis of change in facial emotions of all the individuals in the group help us to understand the hidden correlation among them, which is not observable with the naked eyes. Finally, we evaluate the system by comparing the results with ground truth of a scripted discussion. Also, the results obtained by our system effectively reflects the emotion propagation in the scripted discussion.

Introduction

In the recent years, studying the social relationship among the group of people has become an interesting topic. However, analysing the social relationship among the individuals, in the social network is the challenging task; wherein, multiple people need to be tracked in the real work environment. In any workgroup, people have the tendency to come together and form the virtual groups and then the actions of the influential person are propagated to others in this virtual group. The goal of this work is to analyse the ways in which these group members induce their actions (emotion, speech, etc.) to other members. The analysis of social behaviour propagation in a social network is the integration of social theories with computational methods. For instance, [ 3 ] proposed a system to analyse the social interactions in real working environments by tracking the human location and pose of body. Similarly in the work [ 18 ], analysis is based on tracking speaking and writing patterns. Moreover, the propagation of influence in huge social networks, like Twitter or Facebook, are extensively studied; wherein, online activities of the users are tracked, as discussed in work [ 1 ]. From the existing works, it Copyright c 2016 for the individual papers by the papers’ authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. can be said that, the analysis of social correlation focuses only on tracking pose, user activities, etc. However, very less attempt were made to analyse the social correlations, on the basis of the facial emotions.

In this paper, we propose a novel method to analyse the social correlation among a group of people, which is based on the facial emotion of all the individuals. Since the human’s facial emotion in any discussion can give the clearer insight of social relationship between any two individuals; so our algorithm should discover the clearer relationship among all the individuals. Also, in any small gatherings, the groupings are done on the basis of concepts like homophily, cofounding or influence. Homophily and cofounding concepts; are based on the background information of people, such as profession, knowledge of the topic of discussion, age, etc. However, the analysis of influence is purely based on tracking the actions and the corresponding reactions among the participants in the network; action or reaction could be a pose, dialogue, emotion, etc. We focus on tracking the emotion and communication sequence of all the participants, to analyse the influential correlation among them. The emotion detection is implemented using the deformable model Active Appearance Model (AAM) [ 4 ] and the deep learning algorithm: Deep Belief Network (DBN). Whereas, social correlation was built based on the proposed algorithm discussed in the later sections. Briefly, we propose an algorithm to discover the knowledge of the influential correlation among all the participants in the network; wherein we used AAM and DBN for emotion detection and social relationships methods for emotion detection. This work could find many applications; such as, negotiation, viral marketing, surveys, etc.

We have two main contribution: Firstly, emotion diffusion parameter was introduced, which defines the strength of the emotion induction between any two nodes for all instance of time. Another contribution is the definition of influential correlation in terms of weights of weighted graph, which is associated to the orientation of influence (’+’ or ’-’). The sign ’-’ or ’+’ represents the direction of the induced emotion. Apart from this, the frontal video dataset of a scripted discussion, which was used in this paper as a testing dataset, can also be adopted to build a similar baseline system. The rest of the paper is organised as following. Section 2 provides an overview of the methodology of our system along with the data pre-processing, the facial emotion detection, and the social influential correlation building. Section 3 presents the experimental results that validate our proposed algorithm. Finally, we conclude and provide the future work in section 4. 2

Methodology

The overall architecture of our system is divided into two main modules: the facial emotion detection (facial feature extraction and facial emotion recognition) and the social influence detection. As mentioned before, the algorithm for building the social correlation is based on the facial emotion of the individuals. So, in this system, video data is of a business meeting or a group discussion is fed, which in turn gives the hidden correlation among the individuals. Also, in order validate the algorithm, change in one’s emotion with respect to other participants should be reflected. The final social influential correlation among the individual is represented in the form of weighed graph; wherein, weights denotes the strength of the influence and sign denotes the orientation of the influence. Firstly, the frontal video data of the discussion, needs to be pre-processed before examining the correlations. The pre-processing includes framing of the video, locating and cropping the frontal image in all the frames. Tracking the face frontal was realised using the Viola-Jones algorithm [ 10 ]. These, processed frontal frames for all the participants along with the communication sequence of all the participants in each frame is recorded; where the communication sequence means, track whether the individual is listener or speaker in a particular frame, which is based on the metadata of the discussion. In the later subsections, we would discuss the two main modules of our proposed system; the facial emotion detection and the social influence detection. 2.1

Facial Feature Extraction

In this section, the whole process to extract the facial features of all the individuals, is being discussed. Then, in the next subsection, we will discuss about the emotion recognition. Firstly, we locate the 68 facial feature points in the frontal frames; feature points are nose, eyes, mouth and eyebrows. Then, around these detected facial points, we extract the shape and appearance feature vectors. The shape feature vector denotes the co-ordinates of all the facial features, whereas appearance features can be Gabor descriptor, Local Binary Pattern (LBP) descriptor, etc as discussed in [ 8 ]. However, most of the works in past, [ 4, 2, 12 ], were based on the deformable model, named Active appearance model (AAM). AAM is a computer vision algorithm to track the facial feature points on human face. Additionally, AAM provides a compact statistical representation of the shape and the appearance variation of the human face. So, a trained AAM was used, to align the human face, and later to extract the shape and the appearance features. The implementation of AAM was based on the Menpo [ 13 ] python libraries, and was trained using the two publicly available database, FERA database [ 17 ] and LFPW database [ 14 ]. Further, this trained AAM can be used to track the face, and to extract the three types of holistic features around the 68 tracked landmark points. These three holistic features are, similarity normalised shapes (s-pts), similarity normalised appearance (s-app) and canonical appearance (c-app) [ 2 ]; where, s-pts being the shape feature vector and s-aap and c-aap are the appearance feature vector; appearance feature vectors are scale-invariant feature transform (SIFT) [ 11 ] descriptors. Further, in order to reduce the AAM fitting iterations, the landmarks of the current frame were initialised with the landmarks of the previous frame.

Facial Emotion Recognition The feature vectors from AAM tracking, is used for training a classifier, to classify the frames as per the carrying emotions in each frames. Basically, an emotion is recognised by the presence of one or more specific Facial Action Units (FACS) [ 6 ]; the combination of these FACS defines one of the 7 basic emotions namely, happy, sad, disgust, anger, contempt, surprise and fear. The DBN implementation was similar to the models discussed in [ 9, 16 ], except the number of the input and output of DBN. The DBN was implemented using the Theano python libraries, we adapted the Theano deep belief network code for MINST classification, to construct our Deep belief network. Later, DBN was trained with the CK+ dataset [ 12 ]; where DBN takes combined feature vectors as input and one of the 7 emotions as the output. At the end, if there is any abrupt emotion change or missing emotions, this may be because of wrong prediction. So, we replace that emotion with the most recent emotion; considering both the past and future frames of the individual, thus increasing the recognition accuracy. Now, this emotion labels associated with all the frames is used for correlation building as discussed in the next subsections. 2.2

Social Influence

The social correlation among the connected group of people, can be explained mainly by three phenomenon; homophily, cofounding and social influence. The concepts of homophily and cofounding are more related to the tendency of an individual to follow the other individual, that have similar characteristics, profession, etc. However, analysing the social influence is the tedious task, wherein, the actions of the individual can induce the similar action on the most related individuals, such as: friends, collogue, etc. In our proposed methodology, we track two different actions for influence analysis; namely, communication sequence (whether the individual is speaking or listening) and facial emotions. However, our correlation building algorithm is mainly based on the tracked emotions. Whereas, the communication sequence (C) is used as a signal of interaction initiation. Because, the interaction initiating individual is are more significant than the non initiator, for tracking the correlation. The proposed algorithm for correlation analysis is based on the two data, namely, both the tracked actions (E & C) and background information (I) of all the individuals (nodes).

The information database I consist of details; such as age, gender, profession, seniority, etc., for all the individuals in the network. As mentioned before influence is build among the most related individuals or the individuals having the higher affinity among them. In order to showcase the affinity (Aij ) between any two individual i and j, we label all the individuals based on the their importance, a.k.a. node centrality. The node centrality ( ) is calculated using the details in the information database (I) of the network. Although, there are many different methods to calculate the value of in a network, as discussed in [ 15 ]; in our considered domain, mentioned methods do not suffice our purpose to rank the individuals. So, we decide for all the individuals on the basis profiling and clustering the nodes based on the information in I. The intuition behind the of an individual is, if one of the individual has the superior knowledge in the topic of discussion, or is the senior member, then this individuals‘s opinion will be most likely choice of the others. Also, if some individuals belong to same organisation or are colleagues, or have some family ties, than they tend to form implicit virtual groups.

Basically, reflects the co-founding and homophily in the considered network; further the could be set analysing the network‘s background information or in case of larger networks, clustering of profiles or algorithms similar to customer profiling [ 5 ] can be used. The whole method of social influence detection can be divided in two basic steps (i) Emotion Propagation Tracking and (ii) Social influence Analysis, as discussed in the next subsections.

Emotion Propagation Tracking The emotion propagation tracking algorithm is based on the recorded emotions of all the individual per second and node centrality ( ) for all the individuals. In our work, we consider the emotion induction by tracking the communication sequence of all the individuals. That is, we track the emotion change of all the individuals, when one of the individual is active (speaking) or inactive (listening). Apart from this, we track the emotion of all the individual irrespective of active or inactive individuals. The first scenario is; action induces emotion (a ! e) and the other scenario is, emotion induces emotion (e ! e). The a ! e correlation is said to have occurred, if the action of an node has influenced the emotion of the other nodes. Whereas, e ! e correlation is said have occurred when the emotion of an node influences the emotions of the other nodes. The diffusion of emotions in the scenarios can be represented by an emotion diffusion parameter ( E), calculated using the equation 1,

E = !e = e

!e T

te N e (1) (2) where e is the time interval of emotion e, and !e is an node’s emotion coefficient of e. T is the total frame considered for emotion diffusion calculation E. The !e is calculated using equation 2, which is pre-calculated for all the nodes for every emotion. The te is the total time instance for which node has an emotion e and Ne is the total number of time instance when all the nodes has emotion e.

Other than this, we define the direction of the emotion diffusion by , which can be 1 or -1. The for five emotion neutral, happy, surprise, sad and angry are +1,+1,+1,1 and -1 respectively. These values are based on the discussion in [ 7 ]. The emotion diffusion in both the scenarios is calculated using the algorithms 1 and 2. Let us first discuss the algorithm 1 for a ! e scenario, which accepts the list of passive (agent with action: emotion or no emotion) agents for every active (agent with action: speech or both) nodes, and gives E1 and 1 for all combinations of passive and active nodes as output.

In the equation on line 8, we calculate the E1e values for each emotion e carried by the node p for time e, during the interval Ta, for which the node a was active. Then, the final E1 value is the maximum value in set E1e . Thus finding that emotion e was induced by an node a on node p. Later final 1 is calculated by multiplying the of a and e, which gives the direction of the emotion induction. Similarly, in the algorithm 2, emotion diffusion E2 and 2 are calculated for all the pairs of nodes, irrespective of the active node, where T=1. At the end of this step, we get four values, E1 and

E2, denotes the emotion diffusion parameters and 1 and 2, denotes the direction of emotion diffusion values for all the pairs of nodes. Later, we build the influential correlation among the nodes, as discussed in the next subsection.

Social influence Analysis In the algorithms 1 and 2, we get the emotion diffusions

E1, E2, 1 and 2 for both the scenarios. Then, we calculate the influence of each node on the other nodes, based on the algorithm 3. The objective is to find the social influence amongst all the agents in the network, in terms of weight of the weighted 9: 10: 11: 12: 13:

End Algorithm 1 Emotion Propagation (a ! e) 1: procedure EMOTION DIFFUSION 2: Input: a: Active node 3: P: List of passive node 4: Output: E1 and 1 for all passive node 5: // where is 1 or 1 6: for all p enumerate(P) do 7: for all e Emotion E do 8: f E1e apg eapTa!ep // Maximum of E1e is final E1 // e is emotion having maximum for E1e

E1ap max(f E1e apg) 1ap a ep Algorithm 2 Emotion Propagation (e ! e) 1: procedure EMOTION DIFFUSION 2: Input: A: List of all node 3: B: List of all node 4: Output: E2 and 2 for all pair of nodes 5: (where is 1 or 1 ) 6: for all a enumerate(A) do 7: for all b enumerate(B) do 8: for all e Emotion E do 9: f E2e abg eabT !eb 10: (where T is 1 ) 11: (Maximum of E2e is final E2) 12: ( e is emotion having maximum for E2e ) 13: E2ab max(f E2e abg) 14: 2ab a eb 15: End graph. The input to this algorithm is list of E1, E2, 1 and 2 for all the pairs of nodes. Based on these values, the weight Wij between agent i and j is calculated using the equation on line 7. Where, summation of E1 and E2 for i and j is divided by summation of E1 for all the neighbouring nodes of i and j. Finally the orientation of the influence is associated with the Wij , by multiplying ij . Thus, as a output we get a weighted graph, wherein weight represents the influence and sign ’-’ and ’+’ represents the orientation of the influence being induced. 3

Experimental Results and Discussion

In this section we describe the evaluation of our baseline system, which analyses the social influence in the network based on analysing the facial expressions of all the participants in the network. In order to test our proposed algorithm, we need a input frontal

Algorithm 3 Influence Calculation 1: procedure INFLUENCE 2: Input: E1 and E2 for each pair of nodes 3: 1 and 2 for each pair of nodes 4: Output: W weight for each pair of nodes 5: for each pair of node i and j do 6: ij 1ij 2ij 7: Wij PEkN1(ijj)+ EE12kijj ij 8: // N (j) neighbours of node j 9:

End video data of small network to detect the emotions. However, such dataset was not available, therefore we recorded a scripted discussion of five participants through video conferencing, let us say them as participants A, B, C, D and E. In the scripted discussion the role of each node is defined, also action-reaction of each pair of node for each time stamp are pre defined to build the ground truth of the data. Additionally, the video data should be able to showcase the influence caused by emotion and action (speaking). Therefore, we structured the script where each participants does the action (speaks) for approximately 90 seconds in a sequence, yielding a total of 450 seconds video. Further, assuming that the participants A, B and C belongs the same organisation and D and E belongs to other organisation. Also, A and D are seniors amongst their group and B has the higher knowledge on topic of discussion. Based on these assumptions of seniority, knowledge about the topic, etc, the A, B, C , D and E were set as 2, 5, 4, 5 and 3 respectively. Similarly, the calculation of for any larger network can be done by comparing the basic information of all the participants in the network. In the later subsections, we will discuss the results and then validate our results. 3.1

Results

The input video of length 450 seconds is first converted into frames and other preprocessing steps are done. From the trained AAM, we extract the three types of feature vectors from the frontal frame. Whereas, the frontal of the frame is detected by the Voila Jones algorithm [ 10 ]. The approximate dimensions of the extracted feature vectors; s-pts, s-aap and c-aap are 136 (x&y coordinates of 68 landmark points), 27000 and 27000 (approximately) respectively. Then, these feature vectors are used to detect the carrying emotion by each frame, by classifying it with a trained DBN classifier. In our baseline system, we only considered 5 emotions including the neutral emotion; namely, neutral, happy, sad, surprise and angry. After classification, we get the emotion label associated with all the frames; table 1 list the count of emotion labels detected per seconds for all the five persons in the discussion; the accuracy of the emotion detection was found to be 82% for all the five emotions. So if any irregularity, sudden change or missing emotions found in the emotion labels of the frame, then we set the emotion of the neighbour frames. Further, the social influential correlation was done on the basis of these change in emotion labels for every second, considering both the scenarios.

The calculated values of E1 and E2 are listed in the table 3 and 4. Using these values we calculate the influence in network, based on the algorithm 3. Finally, the result is represented in the form of a weighted graph, the weights are listed in table 2. The discussion about the results and validation of our work is discussed in the next subsection.

Node Neutral Happy Surprise Angry Sad Node

E In order to validate our methodology, we compared our result with the ground truth of the fed data; i.e.,emotion propagation (change in emotions) in the fed video. In our work, influence was calculated considering the two environment, namely, a ! e and e ! e. Figures 1 depicts the emotion propagation in a ! e environment and e ! e. Both the scenarios figures depicts the change in emotion of each node A, B,C,D and E represented in different colours with time. Where, y-axis denotes the detected 5 emotion labels normalised along the axis with different emotion and x-axis denotes time in seconds. Let us first consider emotion propagation in a ! e environment depicted in the figures 1a to 1e. From these figures, it can be said that change in emotion of B induces an positive emotion change on C more often. Additionally B induces negative emotion change on A, i.e., A is happy when B is sad. Similarly, it can be said that emotions of E is more often influence by emotions of D. Further, in order to avoid decision based on the locally induced emotion on the agents (i.e. when emotion propagation is analysed for the time duration of 90 seconds when our participant was active), we also considered induction of emotion in the e ! e environment. Wherein, overall influence for the whole 450 seconds is considered, irrespective of active agent. The emotion propagation in the figure 1f depicts the e ! e environment. From which, it can be said that, B induces emotion on C and A. Also, induction of emotion in reverse sense is not reciprocated from the graphical figure. Similar correlation is also observed between node D and node E.

Now, from the obtained results in the table 2, which are basically the weights of a weighted graph, it can be said that node B induces node A in the opposite orientation with weight -2.4, whereas node B induces C to a greater extent; that is 3.13. Similar correlation is observed between node D and node E. Node D induces node E with weight 2.5; although, vice versa correlation is not observed in both the cases. Other than this, node B induces node D in negative sense. It can also be said that, if the value of weight is less, that means it has lower induction; then that edge or correlation can be ignored or removed form the network. Thus, it will reduce the complexity of other searching algorithms (if any) in the network. By analysing the obtained results, it can be said that the emotion propagation in the scripted discussion is very well represented in terms of weights of the weighted graph. Therefore, it can also be said that our algorithm proves to be efficient to measure the influence in the network.

(a) Active Node: A (b) Active Node: B (c) Active Node: C (d) Active Node: D (e) Active Node: E

(f) No Active Nodes We introduced a system that can track the emotions diffusion in a social network, which can help in extracting the social influence correlation among the nodes of the network.

Intuitively, the measure of social correlation gives an idea of how the change of emotions of one person affects the emotions of another person. If any particular node in the network brings a high degree of emotional change in the network, it can be said that this node is an influential node in the network. Facial emotions were classified by training a DBN network, on the basis of the facial features extracted using a trained deformed model AAM; then, we build the influential correlation among them using the proposed algorithm. The emotion diffusion parameter were calculated, which was used to build the correlation in terms of weights of weighted graph. The proposed method was evaluated against the scripted discussion video in our laboratory, wherein our predicted correlation were able to reflect the known ground truth in the scripted discussion.

As future work, we think of extending our method to larger real-life networks and incorporating emotions from voice data along with facial emotions. The resultant emotions can depict human correlation in a more accurate fashion. Apart from this, we plan to build a concrete method to calculate node centrality, and building the social association on any anonymous network using social network mining techniques.

Acknowledgement

This work has been partially supported by the project ”Large-scale Consensus Support System based on Agents Technology” in the research area “Intelligent Information Processing Systems Creating Co-Experience Knowledge and Wisdom with HumanMachine Harmonious Collaboration” of JST CREST projects.

1. Anagnostopoulos , A. , Kumar , R. , Mahdian , M. : Influence and correlation in social networks . In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining . pp. 7 - 15 . ACM ( 2008 )

2. Ashraf , A.B. , Lucey , S. , Cohn , J.F. , Chen , T. , Ambadar , Z. , Prkachin , K.M. , Solomon , P.E. : The painful face-pain expression recognition using active appearance models . Image and vision computing 27 ( 12 ), 1788 - 1796 ( 2009 )

3. Chen , C.W. , Ugarte , R.C. , Wu , C. , Aghajan , H.: Discovering social interactions in real work environments . In: Automatic Face & Gesture Recognition and Workshops (FG 2011 ), 2011 IEEE International Conference on. pp. 933 - 938 . IEEE ( 2011 )

4. Cootes , T.F. , Edwards , G.J. , Taylor , C.J.: Active appearance models . IEEE Transactions on Pattern Analysis & Machine Intelligence (6) , 681 - 685 ( 2001 )

5. Ding , C.S. : Profile analysis: Multidimensional scaling approach . Practical Assessment, Research & Evaluation 7 ( 16 ), n16 ( 2001 )

6. Ekman , P. , Rosenberg , E.L. : What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS) . Oxford University Press ( 1997 )

7. Ghamen , K. , Caplier , A. : Positive and negative expressions classification using the belief theory . International Journal of Tomography & Statistics 17 ( S11 ), 72 - 87 ( 2011 )

8. Happy , S. , Routray , A. : Robust facial expression classification using shape and appearance features . In: Advances in Pattern Recognition (ICAPR) , 2015 Eighth International Conference on. pp. 1 - 5 . IEEE ( 2015 )

9. Hinton , G.E. , Osindero , S. , Teh , Y.W.: A fast learning algorithm for deep belief nets . Neural computation 18(7) , 1527 - 1554 ( 2006 )

10. Jensen , O.H. : Implementing the viola-jones face detection algorithm ( 2008 )

11. Lowe , D.G. : Object recognition from local scale-invariant features . In: Computer vision , 1999 . The proceedings of the seventh IEEE international conference on. vol. 2 , pp. 1150 - 1157 . Ieee ( 1999 )

12. Lucey , P. , Cohn , J.F. , Kanade , T. , Saragih , J. , Ambadar , Z. , Matthews , I. : The extended cohnkanade dataset (ck+): A complete dataset for action unit and emotion-specified expression . In: Computer Vision and Pattern Recognition Workshops (CVPRW) , 2010 IEEE Computer Society Conference on. pp. 94 - 101 . IEEE ( 2010 )

13. Alabort-i Medina , J., Antonakos , E. , Booth , J. , Snape , P. , Zafeiriou , S. : Menpo: A comprehensive platform for parametric image alignment and visual deformable models . In: Proceedings of the ACM International Conference on Multimedia . pp. 679 - 682 . MM '14, ACM , New York, NY, USA ( 2014 ), http://doi.acm. org/10 .1145/2647868.2654890

14. Sagonas , C. , Tzimiropoulos , G. , Zafeiriou , S. , Pantic , M.: 300 faces in-the-wild challenge: The first facial landmark localization challenge . In: Computer Vision Workshops (ICCVW) , 2013 IEEE International Conference on. pp. 397 - 403 . IEEE ( 2013 )

15. Sun , J. , Tang , J.: A survey of models and algorithms for social influence analysis . In: Social network data analytics , pp. 177 - 214 . Springer ( 2011 )

16. Terusaki , K. , Stigliani , V. : Emotion detection using deep belief networks ( 2014 )

17. Valstar , M.F. , Jiang , B. , Mehu , M. , Pantic , M. , Scherer , K. : The first facial expression recognition and analysis challenge . In: Automatic Face & Gesture Recognition and Workshops (FG 2011 ), 2011 IEEE International Conference on. pp. 921 - 926 . IEEE ( 2011 )

18. Zhang , D.: Probabilistic graphical models for human interaction analysis . Tech. rep. , IDIAP ( 2006 )