Simulating Pairwise Communication for Studying Its Impact on Community Public Opinion Grygoriy Zholtkevych1 , Olena Muradyan2 , Kostiantyn Ohulchanskyi1 , and Sofiia Shelest1 1 School of Math and Comp. Sci, V.N. Karazin Kharkiv National University, 4 Svobody Sqr, Kharkiv, 61022, Ukraine; http://math.univer.kharkov.ua g.zholtkevych@karazin.ua; spaceiseternal,soniashelest@gmail.com; 2 School of Sociology, V.N. Karazin Kharkiv National University, 4 Svobody Sqr, Kharkiv, 61022, Ukraine; http://sociology.karazin.ua o.s.muradyan@karazin.ua Abstract. Communication on the fly made possible by modern informa- tion and communication technology is a characteristic feature of modern society. This style of communication significantly changed social life and provided new ways of public opinion formation. New social practice poses new challenges for specialists studying social phenomena. One of these problems is the problem of the reliability of public opinion measure- ments, which are always based on indirect assessments. Unfortunately, indirect assessments depend essentially on the suggestions accepted dur- ing their development. The marked above changes raised by the tech- nological progress require the new mathematical suggestions lied in the base of public opinion measurements. This paper is to draw attention to this situation and to begin the movement toward the rigorous theory of public opinion measurements basing on social phenomena mathematical models of adequate to features of modern communication processes. It seems that authors’ first results are consistent to hypotheses of a number of sociologists working in this area. Keywords: public opinion· behaviour· microstate· macrostate· commu- nication rate· indirect measurement· simulation 1 Introduction The modern information and communication technology (ICT) has essentially changed communication processes between communities and their members, gen- erated new social phenomena. Social networks are of these phenomena. Their emergence and rapid development have markedly changed the character of com- munication streams. In this context, an important point is the effect of a combination of social net- works and personal telecommunication microelectronic devices (tablets, smart- phones). This effect significantly speeds up communication, reduces in time and makes it discrete and as non-linear as possible. The cognitive and motivational results of such communication are still unknown to sociologists. Nevertheless, it can be argued that such discrete and nonlinear communication significantly diversifies the sources of potential influence on the opinion formation of the of communication participants, and at the same time increases the frequency of such impact, potentially having a different result. Public opinion formation becomes the process difficult for predicting because we need to take into ac- count a significant number of factors than before. At the same time, the effect of these factors due to the specificity of the new segment of the communicative space (nonlinearity, discreteness and high communication rate) is characterised by more and more complex interconnections. The reliable sociological information about public opinion is especially in demand in modern society. The reason for this is not to reinforce the need to effectively predict changes in public opinion (this need has traditionally been high since the fifties of the twentieth century). The reason is associated with an increase in such threats to social stability and security as information terrorism, manipulative technologies affecting society, and the active use of fake-news in the political sphere. Today public opinion needs not only to predict but also to protect against the above threats. This, in turn, requires the development of the technique of sociological measurement based on the dependable quantitative theory. This necessity is also caused by the high cost of the evidential public opinion measurements. Improving public opinion measurements by the way of increasing the frequency of spot measurements as to record in time all possible deviations and shifts in public opinion seems to be of little prospect due to the high cost of such measurements and their technical complexity. Based on the foregoing, it can be said that developing a theory of measur- ing public opinion to substantiate and improve polling tools aimed at making it possible to grasp the peculiarities of the formation of public opinion in mod- ern conditions is a relevant challenge not only for social science but also for mathematics and computer science. We believe that simulation of the public opinion formation within the modern information and communication environment is the origin point for this theory. This paper is our attempt to attract the attention of researchers in the fields of ICT, mathematics and sociology to this challenge. It is needed to stress that the idea to use mathematical modelling as a tool for studying social processes is not novel. In the middle of the twentieth century, the use of relational and statistical models for understanding social processes was proposed by a number of scientists (N. Rashevsky [13], A. Rapoport [12] and other). The idea was further developed with the inception of network science (see [1]). Today there are a lot of works devoted to simulation of social network and studying social dynamics based on the corresponding methods. But we could not find a paper that would present a simulation model for studying a public opinion measurement. This is what caused our research. 2 Model of Communication in Network In this paper, we consider the special class of homogeneous multicomponent discrete-time dynamical systems, whose components interact only pairwise via a network of channels. The components of such a system are entities for modelling members of the community being studied, and channels are entities for modelling stable pairwise communications between the members of this community. 2.1 Modelling Assumptions The goal of this subsection is to formulate explicitly the modelling assumptions. Assumption 1. A network being simulated is a multicomponent dynamical system of the discrete time with a constant set of components called members. Some pairs of components communicate stable and, in this case, we say that there is an information channel between members of such a pair. Assumption 2. The property of an information channel to be active is a ran- dom Boolean variable. Moreover, for different channels, the corresponding vari- ables are independent. Assumption 3. Each system component estimates the claim being in the focus of community interests using an element of the set C = {RED, GREEN, BLUE} corresponding to either negative, or neutral, or positive estimation respectively. The mapping associating a claim estimation with each system components is below called a microstate of the system. Assumption 4. The personal estimations of a member of a community (a sys- tem microstate) is not interest and is considered as not available for the direct observation. Only the occupacy measure of elements of C is available for the direct observation. The value of this measure for an element of C is a ratio be- tween the number of community members that have the corresponding opinion and the total number of community members. This measure is considered below as a macrostate of the system. Assumption 5. At each time-point, a member of the community participates at most in one communication. 2.2 Specification of Network Model Taking into account Assumption 1, an undirected simple finite graph G = (N, E) is the most natural mathematical structure for modelling pairwise communica- tions in a community. The node set N of the graph models members of a com- munity, and the edge set E of the graph models stable information channels between ones. Assumption 2 can be ensured with associating a random Boolean variable activatede with each edge e ∈ E. To do this it is sufficient to specify a function activation rate : E → [0, 1] and to think about its value activation rate(e) as about the activation probability of the channel corresponding the edge e. In other words, we set activation rate(e) = Pr(activatede = true). Thus, we come to the following definition. Definition 1. A network model is a triple hN, E, activation ratei where N and E are respectively the sets of nodes and edges of some undirected simple finite graph G = (N, E) and activation rate : E → [0, 1] is the channel activation rate function. Assumption 3 causes the following definition of a microstate and micro- dynamics for the system class being studied. Definition 2. A node colouring of the graph G in accordance with the colour set C is a microstate of the system. Thereby, the system micro-dynamics is a discrete-time stochastic process explain- ing the observed sequences of system microstates. Assumption 4 leads us toward the concepts of a macrostate and macro- dynamics. Definition 3. Let c : N → C be a microstate of the system then a function c : C → [0, 1] is the macrostate corresponding to c if for each x ∈ C, it is defined as follows3 1 X c(x) = · [c(n) = x] . |N| n∈N Thereby, the system macro-dynamics is a discrete-time stochastic process ex- plaining the observed sequences of system macrostates. 2.3 Simulation Framework Concept Based on the above assumptions and definitions, a prototype framework has de- veloped for simulation of the community dynamics with various kinds of pairwise communications. The general specification of a simulation process is presented as a UML activity diagram in Fig. 1. For the realisation of this general specifi- cation, the language Python 3 [11] and library NetworkX 2.2 [9] have been used. To construct a framework providing the presented simulation process we propose the conceptual model shown as a UML class diagram in Fig. 2. This model based on an undirected simple graph whose nodes are instances of the class Node and the edges are instances of the class Edge. The attribute estimation of the class Node is intended for saving the current value of a microstate for the corresponding node. The association state gives access to the internal description of a node state. This description is abstract on the framework level. Similarly, the attribute activation rate of the class Edge is intended for saving the value activation rate(e) for the Edge-instance that models edge e. 3 In the formula, the Iverson bracket is used (see, for example, [5]). The value of [c(n) = x] equals 1 if c(n) = x , and otherwise it equals 0 . [simulation has complete] Set nodes and Build and save Build network edges parameters initial microstate [otherwise] Choose Perform Renew communicated pairs communication protocol microstate Save macrostate Fig. 1. General specification of a simulation process «enumeration» Colour RED GREEN BLUE state Node 1 State 2 setColour(colour: Colour) getColour() : Coulor incidence estimate(): Colour 1..* Edge activation_rate: Real weight(colours: Colour[2]): Real[3] Constraints: protocol 0 < activation_rate <= 1 forall(c',c'': Colour | 1 weight(c',c'')[0] >= 0 and Protocol weight(c',c'')[1] >= 0 and communicate(states: State[2]): State[2] weight(c',c'')[2] >= 0 and weight(c',c'')[0] + weight(c',c'')[1] + weight(c',c'')[2] = 1) Fig. 2. Conceptual model of the simulated net 2.4 Pairwise Communication Model Above we were focused on modelling the structure of a network, and in this subsection, we pass to modelling the interaction (or the communication, in the case of social network) between nodes of the network. The association between instances of the class Edge and abstract entities classified as Protocol is foreseen for providing the specification of such interaction (see Fig. 2). Taking into account that Assumption 5 is accepted we need some method to form the set of interacting pairs of nodes. We propose to use the method specified by Algorithm 1. Algorithm 1: Method for forming communicating pairs Data: a simple undirected graph G = (N, E) Result: the subset SELECTED of E representing communicating pairs /* initialise the target set and auxiliary sets */ 1 SELECTED := ∅; AVAILABLE := ∅; FORBIDDEN := ∅; /* activate communication channels */ 2 foreach e ∈ E do 3 choose randomly True or False with probabilities a(e) and 1 − a(e) respectively; 4 if True is selected then 5 add e into AVAILABLE 6 else 7 add e into FORBIDDEN 8 end 9 end /* form communicating pairs */ 10 while AVAILABLE 6= ∅ do 11 choose randomly an element e ∈ AVAILABLE in accordance with uniform distribution on AVAILABLE; 12 delete e from AVAILABLE; 13 if for some e0 ∈ SELECTED, e and e0 are incindent then 14 add e into FORBIDDEN 15 else 16 add e into SELECTED 17 end 18 end The following proposition establishes properties of the method. Proposition 1. The method presented by Algorithm 1 has properties 1. a computation with respect to Algorithm 1 is halted for any input data after a finite number of steps; 2. after halting a computation with respect to Algorithm 1, sets SELECTED and FORBIDDEN are disjoint; 3. after halting a computation with respect to Algorithm 1, set SELECTED does not contain incident edges; 4. adding to the set SELECTED an edge added to the set FORBIDDEN in loop 10–18 violates the property claimed in item 3. Proof. The first item of the proposition is true because of the set AVAILABLE decreases (see, line 12 of Algorithm 1) after each iteration of loop 10–18. The validity of the second item of the proposition is ensured by branching 13–17. The validity of the third item of the proposition is ensured by line 14. The validity of the fourth item of the proposition is ensured by branching 13– 17. t u We suggest that any communication protocol can be represented by the UML sequence diagram as in Fig. 3. theEdge:Edge theEdge.incidence[0]:Node theEdge.incidence[1]:Node states[1] = getState() states[0] = getState() communicate(states) newStates setState(newState[0]) setState(newState[1]) estimate(newStates[0]) estimate(newStates[1]) estimation estimation Fig. 3. Model of a pairwise communication protocol Finally, the method estimate(state: State) of the abstract entity State (see Fig. 2) is intended to renew the current microstate. 3 Computational Case Studies In this section, we present and discuss the results of simulation for four kinds of systems: models A-IR and B-IR, which called below as models of components with an instant response, and models A-LR and B-LR, which called below as models of components with a lazy response. 3.1 Realisation of the Method communicate(. . . ) The above classification of the models being studied is based on the general scheme of the interaction process modelled by the method communicate(. . . ) of the abstract entity Protocol. We assume that the communication corresponding to edge e ∈ E is modelled by the weight function we on C×C and taking random value we (colour0 , colour00 ) in the following outcome set {nobody, first, second}. The outcome is interpreted as follows – we (colour0 , colour00 ) = nobody means that participants of the communica- tion preserve their opinions; – we (colour0 , colour00 ) = first means that the first participant of the commu- nication preserves his opinion, but the second one does not preserve; – we (colour0 , colour00 ) = second means that the second participant of the com- munication preserves his opinion, but the first one does not preserve. Based on this assumption, we propose to use the following abstraction spec- ified by Algorithm 2. Algorithm 2: The scheme of the method communicate(. . . ) Data: an edge e ∈ E, the weight function we corresponding e Result: the pair of new node states (newFirstState, newSecondState) 1 firstState := e.incidence[0]; 2 secondState := e.incidence[1]; 3 choose randomly outcome from {nobody, first, second} in accordance with the distribution we (firstState.getColour(), secondState.getColour()); 4 if outcome = nobody then 5 newFirstState = firstState; 6 newSecondState = secondState 7 else if outcome = first then 8 newFirstState = firstState; 9 create newSecondState in accordance with a concrete algorithm 10 else /* outcome = second */ 11 create newFirstState in accordance with a concrete algorithm; 12 newSecondState = secondState 13 end 14 return (newFirstState, newSecondState) Remark 1. Note that everywhere below we use the weight function defined as follows 1. we (c, c) = {nobody = 1.0, first = 0.0, second = 0.0} for any c ∈ C ; 2. we (c0 , c00 ) = we (c00 , c0 ) for all c0 , c00 ∈ C ; 3. we (GREEN, c)[nobody] = 0.0 , we (GREEN, c)[first] = 0.1 , and we (GREEN, c)[second] = 0.45 for any c ∈ {first, second} ; 4. we (RED, BLUE)[nobody] = 0.1 , we (RED, BLUE)[first] = 0.45 , and we (RED, BLUE)[second] = 0.45 . 3.2 Systems of Components with Instant Response The model of a system of components with instant response (below IR-model) is based on the following model of a state called by SimpleState (see Fig. 4). «enumeration» Colour RED GREEN BLUE SimpleState colour: Colour estimate(): Colour Constrints: inv: self.estimate() = self.colour State estimate(): Colour Fig. 4. Model of a simple state The IR-model realises items 9 and 11 of Algorithm 2 as follows if outcome = nobody then newFirstState = firstState newSecondState = secondState if outcome = first then newFirstState = firstState newSecondState = firstState if outcome = second then newFirstState = secondState newSecondState = secondState Simulation Experiment for the IR-model. The simulation experiment was carried out at the initial macrostate defined as follows c0 (RED) = 0.1 , c0 (GREEN) = 0.8 , and c0 (BLUE) = 0.1 . The typical simulation results are shown in Fig. 5. 1.0 0.8 0.6 0.4 0.2 0.0 0 20 40 60 80 100 Number of iteration Fig. 5. A typical behaviour of the IR-model Error Estimation for the IR-Model. We assume that the measurement of the system is performed sequentially by observing a fixed number of system components. Thus, the measurement rate depends on the number of observed components in one step. More precisely, our assumption is that the measurement procedure under our study is sequential and represented by the Algorithm 3. We estimate the measurement error by using Kullback-Leibler divergence [7, 6, 3] D(c || c∗ ) where c is the real system macrostate and c∗ is the measured system macrostste at the end of simulation. Remind that Kullback-Leibler divergence D is computed by the formula X c(c) D(c || c∗ ) = c(c) · log2 c∗ (c) c∈C and estimates the minimal information quantity needed to correct an error. As mentioned above, the measurement speed depends on the number of k nodes observed during one simulation cycle. A small value of k corresponds to a slow measurement and a big value of k corresponds to a fast one. In Fig. 6, the dynamics of error estimation for slow (the blue curve with k = 20) and fast (the green curve with k = 250) measurements are presented. Algorithm 3: Measurement procedure Data: a model of a system, a number k of nodes observed per one simulation cycle Result: the measured macrostate c∗ 1 N [RED] = N [GREEN] = N [BLUE] = 0; 2 foreach simulation cycle do 3 choose randomly k nodes from the nodes not chosen yet; 4 increase each N [RED], N [GREEN] and N [BLUE] by the number of nodes from the sample correspondingly coloured 5 end 6 N = N [RED] + N [GREEN] + N [BLUE]; 7 c∗ (RED) = N [RED]/N ; 8 c∗ (GREEN) = N [GREEN]/N ; 9 c∗ (BLUE) = N [BLUE]/N ; 10 return c∗ 1.0 20 100 250 0.8 0.6 0.4 0.2 0.0 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Relative time Fig. 6. Error for the measurement with k = 20, for the measurement with k = 100, and for the measurement with k = 250 Looking in Fig. 6 one can see that the error estimation increases with in- creasing of the measurement rate. This means that there exists perhaps some low bound for the precision of a measurement. 3.3 Systems of Components with Lazy Response The LR-model is based on the following model of a state called by LazyState (see Fig. 7). «enumeration» Colour RED GREEN BLUE LazyState colour: Colour balance: Integer setColour(colour: Colour) getColour() : Coulor estimate(): Colour State Constrints: post: self.colour = self.estimate(balance) setColour(colour: Colour) inv: self.getColour() = self.colour getColour() : Colour inv: self.estimate() = self.colour estimate(): Colour Fig. 7. Model of a lazy state Unlike the previous model, the model considered in this subsection is more in- ertial. This is provided by the method estimate(), which uses the function Pm (x) , and the field balance, which equals the difference between BLUE-arguments and RED-arguments (see Fig. 7). The function Pm (x) is defined as x3   x    1− if 0 ≤ x < m Pm (x) = m3 2m  1 + 1 arctan π(x − m) if x ≥ m  2 π m This function provides model inertness. Its value equals the probability that the corresponding system node is not green. We assume that the current balance of the node determines this probability. The LR-model realises items 9 and 11 of Algorithm 2 as follows if outcome = nobody then newFirstState = firstState newSecondState = secondState if outcome = first then newFirstState = firstState if firstState.colour = RED then newSecondState.balance = secondState.balance − 1 if firstState.colour = GREEN then newSecondState.balance = secondState.balance if firstState.colour = BLUE then newSecondState.balance = secondState.balance + 1 if outcome = second then newSecondState = secondState if secondState.colour = RED then newFirstState.balance = firstState.balance − 1 if secondState.colour = GREEN then newFirstState.balance = firstState.balance if secondState.colour = BLUE then newFirstState.balance = firstState.balance + 1 The positive parameter m controls the system inertia and in a certain sense can be considered as a mass. This interpretation is illustrated by Fig 8. We should mark that the character of the measurement error behaviour is similar to one for the IR-model. This is a reason to omit the corresponding illustrating figure. 4 Conclusion Thus, the paper has proposed a framework for simulating pair-chatting in com- munities. The simulation results show that our fears associated with a funda- mental change in social behaviour caused by the widespread use of modern in- formation and communication technologies are not groundless. Moreover, these changes have led to a violation of the basic assumptions on which the mathe- matics of sociological measurements is based. The main argument in favour of such a conclusion is the observable fact, saying for the existence of a positive lower bound for measurement errors. The mention of this effect demonstrated by simulation modelling was described in the works of sociologists devoted to the survey method. Their reasoning is informal and far from mathematical ones. In the context of this reasoning, sociologists noted the existence of distortion effects always present in such measurements. In the context of this reasoning, sociologists noted the existence of distortion effects always present in such mea- surements. One can mention, for example, the book of Walter Lippmann [8] and the article of Pierre Bourdieu [2]. One can also refer to the Noelle-Neumann hypothesis [10] about the spiral of silence, which illustrates the contradiction of the internal processes of the functioning of public opinion and the problems of understanding and overcoming this contradiction by sociological means. 1.0 0.8 0.6 0.4 0.2 0.0 0 200 400 600 800 1000 Number of iteration a) m = 2 1.0 0.8 0.6 0.4 0.2 0.0 0 200 400 600 800 1000 Number of iteration b) m = 50 Fig. 8. Behaviours of LR Model In the case, if this hypothesis is confirmed, we will have to admit that the assumption of complete observability [4, p. 14] is wrong for intensively commu- nicating communities. In other words, for studying such communities we need to use models similar to rather quantum than classical models of physical systems. Of course, this does not mean that mathematics of quantum theory is adequate for describing dynamics of intensively communicating communities. Hence, the challenge to find the adequate mathematical language for studying this class of systems. Summing up our discussion, we can formulate the following problems for the top-priority research 1. conduct a detailed study of the dependence of the behaviour of the LR-model on the parameter m; 2. establish the dependence of the measurement error on the rate of this mea- surement; 3. generalise the obtained results for more complicated than pairwise commu- nications; 4. build a simulation model for communities exposed to external influences; 5. establish the character of the dependencies between parameters of the ex- ternal influence and the system behaviour; 6. find out whether the community exposed to external influences is a system managed by these influences. If all these studies give a positive result then the problem to ensure certain community behaviour in the presence of limited resources that provide external influence on the system can be set. References 1. Barabási, A.: Network science. Cambridge University Press (2018) 2. Bourdieu, P.: The three forms of theoretical knowledge. Social Science Information 12(1), 53–80 (1973) 3. Cover, T., Thomas, J.: Elements of Information Theory. Wiley-Interscience, 2nd edn. (2006) 4. Holevo, A.: Probabilistic and Statistical Aspects of Quantum Theory. Scuola Nor- male Superiore Pisa (2011) 5. Knuth, D.: Two notes on notation. American Mathematical Monthly 99(5), 403– 422 (1992) 6. Kullback, S.: Information Theory and Statistics. John Wiley & Sons (1959) 7. Kullback, S., Leibler, R.: On information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951) 8. Lippmann, W.: Public Opinion. Harcourt, Brace and Company, New York (1922) 9. Networkx, https://networkx.github.io/, (accessed 30.12.2019) 10. Noelle-Neumann, E.: The theory of public opinion: The concept of the spiral of silence. In: Anderson, J. (ed.) Communication yearbook, vol. 14, pp. 256–308. Sage Publications, Inc., Thousand Oaks, CA, US (1991) 11. Python, https://www.python.org/, (accessed 25.12.2018) 12. Rapoport, A.: Contributions to the theory of random and biased nets. Bulletin of Mathematical Biophysics 19, 257–277 (1957) 13. Rashevsky, N.: Mathematical Theory of Human Relations: An Approach to Math- ematical Biology of Social Phenomena. Principia Press, Bloomington, 2nd edn. (1949)