Bayesian click model and methods of estimating its parameters Andriy Sverstiuka,b, Taras Dubynyakb, Oleksandra Manziyc, Andriy Senykc, Pavlo Ohloblinc a I. Horbachevsky Ternopil National Medical University, Maidan Voli, 1, Ternopil, 46002, Ukraine b Ternopil National Ivan Puluj Technical University, Rus’ka str. 56, Ternopil, 46001, Ukraine c National University "Lviv Polytechnic", S.Bandera str, 12, Lviv, 79013, Ukraine Abstract In this article, mathematical support for the analysis of user click activity data has been developed. Based on the Bayesian click model and existing methods of estimating its parameters, a software product was created using the Python programming language, which predicts the relevance of web documents based on click logs. Studies have been conducted and conditions have been established under which the smallest error of these predictions is achieved. Keywords Mathematical methods, click model, data analysis, prediction, prediction error, algorithmic support, click logs, graph, probabilistic parameters, Bayesian network, machine learning. 1. Introduction One of the metrics of user feedback in search and marketing systems is Click-Through Rate (CTR) – click rating or click ratio. The value of this indicator gives information about the interest of users in certain search results. Therefore, the tasks of calculating, evaluating and predicting this metric are important and relevant [1-3]. For mathematical modeling and analysis of clicks, so-called click models have been developed, which are described by a set of probability relationships [4]. Research on this topic is presented in the works of scientists from Canada (Zhe Gao and Qigang Gao from the Faculty of Computer Science, Dalhousie University, Canada), Korea (Kyungwon Kim, Eun Kwon and Jaram Park from AI Center of Samsung Research, Samsung Electronics Company, Ltd., Republic of Korea) and other outstanding technical scientists [5-10]. Here are examples of the most common software products, services and online tools that are used to analyze data about user click activity: 1. Google Keyword Planner. 2. Microsoft Keyword Planner. 3. Facebook Campaign Planner. 4. LinkedIn Campaign Manager. 5. SellerApp. 6. Reddit Ads Dashboard. 7. Pinterest Ads Manager. The development of new mathematical and software algorithms for analysis and predicting user click activity is an urgent task. The paper proposes the use of the Bayesian click model and the implementation on its basis of effective mathematical methods of analysis, evaluation and prediction of user actions for improving the quality and efficiency of search services, as well as analyzing user interaction with ads to select the most relevant of them. ITTAP’2023: 3rd International Workshop on Information Technologies: Theoretical and Applied Problems, November 22–24, 2023, Ternopil, Ukraine EMAIL: sverstyuk@tdmu.edu.ua (Andriy Sverstiuk); d_taras@ukr.net (Taras Dubyniak), oleksandra.s.manzii@lpnu.ua (Oleksandra Manziy), andrij.p.senyk@lpnu.ua (Andrij Senyk), pavlo.ohloblin.mpmkm.2022@lpnu.ua (Pavlo Ohloblin). ORCID: 0000-0001-8644-0776 (Andriy Sverstiuk); 0000-0003-1529-6951 (Taras Dubyniak), ORCID 0000-0002-6480-2307 (Oleksandra Manziy); ORCID 0000-0002-1614-512X (Andrij Senyk), ORCID 0009-0002-4515-1390 (Pavlo Ohloblin). ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Mathematical grounding Since CTR is characterized by a shift in the positions of the search results (the lower the result in the list, the lower its actual CTR), the task of its unbiased prediction arises. That is, the calculation of its value, which would not depend on the position of the result in the output. In addition, click models are used to calculate unbiased CTR values. Click models provide an opportunity to estimate the CTR of some web document in terms of probabilities. CTR is numerically equal to the ratio of the number of clicks on the banner or link to the total number of their impressions: clicks CTR   100% impressions The main examples of such models are the following: Random Click Model This is the simplest model, which is described by the following equation: P(Cu  1)   (1) where Cu is a random event – choose any URL u with equal probability  . Position-based Model (РВМ) [9] More complex click models are based on the following hypothesis: Au  1 , Eu  1  Cu  1 (2) where Au - a random event - to become interested in a document, Eu - a random event - to get acquainted with its previous description (for example, a snippet in the search network). The ratio (2) means that the user clicks on the document u if and only if he has familiarized himself with this document and is interested in it. Random variables Au and Eu are independent. PBM is based on the assumption that the probability of getting acquainted with a snippet depends on the rank - the position of the document in the issue. The higher rank of the document, the lower its position and the corresponding probability of familiarization with it. The positional model describes this with the following ratios: P(Cu  1)  P( Eu  1)  P( Au  1) , (3) P( Au  1)  au , (4) P( Eu  1)   u , (5) Cascade Model (CM) The Cascade click model is based on the assumption that the user views the search results strictly from top to bottom and makes a click decision for each viewed document [11, 12]. After selecting the desired URL, the user will no longer view the documents below, regardless of their position. CM is described by the following ratios: Cu  1  Eu  1  Au  1 , (6) P( Au  1)  au , (7) P( Eu  1)  1 , (8) P( Eu  1 Eu1  0)  0 , (9) P( Eu  1Cu1  1)  0 , (10) P( Eu  1 Eu 1  1, Cu1  0)  1. (11) Dynamic Bayesian Network (DBN) [12] In [14], the authors proposed a Dynamic Bayesian Network - a click model, which is an extension of the cascade model (CM). The DBN for a fixed query is described by the following relations: Ci  1  Ei  1  Ai  1 , (12) P( Ai  1)  au , (13) P(Si  1Ci  1)  su , (14) Ci  0  Si  0 , (15) Si  0  Ei1  0 , (16) P( Ei1  1 Ei  1, Si  0)   , (17) Ei  1  Ei1  0 , (18) where i stands for the rank (position) of the document in the search results. Ci is a binary observable variable that shows whether a click occurred on the document at position i. The hidden variable Ei describes the fact of the user's familiarization with the snippet, Ai shows whether the user was interested in the document at position i. S i shows whether the user was satisfied with the search result after visiting the web page at position i. DBN is based on the following assumption: a click occurs if and only if the user has viewed a URL and is interested in it (12). The probability that the document will interest the user depends only on the document itself (13). Similar to the cascading model, the user browses URLs linearly from top to bottom until they decide to stop. Once a user clicks and visits a URL, there is a certain probability that they will be satisfied with the result of their search (14). On the other hand, if he does not browse the web page, he will not be satisfied with the search result (15). If the user is satisfied with the URL they visited, they stop searching (16). If he is not satisfied with the result, there is a probability (1 – γ) that the user will stop searching (17) and a probability γ that the user will check the next URL. In other words, γ is a metric of user "persistence". If the user has not read the document snippet at position i, he will not explore the documents at lower positions (18). In addition, au and su are distributed according to the beta distribution. The essence of learning click models is to estimate their parameters based on a set of data - the so- called click log. This data set contains information primarily about user searches, search results, and clicks on each of the results in the output. After evaluating the parameters of the model, it is possible to draw conclusions about the click behavior of users. For DBN, the evaluated parameters are au and su , and these two parameters describe the relevance of document u. The parameter au describes hypothetical relevance as it measures the likelihood of a click based on a URL. The parameter su is equal to the probability that the user will be satisfied after going to this link; therefore su should be understood as a "ratio" between actual and hypothetical relevance. For a dynamic Bayesian network, the parameter au is equivalent to the CTR that document u would have in the first position of the search results. Therefore, in the future, the main attention will be paid to the calculation of the parameter au , since it is a CTR forecast for the document u in the search results. At the same time, this CTR prediction is unbiased because au does not depend on the rank (position) of document u in the resulting list. The EM algorithm and the Forward-Backwards algorithm [13] for DBN training, that is, for parameter estimation au and based on click logs su . They are used when implicit variables are present in the model. It is important to clarify that in the maximization step of the EM algorithm, the updated values of au and calculated su , and the maximum posterior method is used, which is a generalization of the maximum likelihood method. The easiest way to calculate the updated parameters is to calculate in closed form (analytically) using the theory of conjugate distributions. According to Bayes' theorem: p( x  ) p( ) p( x)  (19) range p( x  ) p( )d where for some parameter  p ( x ) - posterior distribution, p ( x  ) - likelihood function, p( ) - a priori distribution. If the likelihood function and the prior distribution are conjugate, then p ( x ) belongs to the same family of distributions as p( ) . Since in the Bayesian click model the likelihood function has a Bernoulli distribution and the prior probability has a beta distribution, the posterior probability will also have a beta distribution, but with different parameters (or hyperparameters). Values of updated hyperparameters  and  of the posterior beta distribution are calculated as follows: n n  aппосте   апр   i 1 xi ,  aппосте   апр   (1  x ) , i 1 i (20) where xi , (i  1, n) are some independent observations. The only input parameter that is given before starting DBN training is γ, the probability that the user will continue to view the output results at the positions (i  1)  1,...n provided that Si  0 . [10] shows the graph of the root mean square error of predicting the CTR of the document at position 1 as a function of γ. As shown in Figure 3 [12], the best CTR prediction at position 1 was obtained for γ = 0.9. At the same time, the click model for γ = 1 gives only a slightly worse forecast, but at the same time the estimation of DBN parameters becomes much simpler. After all, in the case when γ = 1, the user continues to explore the search results until he is satisfied. This means that the last click in the list of all clicks gave the user a satisfactory result, and he stops reading snippets of documents that are in lower positions: E1  ...  Ei  1  Ei1  ...  En  0 , (21) where i is the position of the last clicked document. As can be seen from (21), for γ = 1 there is no variable uncertainty Ei , and therefore there is no need to apply EM and Forward-Backwards algorithms. Parameters au and su are estimated using simple calculations [12]. Therefore, the estimation of the parameter au , the algorithm of which is given above, is the essence of using the Bayesian click model (DBN) for the unbiased prediction of the CTR of the document u. This work describes the software created by the authors, which implements the Bayesian click model for CTR predictions. Research has also been conducted and the conditions under which the smallest error of these predictions is achieved have been established. 3. Algorithmic support A detailed analysis of the existing development tools was carried out and a decision was made to build a Bayesian click model using the Python programming language as the main tool, in particular its PyAgrum, NumPy, Pandas, Matplotlib libraries included in the Ananconda distribution. The Tkinter library was used to create the graphical interface. In addition, the software product Bayes Server is used as a tool for modeling conventional and dynamic Bayesian networks, causal models and influence diagrams. One of the most important capabilities provided by this software product is the calculation of the values of hidden variables. The main inference algorithm in Bayes Server is Relevance Tree, which is used by default. When calculating hidden variables, a choice is made: either to calculate the values of specific variables, or to calculate all of them at once. In addition, it is possible to calculate the value of the likelihood function. Another important feature of Bayes Server is parameter learning, for which it is necessary to additionally connect sets of training data. Bayes Server provides extensive options for editing models, for querying model parameters, including multiple queries based on large datasets. Description of model inputs Click logs are used to train click models. These are data sets that contain information about search sessions. A search session is a session during which a user makes a search request in the system and receives a result. After receiving the search results, the user goes through one or more links until he either finds the document he needs, or ends the search session due to the lack of relevant documents. Thus, each search session is characterized by three necessary attributes:  search query;  list of results;  vector of clicks that takes values from {0; 1} and corresponds to the fact of clicking on a specific document. Logs published in open access on the Kaggle resource as part of the international competition The Personalized Web Search Challenge were used to build and train the Bayesian click model. The data were formatted as illustrated in the Table 1: Table 1. Format of click logs query url0 url1 … url9 click 0 click 1 … click 9 q1 d10 d11 … d19 c10 c11 … c19 q1 d11 d10 … d19 c11 c10 … c19 … … … … … … … … … qn dn0 d n1 … d n9 cn 0 cn1 … cn9 Here qi (i  1, n) – some search query; d ij ( j  1,10) - a document (URL) associated with a search query qi , c ij ( j  1,10) – a Boolean variable associated with a document d ij . If c ij equal to 0, then the document was not clicked, if c ij equal to 1 - it was clicked. The original click logs are archived and have the *.gz format. To convert them into the format presented in the table. 1, a separate Python script file was created that implements the conversion of the archive to *.csv or *.sql. For this, the generator mechanism is used, which allows reading large and super-large arrays of data. From all the available information in the archive, only data of the following types are selected:  session id;  search request id;  an identifier consisting of two numbers, the first is an anonymized address of some document, the second is its domain. This pair of numbers is read as one and interpreted as the URL of the document. In addition, the log file lists (fig. 2) the URLs that were clicked on during this search session. For URLs from this list, the value 1 in the click vector is matched. Figure 2: A project for the Flask framework has been created. For the rest, the value is 0. In this way, a data array is formed, which looks similar to the table. 1. In the file of scripts for parsing logs, the final data can be exported both to the database and to a *.csv file. 4. Results - construction of the Bayesian click model for unbiased CTR prediction. Model training A Bayesian network is a directed acyclic graph (Fig. 3). To build it, it is necessary to define a set of vertices (or nodes) and specify connections (arcs, edges) between them. The node A from which the edge originates is called the parent, A  pa(B) while the node B is called the child. Child nodes conditionally depend on parent nodes; nodes that have no parents are conditionally independent. Each vertex of the graph corresponds to a random variable that has a finite number of mutually exclusive states. In addition, the probabilities of occurrence of each state are specified. Figure 3. Scheme of the Bayesian click model The Bayesian click model assumes the existence of the following four binary variables:  E – a user viewed some document in the SERP;  A – a user is interested in a certain document;  C – a user clicked on this document;  S – a user was satisfied with the contents of the document. So the Bayesian network has 4 vertices. Connections between them will be built taking into account the dependencies described by equations (12-15). To implement equations (16-18), the concept of a Bayesian network must be expanded to a Dynamic Bayesian Network (DBN). DBN is a regular Bayesian network with the concept of time. DBN makes possible to model time series or sequences of dependent events. When modeling clicks in search networks, search sessions with 10 documents in output are considered. Since events E and S are inextricably linked for documents in adjacent positions, a Dynamic Bayesian Network will be used to model clicks within the entire session. The sequence of DBN construction in Bayes Server consists of the following steps [14]: 1. Creating a node (Node, Discrete Node (temporal)). 2. Naming the node (Name field), specifying the states of the variable ("Y", "N") that correspond to the node. 3. Repeat steps 1-2 for all 4 nodes. 4. Specifying connections between nodes (Links). 5. Setting probability distributions for all states of each variable, using the ratio (12-18) (Fig. 4). Figure 4. Probability distribution for variable C. P (C pa(C )) Let's take a closer look at step 4. Since a Dynamic Bayesian Network is not being created, the connections between nodes in Bayes Server have an attribute called temporal order. This attribute is a non-negative integer and in terms of the click model means the following: if the temporal order for the connection from node C to node S is zero, then S at position i depends on C at the same position. If the temporal order is equal to one, then S at position i +1 conditionally depends on C at position i. Thus, for the construction of the DBN in Bayes Server, additional links with the time order of 1 were created between the variables S and E as well as E and E. This means, according to formulas (16-18), that the variable E for the document at position i conditionally depends on the states of variables E and S at position i -1. The final visual view of the dynamic Bayesian network, which was built in Bayes Server, is presented in Fig. 5. Figure 5. Dynamic Bayesian network (click model), built in Bayes Server In the PyAgrum library [15], the DBN construction algorithm consists of the following steps: 1. Initialization of an instance of the model class (pyAgrum.BayesNet()). 2. Defining variables (method pyAgrum.LabelizedVariable()), naming and setting states ("Y" and "N", which is equivalent to "Yes" and "No", respectively). 3. Creation of network nodes based on defined variables (pyAgrum.BayesNet().add()). 4. Initialization of directed connections between nodes (pyAgrum.BayesNet().addArc()). 5. Setting probability distributions for the states of each variable (pyAgrum.BayesNet().cpt()). 6. Initialization of the inference mechanism for the model. In Bayesian networks, inference is the calculation of the values of hidden (or latent) variables based on the values of the observed variables and the probability distribution for the hidden variables. The probabilities of the occurrence of certain states of the variable given before research are called prior probabilities, when the probabilities calculated after the research are called posterior probabilities. To determine the posterior probabilities of the latent variables, it is necessary to calculate their conditional and unconditional probabilities. The PyAgrum library provides several algorithms for calculating posteriors, such as Lazy Propagation [16], Variable Elimination, Shafer- Shenoy algorithm, Gibbs sampling, etc. The main algorithm for PyAgrum is Lazy Propagation, so it was used to calculate the DBN hidden variables. It is based on the belief propagation algorithm, which is widely used in machine learning, including for neural networks, Bayesian networks, and Markov models [17]. Lazy Propagation computes all posterior distributions in a Bayesian network using an adjacency tree T = (C, S), where C are cliques of the tree, S are separators. The tree T is built by moralizing and triangulating the graph of the Bayesian network G = (V, E). When the tree is constructed, each cell C  C is assigned a set of distributions of all variables X such that X  pa(X ) . Such a set of distributions corresponding to the cell C is denoted by ФC . When evidence X = x is established for some variable X, then from all distributions Ф in the tree, which contain X in the domain of definition, only those for which X = x remain. The algorithm removes the rest. After the tree T is initialized, the message propagation algorithm works. Messages are propagated from one clique to another through each separator S  S in two directions - from the leaves to the root of the tree and vice versa. The message ФAB transmitted from clique A to clique B is a set of probability distributions and is calculated as follows: Ф AB  Ф  A\ B A  ФC  A , Cadj(A) \B (22) where adj(A) are cliques adjacent to A. At the same time, variables for which the posterior distribution should not be calculated and for which certificates are not established can be excluded from the calculations. When traversal of all messages is complete, the posterior distribution for the variable Y can be computed from any clique or separator containing Y. Let Ф be the set of distributions from which the posterior distribution for Y will be calculated. Then the algorithm for calculating P(Y  ) , where  is some proof, is as follows: 1. Find relevant distributions RY from Ф (Algorithm 3.5, [16]).   2. Exclude RY each variable X from the set from X  dom( )   R y , X  Y (Algorithm 2.3, [16]). 3. Let ФY be the resulting set of distributions. Then calculate P(Y  ) as follows:    ФY P(Y  )   Y  ФY (23) In PyArgum, in order to specify the inference method for the newly created Bayesian network, it is only necessary to initialize an instance of the pyArgum.LazyPropagation(bn) class, where bn is an instance of the pyAgrum.BayesNet() class. After defining the structure of the model, PyArgum provides an opportunity to generate a visualization of the Dynamic Bayesian Network in svg format. Fig. 6 shows the model created by means of the library. Figure 6. The model was created using the tools of the PyAgrum library After constructing the DBN, both in Bayes Server and in Python, posterior probabilities were found for the variables Ai and S i , where i are time points i  0,9 (for DBN, the numbering of time points starts from zero). For this, the variables Ci were defined as observables and the states, Ci " N " , were defined for all time slices i  0,9 . In Bayes Server, the states of the observed variables must be specified in special tables, while in PyAgrum there are two ways to initialize variables: using setEvidence() or pyAgrum.lib.dynamicBN.plotFollow(). The plotFollow() method immediately generates a plot of the given variables, so this one was chosen for use in the inference. In PyAgrum, the inference method is Lazy Propagation, in Bayes Server – Relevance Tree. Both of these algorithms belong to the family of exact inference algorithms, but the developers of Bayes Server do not reveal details about how their algorithm works. After configuring the models, we perform calculations. The results of the calculations are shown in figures 7-8. It is worth noting that for these cases au  su  0.5 Figure 7. Visualization of posterior distributions for Ai and S i in PyAgrum Figure 8. Visualization of posterior distributions for Ai and S i in Bayes Server From figures 7 – 8 shows that the calculation results are identical, despite the different methods of inference. Therefore, the tool for building a DBN for the purpose of CTR prediction can be both the Bayes Server API, which is available as Python libraries, and PyAgrum directly. Empirically, it was established that the calculation of posterior probabilities in PyAgrum is faster, so Bayes Server will remain only a secondary tool in the study of Bayesian networks. To predict CTR, it is necessary to calculate the parameter au , which is initialized at the request level. That is, it is determined only by the request-document pair. The parameter au is equivalent to the CTR of document u if it were in the first position. Therefore, the value au will be the unbiased CTR prediction for u To calculate the parameters au and su , a two-step iterative Expectation-Maximization algorithm [10] was used. Suppose that for a fixed search query q are investigated n search sessions. Let Ai j , S i j , Ei j are hidden variables at i position, i  1,10 in j search session, j  1, n . We denote the document at position i for session j as d i j . Then one iteration of the algorithm consists of two steps. Step 1. Maximization. Given Q ( Ai j ) and Q ( S i j ) posterior distributions, then the updated parameters au , su will be found as follows:  I (d  u)Q( A  0) ln(1  a)  Q( A  1) ln(a) ln P(a) ; n 10 au  arg max i j i j i j a j 1 i 1  I (d  u, S  1)Q( AS  0) ln(1  s)  Q(S  1) ln( s) ln P(s) , n 10 su  arg max i j i j i j i j s j 1 i 1 where I is the indicator function, P (a ) and P (s) are a priori beta distributions taken with parameters (1, 1). The maximum a posteriori at this step can be calculated in two ways: by numerical optimization (gradient descent method, Newton's method) or analytically (in closed form). For numerical optimization, the optimize module of the scipy library has a special minimize_scalar() method. To calculate au , su the theory of conjugate distributions is used analytically. We denote  a , and  a as hyperparameters for au ,  s and  s as hyperparameters for su . Then updated au and su will be calculated as follows:  I (d  u)Q( A  1) n 10 1 i j i j j 1 i 1 ; au   I (d  u)Q( A  1)  I (d  u)Q( A  0) n 10 n 10 2 i j i j i j i j j 1 i 1 j 1 i 1  I (d  u, S  1)Q(S  1) n 10 1 i j i j i j j 1 i 1 su  .  I (d  u, S  1)Q(S  1)  I (d  u, S  1)Q(S  0) n 10 n 10 2 i j i j i j i j i j i j j 1 i 1 j 1 i 1 Since closed-form computation requires less programming time and is more accurate, this method was chosen to perform the maximization step. Step 2. Expectation. au , su become the new parameters for A and S, respectively. Then new posterior distributions Q ( Ai j ) and Q ( S i j ) are calculated. Iterations are performed until the parameters au , su converge. Creating a model in Python was implemented in the model_setup function with input parameters initial values au and su input parameter values  . Model training in Python was implemented in the train function, which accepts the following input parameters:  model – DBN created using PyAgrum;  sessions_number – the number of training sessions for the model;  df_with_clicks is a Pandas dataframe that contains click logs in the format shown in the Table 1;  max_iterations (optional) – the number of iterations of the EM algorithm. (max_iterations=60). After the training process is complete, the function returns matrices a and s that contain the corresponding parameters for each session and each URL in the session. A graphical user interface (GUI) was also created for the program that implements the Bayesian click model to combine input and output information for the model. The visual shell is presented in fig. 9: Figure 9. GUI for DBN To start the learning process, you need to set the following input data:  Parameter  is an input parameter of the Bayesian click model.  is the probability that the user will continue the search if the previously viewed document was not relevant for him;  Training data path is a folder in which click logs are stored, for which there is only one unique search query within one file;  Search query amount defines some sample of data that will be used to train the model;  Documents: "all unique URLs" if au calculated for all documents on the search page. If it is necessary to calculate au only for a specific list of documents, then you should select the item "URLs from a config file";  Number of training sessions can be specified as a single number or a sequence of unique integers. In the second case, training will be carried out sequentially in several stages with different numbers of training sessions in each; During model training, the calculated parameters are recorded in the corresponding table. By default, it has only two columns - "Query" and "Document". During the execution of the program, the table changes dynamically after activating the "Evaluate" button. In addition to calculating and displaying the results in a table, it is also possible to save these results in a separate file, in .csv format or as a database. There is an option to save only the values au as a matrix, without the corresponding URLs and queries. Figure 10. GUI for DBN (results of learning) Checking the estimated parameters for accuracy and research of the calculation error Since au and su are evaluated by an iterative algorithm, it is necessary to study the stopping criterion of this algorithm. For this purpose, an experiment was conducted, which consists of the following stages: 1. Randomly select 5 URLs associated with different search queries. 2. Set a fixed number of iterations for the EM algorithm (max_iterations = 100). 3. Compute values au for all five URLs over a fixed number of training sessions for a given number of iterations. At the same time, for each URL, make two stages of calculation au : with initial values of 0.1 and 0.9. In addition, fix all intermediate values au that are calculated at each iteration. 4. Plot convergence graphs au for different numbers of training sessions. Figure 11. Graphs of dependences of differences au on the number of iterations of the EM algorithm Figure 11 shows that with an increase in the number of training sessions, the initial values au have less and less influence on their final evaluation. For four out of five documents, the change in differences slows down with the number of iterations exceeding 60. For u  d 3 a similar result is achieved only on a large number of sessions. For further experiments, the maximum number of iterations of the EM algorithm will be 60. As previously stated, au as a measure of the attractiveness of some URL u is equivalent to the CTR u, if it were in the first position in the output. Therefore, to study the accuracy of CTR prediction by the Bayesian model, an experiment was conducted according to the protocol originally proposed in [12], which was also used in [18]. The experiment consists of the following steps: 1. View all sessions that share a search query. 2. Consider some URL that appeared in both position 1 and positions 2-10. 3. All sessions, where the URL appeared in position 1, are considered test sessions. 4. The rest of the sessions are considered training sessions. 5. In the test sessions, calculate the CTR for the URL directly according to the formula. 6. In training sessions, train the model and evaluate the parameter au . 7. Compare the test CTR with au , calculating the error. 8. Average the errors for all similar request-document pairs, weighting them by the number of test sessions. Thus, the weighted root mean square error for this experiment was calculated according to the following formula: n wi (CTRi  aui ) 2 MSE  i 1 w i As part of this experiment, averaging was performed on different numbers of request-document pairs (10, 20, 30, 40, 50) for a better understanding of the dynamics of error changes. The results are shown in the graphs in fig. 12. Figure 12. Graphs of the dependence of errors on the number of training sessions After comparing the graphs in fig. 12, it can be concluded that with an increase in the number of request-document pairs for averaging, the graphs become smoother - the change in error becomes smaller with an increase in the number of studied URLs. In addition, there is a downward trend, that is, with an increase in the number of sessions, the root mean square error decreases, which is quite natural. On the right of fig. 12 are the graphs of the change in the value, which is the square root of the mean square error (RMSE). Thus, it is possible to see what is the average deviation of the predicted CTR from the real one, depending on the number of training sessions. It is also important to remember that this click model requires the definition of an input parameter  , which is the probability that the user will continue the search, given that the previous result did not satisfy him. Thus, the parameters  also depend on the estimates au and su . For previous experiments,  it was equal to 0.9. To find the optimal value of this parameter, an experiment similar to the previous one was conducted, but now the number of training sessions and n are fixed, and the error is presented as a function dependent on the input parameter  (fig. 13). Figure 13. The dependence of the root mean square error on  An unexpected conclusion emerges from the results of the experiment - the best CTR forecast is achieved at the value  =1, which corresponds to the SDBN (Simplified DBN) specification. A similar effect was also observed in [18]. In other words, for the studied click log, SDBN predicts CTR better than general DBN. This means that users, according to these click logs, are extremely persistent in their search for information. Since the best prediction is achieved for  =1, let's conduct an experiment with the calculation of the root mean square weighted error at  =1 and compare it with the error at  =0.9. Figure 14. The root mean square error. Comparison for  =0.9 and  =1 Comparing forecasts for two different values  , we come to the conclusion that the optimal DBN parameter for predicting CTR on these click logs is  =1, and for any number of training sessions (fig. 14). 5. Conclusion 1. The Bayesian click model is considered, including probabilistic relationships that describe the model, semantics of model parameters, possible methods of parameter estimation. 2. An overview of available software products and services for analyzing and predicting user click behavior was conducted. 3. With the use of modern information technologies and programming languages, in particular, the Python programming language, the PyAgrum library, and the Bayes Server software, a click model was built. 4. A comparison of the performance results of various algorithms for calculating hidden variables was carried out. An application in the Python programming language was created, which evaluates unknown model parameters based on click logs. 5. Studies of the accuracy of CTR prediction by the click model have been conducted. In particular, the dependence of the prediction error on the number of training sessions, as well as on the value of the input parameter, is illustrated. By conducting the experiment, the optimal value of the input parameter was found. In future studies, it is planned to develop a Bayesian click model for medical calculators for cardiac diagnostics [19, 20], biosensor systems [21, 22] and geoinformation systems [23]. This approach will allow taking into account probabilistic ratios that will describe the proposed models, the semantics of parameters, as well as possible methods of their evaluation. 6. References [1] Y. Yang, P. Zhai. Click-Through Rate Prediction in Online Advertising: A Literature Review. SSRN Electronic Journal. 2022. [2] M. Khvostivskyy, H. Osukhivska, L. Khvostivska, T. Lobur, D. Velychko, S. Lupenko, T. Hovorushchenko, Mathematical modelling of daily computer network traffic. ITTAP 2021. CEUR Workshop Proceedings. Ternopil, Ukraine, November 16-18, 2021. Vol. 3039. pp.107-111. [3] V. Zhukovskyy, S. Shatnyi, N. Zhukovska, A. Sverstiuk, Neural network clustering technology for cartographic images recognition. EUROCON 2021 - 19th IEEE International Conference on Smart Technologies, Proceedings, 2021, pp. 125–128. doi: 10.1109/EUROCON52738.2021.9535544. [4] G. Zhou, X. Zhu, C. Song, Y. Fan, H. Zhu, X. Ma, Y. Yan, J. Jin, H. Li, and K. Gai, Deep interest network for click-through rate prediction, in Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Jul. 2018, pp. 1059–1068. [5] H. Zhang, J. Yan, and Y. Zhang, CTR prediction models considering the dynamics of user interest, IEEE Access, vol. 8, 2020. pp. 72847–72858. [6] G. Zhou, N. Mou, Y. Fan, Q. Pi, W. Bian, C. Zhou, X. Zhu, and K. Gai, Deep interest evolution network for click-through rate prediction, in Proc. AAAI Conf. Artif. Intel., vol. 33, 2019, pp. 5941–5948. [7] Z. Xiao, L. Yang, W. Jiang, Y. Wei, Y. Hu, and H. Wang, ``Deep multiinterest network for click-through rate prediction,'' in Proc. 29th ACM Int. Conf. Inf. Knowl. Manage., Oct. 2020, pp. 2265–2268. [8] X. Li, C. Wang, B. Tong, J. Tan, X. Zeng, and T. Zhuang, ``Deep timeaware item evolution network for click-through rate prediction,'' in Proc. 29th ACM Int. Conf. Inf. Knowl. Manage., Oct. 2020, pp. 785–794. [9] Y. Feng, F. Lv, B. Hu, F. Sun, K. Kuang, Y. Liu, Q. Liu, and W. Ou, ''MTBRN: Multiplex target-behavior relation enhanced network for clickthrough rate prediction,'' ' in Proc. 29th ACM Int. Conf. Inf. Knowl. Manage., Oct. 2020, pp. 2421–2428 [10] Q. Pi, W. Bian, G. Zhou, X. Zhu, and K. Gai, Practice on long sequential user behavior modeling for click-through rate prediction, in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Jul. 2019, pp. 2671–2679 [11] A. Chuklin Click Models for Web Search / A. Chuklin, I. Markov, M. Rijke. // Synthesis Lectures on Information Concepts, Retrieval, and Services.. - 2015. - No. 3. – pp. 1–115. [12] O. Chapelle A Dynamic Bayesian Network Click Model for Web Search / O. Chapelle, Y. Zhang. – 2009. – pp. 1–10. [13] T. Krishnan, G. McLachlan. The EM algorithm. In Handbook of computational statistics / - 2012. pp. 139-172. [14] The official site of Bayes Server [Electronic resource] - Access mode: https://www.bayesserver.com/ . [15] PyAgrum official site [Electronic resource] - Access mode: https://pyagrum.readthedocs.io/en/1.1.1/ . [16] A. Madsen, F. Jensen Lazy propagation: A junction tree inference algorithm based on lazy evaluation. Artificial Intelligence. 1999. No. 1-2. pp. 203–245. [17] A. Madsen, C. Butz. Exploiting Semantics in Bayesian Network Inference Using Lazy Propagation. 2015. pp. 3–15. [18] A. Grotov Comparative Study of Click Models for Web Search / A. Grotov et al. // Lecture Notes in Computer Science. Cham. 2015. pp. 78–90. [19] S. Lupenko, I. Lytvynenko, A. Sverstiuk, A. Horkunenko, B. Shelestovskyi, Software for statistical processing and modeling of a set of synchronously registered cardio signals of different physical nature. CEUR Workshop Proceedings, 2021, 2864, pp. 194–205. [20] V. Martsenyuk, A. Sverstiuk, A. Klos-Witkowska, A. Horkunenko, S. Rajba, Vector of diagnostic features in the form of decomposition coefficients of statistical estimates using a cyclic random process model of cardiosignal. Proceedings of the 2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2019, 1, pp. 298–303. doi: 10.1109/IDAACS.2019.8924398. [21] V. Martsenyuk, A. Sverstiuk, I. Gvozdetska, Using Differential Equations with Time Delay on a Hexagonal Lattice for Modeling Immunosensors. Cybernetics and Systems Analysis, 2019, 55(4), pp. 625–637. doi: 10.1007/s10559-019-00171-2. [22] V. Martsenyuk, A. Klos-Witkowska, A. Sverstiuk, Stability Investigation of Biosensor Model Based on Finite Lattice Difference Equations. Springer Proceedings in Mathematics and Statistics, 2020, 312, pp. 297–321. doi: 10.1007/978-3-030-35502-9_13. [23] T. Vilkys, V. Rudzinskas, O. Prentkovskis, N. Višniakov, P. Maruschak, Evaluation of failure pressure for gas pipelines with combined defects, 2018. Metals8(5), pp. 346.