1. Introduction

N. Kanakaris I. Varlamis. Detection of fake news campaigns using graph convolutional networks. International Journal of Information Management Data Insights

10.1016/j.jjimei.2022.100104

Evaluation of methods for intellectual analysis of manipulative content in the information space⋆

Vsevolod Senkivskyy

Iryna Pikh

iryna.v.pikh@lpnu.ua 1

Alona Kudriashova

alona.v.kudriashova@lpnu.ua 1

Roman Andriiv

roman.andriiv@icloud.com 0

Nazarii Senkivskyi

nazarii.y.senkivskyi@lpnu.ua 1 0 0 Akademichna St. , Berezhany, Ternopil region, Ukraine, 47501 1 Lviv Polytechnic National University , Stepan Bandera Str., 12, Lviv, 79013 , Ukraine

2019

2533 91 103

The modern information space is characterized by a high level of manipulative content, which poses a serious threat and significantly affects public opinion, political processes and the level of trust in media resources. Determining the optimal methods for evaluating the veracity of information messages is an extremely urgent task that requires the use of intelligent data analysis and efficient means of counteraction. The attention is focused on the use of machine learning algorithms and artificial intelligence to detect manipulations in text, video and audio materials. Based on the developed expert survey methodology, logistic regression, decision trees and SVM (Support Vector Machine) are identified among the known methods of intelligent analysis of manipulative content. To determine the most efficient approach, a Pareto set of mutually non-dominated criteria is formed, which includes accuracy, execution speed, resource capacity and interpretability. A model of the priority influence of criteria on the message evaluation process is developed, which allows determining their weight coefficients as the basis for experimental calculations. In order to select the optimal method, the method of linear convolution of criteria is applied, which ensures the determination of the Pareto-optimal solution among the proposed alternatives. To implement this approach, alternative options are generated by varying the criteria and forming combinations of their efficiency degrees that correspond to the methods of evaluating the message probability. Based on the method of hierarchy analysis, a program calculation of normalized weights of criteria and utility functions for the corresponding alternatives is performed. The obtained final values of the combined functionalities of alternative options allow determining the optimal method for evaluating the probability of information messages. The results of the study demonstrate the dependency of the method selection on the requirements for accuracy and explainability of solutions. The obtained data can be used to develop automated systems for monitoring and analysing the information space, which will contribute to increasing the level of information security and efficient counteraction to manipulative content.

eol>manipulative content information message intelligence analysis machine learning information security method of linear convolution of criteria alternative option pairwise comparison matrix utility function Pareto-optimal method 1

1. Introduction

In the context of growing information threats, there is a growing need to develop and improve methods of intelligent analysis that allow for the efficient detection, identification and evaluation of false content. The research in this area is aimed at developing algorithmic and analytical approaches that provide the detection of signs of manipulation, analysis of information sources and determination of its reliability level [1, 2]. The use of modern data analysis technologies, in particular machine learning and artificial intelligence methods, increases the accuracy of information flow evaluation and prevents the negative impact of destructive content on the society [3]. The development of artificial intelligence, machine learning, linguistic analysis and neural networks contributes to the improvement of methods for processing text and multimedia data [4, 5]. The publication [6] analyzes the efficiency of artificial intelligence and machine learning methods in cybersecurity, evaluating their ability to detect threats and anomalies. Various approaches, their advantages, disadvantages and development prospects are considered. A feature of the issues studied in this paper is the coverage of approaches in the work [7], which combine computational solutions and propose strategies to combat disinformation, in particular, the use of machine learning-based methods for automatic classification of disinformation. Machine learning methods use statistical approaches [8, 9] to identify patterns and anomalous deviations in large data sets, which allows security analysts to timely detect potential threats, including previously unknown attacks. Deep learning, as a subfield of machine learning, demonstrates high efficiency in increasing the accuracy of recognizing malicious patterns due to multi-level information processing, which is especially important for the analysis of complex cyber threats, including image recognition, traffic analysis, and decryption of threatening messages.

The efficiency evaluation of such methods can be based on the Pareto principle, which allows selecting optimal solutions based on the criteria of accuracy, speed, and resource consumption. For example, deep learning models [10, 11] demonstrate high accuracy, but require significant computing resources, while classical machine learning methods are less expensive, but may be inferior in the ability to recognize complex manipulative structures. The selection of a specific approach depends on the characteristics of the information environment in which the analysis is carried out, as well as on the needs for accuracy and speed of data processing [12].

In view of the above, the goal of the study is to evaluate the efficiency of modern methods for intelligent analysis of manipulative content by comparative analysis of their accuracy, speed, resource capacity, and interpretability. Special attention is paid to the criteria for selecting the optimal method, which allows increasing the efficiency of information security and minimizing the negative impact of disinformation on the society.

Strategies for evaluating methods of intelligent analysis of manipulative content based on multicriteria optimization are proposed. This work analyzes modern methods of detecting manipulative content, identifies key criteria for their evaluation, ranks alternatives using the method of linear convolution of criteria and determines the optimal method. The results obtained can be used to develop automated systems for monitoring the information space, which will help reduce the risks of spreading disinformation and increase the level of information security.

2. Literature review

Research and analysis of manipulative content in the information space covers a wide range of methods and approaches aimed at identifying, assessing and neutralizing the impact of false information. This includes improving fake detection algorithms, developing multi-criteria analysis systems, and integrating modern machine learning technologies to improve the analysis accuracy. The use of efficient methods is critical for protecting various areas of activity exposed to the risks of disinformation and manipulation.

Although most research focuses on detecting fake news [13] based on its content or using the user interaction with news on social networks, there is a growing interest in proactive intervention strategies to counter the spread of disinformation and its impact on society [14]. However, the selection of the optimal method remains an open question, since the efficiency of algorithms significantly depends on the characteristics of the input data, the context of their application, the level of adaptability to new manipulative strategies, and the performance of computing resources.

In the article [15], multimodal methods for detecting fake news based on semantic information have achieved great success. However, these methods use only the deep features of multimodal information, which leads to a large loss of real information at the surface level.

For a well-grounded selection of the optimal method, a set of interrelated criteria is taken into account by the authors. In this context, the use of multi-criteria analysis methods plays a special role, such as the method of linear convolution of criteria, which allows evaluating alternative approaches by key parameters. The formation of the Pareto set allows one to isolate mutually non-dominated methods and determine the best option based on the calculation of utility functions.

The study [16] presents a taxonomy of models, machine learning and deep learning functions, used to detect fake news based on the content analysis. To solve this problem, machine learning models are used that allow for automated identification of unreliable information. At the same time, the work did not propose efficient methods for improving the accuracy of classification or optimizing models, which leaves open the questions of their adaptation to new types of manipulative content and increasing resistance to changes in the characteristics of fake news.

The use of neural network technologies for detecting fake data is considered in the work [17]. The efficiency of the methods is assessed by comparative analysis of their strategies, approaches to error estimation and the accuracy level on different data sets. Our goal is to help researchers in determining relevant criteria and selecting the optimal method for solving specific tasks of intelligent analysis of manipulative content.

A detailed review of methods for generating and detecting deep fakes is presented in the study [18]. Open challenges faced by detection systems are considered, and possible options for overcoming them are proposed, in particular using deep learning. The main emphasis is placed on the need to create efficient manipulative content detection systems that are able to adapt to new challenges in the field of artificial intelligence. However, there are no experimental results and practical evaluation of the proposed methods in real-world application scenarios. The work [19] is devoted to the analysis of the problem of large language model radicalization. Semantic vulnerabilities and the learning inadequacy based on human feedback are studied. At the same time, the attention is focused mainly on theoretical aspects and does not contain practical recommendations for the direct implementation of protective mechanisms.

The work [20] is focused on the problem of manipulation by artificial intelligence and recommends criteria for assessing the level of the system manipulability. The main attention is paid to the analysis of algorithmic ethics and the need to create transparent decision-making systems that are not susceptible to manipulation by users. However, there are no clear metrics or practical methods for assessing the manipulability level.

To solve this problem, improved approaches to fake news detection are proposed by the authors, which combine traditional machine learning methods with the optimal selection of data preprocessing. It is assumed to use the method of linear convolution of criteria to integrate various performance indicators of models related to manipulative content. The proposed approach will contribute to increasing the reliability of classification and the balance between accuracy and computational efficiency. Summarizing, the research highlights a wide range of threats related to manipulative content and artificial intelligence. This emphasizes the importance of a comprehensive approach to countering such threats, which involves improving detection methods, applying multicriteria analysis, and integrating modern machine learning technologies.

3. Material and methods 3.1. Pareto principle and selection of a criteria set

The Pareto principle, or 80/20 rule, is a fundamental approach to analysing efficiency and optimality in various fields, including multi-criteria analysis and decision-making. It states that in many processes, approximately 80% of the results are due to 20% of the factors. In the context of optimization and efficiency analysis, this principle is used to identify the most important parameters that have the greatest impact on the final result [21, 22].

In multi-criteria analysis, the Pareto principle is applied through the concept of Pareto optimality, which defines a set of solutions that cannot be improved on one criterion without worsening another. This means that no solution in the Pareto set is dominant over another, and the selection of a particular option depends on priorities or additional conditions set by the researcher or the receiving party.

The formation of a Pareto set of criteria involves analysing the set of possible solutions and selecting those that are Pareto-efficient. If a solution is not inferior to others in any criterion and has an advantage in at least one, it is considered Pareto-optimal and is included in the corresponding set. In the case of a large number of criteria and a significant selection of solutions, evolutionary search methods or stochastic approaches can be used.

The criteria influencing the process of determining the optimal method for detecting manipulative content are presented in Table 1.

A feature of the Pareto set is its adaptability to a specific task: it allows one to identify compromise solutions that take into account different aspects of the problem, and provides a basis for further analysis and decision-making. Since none of the options is unambiguously the best according to all criteria, the selection of the final solution is often made taking into account additional priorities or weight factors that represent the significance of each criterion in a specific context.

Figure 1 presents a diagram of the efficiency of the criteria assigned to the methods of detecting manipulative content in the information space.

Efficiency of methods 92% 90% ira88% e itr86% c fo84% t cn82% a tro80% Ipm78% 76% 74%

Logistic regression

SVM

Decision Tree Accuracy

Speed

Resource capacity

Interpretability

Thus, the use of the Pareto principle in multi-criteria intelligent analysis of manipulative content allows for efficient resource allocation, finding optimal solutions, and minimizing trade-offs between conflicting requirements. This is especially relevant in complex text message processing systems, where it is necessary to obtain the interaction of many factors and their influence on the final result.

3.2. Formation of an expert group

The use of expert reviews in the evaluation of methods for intellectual analysis of manipulative content allows one to obtain a quantitative assessment of the importance degree of each of the criteria that form a set of values of factors influencing the process quality [22]. In this case, the scale rating method is used, which provides for obtaining quantitative assessments of the importance degree of each of the factors belonging to a certain set, relative to the scale of their basic (reference) values. In this case, the estimates of the relative importance of each factor are expressed in points on a certain scale. The most commonly used is a 100-point scale, where the maximum possible importance corresponds to a score of 100 points, and the minimum possible one is 0 (zero) points.

When processing expert data, the survey results are summarized in a table, where is the relative importance of the parameter from the point of view of the j -th expert, which is expressed by the corresponding score or rank value (Table 2). where is a number of experts evaluating the importance of factors xi .

The values and Ci can be expressed quantitatively in points or ranks. In the first case, the value is called the average score (average value) of the criterion xi , in the second case it is the average rank. Summing the numbers i, j in the rows with the subsequent division of the obtained result by m gives the average ranking of the factors 1, 2, . . . , , which, in turn, serves as an indicator of the generalized opinion about the importance of the factors (the smaller the sum in the row j is, the more important role the factor i plays). The opposite picture occurs with respect to the sums in the columns.

Since the indicator of the generalized opinion Ci and the reference value essentially differ only in their purpose, in the following, for the sake of simplicity of reasoning, one will consider the centre of grouping of scale scores, considering that this concept covers the previous two.

The method of searching the centre of grouping of expert data on the rating scale for any distribution law uses the average, or weighted average score value. This approach (especially the use of the weighted average value) allows to objectively determine the centre of grouping with a sufficient degree of approximation [20]. However, with a large range of scale values, taking into account all values without exception, as it will be shown below, can lead to a significant shift in the centre of grouping.

Suppose the centre of score grouping for a given distribution of experts by the values provided by them is denoted by C . The following relation is valid:

where: k is a number of experts in the group; h is a step for searching a grouping area; W is a range of score values, to which the number of experts corresponds, is not less than θ k , at the step (0 < < 1).

Suppose one has a scale with values i , where = 0,1,2, . . . , . Then mi is a number of experts who put the i -th value. If the group includes k experts, then ∑ =1 = . In the first step of the search (ℎ = 1) of the center of grouping, pairs of values are defined that satisfy the expression: In this case, the following three logical options are possible: no pair of values at this step satisfies the relation ( 2 ). Then the searching step of the grouping area increases by one, that is, the "weighting" area expands, and the procedure is repeated; there is exactly one area on the scale at this searching step that satisfies the relation ( 3 ). In this case, the grouping area is identified and the centre of grouping is determined as the weighted average of all values belonging to this area: = ( , ℎ, )

, ∈ ℎ

≥ =

∑ ∈ ℎ ⋅

∑ ∈ ℎ =

∑ ∈ ⋅

∑ ∈ ( 2 ) ( 3 ) ( 4 ) ( 5 ) there are several areas satisfying the relation ( 4 ). Then the grouping area is defined as follows: the left boundary represents the smallest value for all values of the these areas, and the right boundary represents their largest value, respectively. The grouping area is identified as the weighted average of all values belonging to the grouping area: 1 8 2 6 3 7 4 3 5 4 6 7 7 4 8 5 9 1 10 2 11 2 12 1 where G is a set of all scale values belonging to the grouping area.

Suppose the total set of criteria for evaluating the process efficiency is 8. Using expert evaluation, a subset of four factors should be selected (Table 3). To solve the problem, a scale from 0 to 8 and the number of experts corresponding to each scale value = 50 are used. number of experts for each area is calculated (Table 4).

Suppose one takes = 0,5, since the 50% comparison is the most common in the expert evaluation method. For the total number of experts = 50, the search step is set ℎ = 1 and the Quantitative composition of experts in each area during the first step

According to the last table, no area satisfies the relation ( 2 ) at a given step. Therefore, the searching step is increased (ℎ = 2) and the expanded areas are calculated (Table 5).

As it can be seen from the data presented, at this step there is also no area satisfying the relation ( 3 ). Therefore, the searching step is increased again and the procedure is repeated until four optimal influencing criteria are determined (Fig. 2).

As it follows from the calculations, there is exactly one area that satisfies the relation ( 4 ), namely area 1-8. It becomes the grouping area.

Using the relation ( 5 ), the value of the centre of grouping is calculated:

1 × 8 + 2 × 6 + 3 × 7 + 4 × 3 + 5 × 4 + 6 × 7 + 7 × 4 + 8 × 5 183 C = = = 4 8 + 6 + 7 + 3 + 4 + 7 + 4 + 5 44 246 With a weighted average value of the entire scale = 50 = 5.

Thus, the method of generalizing scale scores during group examination ensures the formation of the most reliable indicator of generalized opinion for any distribution of expert data – the centre of grouping of scores, which, with a symmetric distribution of scores, coincides with the weighted average value.

3.3. Method of linear convolution of criteria

The multilevel models of the priority influence of technological indicators on the level of performance of publishing and printing processes obtained in the work [24] serve as the basis for making a decision on the importance of a particular factor and its “contribution” to the overall quality of the resulting product. Using mathematical terminology, one has the necessary initial data on the degree of the indicator influence on the process, but they are insufficient for their full practical implementation. Knowledge is essential not only about the conditional importance of the technological factor. It is necessary to study how much effort should be spent on each of the indicators during their interaction to achieve proper process efficiency.

The issues outlined in this paper concern the generation of alternative options based on methods for processing text information messages and selecting the optimal solution in accordance with the specified principle. This task belongs to the field of multi-criteria optimization, which involves making a reasoned decision on selecting the most efficient method for evaluating the message reliability.

To solve this problem, it is necessary to determine a set of criteria, the composition and content of which are formed on the basis of the Pareto principle [25]. The main idea of the approach is that the criteria included in the set of Pareto-optimal solutions cannot be completely dominated by other criteria. That is, there is no criterion that would surpass all the others in all indicators at the same time. Within the framework of the study, based on expert evaluation, mutually non-dominated criteria are selected that form the Pareto set, namely: accuracy, execution speed, resource capacity and interpretability.

According to the methods of decision theory [26, 27], multi-criteria optimization on a set of alternatives

in the presence of objective functions ( ) = 1( ), . . . , ( ) constructing models of utility functions and determining their maximum value: ( ) → 1, .

The process of multi-criteria selection of the optimal alternative is based on the method of linear convolution of criteria, which involves linear combination of partial target functionals f1,..., fm into a single generalized functional: the set of alternatives is a finite set of elements = { 1, 1, . . . , }that the decision-maker the evaluation of alternatives is carried out by utility functions , moreover : → the decision-maker uses criteria ordered by their priority. • • • can list; = 1, .

A set of criteria 1, 2, 3, 4 is formed. The alternative options for the process are methods for evaluating the probability of information messages, denoted by M1, M2, M3. For each option, ( , ) = ∑ =1 ( ) →

; ∈ , ∈ where = {

= ( 1, . . . , ) ; > 0; ∑ =1 = 1}.

The above allows one to formalize the decision-making process by reducing a set of criteria to a single target function, the value of which determines the optimal solution.

Factor weights wi are identified with the numerical values of the corresponding utility functions. To select an alternative, the theorem of the method of multi-criteria utility theory is used, the essence of which is that if the criteria are independent in utility and preference, then there is a utility function.

( ) = ∑ =1 ( ), which serves as a criterion for selecting the optimal option. At the same time ( ) is a multicriteria utility function (0 ≤ ( ) ≤ 1) for the alternative ; ( ) is a utility function of the -th criterion(0 ≤ ( ) ≤ 1); yi is a value of the alternative x by the criterion ; is a weight of the i -th criterion, moreover 0 < < 1, ∑ =1 = 1. following basic assumptions [25]:

In general, the process of multi-criteria alternative selection in decision-making is based on the involves ∈ , = ( 6 ) ( 7 ) 4 =

; = 1,2,3 =1

Additional notations are introduced: wi is the initial weight of the -th criterion, determined on the basis of an expert evaluation of its influence on the priority of the selected methods; are expertly established percentage values of the importance of the -th criterion during the formation of the -th alternative. It is important to note that the following condition must be met for each criterion: 4

= 100 ; = 1,2,3 =1

The evaluation of alternatives by efficiency degrees is represented in Table 6. criterion ( = 1, . . . ,4). Finally, the multi-criteria evaluation of the utility of the -th alternative:

Considering the degrees of influence or efficiency of the criteria in different variants, that is, when applied to each of the selected methods, matrices of pairwise comparisons are constructed according to the method of hierarchy analysis. Processing these matrices allows one to obtain the corresponding utility functions uij , namely: 1 – 11, 12, 13; 2 – 21, 22, 23; 3 – 31, 32, 33; 4 – 41, 42, 43.

An important addition to the previous considerations. It is obvious that the criteria of the Pareto set form a new autonomous group, which requires the calculation of current weight values wi based on the initial numerical priorities. The obtained values will be used for the final calculation of the final target functions.

The formal representation of the pairwise comparison matrix of the initial weights is presented in Table 7. ( 9 )

The sign " / " in the matrix elements means a comparison of the starting weight values of the criteria.

Processing the matrix will allow calculating the normalized weight values of the criteria, which will become the initial data for generating alternative options and determining the optimal method for evaluating the probability of text information messages.

The final multi-criteria evaluations of the utility of alternatives for options M1, M2, M3, obtained on the basis of the formula ( 8 ), are expressed by the relation ( 10 ): 1 = 1 ⋅ 11 + 2 ⋅ 21 + 3 ⋅ 31 + 4 ⋅ 41; 2 = 1 ⋅ 12 + 2 ⋅ 22 + 3 ⋅ 32 + 4 ⋅ 42; ( 10 ) 3 = 1 ⋅ 13 + 2 ⋅ 23 + 3 ⋅ 33 + 4 ⋅ 43.

As noted earlier, the indicator of selecting the optimal method among alternative options for evaluating the probability of messages is the option (method) for which the value of the utility function of the combined partial target functions in the relation ( 6 ) reaches the maximum indicator.

Since the experimental implementation of the above theoretical approaches requires the information about the levels of the criteria preferences, a multi-level graphical model is designed (Fig. 3), which visually reproduces the essence of the criteria and the priority of the influence on the evaluation of text information messages. The preferences of the criteria in the model are reproduced on the basis of the diagram (Fig. 1).

To form the model, the essence of the criteria identified for the study are briefly summarized. Thus, methods for text message processing are characterized by: accuracy, which determines the ability of the method to correctly classify or analyse text data; interpretability, which reflects the clarity and explainability of the results, which is critically important in areas where transparency of decision-making is required; execution speed, which affects the performance of the system and depends on the algorithmic complexity of the calculations; resource capacity, which determines the amount of memory and computing power used, which is especially important when working with large amounts of text data.

Thus, an algorithm for generating and calculating alternative options is constructed, suitable for studying the process of message evaluation using modern methods and functional criteria. The essence of its theoretical foundations is based on mathematical calculations of modelling theory, methods of hierarchy analysis and decision-making, and operations research theory.

4. Experiment, results and discussion

As part of further research, the refined normalized weighted priorities of the model criteria are calculated (Fig. 3), necessary for practical implementation (Table 6). To do this, using the scale of K1 K2 K3 K4 relative importance of objects [27, 28], a pairwise comparison matrix of criteria is formed, taking into account the levels of their preference in the priority model.

Processing the matrix results in obtaining the initial numerical preferences of the criteria expressed in conventional units, namely: K1 (accuracy) – 220 c.u.; K2 (interpretability) – 130 c.u.; 3 (execution speed) – 80 c.u.; 4 (resource capacity) – 50 c.u. At the same time, normalized weight values of the criteria are calculated: 1 = 0,46; 2 = 0,28; 3 = 0,16; 4 = 0,10

As a result of processing the matrix, the following are obtained: the maximum eigenvalue of the matrix , the consistency index = 0,01 and the consistency ratio = 0,01.

The calculation results correspond to the permissible limits of reliability, the essence of which is as follows. The assessment of the obtained solution is determined by the consistency index, the value of which is determined by the formula = ( ()), where n is a number of objects. The ( −1) consistency index value is compared with the reference values of the consistency indicator, the socalled random index (WI ), which depends on the number of objects being compared. In this case, the results are considered satisfactory if the index value does not exceed 10% of the reference value . Comparing the obtained value IU and the reference value = 0,9 for four criteria, and checking the inequality < 0,1 × , one obtains: 0, 01 < 0,1× 0,9 . This confirms the reliability of the obtained results. Additionally, the results are evaluated by the consistency ratio: = . One obtains = 0,01 . The results are considered satisfactory if ≤ 0,1. Therefore, one has a sufficient level of process convergence and proper consistency of expert judgments regarding pairwise comparisons of criteria.

Normalized weight coefficients will be applied when calculating the utility functions of the criteria used in alternative approaches to message evaluation.

Additionally, possible groups of combinations of the efficiency shares of the criteria in alternative options are presented (Table 9), expressed in percentages [25,29].

Based on the data obtained, a basic table for the method of linear convolution of criteria is formed (Table 10).

The resulting data are obtained: λmax

=3,03; IU =0,01; WU =0,03. The utility functions of the criterion K2 “Interpretability” in alternative options are expressed by the following values: 21 = 0,576; 22 = 0,341; 23 = 0,081.

The criterion K3 “Execution speed” will form a similar matrix (Table 13). 20 50 20 20 М2 2 1 7 М2 2 1 1/5 М3 1/4 1/7 1 М3 6 5 1 K1 М1

The matrix for the criterion K2 “Interpretability” looks like this (Table 12).

As a result of processing the matrix, it is obtained: λmax

=3,00; IU =0,00; WU =0,00. The utility of the criterion "Resource capacity" is expressed by the following values: 41 = 0,6; 42 = 0,2; 43 = 0,2.

Calculation indicators, in particular the maximum values of priority vectors λmax , consistency indices IU and consistency ratio WU , meet the established requirements presented above. Substituting the weight coefficients and the values of the utility functions of the criteria into the relation ( 10 ), the final values of the combined functionalities of the alternative options are obtained ( 11 ). 1 = 0.46 ⋅ 0.186 + 0.28 ⋅ 0.576 + 0.16 ⋅ 0.093 + 0.1 ⋅ 0.6; 2 = 0.46 ⋅ 0.097 + 0.28 ⋅ 0.341 + 0.16 ⋅ 0.626 + 0.1 ⋅ 0.2; ( 11 ) 3 = 0.46 ⋅ 0.715 + 0.28 ⋅ 0.081 + 0.16 ⋅ 0.279 + 0.1 ⋅ 0.2.

Finally, one gets: 1 = 0.322; 2 = 0.260; 3 = 0.416.

Among the selected alternative methods for evaluating the probability of information messages, the third option, i. e. the method M3 – SVM (support vector machine) is considered the most efficient, since it is characterized by the maximum value of the combined functional 3. The theoretical justification of this option indicates its ability to provide a high level of reliability of the obtained score. Important parameters that determine the method efficiency are its ability to accurately evaluate the veracity of messages and efficiently distribute computational resources. It is the combination of such criteria as "accuracy" and "resource capacity" that makes this option the best among other alternatives.

The functional model of the algorithm of the probable software solution is shown in Fig. 4.

This model will take into account a multifactor analysis of the characteristics of input data and adaptively select the most effective approach depending on the context. It also ensures integration with machine learning systems to enhance the accuracy of assessment and reduce the rate of falsepositive results. For this purpose, machine learning algorithms such as classification methods, regression, and neural networks are utilized, allowing the detection of hidden patterns in large data sets. Additionally, the model can apply reinforcement learning techniques to dynamically adjust evaluation criteria based on changing conditions in the information environment.

Thus, based on the results of the conducted study, the most effective method for assessing the credibility of messages is the Support Vector Machine (SVM). This method is based on finding a hyperplane that maximally separates different data classes in the feature space, making it optimal for binary classification tasks, such as distinguishing reliable information from disinformation or manipulative content. However, the application of SVM for message credibility assessment also has certain limitations. One of the key challenges is the complexity of interpreting the results, as SVM does not provide clear explanations regarding which specific text features influenced the classification. Additionally, this method is sensitive to imbalanced datasets, where one category of messages significantly outweighs the other, which can distort classification results.

It should be emphasized that the final selection of the optimal option within the method of linear convolution of criteria can be improved by developing an appropriate software application, which will allow automating the process of calculating the values of combined functionals for various alternative methods and will provide the ability to take into account a wide range of options for combinations of criteria efficiency degrees.

The application of machine learning methods in this process is extremely important, as they enable the analysis of large volumes of data, the identification of hidden dependencies between criteria, and the automatic adjustment of weight coefficients to improve assessment accuracy. Furthermore, the use of machine learning will facilitate the model's adaptation to changing conditions and enhance its ability to self-learn, ensuring a more flexible and reliable approach to selecting the optimal evaluation method. A promising direction for future research is the refinement of the proposed approach through the integration of advanced artificial intelligence methods and hybrid algorithms that combine classical statistical models with neural network technologies. This will improve the system’s adaptability to dynamic changes in the information environment, increase its resistance to manipulative influences, and provide a more accurate and well-founded assessment of message credibility in real-time.

In conclusion, it is worth emphasizing that the research is based on the concept of the relationship between the selection of the optimal method for evaluating the probability of information messages and the combination of efficiency shares of the applied criteria. This approach makes it possible to identify patterns that affect the accuracy and validity of the intellectual analysis of manipulative content in the information space. The recommended algorithmic decision to the automated solution of the problem provides an increase in the level of the evaluation objectivity, reducing the influence of subjective factors. In addition, the proposed solution provides the possibility of flexible parameter settings, which allows adapting the system to the specific conditions and requirements of a specific research task, contributing to increasing its efficiency and practical value.

5. Conclusions

The results of the study confirm the relevance of the problem of evaluating methods of intellectual analysis and detecting manipulative content in the information space. Determining the most efficient method is a key task for ensuring the information security and countering disinformation. Given the constantly growing threats in the information environment, combating manipulation and disinformation requires not only the improvement of technical analysis methods but also the development of strategies. This includes educational programs, increasing media literacy among the population, and active collaboration between government institutions and scientific organizations, which will enable effective resistance to manipulative content and ensure sustainable information security.

The study proposes an approach based on the formation of a Pareto set of mutually nondominated criteria, which allows one to objectively evaluate the efficiency of machine learning methods, in particular logistic regression, decision trees and the support vector method (SVM). The generation of alternatives is carried out by identifying options determined by the selected methods taking into account expertly established efficiency degrees of key criteria, such as accuracy, execution speed, resource capacity and interpretability.

The application of the method of hierarchies analysis to the developed model of the priority influence of criteria on the process of analysis, evaluation and detection of manipulative content provides the calculation of the utility functions of the criteria for each of the methods. As a result, combined functionalities of alternative options are formed, which act as the main optimization criterion. The use of multi-criteria analysis methods, in particular the method of linear convolution of criteria, provides a balanced approach to selecting the optimal method for evaluating the probability of information messages, contributing to increasing the accuracy and validity of decisions made.

The results of the study indicate that the selection of the optimal method largely depends on the specific requirements for the evaluation system. In particular, in cases where accuracy is the priority criterion, it is advisable to use the support vector method (SVM), while to ensure high data processing speed it is more expedient to use decision trees. Logistic regression, in turn, provides a balance between accuracy and interpretability, which makes it an efficient tool for analysing text messages.

In view of the prospects for further development, the approach proposed in the study makes it possible to adapt the evaluation system to the changing conditions of the information environment, integrate additional analysis criteria and use modern machine learning algorithms to automate the content evaluation process. The results obtained can be used to develop automated information space monitoring systems, which will contribute to the timely detection of manipulative content and increase the level of information security.

The proposed approaches can be integrated into software solutions for intelligent analysis and evaluation of the level of manipulative content in the information space, which is especially relevant in the context of the growing information influence on the public opinion and political processes. Further research should be directed at expanding the approaches by using the latest methods, in particular deep learning and hybrid models, which can potentially provide significantly higher efficiency in detecting manipulative content.

Declaration on Generative AI

The authors have not employed any Generative AI tools.

[1]

Aïmeur ,

Amri , G. Brassard. Fake news, disinformation and misinformation in social media: a review . Social Network Analysis and Mining , 13 ( 1 ) ( 2023 ) 30 . doi: 10 .1007/s13278-023-01028-5.

[2]

Chen ,

Xia . Social network behavior and public opinion manipulation . Journal of Information Security and Applications , 64 ( 2022 ) 103060 . doi: 10 .1016/j.jisa. 2021 . 103060 .

[3]

Basol ,

Roozenbeek ,

Linden . Good news about bad news: Gamified inoculation boosts confidence and cognitive immunity against fake news . Journal of Cognition , 3 ( 1 ) ( 2020 ) 2 . doi: 10 .5334/joc.91.

[4]

Rampersad ,

Turki . Fake news: Acceptance by demographics and culture on social media . Journal of Information Technology & Politics , 17 ( 1 ) ( 2020 ) 1 - 11 . doi: 10 .1080/19331681. 2019 . 1686676 .

[5]

Jones-Jang ,

Mortensen , J. Liu. Does media literacy help identification of fake news? Information literacy helps, but other literacies don't . American Behavioral Scientist, 65 ( 2 ) ( 2021 ) 371 - 388 . doi: 10 .1177/0002764219869406.

[6]

Ozkan-Okay , E. Akin, Ö. Aslan,

Kosunalp ,

Iliev , I. Stoyanov. A comprehensive survey: Evaluating the efficiency of artificial intelligence and machine learning techniques on cyber security solutions . IEEe Access , 12 ( 2024 ) 12229 - 12256 . doi: 10 .1109/ACCESS. 2024 . 3355547 .

[7]

Caled ,

M. J.

Silva . Digital media and misinformation: An outlook on multidisciplinary strategies against manipulation . Journal of Computational Social Science , 5 ( 2022 ) 123 - 159 . doi: 10 .1007/s42001-021-00118-8.

[8]

S. R.

Sahoo ,

B. B.

Gupta . Multiple features based approach for automatic fake news detection on social networks using deep learning . Applied Soft Computing , 100 ( 2021 ) 106983 . doi: 10 .1016/j.asoc. 2020 . 106983 .

[9]

Mishra ,

Shukla , R. Agarwal. Analyzing machine learning enabled fake news detection techniques for diversified datasets . Wireless Communications and Mobile Computing , 2022 (1) ( 2022 ) 1575365 . doi: 10 .1155/ 2022 /1575365.

[10]

Dhiman ,

Kaur ,

Iwendi ,

S. K.

Mohan . A scientometric analysis of deep learning approaches for detecting fake news . Electronics , 12 ( 4 ) ( 2023 ) 948 . doi: 10 .3390/electronics12040948.

[11]

Hovorushchenko ,

Alekseiko ,

Shvaiko ,

Ilchyshyna ,

Kuzmin . Information system for earth's surface temperature forecasting using machine learning technologies . Computer Systems and Information Technologies , 4 ( 2024 ) 51 - 58 . doi: 10 .31891/csit-2024 -4-7.

[12]

Khanam ,

B. N.

Alwasel ,

Sirafi ,

Rashid . Fake news detection using machine learning approaches . IOP Conference Series: Materials Science and Engineering , 1099 ( 1 ) ( 2021 ) 012040 . doi: 10 .1088/ 1757 -899X/1099/1/012040.

[13]

Naitali ,

Ridouani ,

Salahdine ,

Kaabouch . Deepfake attacks: Generation, detection, datasets, challenges, and research directions. Computers , 12 ( 10 ) ( 2023 ) 216 . doi: 10 .3390/computers12100216.