=Paper=
{{Paper
|id=Vol-3942/S_09_Korbicz
|storemode=property
|title=
Application of SAS Text Miner for the analysis of citizens' appeals in the system of social protection and social security
|pdfUrl=https://ceur-ws.org/Vol-3942/S_09_Korbicz.pdf
|volume=Vol-3942
|authors=Józef Korbicz,Oleksii Sholokhov,Roman Koval,Oleksii Zarudnyi
}}
==
Application of SAS Text Miner for the analysis of citizens' appeals in the system of social protection and social security
==
Application of SAS Text Miner for the analysis of citizens'
appeals in the system of social protection and social
security⋆
Józef Korbicz1 , Oleksii Sholokhov2, ∗, Roman Koval3, Oleksii Zarudnyi3
1
University of Zielona Góra, 9 Licealna Street, Zielona Góra, 65-417, Republic of Poland
2
Taras Shevchenko National University of Kyiv, 64/13 Volodymyrska Street, Kyiv, 01601, Ukraine
3
Institute of Telecommunications and Global Information Space of the National Academy of Sciences of Ukraine, 13
Chokolovsky Blvd., Kyiv, 03186, Ukraine
Abstract
Issues of social protection and social security have always been among the most urgent for all, without
exception, social strata. In the conditions of the war, this sphere acquired special importance. After all, the
effectiveness of the state policy of social protection and social security depends not only on the well-being
of citizens and the balanced development of society, but also on ensuring national security. During the war,
the amount of spending on social protection and social security increased significantly and will continue to
increase, despite the limited budgetary funding. Therefore, special attention needs to be paid to the
targeting of funds for social protection and social security, as well as control over the targeting of state
assistance. In the conditions of war, conducting sociological research, surveys, and personal reception of
citizens becomes much more difficult. Taking into account the fact that a significant number of the
population uses various social networks, digital platforms of state institutions and organizations, etc., the
research of the online environment becomes a promising direction of work with citizens' appeals.
Therefore, having information from Internet sources, it is possible to investigate problems that are
significant for different social groups, to analyze the moods and expectations of the population. But at
present, there are practically no software products in the social security system designed to analyze textual
information presented in citizens' appeals.
The work proposes a method of building an analytical model for the study of social protection and social
security problems that require special attention from the state, using means of analyzing textual
information from Internet sources and building classification models.
Keywords
Text clustering, linguistic rules, intelligent data analysis, social protection and social security, information
technology
1. Problems of automation and processing of citizens' appeals in the
social sphere
Information and analytical activity in the conditions of deepening digitalization of society is
becoming an increasingly important component of the system of social protection and social
security, which in turn, as noted by domestic and foreign experts [14-16], requires its constant
modernization, introduction of modern models, methods and information technology. The
introduction of the "Unified Information System of the Social Sphere" [17] was a new step towards
the end-to-end digitalization of the pension system and social protection of the population. The
purpose of the introduction of the System is to "ensure integral automation of processes in the social
8th International Scientific and Practical Conference Applied Information Systems and Technologies in the Digital Society
AISTDS’2024, 2024, October 1, Kyiv, Ukraine
*
Corresponding author.
J.Korbicz@issi.uz.zgora.pl (J. Korbicz) gyroalex@knu.ua (O. Sholokhov); roman.koval.science@gmail.com (R. Koval);
oleksii.zarudnyi@gmail.com (O. Zarudnyi)
2338-9598-800 (J. Korbicz); 0000-0002-8676-3724 (O. Sholokhov); 0009-0003-3821-3378 (R. Koval); 0009-0008-7462-3899
(O. Zarudnyi)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
sphere by optimizing and developing electronic information interaction of the subjects of the Unified
System aimed at ensuring transparency of the social sphere, digitalization of the social support
market and increasing the level of its availability for persons who need it" [17 ].
The development of the Unified Information System of the Social Sphere [1] involves the creation
of a unified information and reference environment for recipients of social support. An important
place is occupied by the subsystem of working with citizens' appeals, because only in January-
September 2024, the Pension Fund of Ukraine registered 504,856 appeals from citizens on issues, of
which 229,537 (or 45.5 percent) were electronic appeals [2].
Therefore, the issue of developing methods, models, information technologies for the analysis of
textual information from citizens' electronic appeals to institutions of social protection and social
security, Internet sources, identifying issues that are most important for those who need state
support, is urgent and of practical importance. [18-20].
2. Statement of the research problem
The paper proposes a method of using text analytics tools to build an analytical model for the
classification of text information in the task of analyzing citizens' appeals to the Pension Fund of
Ukraine.
3. Methods and results
In the course of the study, the practical task of determining the need for social protection and social
security of residents of different regions of Ukraine and refugees was considered. SAS Text Miner
tools [21-23] were used to analyze text information.
Incoming information is electronic appeals from citizens that have arrived at the web portal of
electronic services of the Pension Fund of Ukraine and the state institution "Government Contact
Center [2]. The materials of Internet publications, different in subject matter and audience, both state
and non-state, were also examined, from which 162 were selected (names of sources and references
to them are presented in Table 1.
Table 1
List of Internet sources, information from which was used for analysis
Texts
N Name of the source Resource address
number
https://www.ukrinform.ua/
1 UkrInform 50
rubric-society
2 Public. News https://suspilne.media 25
Website of the international scientific publication
3 "Financial and credit activity: problems of theory https://fkd.net.ua 7
and practice"
The newspaper "Government Courier" is the official
https://ukurier.gov.ua/uk/a
4 printed publication of the Cabinet of Ministers of 30
rticles
Ukraine.
The official website of the Kyiv Regional Council of
5 http://korps.com.ua 5
Professional Unions
6 Official website of the National Bank of Ukraine https://knpf.bank.gov.ua 10
7 The official site of the magazine "Forbes Ukraine" https://forbes.ua 15
Website of the electronic publication "Sudovo-
8 https://sud.ua 20
yuridychna Gazeta"
Based on the analysis of texts related to issues of social protection and social security posted on
the specified Internet resources and in electronic applications, six clusters were obtained.
The first cluster includes texts that contain issues related to the pension reform. The most
characteristic words and phrases for this cluster were: "reform", "insurance payments", "insurance
experience", "mandatory pension savings".
The second cluster includes words and phrases describing the issue of accrual and payment of
pensions and social benefits by the Pension Fund of Ukraine: "timely payment of pensions",
"voluntary contributions to pension insurance", "minimum pension", "indexation of pensions",
"increase of pensions", "housing subsidy", "financing of current payments", "recalculation of pensions
for working pensioners".
The third cluster summarizes the problems of social protection of internally displaced persons.
The most characteristic are such words and phrases as "IDPs", "identification", "liberated territories",
"payments to displaced persons", "inhabitants of the occupied Crimea", "UN World Food Program",
"temporarily uncontrolled territories".
The fourth cluster includes words and phrases describing problems related to losses due to
military conflict: "military serviceman", "policeman", "combat zone", "missing person", "loss of
breadwinner", "family members of the deceased" ".
For the fifth cluster, the issues of social protection and social security of refugees are "relevant",
in particular, "pension abroad", "work outside Ukraine", "proportional calculation of insurance
experience", "insurance experience received in other countries".
The sixth cluster summarizes issues related to the victims of the accident at the Chernobyl NPP:
"accident", "ChNPP", "Chernobyl".
Based on the preliminary analysis of the texts of the appeals, a corpus of texts was formed, a
fragment of which is given in the table. 2.
Table 2
Frequency matrix of terms for the corpus of texts, built on the basis of the corpus of texts formed
from electronic appeals of citizens
Number of mentions in the document:
Marking Term
d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
t1 court 1 0 0 0 0 0 1 2 0 0
t2 allowances 1 0 1 1 0 0 1 0 2 0
t3 military 0 1 0 0 2 1 0 0 0 0
t4 monetary support 0 1 0 0 1 0 0 0 2 0
t5 pension 0 1 0 1 2 2 1 0 1 1
law enforcement
t6 0 1 0 0 1 0 0 0 0 0
officers
t7 the former 0 1 0 0 1 0 0 0 0 0
t8 accident 0 0 1 1 0 0 0 0 0 1
Chernobyl Nuclear
t9 0 0 1 2 0 0 0 0 0 1
Power Plant
t10 Ukraine 0 0 1 0 0 1 0 0 0 0
t11 received 0 0 0 1 1 0 0 0 0 0
t12 service 0 0 0 0 1 1 0 0 0 0
To solve the problem of reducing the dimensionality and sparsity of the frequency matrix of the
corpus of texts, the method of singular distribution (SVD) was used [3-5]. After all, documents
usually use a fairly small set of terms that describe a certain subject area. Therefore, if in the diagonal
matrix of singular values (S) we leave exactly k of the first diagonal elements, and assign the value
zero to the rest, then the use of the SVD method gives an optimal approximation. In the diagonal
matrix of singular values S, the values are ordered, namely, 𝑠𝑠1 ≥ 𝑠𝑠2 ≥ … ≥ 𝑠𝑠𝑘𝑘 , that is, if you leave
the first two values, then assign the value zero to the others. On the basis of the obtained matrix S,
it is possible to calculate the percentage contribution of the dimension described by the
corresponding singular value to the explanation of the data.
On the basis of the obtained matrix S, it is possible to calculate the amount in percent that the
corresponding dimension, which is described by the corresponding singular value, contributes to the
explanation of the data (table 3). The value of the column "Percentage of value contribution to the
explanation of data variability" is calculated as the value of "Square of the singular value" divided by
the sum of the values of the squares of the singular values, multiplied by 100%.
As can be seen from the obtained results, table 3, if only the two basic dimensions are left, a total
of 66.16% of the data variability will be explained.
Table 3
Analysis of the obtained singular values
The percentage of
Cumulative
Measurement Singular value value contribution to
Singular value value of
number square the explanation of
deposit interest
data variability
1 5.1435 26.45 45.61 45.61
2 3.4526 11.92 20.55 66.16
3 2.7696 7.67 13.23 79.38
4 2.3736 5.63 9.71 89,11
5 1.7711 3.13 5.41 94.51
6 1.2251 1.5008 2.58 97.09
7 1,029 1.0588 1.82 98.92
8 0.684 0.4678 0.81 99.73
9 0.371 0.1376 0.23 99.96
10 0.1352 0.0182 0.03 100
In this case, all documents can be located in two-dimensional space and determine the clusters
that they form according to the degree of similarity and belonging to a certain topic (Fig. 1).
Figure 1: Location of terms in two-dimensional space.
As can be seen from fig. 1, the first dimension explains 45.61% of the data variability; the second
dimension explains 20.55% of the data variability. As a result, three thematic clusters were formed,
which included documents based on the similarity of the use of terms [6-9].
The SAS Text Miner system was used in this study. When using the SAS Text Miner software, a
technological project is built in which the following steps are performed:
1. Loading data.
2. Text parsing.
3. Text filtering.
4. Text clustering.
The technological process of analyzing the corpus of texts for the purpose of their clustering is
presented in fig. 2.
Figure 2: Technological process of text corpus analysis in the SAS Text Miner system.
The constructed rules for the corresponding clusters are generated in the form of the following
program code:
F_TextCluster_cluster_ =1 ::
(OR
, "reform"
, "insurance"
, (AND, (OR, "payments", "seniority") )
, "accumulation"
, (AND, (OR, "pensionable", "mandatory") )
F_TextCluster_cluster_ =2 ::
(OR
, "voluntary"
, (AND, (OR, "payments" , "pension"))
, "timely"
, (AND, (OR, "contributions" , "pension" , "insurance", "recalculation"))
, "pension"
, (AND, (OR, "minimum" , "index" , "increment"))
, "subsidy"
, (AND, (OR, "residential"))
, "current"
, (AND, (OR, "payment" , "funding"))
F_TextCluster_cluster_ =3 ::
(OR
, "identification"
, (AND, (OR, "refugee" , "displaced person". "payments"))
, "resident"
, (AND, (OR, "Crimea" , "uncontrolled" , "territory" , "temporary"))
, "UN"
, (AND, (OR, "global" , "food" , "program"))
F_TextCluster_cluster_ =4 ::
(OR
, (AND, (OR, "serviceman" , "military", "policeman"))
, "zone"
, (AND, (OR, "combat" , "actions"))
, (AND, (OR, "missing" , "missing"))
, "deceased"
, (AND, (OR, "loss" , "breadwinner" , "members" , "family"))
F_TextCluster_cluster_ =5 ::
(OR
, "pension"
, (AND, (OR, "border", "borders", "others", "countries"))
, "experience"
, (AND, (OR, "calculation" , "insurance" , "proportional"))
F_TextCluster_cluster_ =6 ::
(OR
, "accident"
, (AND, (OR, "CHAES" , "nuclear" , "power plant"))
, "Chernobyl"))))
The statistical characteristics of the built classification model based on linguistic rules were
calculated separately for the training and test data sets: the ratio is 70% for training and 30% for
testing, i.e. 114 and 48 texts, respectively.
The results are summarized in Table 3.
Table 3
Statistical characteristics of the classification model of the studied texts
Data set
Statistics
training Test
TP (True Positive) 30 11
TN (True Negative) 67 26
FP (false positive) 10 6
FN (false negative) 7 5
MISC,% (proportion of incorrectly
15 23
classified values)
Ginny 0.82 0.71
ROC 0.79 0.67
The image of the ROC curve for the text information classification model based on linguistic rules
is presented in Fig. 3.
ROC-
characteristics of
the model on the
training set
ROC-
characteristics of
the model on the
test set
The reference line
is 50 for 50 percent
of the occurrence of
the event
Figure 3: ROC curve for the built classification model based on linguistic rules.
The constructed linguistic rules were used to cluster news texts that were published on the
Internet from September 2023 to September 2024. In general, about 10,000 tons were unloaded and
processed. texts on social protection and social security of Ukrainians.
After clustering the texts, the number of texts belonging to contributors from a certain region
was calculated for each cluster. The obtained values were normalized on a scale from 0 to 100
according to formula (1):
𝑛𝑛𝑖𝑖
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑖𝑖 = |∀
, (1)
max(𝑛𝑛𝑖𝑖 𝑖𝑖)
where 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑖𝑖 – the popularity of the texts of the corresponding cluster for the i-th region, 𝑛𝑛𝑖𝑖
– the number of texts by region, max(𝑛𝑛𝑖𝑖 |∀ 𝑖𝑖) – maximum number texts by all regions.
The results of the calculations are presented in Table 4.
Table 4
Results of cluster analysis of textual information on issues of social protection and social security by
regions of Ukraine
Popularity of the texts of the corresponding cluster
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6
(pension (accrual (problems (issues (issues of (issues
reform) and of social related to social related to
Name of the region payment of protection losses due protection victims of
pensions of to and social the accident
and social internally military security at the
benefits by displaced conflict) of Chernobyl
the Pension persons) refugees) nuclear
Fund of power
Ukraine) plant)
Vinnytsia region 94 65 24 72 79
Volyn region 87 57 20 100 63
the city of Kyiv 82 49 32 37 26 33
the city of
- - - - - -
Sevastopol
Dnipropetrovsk
58 39 43 33 14
region
Donetsk region 27 32 59 37
Zhytomyr region 94 73 19 34 62 88
Transcarpathian
67 45 29 40 75
region
Zaporizhzhia
58 39 90 30
region
Ivano-Frankivsk
87 66 24 63 72
region
Kyiv region 84 42 28 37 37 100
Kirovohrad region 92 88 32 73 46
Autonomous
- 1 1 - - -
Republic of Crimea
Luhansk region 22 8
Lviv region 73 45 20 60 57
Mykolayiv region 76 70 64 47 18
Odesa region 32 24 27 13 13
Poltava region 75 63 32 73 42 77
Rivne region 100 64 17 81 100
Sumy region 92 100 52 43 30
Ternopil region 50 56 24 63
Kharkiv region 47 35 100 15 9
Kherson region 71 62 89
Khmelnytskyi
87 55 28 78 73
region
Cherkasy region 87 47 30 50 55 74
Chernihiv region 81 58 24 50 29
Chernivtsi region 83 31 25 1 61
The results of the analysis presented in the table can be visualized using SAS tools Enterprise
Guide 7.1 (fig. 4-9).
Figure 4: Cluster 1 - popularity of texts on " Pension reform " by regions of Ukraine.
Figure 5: Cluster 2 - the popularity of texts on the topic "Questions related to the pension fund in
general" by regions of Ukraine.
Figure 6: Cluster 3 - popularity of texts on the topic "Problems related to IDPs" by regions of Ukraine.
.
Figure 7: Cluster 4 - the popularity of texts on the topic "Issues related to the military and police"
by regions of Ukraine.
Figure 8: Cluster 5 - the popularity of texts on the topic "Questions regarding the payment of
pensions abroad" by regions of Ukraine.
Figure 9: Cluster 6 - the popularity of texts on the topic "Issues related to pensions for victims of the
accident at the ChAES" by regions of Ukraine.
4. Declaration on Generative AI
The authors have not employed any Generative AI tools.
5. Conclusion
The proposed method of textual information analysis using text tools mining designed for automated
processing of large volumes of texts on a certain topic. The use of text analytics allows you to deepen
your knowledge of the subject area by using unstructured data. In this study, the problem of
dimensionality and sparsity of the frequency matrix of the corpus of texts is solved using the key
theorem of linear algebra - the singular matrix decomposition (SVD) method. Pre-executed.
frequency weighting operation, which helped to partially solve the problem of unevenness of high-
frequency terms, making them less influential. This made it possible to obtain results of classification
of textual information of high quality.
Therefore, the use of intellectual analysis of large volumes of textual data allows to identify the
most important problems that require a priority solution, to find out for which categories of the
population they are most relevant. The obtained results can be further used during the planning of
social expenditures of budgets of different levels, in the model of actuarial calculations, during the
planning of social expenditures of budgets of various levels. The proposed approach can improve the
quality of forecasts in modern conditions, when there is no complete information about the
investigated process or phenomenon or the information is distorted.
References
[1] Shapovalova T. The concept and content of social protection and social security of the
population in modern Ukraine. Economic analysis. 2022. Volume 32. No. 3. P. 123-130.
https://doi.org/10.35774/econa2022.03.123 (ukr)
[2] Gren T. I. Peculiarities of implementation of the policy of social protection of territories in war
conditions. Academic notes of TNU named after V.I. Vernadskyi. Series: Public management
and administration. 2022. Volume 33 (72) No. 6. P. 81-84. https://doi.org/10.32782/TNU-2663-
6468/2022.6/13 (ukr)
[3] Expenditures on social assistance. URL:
https://mof.gov.ua/uk/expenditures_on_social_assistance (ukr)
[4] Smush-Kulesha M. Fedorova A., Moysa B. Social rights in Ukraine during the war. Report on
needs assessment. Council of Europe. 2022, 64 p. URL : https://rm.coe.int/needs-assessment-ua-
2/1680a9b408 (ukr)
[5] On the approval of the Regulation on the Unified Information System of the Social Sphere.
Resolution of the Cabinet of Ministers of Ukraine dated April 14, 2021 No. 404. URL:
https://zakon.rada.gov.ua/laws/show/404-2021-п#Text (ukr)
[6] Report on appeals of citizens for 9 months of 2024. URL: https://www.pfu.gov.ua/2167929-zvit-
pro-zvernennya-gromadyan-za-9-misyatsiv-2024-roku/ (ukr)
[7] Sharma S., JainRole A. Role of sentiment analysis in social media security and analytics. WIREs
Data Mining and Knowledge Discovery: Vol. 10, Issue 5. https://doi.org/10.1002/widm.1366
[8] Shkurko O. IN. Types of linguistic text analysis: teaching. manual Dnipro: Univ. Alfred Nobel,
2018. 119 p. (ukr)
[9] Perebijnis V. AND. Statistical methods for linguists: training. manual Vinnytsia: Nova Kniga,
2013. 176 p. (ukr)
[10] Lande D. IN. Elements of computer linguistics in legal informatics. Kyiv: NDIIP National
Academy of Sciences of Ukraine, 2014. 168 p. (ukr)
[11] Find the information that matters using natural language processing (NLP). URL:
https://www.sas.com/ru_ua/software/visual-text-analytics.html
[12] Survey of Text Mining I: Clustering, Classification, and Retrieval / Ed. by MW Berry. Springer,
2003. 261 p.
[13] Aggarwal CC, Zhai C. Mining Text Data. Springer, 2012. 527 p.
[14] Text Cluster Node Results. URL:
https://documentation.sas.com/?docsetId=tmref&docsetTarget=n1d7r58qug6sefn162cu6cqx0nq
4.htm&docsetVersion=14.3&locale=en
[15] Emerging Technologies of Text Mining: Techniques and Applications / Ed. by HA Do Prado, E.
Ferneda. Idea Group Reference, 2007. 358 p.
[16] Valls Martínez, MdC, Santos-Jaén, JM, Amin, F.-u., Martín-Cervantes, PA Pensions, Aging and
Social Security Research: Literature Review and Global Trends. Mathematics 2021, No. 9, 3258.
https://doi.org/10.3390/math9243258
[17] Social Protection Systems. Ed. E. Schüring, M. Loewe. Elgar Publishing. 2021. 776 p.
https://doi.org/10.4337/9781839109119
[18] Official website of the Ministry of Digital Transformation of Ukraine. URL :
https://thedigital.gov.ua (ukr)
[19] On the approval of the Regulation on the Unified Information System of the Social Sphere.
Resolution of the Cabinet of Ministers of Ukraine dated April 14, 2021 No. 404. URL :
https://zakon.rada.gov.ua/laws/show/404-2021-п#Text (ukr)
[20] Gladun A. Ya., Rogushina Yu. IN. Data mining : searching for knowledge in data: a tutorial. Kyiv:
ADEF-Ukraine, 2016. 451 p. (ukr)
[21] Lytvyn V.V., Pasichnyk V.V., Nikolskyi Yu.V. Analysis of data and knowledge: training. manual
Lviv: Magnolia 2006, 2017. 276 p. (ukr)
[22] Analysis and processing of data flows by means of computational intelligence: monograph / Ye.
IN. Bodyanskyi et al. Lviv: View of Lviv. polytechnics, 2016. 235 p. (ukr)
[23] Text analytics using SAS Text Miner: course notes. NC.: SAS Institute, 2014. 218 p.
[24] Getting Started with SAS® Text Miner 12.1 URL:
https://support.sas.com/documentation/onlinedoc/txtminer/12.1/tmgs.pdf
[25] Matignon R. Data Mining Using SAS Enterprise Miner. URL: https://www.amazon.com/Data-
Mining-Using-Enterprise-Miner/dp/0470149019
[26] Sharma S., JainRole A. Role of sentiment analysis in social media security and analytics. WIREs
Data Mining and Knowledge Discovery: Vol. 10, Issue 5. https://doi.org/10.1002/widm.1366
[27] Find the information that matters using natural language processing (NLP). URL:
https://www.sas.com/ru_ua/software/visual-text-analytics.html