<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>J. Korbicz);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Application of SAS Text Miner for the analysis of citizens' appeals in the system of social protection and social security⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Józef Korbicz</string-name>
          <email>J.Korbicz@issi.uz.zgora.pl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksii Sholokhov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman Koval</string-name>
          <email>roman.koval.science@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksii Zarudnyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Telecommunications and Global Information Space of the National Academy of Sciences of Ukraine</institution>
          ,
          <addr-line>13 Chokolovsky Blvd., Kyiv, 03186</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>64/13 Volodymyrska Street, Kyiv, 01601</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Zielona Góra</institution>
          ,
          <addr-line>9 Licealna Street, Zielona Góra, 65-417</addr-line>
          ,
          <country>Republic of Poland</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Issues of social protection and social security have always been among the most urgent for all, without exception, social strata. In the conditions of the war, this sphere acquired special importance. After all, the effectiveness of the state policy of social protection and social security depends not only on the well-being of citizens and the balanced development of society, but also on ensuring national security. During the war, the amount of spending on social protection and social security increased significantly and will continue to increase, despite the limited budgetary funding. Therefore, special attention needs to be paid to the targeting of funds for social protection and social security, as well as control over the targeting of state assistance. In the conditions of war, conducting sociological research, surveys, and personal reception of citizens becomes much more difficult. Taking into account the fact that a significant number of the population uses various social networks, digital platforms of state institutions and organizations, etc., the research of the online environment becomes a promising direction of work with citizens' appeals. Therefore, having information from Internet sources, it is possible to investigate problems that are significant for different social groups, to analyze the moods and expectations of the population. But at present, there are practically no software products in the social security system designed to analyze textual information presented in citizens' appeals. The work proposes a method of building an analytical model for the study of social protection and social security problems that require special attention from the state, using means of analyzing textual information from Internet sources and building classification models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Text clustering</kwd>
        <kwd>linguistic rules</kwd>
        <kwd>intelligent data analysis</kwd>
        <kwd>social protection and social security</kwd>
        <kwd>information technology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>modern</p>
      <p>models, methods and information technology. The
introduction of the "Unified Information System of the Social Sphere" [17] was a new step towards
the end-to-end digitalization of the pension system and social protection of the population. The
purpose of the introduction of the System is to "ensure integral automation of processes in the social
8th International Scientific and Practical Conference Applied Information Systems and Technologies in the Digital Society
AISTDS’2024, 2024, October 1, Kyiv, Ukraine
* Corresponding author.
sphere by optimizing and developing electronic information interaction of the subjects of the Unified
System aimed at ensuring transparency of the social sphere, digitalization of the social support
market and increasing the level of its availability for persons who need it" [17 ].</p>
      <p>
        The development of the Unified Information System of the Social Sphere [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] involves the creation
of a unified information and reference environment for recipients of social support. An important
place is occupied by the subsystem of working with citizens' appeals, because only in
JanuarySeptember 2024, the Pension Fund of Ukraine registered 504,856 appeals from citizens on issues, of
which 229,537 (or 45.5 percent) were electronic appeals [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Therefore, the issue of developing methods, models, information technologies for the analysis of
textual information from citizens' electronic appeals to institutions of social protection and social
security, Internet sources, identifying issues that are most important for those who need state
support, is urgent and of practical importance. [18-20].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Statement of the research problem</title>
      <p>The paper proposes a method of using text analytics tools to build an analytical model for the
classification of text information in the task of analyzing citizens' appeals to the Pension Fund of
Ukraine.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods and results</title>
      <p>In the course of the study, the practical task of determining the need for social protection and social
security of residents of different regions of Ukraine and refugees was considered. SAS Text Miner
tools [21-23] were used to analyze text information.</p>
      <p>
        Incoming information is electronic appeals from citizens that have arrived at the web portal of
electronic services of the Pension Fund of Ukraine and the state institution "Government Contact
Center [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The materials of Internet publications, different in subject matter and audience, both state
and non-state, were also examined, from which 162 were selected (names of sources and references
to them are presented in Table 1.
      </p>
      <p>Based on the analysis of texts related to issues of social protection and social security posted on
the specified Internet resources and in electronic applications, six clusters were obtained.</p>
      <p>The first cluster includes texts that contain issues related to the pension reform. The most
characteristic words and phrases for this cluster were: "reform", "insurance payments", "insurance
experience", "mandatory pension savings".</p>
      <p>The second cluster includes words and phrases describing the issue of accrual and payment of
pensions and social benefits by the Pension Fund of Ukraine: "timely payment of pensions",
"voluntary contributions to pension insurance", "minimum pension", "indexation of pensions",
"increase of pensions", "housing subsidy", "financing of current payments", "recalculation of pensions
for working pensioners".</p>
      <p>The third cluster summarizes the problems of social protection of internally displaced persons.
The most characteristic are such words and phrases as "IDPs", "identification", "liberated territories",
"payments to displaced persons", "inhabitants of the occupied Crimea", "UN World Food Program",
"temporarily uncontrolled territories".</p>
      <p>The fourth cluster includes words and phrases describing problems related to losses due to
military conflict: "military serviceman", "policeman", "combat zone", "missing person", "loss of
breadwinner", "family members of the deceased" ".</p>
      <p>For the fifth cluster, the issues of social protection and social security of refugees are "relevant",
in particular, "pension abroad", "work outside Ukraine", "proportional calculation of insurance
experience", "insurance experience received in other countries".</p>
      <p>The sixth cluster summarizes issues related to the victims of the accident at the Chernobyl NPP:
"accident", "ChNPP", "Chernobyl".</p>
      <p>Based on the preliminary analysis of the texts of the appeals, a corpus of texts was formed, a
fragment of which is given in the table. 2.</p>
      <p>
        To solve the problem of reducing the dimensionality and sparsity of the frequency matrix of the
corpus of texts, the method of singular distribution (SVD) was used [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3-5</xref>
        ]. After all, documents
usually use a fairly small set of terms that describe a certain subject area. Therefore, if in the diagonal
matrix of singular values (S) we leave exactly k of the first diagonal elements, and assign the value
zero to the rest, then the use of the SVD method gives an optimal approximation. In the diagonal
matrix of singular values S, the values are ordered, namely,  1 ≥  2 ≥ … ≥   , that is, if you leave
the first two values, then assign the value zero to the others. On the basis of the obtained matrix S,
it is possible to calculate the percentage contribution of the dimension described by the
corresponding singular value to the explanation of the data.
      </p>
      <p>On the basis of the obtained matrix S, it is possible to calculate the amount in percent that the
corresponding dimension, which is described by the corresponding singular value, contributes to the
explanation of the data (table 3). The value of the column "Percentage of value contribution to the
explanation of data variability" is calculated as the value of "Square of the singular value" divided by
the sum of the values of the squares of the singular values, multiplied by 100%.</p>
      <p>As can be seen from the obtained results, table 3, if only the two basic dimensions are left, a total
of 66.16% of the data variability will be explained.</p>
      <p>In this case, all documents can be located in two -dimensional space and determine the clusters
that they form according to the degree of similarity and belonging to a certain topic (Fig. 1).</p>
      <p>
        As can be seen from fig. 1, the first dimension explains 45.61% of the data variability; the second
dimension explains 20.55% of the data variability. As a result, three thematic clusters were formed,
which included documents based on the similarity of the use of terms [
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6-9</xref>
        ].
      </p>
      <p>The SAS Text Miner system was used in this study. When using the SAS Text Miner software, a
technological project is built in which the following steps are performed:</p>
      <sec id="sec-3-1">
        <title>1. Loading data. 2. Text parsing. 3. Text filtering. 4. Text clustering.</title>
        <p>The technological process of analyzing the corpus of texts for the purpose of their clustering is
presented in fig. 2.</p>
        <p>The constructed rules for the corresponding clusters are generated in the form of the following
program code:</p>
        <p>F_TextCluster_cluster_ =1 ::
(OR
, "reform"
, "insurance"
, (AND, (OR, "payments", "seniority") )
, "accumulation"
, (AND, (OR, "pensionable", "mandatory") )
F_TextCluster_cluster_ =2 ::
, (AND, (OR, "payments" , "pension"))
(OR
, "voluntary"
, "timely"
, "pension"
, (AND, (OR, "contributions" , "pension" , "insurance", "recalculation"))
, (AND, (OR, "minimum" , "index" , "increment"))
, "subsidy"
, (AND, (OR, "residential"))
, "current"
, (AND, (OR, "payment" , "funding"))
F_TextCluster_cluster_ =5 ::
(OR
, "pension"
, (AND, (OR, "border", "borders", "others", "countries"))
, "experience"
, (AND, (OR, "calculation" , "insurance" , "proportional"))
F_TextCluster_cluster_ =6 ::
(OR
, "accident"
, (AND, (OR, "CHAES" , "nuclear" , "power plant"))
, "Chernobyl"))))</p>
        <p>The statistical characteristics of the built classification model based on linguistic rules were
calculated separately for the training and test data sets: the ratio is 70% for training and 30% for
testing, i.e. 114 and 48 texts, respectively.</p>
        <p>The results are summarized in Table 3.
Statistical characteristics of the classification model of the studied texts</p>
        <p>The image of the ROC curve for the text information classification model based on linguistic rules</p>
        <p>Statistics</p>
      </sec>
      <sec id="sec-3-2">
        <title>TP (True Positive)</title>
        <p>TN (True Negative)
FP (false positive)
FN (false negative)
MISC,% (proportion of incorrectly
classified values)
Ginny</p>
        <p>ROC
is presented in Fig. 3.</p>
        <p>training
30
67
10
7
15
0.82
the model on the
training set
ROCcharacteristics of
the model on the
test set
The reference line
is 50 for 50 percent
of the occurrence of
the event</p>
        <p>The constructed linguistic rules were used to cluster news texts that were published on the
Internet from September 2023 to September 2024. In general, about 10,000 tons were unloaded and
processed. texts on social protection and social security of Ukrainians.</p>
        <p>After clustering the texts, the number of texts belonging to contributors from a certain region
was calculated for each cluster. The obtained values were normalized on a scale from 0 to 100
according to formula (1):

 =</p>
        <p>max(  |∀  )</p>
        <p>The results of the calculations are presented in Table 4.
– the number of texts by region, max(  |∀  ) – maximum number texts by all regions.
where</p>
        <p>– the popularity of the texts of the corresponding cluster for the i-th region,</p>
        <p>Popularity of the texts of the corresponding cluster</p>
      </sec>
      <sec id="sec-3-3">
        <title>Name of the region</title>
        <p>of
Vinnytsia region
Volyn region
the city of Kyiv
the city
Sevastopol
Dnipropetrovsk
region
Donetsk region
Zhytomyr region
Transcarpathian
region
Zaporizhzhia
region
Ivano-Frankivsk
region
Kyiv region
Kirovohrad region
Autonomous
Republic of Crimea
Luhansk region
Lviv region
Mykolayiv region
Odesa region
Poltava region
Rivne region
Sumy region
Ternopil region
Kharkiv region
Kherson region
Khmelnytskyi
region
Cherkasy region
Chernihiv region
Chernivtsi region</p>
        <p>Cluster 1
(pension
reform)
94
87
82
58
27
94
67
58
87
84
92</p>
        <p>The results of the analysis presented in the table can be visualized using SAS tools Enterprise
Guide 7.1 (fig. 4-9).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The proposed method of textual information analysis using text tools mining designed for automated
processing of large volumes of texts on a certaintopic. The use of text analytics allows you to deepen
your knowledge of the subject area by using unstructured data. In this study, the problem of
dimensionality and sparsity of the frequency matrix of the corpus of texts is solved using the key
theorem of linear algebra - the singular matrix decomposition (SVD) method. Pre-executed.
frequency weighting operation, which helped to partially solve the problem of unevenness of
highfrequency terms, making them less influential. This made it possible to obtain results of classification
of textual information of high quality.</p>
      <p>Therefore, the use of intellectual analysis of large volumes of textual data allows to identify the
most important problems that require a priority solution, to find out for which categories of the
population they are most relevant. The obtained results can be further used during the planning of
social expenditures of budgets of different levels, in the model of actuarial calculations, during the
planning of social expenditures of budgets of various levels. The proposed approach can improve the
quality of forecasts in modern conditions, when there is no complete information about the
investigated process or phenomenon or the information is distorted.
[14] Text Cluster Node Results. URL:
https://documentation.sas.com/?docsetId=tmref&amp;docsetTarget=n1d7r58qug6sefn162cu6cqx0nq
4.htm&amp;docsetVersion=14.3&amp;locale=en
[15] Emerging Technologies of Text Mining: Techniques and Applications / Ed. by HA Do Prado, E.</p>
      <p>Ferneda. Idea Group Reference, 2007. 358 p.
[16] Valls Martínez, MdC, Santos-Jaén, JM, Amin, F.-u., Martín-Cervantes, PA Pensions, Aging and
Social Security Research: Literature Review and Global Trends. Mathematics 2021, No. 9, 3258.
https://doi.org/10.3390/math9243258
[17] Social Protection Systems. Ed. E. Schüring, M. Loewe. Elgar Publishing. 2021. 776 p.</p>
      <p>https://doi.org/10.4337/9781839109119
[18] Official website of the Ministry of Digital Transformation of Ukraine. URL :
https://thedigital.gov.ua (ukr)
[19] On the approval of the Regulation on the Unified Information System of the Social Sphere.</p>
      <p>Resolution of the Cabinet of Ministers of Ukraine dated April 14, 2021 No. 404. URL :
https://zakon.rada.gov.ua/laws/show/404-2021-п#Text (ukr)
[20] Gladun A. Ya., Rogushina Yu. IN. Data mining : searching for knowledge in data: a tutorial. Kyiv:</p>
      <p>ADEF-Ukraine, 2016. 451 p. (ukr)
[21] Lytvyn V.V., Pasichnyk V.V., Nikolskyi Yu.V. Analysis of data and knowledge: training. manual</p>
      <p>Lviv: Magnolia 2006, 2017. 276 p. (ukr)
[22] Analysis and processing of data flows by means of computational intelligence: monograph / Ye.</p>
      <p>IN. Bodyanskyi et al. Lviv: View of Lviv. polytechnics, 2016. 235 p. (ukr)
[23] Text analytics using SAS Text Miner: course notes. NC.: SAS Institute, 2014. 218 p.
[24] Getting Started with SAS® Text Miner 12.1 URL:
https://support.sas.com/documentation/onlinedoc/txtminer/12.1/tmgs.pdf
[25] Matignon R. Data Mining Using SAS Enterprise Miner. URL:
https://www.amazon.com/Data</p>
      <p>Mining-Using-Enterprise-Miner/dp/0470149019
[26] Sharma S., JainRole A. Role of sentiment analysis in social media security and analytics. WIREs</p>
      <p>Data Mining and Knowledge Discovery: Vol. 10, Issue 5. https://doi.org/10.1002/widm.1366
[27] Find the information that matters using natural language processing (NLP). URL:
https://www.sas.com/ru_ua/software/visual-text-analytics.html</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Shapovalova</surname>
            <given-names>T.</given-names>
          </string-name>
          <article-title>The concept and content of social protection and social security of the population in modern Ukraine</article-title>
          .
          <source>Economic analysis</source>
          .
          <source>2022</source>
          . Volume
          <volume>32</volume>
          . No. 3. P.
          <volume>123</volume>
          -
          <fpage>130</fpage>
          . https://doi.org/10.35774/econa2022.
          <fpage>03</fpage>
          .123 (ukr)
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Gren</surname>
            <given-names>T. I.</given-names>
          </string-name>
          <article-title>Peculiarities of implementation of the policy of social protection of territories in war conditions</article-title>
          .
          <source>Academic notes of TNU named after V.I. Vernadskyi. Series: Public management and administration</source>
          .
          <source>2022</source>
          . Volume
          <volume>33</volume>
          (
          <issue>72</issue>
          ) No. 6. P.
          <volume>81</volume>
          -
          <fpage>84</fpage>
          . https://doi.org/10.32782/TNU-2663- 6468/
          <year>2022</year>
          .6/13 (ukr)
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] Expenditures on social assistance</article-title>
          . URL: https://mof.gov.ua/uk/expenditures_on_
          <article-title>social_assistance (ukr)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Smush-Kulesha M. Fedorova</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moysa</surname>
            <given-names>B</given-names>
          </string-name>
          .
          <article-title>Social rights in Ukraine during the war</article-title>
          .
          <source>Report on needs assessment. Council of Europe</source>
          .
          <year>2022</year>
          ,
          <volume>64</volume>
          p. URL : https://rm.coe.
          <source>int/needs-assessment-ua2/1680a9b408 (ukr)</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] On the approval of the Regulation on the Unified Information System of the Social Sphere</article-title>
          .
          <source>Resolution of the Cabinet of Ministers of Ukraine dated April 14</source>
          ,
          <year>2021</year>
          No. 404. URL: https://zakon.rada.gov.ua/laws/show/404-2021-п#Text (ukr)
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] Report on appeals of citizens for 9 months of 2024</article-title>
          . UR L: https://www.pfu.gov.ua/2167929-zvitpro
          <article-title>-zvernennya-gromadyan-</article-title>
          <string-name>
            <surname>za-</surname>
          </string-name>
          9
          <string-name>
            <surname>-misyatsiv-</surname>
          </string-name>
          2024-roku/ (ukr)
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Sharma</surname>
            <given-names>S.</given-names>
          </string-name>
          , JainRole
          <string-name>
            <surname>A</surname>
          </string-name>
          .
          <article-title>Role of sentiment analysis in social media security and analytics</article-title>
          .
          <source>WIREs Data Mining and Knowledge Discovery:</source>
          Vol.
          <volume>10</volume>
          , Issue 5. https://doi.org/10.1002/widm.1366
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Shkurko</surname>
            <given-names>O. IN.</given-names>
          </string-name>
          <article-title>Types of linguistic text analysis: teaching</article-title>
          .
          <source>manual Dnipro: Univ. Alfred Nobel</source>
          ,
          <year>2018</year>
          . 119 p.
          <article-title>(ukr)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Perebijnis</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>AND</surname>
          </string-name>
          .
          <article-title>Statistical methods for linguists: training</article-title>
          .
          <source>manual Vinnytsia: Nova Kniga</source>
          ,
          <year>2013</year>
          . 176 p.
          <article-title>(ukr)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Lande</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>IN</surname>
          </string-name>
          .
          <article-title>Elements of computer linguistics in legal informatics</article-title>
          .
          <source>Kyiv: NDIIP National Academy of Sciences of Ukraine</source>
          ,
          <year>2014</year>
          . 168 p.
          <article-title>(ukr)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <article-title>Find the information that matters using natural language processing (NLP)</article-title>
          . URL: https://www.sas.com/ru_ua/software/visual-text-analytics.html
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <article-title>Survey of Text Mining I: Clustering, Classification,</article-title>
          and Retrieval / Ed. by
          <source>MW Berry</source>
          . Springer,
          <year>2003</year>
          . 261 p.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Aggarwal</surname>
            <given-names>CC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            <given-names>C</given-names>
          </string-name>
          .
          <article-title>Mining Text Data</article-title>
          . Springer,
          <year>2012</year>
          . 527 p.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>