<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Algan, Y., &amp; Cahuc, P. (</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Determinants of social trust: analysis using machine learning methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tamara Merkulova</string-name>
          <email>tamara.merkulova@karazin.ua</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hanna Bohdanova</string-name>
          <email>hanna.bohdanova@gmail.com</email>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>5</volume>
      <issue>1</issue>
      <fpage>1</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>This paper presents results of testing individual-based and society-based hypotheses of interpersonal trust and clarifying the relationship between institutional trust an individual and societal characteristics on the latest data of the World Values Surveys (2017-2021) using machine learning methods. The initial sample size consisted of 70,867 respondents. These data were used to develop models of interpersonal and institutional trust. Factors that can be considered as determinants of social trust were studied using classification models (for both interpersonal and institutional trust) and cluster analysis (for trust in government). Classification allows recognizing the class (a level of trust) to which the respondent belongs according to a range of factors (predictors). We defined 2 classes in accordance with the responses: people who trust in strangers (government) or don't trust. Classification models were developed with various sets of predictors (determinants of trust): individual characteristics, societal indicators, and mixed composition of determinants. The best results for interpersonal trust as well as for trust in government were obtained in classification models with mixed composition sets of predictors. As a result of cluster analysis, it was clarified what individual and societal characteristics were associated with the high or low level of trust in government. The results of this research can to a certain extent serve as arguments in favor of the multilevel approach to social trust determinants, taking into account the essential role of individual and societal factors for both interpersonal and institutional trust.</p>
      </abstract>
      <kwd-group>
        <kwd>Interpersonal trust</kwd>
        <kwd>Institutional trust</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Clustering</kwd>
        <kwd>Classification models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>I Introduction</title>
      <p>The study of trust, its origins, and its relationship with the development of society and
the economy is a broad area of interdisciplinary researches that are carried out within the
framework of various scientific schools. Social trust is often referred to as the keystone of
social capital (Newton, 2004), (Rothstein &amp; Stolle, 2008) and considered as a powerful
resource for socio-economic development, increasing stability, fairness, and harmony in
society (Roth, 2006), (Bjornskov, How does social trust affect economic growth?, 2012),
(Algan &amp; Cahuc, 2013).</p>
      <p>As is known, social trust has 2 types: interpersonal trust and institutional trust.
Interpersonal trust is presented as in-group trust (interpersonal trust between members of a
group, for instance, family members, friends, colleagues, etc.) and trust to strangers,
which is considered as generalized trust (Kwon, 2019).</p>
      <p>Studies of factors that determinate interpersonal trust are based on the ideas provided
by individual-oriented theory and the society-based theory (Algan &amp; Cahuc, 2013),
(Delhey &amp; Newton, 2005), (Kwon, 2019). The first one considers interpersonal trust as an
individual property that is determined by individual characteristics such as education,
gender, age, income, etc. The social-based theory assumes that interpersonal trust is a
property of society and depends on social, economic, cultural, national, historical, and
other factors, that characterize society as a whole.</p>
      <p>These theories both have arguments pro and contra, that have been obtained in
numerous researches. As it was noted in (Newton, 2004) although many investigations
focus on social trust at an individual level, social trust isn’t closely associated with
individual characteristics, such as sex, income, education, etc. The authors showed on the
data of the third wave of the World Values Study that social trust has a close relationship
with a range of societal indicators that are related to the development of democracy and
sustainability.</p>
      <p>Study of factors that underpin generalized trust in society have revealed at macrolevel
4 indicators that influence trust: economic inequality, civic participation, ethnic
homogeneity, and institutional quality (Delhey &amp; Newton, 2005), (Charron &amp; Rothstein,
2014), (Rothstein &amp; Uslaner, 2005), (Roth, 2006). In (Rothstein &amp; Uslaner, 2005), (Roth,
2006) income inequality is considered an essential determinant of the low level of
interpersonal trust. The high-trust countries are at the same time high-income countries,
they have good governance, low level of income inequality and ethnic homogeneity
(Delhey &amp; Newton, 2005). This combination of factors is presented most impressively in
the Nordic countries. The analyses of regions in Europe (Charron &amp; Rothstein, 2014)
shown, that the quality of institutions is the most essential factor that determines a
regional dispersion of trust within a country. At the same time, economic inequality, civic
participation, and ethnic homogeneity are not very important to explain a variation in
trust.</p>
      <p>Data presented in (Tsai, Laczko, &amp; Bjørnskov, 2011) don’t support the hypothesis that
social diversity (ethnic, linguistic, and religious) leads to a decrease in the level of trust, at
least in the short term. The results highlight the complex interaction of many factors that
determine generalized trust in society. The arguments in favor of the positive influence of
the state on trust are discussed in (Robbins &amp; Blaine, 2011). The state creates an
environment that can enhance social trust, in particular, the public allocation of resources
and property rights institutions have a positive effect on generalized trust.</p>
      <p>The researches that test individual-based hypotheses of social trust provide evidence
that interpersonal trust is associated with individual characteristics. In (Adwere -Boamah
&amp; Hufstedler, 2015) the authors present regression models in favor of the assumption, that
education and sex are essential factors of interpersonal trust. The study (Almakaeva,
Welzel, &amp; Ponarin, 2018) revealed that human empowerment could be considered as a
moderator of individual-level determinants of trust.</p>
      <p>The researches devoted to trust include the study of cultural, religious, moral factors
that can be essential determinants of social trust. The influence of Protestant tradition is
discussed in (Delhey &amp; Newton, 2005). The results of statistical analysis in favor of the
assumption that religion is a significant factor are presented in (Uslaner, 2002).</p>
      <p>Institutional trust shows whether citizens are confident in institutes. Citizens evaluate
institutes according to their expectations of effectiveness and fairness that institutions
should demonstrate.</p>
      <p>“Citizens expect institutions to perform efficiently, effectively, fairly, and ethically in
accordance with the roles assigned to them by law or with social norms in the eyes of
citizens” ( (Kwon, 2019), p.28).</p>
      <p>Thus, trust of citizens to institutions is based a) on the ability of institutions to perform
their functions assigned to them in accordance with law and social norms (competence of
institutions), b) on their acceptance of institutional operations from moral criteria.</p>
      <p>Therefore, institutional trust has 2 dimensions: the competence dimension is associated
with the efficiency and effectiveness of institutions (this can be presented by
macroeconomic indicators), the value dimension includes fairness, transparency, non
corruption, and other moral values (Kwon, 2019). Trust in government is one of the most
important types of institutional trust from the perspectives of the legitimacy of
government and other political institutions (Knah, 2016).</p>
      <p>Thus, the influence of various individual and group (social) characteristics on social
trust hasn’t been completely clarified and requires further research. Machine learning
gives the tools to study this problem on the big data provided by the World Values
Surveys, which include a direct question on trust in strangers and trust in government.</p>
      <p>Our tasks include testing individual-based and society-based hypotheses of
interpersonal trust and clarifying the relationship between institutional trust an individual
and societal characteristics on the latest data of the World Values Surveys.</p>
    </sec>
    <sec id="sec-3">
      <title>II Methodology and Data</title>
      <p>This study uses data from the World Values Survey (the World Values Survey,
20172021). The World Values Survey (WVS) is an international research program that
analyzes a wide range of indicators across social, political, economic, religious and
cultural groups. This project evaluates the impact of values on the social, political and
economic development of countries. Waves of research are repeated every 5 years. In this
study, we used the data of the 7th wave, which took place in 80 countries of the world in
2017-2021.</p>
      <p>The data from this study was used to build models of interpersonal and institutional
trust. The initial sample size consisted of 70,867 respondents. Hypotheses about the
determinants of these types of trust were tested using machine learning methods.</p>
      <p>Interpersonal trust</p>
      <p>At the first stage, when constructing models of interpersonal trust (Generally speaking,
would you say that most people can be trusted or that you need to be very careful in
dealing
with people?), only individual characteristics were used as predictors: "Sex", "Age",
"Education","Satisfaction_with_life" (on a scale from 1, which means you are
“completely dissatisfied”, to 10, which means you are “completely satisfied”),
"Employment_status" (data on this issue was binarized: it has a value of 1 if the
respondent works (full-time, part-time, self-employed), and a value of 0 if they do not
work (a retiree, a student, a housewife, etc.)),
"Satisfaction_with_financial_situation_of_household" (scale score on which 1 means you
are “completely dissatisfied” and 10 means you are “completely satisfied”), "Marriage",
"Religion" (How important is God in your life? Please use this scale to indicate. 10 means
“very important” and 1 means “not at all important.”).</p>
      <p>Classification models were used to identify the presence of a relationship between
individual characteristics and interpersonal trust.</p>
      <p>Then, we expanded the range of predictors by adding factors that can be considered
characteristics of society and institutions: "Corruption" (How would you place your views
on corruption in your country on a 10-point scale where 1 means “there is no corruption
in my country” and 10 means “there is abundant corruption in my country”), "Migration"
(How would you evaluate the impact of the people from other countries who come to live
in [your country] - the immigrants on the development of [your country]?), "Security"
(Could you tell me how secure do you feel these days?), "Democratically" (How
important is it for you to live in a country that is governed democratically?).</p>
      <p>After that, we compared the quality of classification models constructed for two sets of
predictors.</p>
      <p>Institutional trust</p>
      <p>The study used an indicator of trust to government (How much confidence you have in
the government: is it a great deal of confidence, quite a lot of confidence, not very much
confidence or none at all?). The following hypotheses were tested:
1) Institutional trust is dependent on individual characteristics;
2) Institutional trust is dependent on the characteristics of society and the quality of
institutions;
3) Institutional trust is dependent on a mixed composition of predictors.</p>
      <p>The same set of individual characteristics was used for both models of interpersonal
trust and institutional trust. The following institutional-related features were used:
"Corruption", "Security", and “Democracy". These features reflect citizen’s opinions on
the degree of realization of said feature in their country.</p>
      <p>We also added another indicator to the characteristics of society and the quality of
institutions - "Ethnic_group". By definition, this feature is described as “the ethnic group
of the respondent is indicated. Answer options – 1. White, 2. Black, 3. South Asian
Indian, Pakistani, etc., 4. East Asian Chinese, Japanese, etc., 5. Arabic, Central Asian, 6.
Other”.</p>
      <p>For institutional trust, classification and clustering models were built, in order to
identify the relationship between said trust and the identified predictors.</p>
      <p>Data processing and analysis were performed using Python.</p>
    </sec>
    <sec id="sec-4">
      <title>III Results and analysis</title>
      <sec id="sec-4-1">
        <title>1. Interpersonal trust. Classification problem.</title>
        <p>Data classification is the process of analyzing structured or unstructured data and
organizing it into categories based on file type, contents, and other metadata (Bowles,
2015).</p>
        <p>The most common machine learning methods for classification are Logistic regression,
Naive Bayes classifier, Support vector machines, k-nearest neighbor, Neural networks.
(Horwood, 1994), (MacKay, 2005).</p>
      </sec>
      <sec id="sec-4-2">
        <title>1.1. Classification problem for interpersonal trust and individual characteristics.</title>
        <p>To solve the classification problem for interpersonal trust and individual
characteristics, we built a machine learning model. In this model, eight individual
characteristics were used as predictors: "Sex", "Age",
"Education","Satisfaction_with_life", "Employment_status",
"Satisfaction_with_financial_situation_of_household", "Marriage", "Religion".</p>
        <p>In the original data set, some respondents declined answering some questions. This
resulted in missing data, so after excluding such cases, 65039 responses remained in the
data set.</p>
        <p>For each classification problem, the original dataset was divided into training (80%)
and test (20%) sets. The training sample in this model contains data about 52,031
respondents, the test sample - 13,008 respondents.</p>
        <p>We used 5 machine learning methods for modeling and the resulting models were
compared in terms of accuracy.</p>
        <p>Accuracy in machine learning refers to one of the metrics for evaluating classification
models, which is used to determine which model is best for identifying relationships and
patterns between variables in a dataset based on input or training data. The accuracy of the
model is calculated as follows:</p>
        <p>For binary classification, accuracy can also be calculated in terms of positives and
negatives as follows:</p>
        <p>Here TP – True Positive (true positive is an outcome where the model correctly
predicts the positive class)</p>
        <p>TN – True Negative (true negative is an outcome where the model correctly predicts
the negative class)</p>
        <p>FP – False Positive (false positive is an outcome where the model incorrectly predicts
the positive class)</p>
        <p>FN – False Negative (false negative is an outcome where the model incorrectly predicts
the negative class.)</p>
        <p>Given that the exact nature of error is irrelevant, we can restrict ourselves to only
considering accuracy as our performance metric.</p>
        <p>Note that all the methods used to build the classification model gave close estimates of
accuracy (77% - 78.5%). Neural network classifier showed the best accuracy (78.5%) on
the test set (see Table 1).</p>
        <p>When applying the logistic regression model, the significance of the coefficients for the
variables was tested (Table 2). We used p-value estimates on regression coefficients to
test the null hypothesis that the coefficients are zero. All p-values were higher than the
pvalue threshold of 0.1, which means that all exogenous variables affect the endogenous
variable in some way. Here the endogenous variable is interpersonal trust.
P-value
0.0670
0.000
0.000</p>
      </sec>
      <sec id="sec-4-3">
        <title>1.2. Classification problem for interpersonal trust and mixed composition of predictors</title>
        <p>In this task, individual characteristics and characteristics of society and institutions
were used as predictors of interpersonal trust. These are: "Sex", "Age", "Education",
"Satisfaction_with_life", "Employment_status",
"Satisfaction_with_financial_situation_of_household", "Marriage", "Religion" and
"Corruption", "Migration", "Security", "Democracy". After excluding missing data points,
13,608 responses remained to build this model.</p>
        <p>Calculations have shown that all exogenous variables are significant in terms of
influence on the endogenous variable, since their p-values are close to zero (see Table 3).
The best accuracy estimate (80%) was shown by the Support Vector Machines classifier
(see Table 4).</p>
        <p>The ROC curve is a chart of the number of correctly classified positive examples
versus the number of incorrectly classified negative examples (when varying model
threshold as an implicit variable). A quantifiable measure of a ROC curve estimate is an
Area Under Curve (AUC) estimate. This estimate can be obtained directly by calculating
the area under the polyhedron bounded from the right and bottom by the coordinate axes
and from the top left by the experimentally obtained points. One can calculate the AUC,
for example, using the numerical trapezoidal method:
∫ ( )
∑
(
)</p>
        <p>ROC curve of the binary logistic regression model we obtained, is shown in the figure
= 0,72.</p>
        <p>Note that all methods show a higher accuracy of models with a mixed composition of
predictors, than that with only individual features.</p>
      </sec>
      <sec id="sec-4-4">
        <title>2. Institutional trust. Classification problem for Government Trust and</title>
      </sec>
      <sec id="sec-4-5">
        <title>Personality</title>
      </sec>
      <sec id="sec-4-6">
        <title>2.1. Classification task with a set of individual characteristics of the respondents.</title>
        <p>Trust in government is one of the most important indicators of institutional trust. In this
section, we used the same data set of individual characteristics as for the interpersonal
trust models. The sample includes 63,360 respondents.</p>
        <p>We built several machine learning models with this feature set. Let’s discuss the first
one, namely a Logistic Regression model.</p>
        <p>The p-value of the sex variable turned out to be higher than 0.1, so we excluded the
gender variable from the predictors of institutional trust due to the fact that it has no effect
on trust in the government (Table 5).
x
e</p>
        <p>S
0.2224
e
g</p>
        <p>A
0.000</p>
        <p>The accuracy of the models built by different methods is very low (Table 4), which
casts doubt on the suitability of these models (Table 6) (Idris, 2016).</p>
        <p>For binary logistic regression, the default threshold is 0.5. In many problems, a much
better result may be obtained by adjusting the threshold. We conducted such an analysis
and found that the logistic regression model shows the best accuracy at a threshold of 0.47
(Fig. 2).</p>
        <p>All methods give higher accuracy of the models in comparison with the results from
the previous section (Table 8). Although this level of accuracy is still insufficient.</p>
      </sec>
      <sec id="sec-4-7">
        <title>2.3. Classification Problem for Institutional Trust and Mixed Composition of</title>
      </sec>
      <sec id="sec-4-8">
        <title>Predictors</title>
        <p>In this section, we examined the relationship of institutional trust with individual and
indicators associated with society and institutions. The data set size consists of 13,556
responses.</p>
        <p>We excluded the following variables: “Sex”, “Employment_status”, and
“Ethnic_group”, since the p-value of these indicators turned out to be higher than 0.1
(Table 9).
Table 9 (continued). P-value coefficients in the logistic regression model for institutional
trust and mixed composition of predictors (characteristics of society and institutions).
PV-avraialubeles 0.0tirrouponC011 0.0itiragonM00 0.0itrceuyS00 0.0itllrecacaoyDm036 1.0tichnE00,irrcaogp_ubA tlirseaannCA1.0tichnE00lracgo_pukB 1.0ticnhE00trsaougp_E ,iissaeenhnCA1.0,tJsaaeeecpnticnhE00troupg_ouhS ,iiIsaanndnA1.0,ititsaaceknPticnhE00itreup_gohW</p>
        <p>This set of predictors provides a significant increase in the accuracy of the models for
all methods (Table 10).</p>
        <p>The ROC curve of binary logistic regression is shown in Figure 3. AUC is 0.772.
Fig. 3. ROC curve for a logistic regression model for interpersonal trust and mixed
composition of predictors</p>
      </sec>
      <sec id="sec-4-9">
        <title>3. Institutional trust. Clustering problem.</title>
        <p>Cluster analysis in Data Mining allows one to find a group of objects that are similar to
each other in a cluster, but differ from objects in other clusters. In our study, we applied
this method to identify differences in the values of the characteristics of responses that
belong to different clusters according to the criterion of trust in the government.</p>
        <p>Methods such as "Elbow Method" or "Silhouette Method" can be used to determine the
number of clusters (Rousseeuw, 1987). The Elbow method consists of graphically
displaying the relationship between the number of clusters and the sum of squares within
the cluster (Within Cluster Sum of Squares, WCSS), then select the number of clusters in
which the WCSS change begins to level out (Figure 4).</p>
        <p>As you can see in Figure 2, these are points 2, 4, 7. To refine the result, we will apply
the silhouette method.</p>
        <p>The silhouette value represents a measure of how similar a data point is to its own
cluster when compared to all other clusters (Figure 5).</p>
        <p>Fig 5. Graphic implementation of the Silhouette method.</p>
        <p>To select the optimal number of clusters using this method, one needs to select the
maximum value of this indicator. As you can see in the figure, the optimal number of
clusters is 2.</p>
        <p>We used K-means based clustering algorithm to partition the data into clusters.
Initially, we included a full set of factors as features, the individual characteristics and
characteristics of society and the quality of institutions. Then, we excluded factors with
weak variability, and only eight factors remained: “Trust_the_government”,
“Employment_status”, “Marriage”, “Corruption”, “Religion”, “Migration”, “Ethnic
group_East Asian Chinese, Japanese, etc”, “ Ethnic group_White ”.</p>
        <p>Cluster centroid are presented in Table 11.</p>
        <p>It is important to take a closer look at the differences in the average values of factors.
The first cluster includes respondents with low trust in the government. They are of the
White ethnic group and are more religious. The respondents with higher confidence in the
government belong to the “East Asian Chinese, Japanese, etc” ethnic group and are less
religious than the respondents in the first cluster. For the rest of the indicators, differences
in the mean values of the clusters of such scale are not visible.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>IV Conclusions</title>
      <p>The results of modeling can be summarized in the following conclusions.</p>
      <p>Interpersonal trust. Classification models allow recognizing the class (a level of trust)
to which the object belongs according to a range of factors (predictors). We defined 2
classes in accordance with the responses (people who trust to strangers or don’t trust).</p>
      <p>Classification problem was solved using 2 sets of predictors: individual characteristics
("Sex", "Age", "Education","Satisfaction_with_life", "Employment_status",
"Satisfaction_with_financial_situation_of_household", "Marriage", "Religion") and the
mixed composition, that includes, in addition to individual, also societal characteristics
("Corruption", "Migration", "Security", "Democracy"). In both cases of predictors sets all
the 5 machine learning methods gave close sufficient estimates of the accuracy of models.
But the mixed composition allowed to increase accuracy of classification from 77%
78.5% (for individual set models) to 78,3% - 80% (mixed composition models).</p>
      <p>Trust in government. Trust in government is one of the most important indicators of
institutional trust. The classification problem was solved using 3 sets of predictors:
individual characteristics, societal indicators, and the mixed composition. All the
predictors in these sets were the same as in interpersonal trust models.</p>
      <p>As it was expected, using the first set haven’t led to satisfactory models: all the
machine learning methods gave very low accuracy (about 60%). Therefore, the
assumption that institutional trust can’t be only explained at an individual level was
verified for the case of trust in government.</p>
      <p>However, classification models with the only societal characteristics didn’t proved
satisfactory results too. Despite this, this set of predictors increased the accuracy of
models (to 68%) it didn’t reach the acceptable value.</p>
      <p>Finally, the only mixed composition models showed a higher estimate of accuracy. The
best results (76.7%) were provided by the Support Vector Machines and K-Nearest
Neighbors methods. This value is already high enough to recognize the simulation results
as quite satisfactory.</p>
      <p>Cluster analysis in Data Mining allows one to find a group of objects that are similar to
each other in a cluster, but differ from objects in other clusters. In our study, we applied
this method to identify differences in the values of the characteristics of responses that
belong to different clusters according to the criterion of trust in the government.</p>
      <p>We used the K-means-based clustering algorithm to partition the data into clusters.
Cluster analysis of eight factors (“Trust_the_government”, “Employment_status”,
“Marriage”, “Corruption”, “Religion”, “Migration”, “Ethnic group_East Asian Chinese,
Japanese, etc.”, “ Ethnic group_White ”) divided the set of respondents into 2 clusters. It
is important to emphasize the differences between clusters in the average values of
factors. First of all, there is a significant gap between clusters in the factor “Trust in
government”.</p>
      <p>The first cluster includes respondents with low trust in the government. They are of the
White ethnic group and they are more religious. The second cluster includes respondents
with high confidence in the government belong to the “East Asian Chinese, Japanese,
etc.” ethnic group and are less religious than the respondents in the first cluster. For the
rest of the indicators, differences in the mean values of the clusters of such scale are not
visible.</p>
      <p>The results of this research can to a certain extent serve as arguments in favor of the
multilevel approach to social trust determinants, taking into account the essential role of
individual and societal factors for both interpersonal and institutional trust.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>