=Paper=
{{Paper
|id=Vol-2170/paper10
|storemode=property
|title=ATAnalysis – Toward a Psycholinguistic Method to Analyze Video Textual Information
|pdfUrl=https://ceur-ws.org/Vol-2170/paper10.pdf
|volume=Vol-2170
|authors=Helder Yukio Okuno,Flavio Carvalho,Gustavo Paiva Guedes,Marcelle Torres Alves Okuno
|dblpUrl=https://dblp.org/rec/conf/vldb/OkunoCGO18
}}
==ATAnalysis – Toward a Psycholinguistic Method to Analyze Video Textual Information==
LADaS 2018 - Latin America Data Science Workshop
ATAnalysis - Toward a psycholinguistic method to analyze
video textual information
Helder Yukio Okuno1 , Flávio Carvalho1 , Gustavo Paiva Guedes1 , Marcelle Torres2
1
CEFET/RJ - Centro Federal de Educação Tecnológica Celso Suckow da Fonseca
Av. Maracanã, 229 - Rio de Janeiro - RJ - Brazil.
Núcleo de Avaliação da Conjuntura
2
Escola de Guerra Naval (EGN) – Rio de Janeiro, RJ – Brazil
helder.okuno@eic.cefet-rj.br, flavio.carvalho@eic.cefet-rj.br,
gustavo.guedes@cefet-rj.br, torres.m.a@hotmail.com
Abstract. Political statements of world leaders may affect many lives, so it is
important to study what they express through language. We propose a method
to do psycholinguistic analysis of statements extracted from videos. To show the
relevance and some interesting information, we conducted some experiments in
video subtitles of world leaders Donald Trump and Kim Jong-un amid imminent
agreement that could lead to peace in the Korean peninsula. Results suggest less
security in statements of the North Korean leader while threatening to unleash
an “unimaginable strike” at the US territory. Moreover, the US president shows
less honesty by saying he hopes never to use the nuclear arsenal. This approach
may be useful in future studies to reveal what the language used by candidates
can show.
1. Introduction
Social networks applications allow users to share ideas and inform about events, while
the platform that provides the service stores sentiment expressions in many formats (e.g.,
written and video records). For example, YouTube1 has established itself as a social net-
work providing user-generated content for entertainment and information [Moghavvemi
et al., 2018]. YouTube uses content popularity (“thumbs-up” button), comments and num-
ber of subscribers to observe social aspects [de Arantes et al., 2015]. Offering users the
ability to upload, view, rate, share, and comment on videos, it became an important so-
cial media that also enables people to engage more directly with political issues [Howard
et al., 2011].
Social networks are also being used to propagate positioning and statements of
world leaders. With the attention, credibility and other resources that they have at their
disposal, what they say matters because it can show how they intend to set the tone of
their administration [Cohen, 1995]. Thereby, it is relevant to watch what has been said
by protagonists of one of the most drastic war of words and rhetoric of attacks in the
international scene.
Donald Trump (DJT) and Kim Jong-un (KJU), after more than a year of threats
and insults, agreed to meet in a summit to discuss the denuclearization of the Korean
1
https://www.youtube.com/
73
LADaS 2018 - Latin America Data Science Workshop
peninsula. It can be the first presidential direct talks between the United States and North
Korea since the end of Korean War in 1953, and the beginning for a path toward the peace
in Northeast Asia. The world leaders face larger and more decisive international crisis,
not only for their own nations but also for the future of the humanity.
In this work, we propose a method for the psycholinguistic analysis of oral dis-
courses extracted from videos. Some global leaders, such as the KJU, do not use written
language on social networks. However, there are speeches recorded in video format that
can be found on YouTube. In this scenario, we conducted experiments using YouTube
automatic subtitles extraction to collect DJT and KJU videos subtitles. Next, we submit-
ted these subtitles in the 2015 version of the Linguistic Inquiry Word Count (LIWC2015)
[Pennebaker et al., 2015] program for text analysis. The results show higher authenticity
and confidence in DJT speeches than in KJU speeches. On the other hand, KJU speeches
shows higher analytical thinking. All the values related to analytical thinking were very
high in KJU speeches.
This work is structured so that after this introductory section, we present in sec-
tion 2 works related to how computational linguistics analysis can be applied to obtain
information about world leaders. We then explain some aspects of LIWC2015, an avail-
able tool for computational linguistics analysis in text, in section 3. In section 4, we
describe how we obtained the data. The results are presented in section 5. The conclusion
and future work are shown in section 6.
2. Related Work
Recently, studies investigating language analysis of DJT’s statements as a political fig-
ure can be found in the literature [Ahmadian et al., 2017, Savoy, 2017]. By adopting
LIWC2015, Jordan and Pennebaker [2017] intended to determine how US candidates dif-
fered across linguistic style categories analyzing the language of political figures from
DJT back to George Washington. They found that, compared to other politicians, a very
low score in analytic thinking for DJT was indicated in the analysis of documents in text
format. It was possible to notice also that presidents and presidential candidates have been
becoming less analytic [Jordan and Pennebaker, 2017].
Regarding psycholinguistic behavior in Korean leaders’ statements, it is fitting to
bring a study using the Korean Key Words in Context (KrKwic2 ) and Korean Linguistic
Inquiry and Word Count (K-LIWC) [Chung and Park, 2010]. It tested the literary char-
acteristics of inaugural addresses from linguistic, quantitative and psychological perspec-
tives of two presidential inaugural addresses, Moo-Hyun Roh (2003–2008) and Myung-
Bak Lee (2008–2013). It had, according to the authors, significance as an investigative
tool to support discovering the relationship between rhetorical substance and style, and
also the characteristics of presidents’ political and social viewpoints.
Our approach in this work is different from those previous publications as it uses
exclusively speech and statements from content available in video format on YouTube.
The process involves also, in KJU case, the automatic translation before submitting it to
LIWC2015. Another difference is that it compares the speeches of two distinct current
world leaders which have been trading threats during the last times, and recently DJT has
publicly agreed to meet KJU.
2
https://www.leydesdorff.net/krkwic/
74
LADaS 2018 - Latin America Data Science Workshop
3. Computational Text Analysis
Content analysis is a methodology that uses categorization and classification of commu-
nication such as speech, written text, pictures, audio or video [Bryman and Bell, 2015].
All around the world, an increasing number of people is using communications and so-
cial network applications, uninterruptedly generating text, images, audio and video files
by recording a lot of events, feelings and emotions. As a result, researchers are using
computer-assisted methods, like text analysis, in social sciences studies to address many
issues. Not only that, but also official documents, technical reports, theatre play scripts,
lesson plans, books and a variety of texts are of social scientific interest [Brier and Hopp,
2011].
LIWC2015 is an available tool for computerized text analysis. It consists of the
main program, which has a text analysis module with a user interface, and the dictionary
file. This dictionary is used to associate words from any given text under the categories
of important linguistic, psychological, and social processes.
In addition to the dictionary developed at 2001, two more were produced for the
English language: one in 2007, containing 4, 500 words; the other in 2015, with a total
of 6, 400 words. The 2015 version also brings summary variables, which are percentiles
based on other categories, rather than raw frequencies. The summary categories refer to
aspects like analytical thinking (Analytic), clout, authenticity (Authentic) and emotional
tone (Tone) [Pennebaker et al., 2015].
Analytic is drawn through the high use of nouns, articles, and prepositions, indi-
cating how people use words that suggest thought patterns that are formal, logical and
hierarchical. Clout reflects the confidence, which is indicated by higher use of first per-
son plural pronouns and words related to social processes, like the ones in the categories
“family” and “friend”. On the other hand, it is related to a lower use of first person sin-
gular pronouns, negations and swear words. Authentic is related to when people reveal
themselves in an authentic or honest way, being more personal, humble and vulnerable
(i.e., words with I, me, my and present tense verbs). Emotional tone uses values from
categories of positive and negative emotions in a single summary variable.
4. Materials and procedures used to obtain psycholinguistic behavior in
statements
For this study, we used speeches and statements of DJT and KJU from YouTube videos.
The illustration in Figure 1 shows the process used in this study. It starts with the tran-
scription of videos containing DJT and KJU speeches using a free web application (Down-
Sub3 ) that can download and save subtitles directly from YouTube.
After the extraction of subtitles, the Korean text was submitted to a computer-
assisted translation from Korean to English using a free online tool named NAVER4 . This
was chosen to use a sole method of computerized text analysis at the end, since translation
to English does not significantly interfere with the accuracy or comprehensiveness of
methods for text analysis [Reis et al., 2015]. In Table 1, we present basic information
about the text content submitted into LIWC2015.
3
Available at downsub.com
4
Available at translate.naver.com; a free online language translation service.
75
LADaS 2018 - Latin America Data Science Workshop
Figure 1. Automatic subtitles of the DJT videos were extracted by DownSub and
then analyzed with LIWC2015. The process was the same in KJU videos, how-
ever, it needed to be translated from Korean into English using Naver Translator
and then analyzed by LIWC2015.
Table 1. Text content from statements of DJT and KJU from YouTube videos
submitted into LIWC2015.
Code Speech Date Words
DJT1 Trump meeting North Korean defectors Feb 2018 322
DJT2 Trump’s address to the UN General Assembly Sep 2017 494
DJT3 Trump about North Korean missile launch Nov 2017 762
DJT4 Trump Speech in the State of the Union Feb 2018 957
DJT5 Trump announcing new sanctions against North Korea Sep 2017 637
DJT6 Trump speaking at the South Korean National Assembly Nov 2017 2014
KJU1 New Year Speech 2018 Jan 2018 4529
KJU2 Reply to Trump’s first speech Sep 2017 605
KJU3 New Year’s Speech 2017 Jan 2017 3959
KJU4 Speech National Army Day Feb 2018 1385
5. Results
Table 2 shows the values found for text content analysis on DJT and KJU addresses using
LIWC2015. The selected categories reflect, along with percentages values of words as-
sociated to positive and negative emotions, the four summary categories: analytic, clout,
authentic and tone.
Figure 2 illustrates analytic values as the size of the plot marker, against values
for authentic category in the X axis and clout in the Y axis. In the discourse style of DJT,
lower values were observed in the variables reflecting analytical thinking, where higher
values of authenticity and confidence were associated. In the speeches of KJU, on the
other hand, all the values related to analytical thinking were very high.
Analyzing the results, during the New Year’s speech (KJU1), which provides the
country’s guideline for this year, KJU had the lowest Clout revealing his uncertainties
regarding the goals of the country in 2018. Speaking with less confidence compared to
other speeches, KJU expressed that the entire United States territory is under the range
of North Korean nuclear missiles emphasizing a secured powerful deterrence against the
nuclear threat from the United States.
The most recent speech of the North Korean leader (KJU4), just on the eve of
76
LADaS 2018 - Latin America Data Science Workshop
Table 2. Percentual values found in the text content analysis on DJT and KJU
addresses using LIWC2015.
Speech Analytic Clout Authentic Tone posemo negemo
DJT1 8,47 87,47 60,98 4,13 1,55 3,73
DJT2 78,37 67,91 13,47 29,15 4,66 4,45
DJT3 40,82 77,27 56,07 72,46 3,67 1,18
DJT4 83,07 87,48 15,21 36,83 3,87 3,24
DJT5 63,85 66,98 41,95 58,44 3,92 2,2
DJT6 88,38 77,02 25,4 33,48 3,62 3,18
KJU1 97,28 67,77 25,84 61,36 4,22 2,34
KJU2 96,43 73,5 9,29 16,34 2,81 3,47
KJU3 96,63 74,61 32,72 49,76 3,33 2,05
KJU4 97,36 83,34 24,68 84,4 5,2 1,88
the Winter Olympic Games also known as Olympics for Peace, showed higher values in
Tone and Clout representing his most confident and positive declaration. In that one, KJU
declared North Korea as a global military power and asked for the military permanent high
alert to ensure the sovereignty and national security against any foreign threat, revealing
his behavior even in an apparent moment of progress in reducing military tensions in the
region. Even with the pause in North Korean nuclear launches and the recent opening
for diplomatic negotiations on peace and security in Northeast Asia, the North Korean
strategy and alert remain the same.
Figure 2. Graphical representation of selected percentual values from LIWC2015
categories, with analytic values as the size of the plot marker, X axis representing
values of the authentic and Y axis the clout categories.
Also, comparing the speeches of both leaders, it is possible to observe that KJU
maintains a more analytical style while DJT reveals to be more intuitive suggesting that he
is more impulsive in making decisions. For example, in December 2017, the US Senate
Foreign Relations Committee examined the DJT allusion to the use of military force or a
preventive strike against North Korea, when DJT said North Korean threats would be met
77
LADaS 2018 - Latin America Data Science Workshop
with “fire and fury like the world has never seen”. Some US Senators took attention for
the words and personality of the US president raising the discussion on the importance of
the US Congress to impose limits in DJT sole decisions that could harm the US security
interests [Reif, 2017].
6. Conclusion
We conducted experiments in which we extracted subtitles and performed psycholinguis-
tic analysis from text of an important world leader that does not use social networks, like
KJU. This demonstrated the relevance of the approach, and some interesting information
could be integrated examining dimensions of each speech. Studies from the international
relations fields could benefit from the identification of text dimensions like thinking style,
clout, and authenticity to see how these expressions in words by leaders may suggest an
ulterior political aggression.
International relations scholars agree that states opt for developing nuclear
weapons when facing security threats and do not find other solutions [Sagan, 1997]. This
can essentially explain part of one of the main North Korean policy, the byungjin line,
which results in nuclear and economic development [Han and Joo, 2014]. However, they
have the will to remain non-nuclear states once potential threats are solved [Sagan, 1997].
Given the recent military tensions in Northeast Asia, the work sought to use
LIWC2015 to identify the psycholinguistic behavior of world leaders DJT and KJU by
analyzing their recent speeches. The results of LIWC2015 suggest that KJU is less secure
in discourses in which he threatens nuclearly the entire US territory. On the other hand,
he reveals more confidence and sense of certainty in describing his country as a global
military power and by demanding continued readiness from its military, even in a mo-
ment that the international community observes as conducive to advances in diplomatic
negotiations for the denuclearization of the Korean peninsula.
In relation to DJT, results show his narrative style as more intuitive, revealing im-
pulsiveness in his decision making. Also, during the State of the Union speech, a high
value for clout was observed (i.e., speaking confidently with a sense of certainty), when
he emphasized the need to increase the national defense sector spendings to allow mod-
ernization and reconstruction of the US nuclear arsenal. A lower value for authenticity,
which may suggest a lesser degree of honesty, was observed when DJT stated his hopes
of never needing to use the nuclear arsenal.
As future studies, we intend to analyze verbal expression of candidates in Brazil-
ian presidential election debates, looking at how these candidates compare on three di-
mensions: thinking style, clout, and authenticity. We also intend to analyze tone of voice
and non-verbal expressions in videos as a complementary approach. The results can be
used to compare the emotions expressed verbally to the ones that appear in expressions
and gestures.
Acknowledgments
The authors thank FAPERJ for partial funding support.
78
LADaS 2018 - Latin America Data Science Workshop
References
Ahmadian, S., Azarshahi, S., and Paulhus, D. L. (2017). Explaining Donald Trump
via communication style: Grandiosity, informality, and dynamism. Personality and
Individual Differences, 107:49–53.
Brier, A. and Hopp, B. (2011). Computer assisted text analysis in the social sciences.
Quality & Quantity, 45(1):103–128.
Bryman, A. and Bell, E. (2015). Business research methods. Oxford University Press,
USA.
Chung, C. J. and Park, H. W. (2010). Textual analysis of a political message: the inaugural
addresses of two Korean presidents. Social science information, 49(2):215–239.
Cohen, J. E. (1995). Presidential rhetoric and the public agenda. American Journal of
Political Science, pages 87–107.
de Arantes, M. V. S., de Figueiredo, F., and Almeida, J. (2015). Uma caracterização dos
padrões de navegação de usuários em uma aplicação social de streaming de vı́deo. In
IV Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2015).
Han, T. S. and Joo, J. K. (2014). Can North Korea catch two rabbits at once: Nuke and
economy? one year of the Byungjin line in North Korea and its future. The Korean
Journal of Defense Analysis, 26(2):134–136.
Howard, P. N. et al. (2011). The Arab Spring’s cascading effects. Pacific Standard, 23.
Jordan, K. N. and Pennebaker, J. W. (2017). The exception or the rule: Using words to
assess analytic thinking, Donald Trump, and the American presidency. Translational
Issues in Psychological Science, 3(3):312.
Moghavvemi, S., Sulaiman, A., Jaafar, N. I., and Kasem, N. (2018). Social media as a
complementary learning tool for teaching and learning: The case of YouTube. The
International Journal of Management Education, 16(1):37–42.
Pennebaker, J. W., Boyd, R. L., Jordan, K., and Blackburn, K. (2015). The development
and psychometric properties of LIWC2015. Technical report.
Reif, K. (2017). Senate examines launch authority. Arms Control Today, 47(10):30–31.
Reis, J. C., Gonçalves, P., Araújo, M., Pereira, A. C., and Benevenuto, F. (2015). Uma
abordagem multilıngue para análise de sentimentos. In IV Brazilian Workshop on So-
cial Network Analysis and Mining (BraSNAM 2015).
Sagan, S. D. (1997). Why do states build nuclear weapons? Three models in search of a
bomb. International security, 21(3):54–86.
Savoy, J. (2017). Trump’s and Clinton’s style and rhetoric during the 2016 Presidential
Election. Journal of Quantitative Linguistics, pages 1–22.
79