<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The language e ect in phishing susceptibility</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joakim Kävrestad</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rickard Pettersson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcus Nohlberg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Skövde</institution>
          ,
          <addr-line>Högskolevägen 1, 541 28 Skövde</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <fpage>162</fpage>
      <lpage>167</lpage>
      <abstract>
        <p>Phishing has been, and remains to be, one of the most common types of social engineering. It is the act of tricking users to perform actions they normally wouldn't't using e-mail. Since phishing involves using technical measures to trick users, it is a social technical phenomenon that must be understood from the technical as well as the social side. While phishing and phishing susceptibility has been researched for decades, the e ect of language ability on phishing susceptibility is underresearched. In this paper, we conducted a survey where we had swedes rate their English ability before classifying e-mails in Swedish and English as fraudulent or legitimate. The results shows that the respondents English ability does a ect the ability to correctly identify legitimate emails and brings another piece to the puzzle of phishing susceptibility.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;phishing</kwd>
        <kwd>susceptibility</kwd>
        <kwd>foreign</kwd>
        <kwd>language</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1. Introduction
Social engineering, the act of deceiving end-users, has become one of the most devastating attacks
against computer systems. Attackers manipulate human users in order to circumvent technical
security measures in the endeavour to get access to login credentials, social security numbers, credit card
information or the system itself [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Humans are seen by attackers as the easiest way into a network
and the human factor is involved in 95% of security incidents in companies [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Phishing is a common online threat that is one type of social engineering, and it has been around
since 1995 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Phishing is indeed a practice that has been used by attackers for a long time and
organisations are activity trying to combat it using detection tools, information campaigns and user
training. Even so, phishing attacks continue to be used by attackers that manage to be successful and
causing millions of dollars in damages.
      </p>
      <p>
        Phishing is a complex matter where technology is used to deceive users into performing actions
they would not normally do, making is a Socio-technical system [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The phenomenon that does not
appear to go away. User susceptibility to phishing is widely researched, and suggests that users are bad
at recognizing phishing to a satisfactory degree [
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ]. The aim of this study is to research a phishing
susceptibility aspect that has not gotten a lot of previous attention from the research community; how
good Swedish users are at detecting phishing e-mails in their native language compared to in English.
As such, the paper responds to a need for greater understanding of the social elements of phishing as
described necessary in the literature [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ]. This factor is increasing in importance due to the
multilingual nature of many organizations. For instance, many Swedish organizations now have English
as the rst language within the organization and the same is seen in many other nations.
      </p>
      <p>The rest of this paper is structured as follows; Section 2 presents the methodology used in the study.
Section 3 presents the results and Section 4 concludes the paper and provides directions for future
6th International Workshop on Socio-Technical Perspective in IS development (STPIS’20), June 08–09, 2020, Online
:joakim.kavrestad@his.se (J. Kävrestad); marcus.nohlberg@his.se (M. Nohlberg)
:0000-0003-2084-9119 (J. Kävrestad); 0000-0001-5962-9995 (M. Nohlberg)</p>
      <p>© 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>
        CWPrEooUrckResehdoinpgs hIStpN:/c1e6u1r3-w-0s.o7r3g CEUR Workshop Proceedings (CEUR-WS.org)
work.
2. Methodology
The study was carried out using an online survey, distributed using SurveyMonkey, in which Swedish
social network users were asked to classify 32 e-mails as phishing or legitimate. The e-mails were
grouped into four di erent groups, with eight e-mails in each, as follows:
The participants were also asked to rate their English pro ciency on a six-graded scale based on the
CEFRL framework[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. CEFRL is a guideline for describing achievements in a foreign language and
divides language capability into six levels. The levels were presented to the survey respondents as
follows (Translated from Swedish):
• Beginner - You can present yourself and use simple words and phrases
• Basic - You can understand phrases and the most common words and you can communicate in
simple contexts
• Intermediate - You can handle most situations that arise during travels to countries where
English is used
• Upper intermediate - You can understand the main parts of complex text and , to a certain
degree, interact uently and spontaneously
• Advanced - You can read and understand a large portion of long and demanding texts and use
      </p>
      <p>English spontaneously, exibly and e ciently in social contexts
• You can, without problem, understand everything you read or hear and express yourself uently
in almost any situation</p>
      <p>
        The survey was designed to mimic an authentic situation as far as possible and was therefore
constructed with 32 e-mails that the respondents were asked to classify as phishing or legitimate.
The participants were told that they would receive a score based on how many correct classi cations
they made to introduce an element of gami cation, inspired by [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The e-mails used in the survey
were collected from open sources on the Internet and designed to be hard to classify.The survey was
subjected to pilot testing during one week to ensure that it was understandable to the participants.
Once completed, the survey was spread on social networks.
      </p>
      <p>
        For data analysis, the participants were separated based on their reported English ability.
Respondents classifying themselves in one of the two highest grades were placed in one group (A), and the
other respondents in another group (B). The separation was done arbitrarily to get as equal group
sizes as possible. Mean and Median values for the the two groups and four variables were then
calculated to describe central tendencies. As suggested by [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Shapiro-Wilks tests were used to assess
the distribution form in combination with visual inspection of the gathered data. The variables were
Category
      </p>
      <p>LS
LS
FS
FS
LE
LE
FE
FE</p>
      <p>Group</p>
      <p>A
B
A
B
A
B
A
B
0.08
0.67
found to not be normally distributed and as such, the MannWhitney U-test was primarily used to
assess if identi ed di erences between groups was statistically signi cant [12]. To validated the
result of the signi cance test, T-test was also used for the same purpose providing increased validity
through triangulation [13]. T-test is parametric and not considered appropriate for data not normally
distributed, but can be argued to be robust in this case given the sample size[14]. The conventional
signi cance level of 95% (p&lt;0.05) was used throughout this study.</p>
      <p>To further analyze the correlation between the ability to detect phishing e-mails and perceived
language skill, correlation testing was used for variables where statistically signi cant di erences
were identi ed. Because of the above-mentioned concerns with the distribution form, Kendall’s Tau
was used as the primary correlation test and the parametric Pearsons r was used for validation. Those
tests return a value between -1 and 1 where 1 signi es a perfect positive correlation and -1 signi es
a perfect negative correlation [15].
3. Results
The survey was answered by 152 respondents and the collected data was used to calculate 2 scores
for each participants. The score re ected the number of correct classi cations the participant made
in the following categories, and was calculated as a number between 0 and 8:
• Legitimate Swedish Emails (LS)
• Fraudulent Swedish e-mails (FS)
• Legitimate English E-mails (LE)
• Fraudulent English E-mails (FE)</p>
      <p>The respondents were grouped based on their perceived English ability into one of two groups. The
respondents ranking themselves in one of the two highest categories were put in one group (A, n=78)
and the participants ranking themselves in one of the four lowest categories were placed in group B
(n=74). An overview of the mean and median scores in the di erent categories for the two groups is
presented in Table 1, below.</p>
      <p>The data shown in Table 1 shows that the participants perform well in the two fraudulent categories,
with mean values around 6.5 (of 8). This shows that 81% of the e-mails that were phishing e-mails was
accurately identi ed to be phishing in this study. the mean values indicate a very small di erence ( 0.1)
between the language groups and the p-values are far higher than 0.05 showing that the identi ed
di erence can very well be due to chance. As such, the study does not suggest that the perceived
English ability impact the ability to correctly identify phishing e-mails.</p>
      <p>As seen in table 1, participants with a perceived high English pro ciency score higher when it comes
to accurately identify legitimate e-mails (mean di erence of 1.24) and slightly better for accurately
identifying English phishing e-mail. Mann-Whitney U-test was used to determine if the observed
tendencies were signi cant. A p-value of below 0.05 shows signi cance meaning that the result for
legitimate English e-mails is statistically signi cant, the result is validated by the T-test.</p>
      <p>For the variables were signi cant results was observed, correlation testing was performed. the
correlation test allows for use of the full language pro ciency scale and will account for nuances that
can be missed using the arbitrarily assigned binary variable for language pro ciency. The results of
the correlations tests between the variables for English pro ciency and ability to accurately classify
legitimate English e-mails were as follows:
• Kendalls Tau: 0.228 (p=0.00)
• Pearsons r: 0.279 (p=0.00)</p>
      <p>The tests produce positive numbers around 0.25 with a p-value of 0 meaning that a positive
correlation is identi ed and is signi cant at the level adopted in this study. This suggests that English ability
is correlated with ability of correctly identifying legitimate e-mails in English even if the correlation
coe cient also suggests that other factors, beyond the scope of this study, plays a role.
4. Conclusions
The aim of this study was to identify how the perceived English ability of Swedish participants a ect
the participants ability to correctly identify e-mails as legitimate of phishing. A survey containing
32 e-mails in four di erent categories were used and participants were invited to classify the e-mails
as fraudulent or legitimate. The participants got a score in each category, and were then grouped by
their self reported English ability.</p>
      <p>In summary, the survey suggests that Swedish speaking users that are good at English are better at
correctly identifying legitimate English e-mail as legitimate. However, perceived language skill does
not seem to impact the ability to detect phishing e-mails in this survey. Still, the survey does suggest
that language pro ciency is an important factor in determining if e-mails are legitimate or not, which
should be considered in future attempts to prevent phishing.</p>
      <p>
        Another insight from this survey is that the participants score rather high in terms of correctly
identifying phishing e-mails, the mean score is around 6.5 for Swedish and English phishing e-mails
showing that the mean percentage of correct answers was 81%. this is almost 10% higher than the
results reported by [16] and far better than [17] and [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] where about 70% of the participants fell for
the phishing experiments conducted. The di erence in results can have various reasons. One can,
of course, be that the sample examined in this study is better at phishing detection than samples in
other studies. It is known that cultural aspects a ect security behaviour[18, 19]. another explanation
to the di erence can be that this study, as well as [16] use a survey methodology where participants
are aware that they are being tested on phishing and focused on nding phishing e-mails. [17] and
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] perform observation studies where the users normal use of computers is taken into account. This
suggests that awareness is a major factor in a users ability to detect phishing e-mails. While this
discussion questions the use of the survey methodology for phishing susceptibility studies, it is hard
to see another suitable method for measuring di erences between controlled groups. And while the
success rate reported in this paper should perhaps be interpreted with care, the paper successfully
identi es that language ability a ects the ability to correctly identify legitimate e-mails as legitimate.
      </p>
      <p>Another conclusion from this study is that English ability impacts the participants ability to
correctly identify legitimate English e-mails but does not impacts the ability to identify phishing e-mails.
This result could suggest that the participants use non-language related cues (e.g. senders address and
link addresses) to identify phishing e-mails. As such, the study identi es an interesting conundrum.
Looking for language mistakes is a common advice on how to detect fraudulent e-mails, but if you
receive e-mail in a language that you are not uent in it is perhaps not a very helpful advice. In light
of the world and organizations in it becomes more global, one can argue for a need of better ways to
assist users in detecting fraudulent e-mails.</p>
      <p>This study contributes to the knowledge around phishing susceptibility and shows that language
skill is an important factor when users identify e-mails as legitimate or fraudulent. The paper, in
comparison to other phishing susceptibility studies, also suggest that awareness is a key factor in
phishing susceptibility, an insight that contributes to practitioners that design and implement security
measures. While phishing and susceptibility to phishing is well researched the problem remains. As
discussed in this paper, phishing research is di cult not only due to the complex nature of the problem
but also due to ethical restrictions. The need for future work focusing on identifying ways to combat
phishing is imperative. Future studies could focus on nding ethically sound methods to perform
in-dept studies on phishing susceptibility. Another direction for future work could be survey-based
studies used to identify other demographic aspects that a ect phishing susceptibility.
[12] P. E. McKnight, J. Najab, Mann-whitney u test, The Corsini encyclopedia of psychology (2010)
1–1.
[13] Y. S. Lincoln, E. G. Guba, Naturalistic inquiry, 1985.
[14] G. Norman, Likert scales, levels of measurement and the “laws” of statistics, Advances in health
sciences education 15 (2010) 625–632.
[15] M. M. Mukaka, A guide to appropriate use of correlation coe cient in medical research, Malawi
medical journal 24 (2012) 69–71.
[16] S. Sheng, M. Holbrook, P. Kumaraguru, L. F. Cranor, J. Downs, Who falls for phish? a
demographic analysis of phishing susceptibility and e ectiveness of interventions, in: Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems, 2010, pp. 373–382.
[17] A. Mihelič, M. Jevšček, S. Vrhovec, I. Bernik, Testing the human backdoor: Organizational
response to a phishing campaign, Journal of Universal Computer Science 25 (2019) 1458–1477.
[18] K.-L. Thomson, R. Von Solms, L. Louw, Cultivating an organizational information security
culture, Computer fraud &amp; security 2006 (2006) 7–11.
[19] A. Da Veiga, J. H. Elo , A framework and assessment instrument for information security culture,
Computers &amp; Security 29 (2010) 196–207.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Salahdine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kaabouch</surname>
          </string-name>
          ,
          <article-title>Social engineering attacks: A survey</article-title>
          ,
          <source>Future Internet</source>
          <volume>11</volume>
          (
          <year>2019</year>
          )
          <fpage>89</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Sherman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <article-title>Phishing in an academic community: A study of user susceptibility and behavior</article-title>
          ,
          <source>Cryptologia</source>
          <volume>44</volume>
          (
          <year>2020</year>
          )
          <fpage>53</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Chiew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S. C.</given-names>
            <surname>Yong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>A survey of phishing attacks: their types, vectors and technical approaches</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>106</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lacey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Salmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Glancy</surname>
          </string-name>
          ,
          <article-title>Taking the bait: a systems analysis of phishing attacks</article-title>
          ,
          <source>Procedia Manufacturing</source>
          <volume>3</volume>
          (
          <year>2015</year>
          )
          <fpage>1109</fpage>
          -
          <lpage>1116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Parsons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McCormac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pattinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Butavicius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jerram</surname>
          </string-name>
          ,
          <article-title>Phishing for the truth: A scenario-based experiment of users' behavioural response to emails</article-title>
          , in: IFIP International Information Security Conference, Springer,
          <year>2013</year>
          , pp.
          <fpage>366</fpage>
          -
          <lpage>378</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M. V.</given-names>
            <surname>Marques</surname>
          </string-name>
          ,
          <article-title>Phishing through time: A ten year story based on abstracts</article-title>
          .,
          <source>in: ICISSP</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hinds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Joinson</surname>
          </string-name>
          ,
          <article-title>Exploring susceptibility to phishing in the workplace</article-title>
          ,
          <source>International Journal of Human-Computer Studies</source>
          <volume>120</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abbasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Zahedi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Phishing susceptibility: The good, the bad, and the ugly</article-title>
          ,
          <source>in: 2016 IEEE Conference on Intelligence and Security Informatics (ISI)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>169</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] Council of Europe, Common European Framework of Reference for Languages: learning, teaching</article-title>
          , assessment,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>M. L. Hale</surname>
            ,
            <given-names>R. F.</given-names>
          </string-name>
          <string-name>
            <surname>Gamble</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Gamble</surname>
          </string-name>
          ,
          <article-title>Cyberphishing: A game-based platform for phishing awareness testing</article-title>
          ,
          <source>in: 2015 48th Hawaii International Conference on System Sciences, IEEE</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>5260</fpage>
          -
          <lpage>5269</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pala</surname>
          </string-name>
          ,
          <article-title>Type i error rate and power of three normality tests</article-title>
          ,
          <source>Pakistan Journal of Information and Technology</source>
          <volume>2</volume>
          (
          <year>2003</year>
          )
          <fpage>135</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>