<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Method Of Recognition Sarcasm In English Communication With The Application Of Information Technologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Serhii Trystan</string-name>
          <email>serhii.trystan@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olha Matiushchenko</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maryna Naumenko</string-name>
          <email>mv.naumenko@ukr.net</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>Nauky Ave. 14, Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The article developed a software application for recognizing sarcasm in English communication. NLP technology is used to implement machine learning. Python programming language. Comparisons with known algorithms and models are made. The advantage in the simplicity of the method implementation and the speed of recognition, which corresponds to live communication, is proved.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Artificial intelligence</kwd>
        <kwd>language recognition</kwd>
        <kwd>machine learning</kwd>
        <kwd>sarcasm</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction1</p>
      <p>Human language is extremely complex and
contains a significant number of linguistic
constructions. Language recognition and
translation is a well-developed area of machine
learning. However, living human language
contains such elements as humor, irony, pun,
aphorism, sacredness, which are not always
correctly recognized by native speakers, and
recognizing them with the help of intelligent
information technology becomes quite a difficult
task. At the same time, virtual translators must,
in a reasonable amount of time (preferably in real
time), recognize a person's living language and
communicate its content and emotional and
logical implication to the user. Today, the
universal language of communication is English.
Sarcasm is the most complex language
construction, because in a sentence with sarcasm
one logical construction is confirmed, and the
opposite is understood [1]. Thus, recognizing
sarcasm in communication is quite a challenge.</p>
      <p>At present, there are a sufficient number of
electronic translators designed to facilitate the
communication process.</p>
      <p>
        Living language recognition is based on
electronic dictionaries and Data Science (DS)
technologies [
        <xref ref-type="bibr" rid="ref1">2</xref>
        ]. In DS such direction as Nature
- Language Processing (NLP) is allocated. This
area studies the problems of computer analysis
and synthesis of natural language.
      </p>
      <p>
        For artificial intelligence, analysis means
understanding language, and synthesis means
generating intelligent text [
        <xref ref-type="bibr" rid="ref2">3</xref>
        ].
      </p>
      <p>Solving the problems associated with the
analysis and synthesis of language structures will
mean creating a more convenient form of
interaction between computer and human, as
well as ensuring communication through
electronic translators.</p>
      <p>
        Examples of using NLP are such services and
applications as Siri (assistant for operating
systems from Apple: iOS, watchOS, macOS,
HomePod and tvOS) [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ], Cortana (virtual
assistant in Windows) [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ], Gmail Spam Filter
(analysis service) and selection of mail with
spam) [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ].
      </p>
      <p>It should be noted that there are currently a
sufficient number of applications that implement
NLP.</p>
      <p>However, simple solutions are needed that
will quickly recognize sarcasm in living
language in communication.</p>
      <p>
        The theoretical basis of the work was works
[
        <xref ref-type="bibr" rid="ref6">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">8</xref>
        ] works on applied statistical analysis [9]
and applied analysis of text data in Python [10].
of
statement
tools
for
and
its
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Main part</title>
    </sec>
    <sec id="sec-3">
      <title>3.1. Problem substantiation solution</title>
      <p>The task to be solved in the article is the
automatic recognition of sarcasm in live English
communication.</p>
      <p>
        The main tools for solving this problem are:
1. Statistical and mathematical methods of
big data processing [
        <xref ref-type="bibr" rid="ref6">7</xref>
        ]−[10];
2. Dataset for model learning [11];
3. Python programming language 3.8.1 [12];
4. Jupyter notebook development
environment;
5. A set of libraries (sklearn, re, pandas,
numpy, nltk (natural language toolkit),
matplotlib).
      </p>
    </sec>
    <sec id="sec-4">
      <title>3.2. Dataset choosing and bring it to normal</title>
      <p>The date set was chosen on Kaggle.com. The
required data set is called “News Headlines
Dataset For Sarcasm Detection. High quality
dataset for the task of Sarcasm Detection” [11].
This dataset contains news headlines that are
collected from two sites: TheOnion and
HuffPost. Each record consists of three
attributes, and itself:</p>
      <p>1. is_sarcastic (1, if the entry is sarcastic,
and 0 if not sarcastic);
2. headline (the title of the page);
3. article_link (link to the page from which
the title was taken).</p>
      <p>In Figure 1 shows the structure of this dataset.</p>
      <p>As can be seen from fig. 1, the first steps in
creating an information technology for sarcasm
recognition - is to bring the column "headline" to
normal, which consists of bringing all the letters
to lowercase, removing dots and spaces.</p>
      <p>In Figure 2 shows the script for bringing the
“headline” column to normal.</p>
    </sec>
    <sec id="sec-5">
      <title>Stemming words</title>
      <p>The next stage of the method is word
stemming. In the field of natural language
processing, there are cases when two or more
words have a common root. Stemming reduces
all counter word forms to one, normal
vocabulary form.</p>
      <p>There are two main steaming algorithms:
Porter's algorithm and Lancaster's algorithm [9].
The developed script uses Porter's algorithm
because it is less aggressive to word forms.
Lancaster's algorithm is quite aggressive,
because it strictly "cuts" the word and makes it
very confusing, which is impractical in
recognizing such a complex linguistic
phenomenon as sarcasm. In Figure 3 shows the
use of the Portrait algorithm with respect to the
“headline” column.</p>
    </sec>
    <sec id="sec-6">
      <title>Convert text to numbers</title>
      <p>The next step of the method is to convert the
text into a meaningful representation of numbers,
which will be used in machine learning
algorithms for prediction. In Figure 4 shows the
use of the TfidfVectorizer function, which was
taken from the sklearn.feature_extraction.text
library.</p>
      <p>After all the steps to bring the data to values
that can be used in computer training, we need to
determine the model of machine learning.</p>
    </sec>
    <sec id="sec-7">
      <title>Choosing a machine learning 3.6.</title>
    </sec>
    <sec id="sec-8">
      <title>Machine learning 3.5. model</title>
      <p>The logistic regression is chosen as the basic
model of machine learning. Logistic regression is
a machine learning classification algorithm that
is used to predict the probability of a categorical
dependent variable. In logistic regression, the
dependent variable is a binary variable
containing data encoded as 1 (yes, success) or 0
(no, failure). Since our problem is a binary
classification problem (1 - sarcasm, 0 - not
sarcasm), logistic regression is a relevant model
[9].</p>
      <p>In Figure 5 presents a graph of logistic
regression.</p>
      <p>Another metric for evaluating the quality of
the model is the ROC curve (one of the most
popular quality functionalities in binary
classification problems) [9].</p>
      <p>In Figure 8 shows the ROC - curve obtained
in the work.</p>
      <p>Before training the model, divide the dataset
into training and test samples with the following
parameters: 30 percent of the entire sample will
go to the test data set, and the other 70 to the
training (Figure 6).</p>
      <p>The accuracy of the model is checked using
cross-validation (re-sampling procedure). The
decision to choose this method is based on its
simplicity and obtaining a less biased or less
optimistic assessment of the quality of the model
than other methods. In</p>
      <p>Figure 7 shows the accuracy of the model that
was verified by cross-validation.</p>
      <p>After carrying out stages designing model of
machine learning it is necessary to carry out the
test.
3.7.</p>
    </sec>
    <sec id="sec-9">
      <title>Model testing</title>
      <p>In Figure 9 shows an example of testing the
model on the phrase: "Oh, have I touched that
tiny ego of yours?".</p>
      <p>Information technology has
determined that this sentence is sarcasm.
rightly</p>
      <p>It is also necessary to compare the results
with known algorithms and implementations.
This comparison is shown in Table 1.</p>
      <p>Thus, we can conclude about the
effectiveness of the developed method and the
possibility of its application for the recognition
of sarcasm in live communication on English.
4. Conclusions</p>
      <p>1. Recognition of linguistic constructions of
natural language is a difficult task on the border
of such scientific areas as AI, ML, philology.</p>
      <p>2. Recognizing sarcasm in living language is
one of the most difficult tasks, because a person
masks sarcasm.</p>
      <p>3. Recognition of sarcasm should be done
quickly (commensurate with the pace of
communication), so decisions should be simple
and effective.</p>
      <p>4. The obtained solution has a degree of
recognition of 0.83, but in contrast to more
powerful solutions, it is quite fast.</p>
      <p>5. The article presents the results of the study
of ML and NLP in terms of solving the problem
of identification and classification of sarcasm.</p>
    </sec>
    <sec id="sec-10">
      <title>5. References</title>
      <p>[1] "Definition of SARCASM" [Online].</p>
      <p>Available: www.merriam-webster.com.
[Accessed 29 June 2021].
[9] Atienza R. Advanced Deep Learning with
TensorFlow 2 and Keras: Apply DL, GANs,
VAEs, deep RL, unsupervised learning,
object detection and segmentation, and
more. Packt Publishing Ltd, 2020. p. 512.
[10] "News Headlines Dataset For Sarcasm
Detection" [Online]. Available:
https://www.kaggle.com/rmisra/newsheadlines-dataset-for-sarcasm-detection.
[Accessed 29 June 2021].
[11] Guttag J. V. Introduction to Computation
and Programming Using Python: With
Application to Understanding Data, 2017.
p. 447.
[12] Oleksandr Laptiev, Savchenko Vitalii,
Serhii Yevseiev, Halyna Haidur, Sergii
Gakhov, Spartak Hohoniants. The new
method for detecting signals of means of
covert obtaining information. 2020 IEEE
2nd International Conference on Advanced
Trends in Information Theory (IEEE ATIT
2020) Conference Proceedings Kyiv,
Ukraine, November 25-27. pp.176 –181.
[13] Oleg Barabash, Oleksandr Laptiev,
Valentyn Sobchuk, Ivanna Salanda, Yulia
Melnychuk, Valerii Lishchyna.
Comprehensive Methods of Evaluation of
Distance Learning System Functioning.
International Journal of Computer Network
and Information Security (IJCNIS). Vol. 13,
No. 3, Jun. 2021. рр.62-71, DOI:
10.5815/ijcnis.2021.03.06.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[2] "About Data Science / Data Science Association" [Online]. Available: www.datascienceassn.org. [Accessed 3 April</source>
          <year>2021</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Goldberg</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>A primer on neural network models for natural language processing</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <year>2016</year>
          , No 57, pp.
          <fpage>345</fpage>
          -
          <lpage>420</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bhat</surname>
            <given-names>H. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lone</surname>
            <given-names>T. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paul Z. M.</surname>
          </string-name>
          <article-title>Cortanaintelligent personal digital assistant: a review</article-title>
          .
          <source>International Journal of Advanced Research in Computer Science</source>
          ,
          <year>2017</year>
          , No
          <volume>8</volume>
          (
          <issue>7</issue>
          ), pp.
          <fpage>55</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Fumera</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pillai</surname>
            <given-names>I.</given-names>
          </string-name>
          , Roli F.
          <article-title>Spam filtering based on the analysis of text information embedded into images</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <year>2006</year>
          , No.
          <volume>7</volume>
          (
          <issue>12</issue>
          ), pp.
          <fpage>2699</fpage>
          -
          <lpage>2720</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Liddy</surname>
            ,
            <given-names>E.D.</given-names>
          </string-name>
          <year>2001</year>
          .
          <article-title>Natural Language Processing</article-title>
          .
          <source>In Encyclopedia of Library and Information Science</source>
          , 2nd Ed. NY. Marcel Decker, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[7] Practical statistics for Data Science specialists: Translated</source>
          from English / P. Bryus,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bryus</surname>
          </string-name>
          . Petersburg,
          <year>2018</year>
          . 304 p.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Bengfort</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilbro</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Okheda</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <source>Applied Text Data Analysis in Python. Machine Learning and the Creation of Natural Language Processing Applications</source>
          , St. Petersburg: Peter,
          <year>2020</year>
          .368 p.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>