<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>of Transitivity Relation in Natural Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petro Zdebskyi</string-name>
          <email>petro.v.zdebskyi@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Berko</string-name>
          <email>Andrii.Y.Berko@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victoria Vysotska</string-name>
          <email>victoria.a.vysotska@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>S. Bandera Street, 12, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Osnabrück University</institution>
          ,
          <addr-line>Friedrich-Janssen-Str. 1, Osnabrück, 49076</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Motivation of this work is a data-centric approach of improving model accuracy by improving data quality instead of improving model architecture. The idea is to improve dataset with transitivity relations to help machine learning model learn such dependencies. Alongside with enriching dataset investigate how good is previously trained model in catching such relations. So, basically study can be divided into two main parts investigating dataset and investigating machine learning model trained on such datasets. It was found that the existing model catches transitive dependencies well. It was also investigated that “entailment” relation is more directional that “contradiction” and “neutral”.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Natural Language Inference</kwd>
        <kwd>Recognizing Textual Entailment</kwd>
        <kwd>transitive relation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        NLI task is direct relation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Let's check relation in the opposite direction. Instead of making
inference from first sentence to second sentence, swap them and make the inference from the second
sentence to the first one.
      </p>
      <p>Model that will be used called Roberta. It is huge deep learning model trained by Facebook and
finetuned on previously mentioned dataset (MultiNLI). It is improved BERT model. Performance was
improved by training the model on more data for a longer period of time; removed the next sentence
prediction objective; and dynamically changed the masking pattern on to the training data. So, basically
it has improved training procedure, which is called Roberta, and it achieves better results on various
benchmarks, without finetuning for GLUE or any additional data for SQuAD.</p>
      <p>Investigate dataset if second sentence on some samples are the same as first on other samples (strict
and fuzzy matching) for creating dataset for analysing transitivity and check Roberta ability of learning
transitivity relation on the dataset created on the previous step.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works. Transformer architecture</title>
      <p>
        The Transformer is a neural network architecture designed for natural language processing (NLP)
tasks. It has become one of the most widely used and popular architectures in NLP. This architecture
is based on a self-attention mechanism, which allows to better understand the context and relationships
between words because the model looks at different parts of the input sequence [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ]. Unlike traditional
NLP models such as convolutional or/and recurrent neural networks, the Transformer is faster and more
efficient because it process the entire sequence in parallel [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4-7</xref>
        ].
      </p>
      <p>It composed of an encoder and a decoder, each of which is made up of a stack of identical layers.
The encoder takes the input sequence and produces a sequence of hidden states, while the decoder takes
the encoder output and produces the final output sequence. Both the encoder and decoder use
selfattention to process the input and produce the output.</p>
      <p>One of the main advantages of the Transformer is its ability to handle variable-length input
sequences without the need for padding or truncation. This is achieved through the use of positional
encoding, which encodes the position of each word in the input sequence as a vector.</p>
      <p>The Transformer has achieved state-of-the-art performance on a wide range of NLP tasks, including
machine translation, language modeling, question answering, and sentiment analysis. Many popular
NLP models, such as BERT, GPT-3, and RoBERTa, are based on the Transformer architecture.</p>
      <p>Positional encoding is a technique used in NLP to encode the relative position of tokens in a
sequence. In NLP, sequences of tokens are often processed by models such as the Transformer, which
are based on attention mechanisms. Since these models don’t have any notion of order they require an
additional signal to represent the order of the sequence. Positional encoding achieves this by adding an
encoding vector to the embedding of each token that captures the token's position in the sequence. The
encoding vector is calculated based on the position of the token in the sequence and the dimension of
the embedding space. This encoding vector is added to the token's embedding, which allows the model
to differentiate between tokens based on their position in the sequence. For tasks such as language
modeling, machine translation, and text classification it is important to capture both the semantics of
the tokens and their position in the sequence and the Transformer model with positional encoding can
achieve this.
2.1.</p>
    </sec>
    <sec id="sec-3">
      <title>Encoder and Decoder</title>
      <p>Encoder-decoder is a type of neural network architecture commonly used in natural language
processing and machine translation tasks. It consists of two main components: an encoder and a
decoder.</p>
      <p>As it shown in Fig. 1, the encoder takes an input sequence and converts it into a fixed-length vector
representation, which captures the essential information of the input. This vector is passed on to the
decoder, which generates an output sequence based on the input vector.</p>
      <p>Such architecture is well-suited for tasks where the input and output have different lengths because
it can handle sequences of variable length for input and output. Examples of such tasks can be machine
translation, text summarization.</p>
      <p>
        The encoder contains 6 identical layers stacked together [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7">1-7</xref>
        ]. Each of them two sub-layers: a
multihead self-attention, and a fully connected feed-forward network. There is a residual connection around
each of the two sub-layers, followed by the normalization layer [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref8 ref9">8-12</xref>
        ].
      </p>
      <p>The decoder takes the vector from encoder and generates an output sequence. The decoder often has
residual connections and normalization similarly to encoder.</p>
      <p>The encoder typically has a stack of recurrent or convolutional neural networks that process the
input sequence one token at a time and produce a sequence of hidden states. It takes an input sequence
such as a sentence in one language and transforms it into a fixed-length vector representation called a
context vector.
2.2.</p>
    </sec>
    <sec id="sec-4">
      <title>Scaled Dot-Product Attention</title>
      <p>Scaled Dot-Product Attention is one of the main components of the Transformer. It is the attention
mechanism that helps model to generate the output sequence better by concentrating on the most
relevant parts of the input sequence. It computes a weighted sum of values based on a set of key-value
pairs where the weights are computed by taking the dot product of the query with each key and then
applying a softmax function to normalize the weights. Also, to prevent the gradients from becoming
too large dot product is scaled by the square root of the dimension of the key vectors. As a result,
weighted sum is then used as input to the next layer in the network.</p>
      <p>Calculation of the attention on a set of the queries is stored in a matrix Q. Keys are stored in matrix
K and values in matrix V. Formula for Dot-Product Attention:

( ,  ,  ) = 
  
(
√ 
)
(1)</p>
      <p>There are two attention functions that are used. First is additive and the second is multiplicative
attention. In complexity these are similar but the product is faster and more efficient in terms of space
because it’s implemented with matrix multiplication code that is highly optimized.</p>
      <p>If we have small values of   then two approaches works similarly but additive attention works
better than dot product attention without scaling when   have bigger values. The dot products grow
and pushes the softmax function where it has extremely small gradients when we have large values of
  .</p>
      <p>Such Attention mechanism is very useful in NLP tasks where the input sequences can be very long
because it allows the model to focus only on the most relevant parts of the sequence and as a result it
improves the accuracy of the model.
2.3.</p>
    </sec>
    <sec id="sec-5">
      <title>Multi-Head Attention</title>
      <p>Multi-head attention is commonly used in neural network models for NLP tasks like machine
translation and text classification. It allows the model to analyse simultaneously distinct parts of the
sequence. It has improved ability to analyse difficult relationships between the parts of the input.</p>
      <p>It is better to linearly project queries, keys, and values the number of times h with different
projections on   ,   , and   sizes instead of using single attention with   -sized keys, values,
and queries. Attention function is performed in parallel on predicted versions of queries, keys and
values. With that we obtain output values which are   -dimensional.</p>
      <p>The weighted sum of the input sequence is then computed for each head, and the results are
concatenated together to form a single output vector. After that multi-head attention output is passed
through a basic feedforward neural network layer. Multi-head focus allows models to share information
from different view subspaces in different positions. By averaging we control this because one head of
attention is used.</p>
    </sec>
    <sec id="sec-6">
      <title>Application of Attention in the Model</title>
      <p>There are three different ways in which multi-headed attention can be used in Transformer:
 In the encoder-decoder attention, keys and memory values passed from the output of the encoder
but requests from the decoder last level. Specifically, at each time step in the decoder, the attention
mechanism computes an attention weight vector over the encoded input sequence. This allows
each position in the decoder to visit all positions in the input sequence. This is very similar to the
common approaches of the encoder-decoder attention in the seq-to-seq models.
 In Self-Attention encoder approach it has the same attention to itself. In the level of self-awareness
everything is coming from the one place. In this approach the output is taken from the encoder
previous level. Each head of the attention mechanism attends to a different part of the input
sequence and computes an attention weight vector. In the encoder each position can be in any or
all positions of the decoder’s previous level.
 Similarly, the decoder self-attention allows decoder to be in all positions of decoder until this
position. At each time step, the decoder attends to the previously generated tokens in the target
sequence to compute an attention weight vector. It's implemented inside the scaled attention of the
product point and all values at the softmax are masked input to prevent the left flow of the
information.</p>
      <p>
        The Transformer is can to capture complex relation between the input and output by using
multihead attention in these ways, that is why it’s one of the most successful model in NLP. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
      </p>
    </sec>
    <sec id="sec-7">
      <title>3. Methods and technology 3.1.</title>
    </sec>
    <sec id="sec-8">
      <title>BERT</title>
      <p>In this section there is details about BERT implementation. In general, there are two steps in the
structure of the model. First is pre-training and the seconds is fine-tuning. The model is trained in
unsupervised manner (unlabeled data) in different tasks tasks during pre-training. The model is
initialized with trained parameters and then all parameters are adjusted using supervised approach
(labeled data) for fine-tuning. For each task there is different fine-tuned models despite the fact that
they are initialized with the identical pre-trained parameters.</p>
      <p>A distinctive feature of such model is its that for different tasks is uses the same architecture. It
means that there is only slight difference between the final architecture and pre-trained architecture.</p>
      <p>
        The BERT model architecture is a multilayer Transformer encoder which is bidirectional. The
database for BERT have the same model size as OpenAI first generation GPT model. Although, the
GPT Transformer has limited self-awareness so it can only look to the left in the context while the
BERT has bidirectional self-awareness. BERT is trained on a task called Masked Language Modeling
(MLM), where it is given a sentence with some of the words randomly masked and is asked to predict
the masked words based on the context of the other words in the sentence [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This helps model to learn
the relationships between words and to understand the meaning of a sentence [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. BERT is also trained
on a task called Next Sentence Prediction (NSP), where it is given a pair of sentences and is asked to
predict whether the second sentence is the next sentence in the document after the first sentence or not.
      </p>
      <p>The representation of the input can represent a pair of sentences and a single sentence to make model
work with different subsequent tasks. The sequence in this context is sequence of tokens passed to
model that can be one sentence or two sentences together.</p>
      <p>
        Embedding that is used in BERT called WordPiece embedding. The is special marker at the start of
each sequence. There are two ways differentiate between sentences. First approach is to use special
character to separate. Another is adding to each marker the learned embedding this says if it belongs to
one sentence or another. [
        <xref ref-type="bibr" rid="ref11 ref13 ref14 ref15 ref4">4, 11, 13-15</xref>
        ]
3.2.
      </p>
    </sec>
    <sec id="sec-9">
      <title>Roberta</title>
      <p>RoBERTa (Robustly Optimized BERT Pre-training Approach) is a variant of the BERT
(Bidirectional Encoder Representations from Transformers) model. But it has massive dataset for
pretraining compared to BERT (160 GB of text data). It uses dynamic masking which randomly masks out
different tokens at each training epoch. RoBERTa is trained on a masked language modeling task
instead of Next Sentence Prediction.</p>
      <p>
        In the fist implementation, random replacement and masking is made once at the beginning and
stored during the training but the data is duplicated in practice so for each sentence the mask is not
always the same [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. NSP or Next Sentence Prediction is a binary classification task to predict if two
sentences comes one after another in the text. Examples for training crated by extracting consecutive
sentences from the training text data while negative created using sentences from different documents.
      </p>
      <p>
        The Next Sentence Prediction task is used to improve the performance of tasks like as natural
language inference that require understanding the relationship between sentences. [
        <xref ref-type="bibr" rid="ref12 ref16 ref17 ref18 ref5">5, 12, 16-18</xref>
        ]
3.3.
      </p>
    </sec>
    <sec id="sec-10">
      <title>MultiNLI dataset</title>
      <p>The MultiNLI (Multi-Genre Natural Language Inference) dataset is a popular benchmark dataset for
natural language understanding tasks in machine learning. It has 433,000 examples and it makes it one
of the biggest dataset for inference in natural language. MultiNLI has higher complexity by using data
from 10 genres of written and spoken English. That allows evaluate models for almost complete
complexity of the language and provides an obvious settings for assessing domain adaptation.</p>
      <p>The MultiNLI dataset consists of sentence pairs drawn from various genres, including fiction,
government reports, and conversational speech. Each sentence pair is labeled with one of three
relationship types: entailment, contradiction, or neutral.</p>
      <p>
        The NLP tasks depend on understanding natural language or NLU to succeed [
        <xref ref-type="bibr" rid="ref19 ref20 ref21">19-21</xref>
        ]. A lot of work
has been done to advance applied understanding natural language tasks so that the model can succeed
in these issues. In general model must be accurate in NLU as well as in additional machine learning
tasks like memory access or structured prediction. Given that it makes it pretty difficult to make the
correct assessment on how NLP models understand the meaning of language [22-27].
      </p>
      <p>The entailment relationship indicates that the first sentence logically entails the second sentence,
meaning that if the first sentence is true, then the second sentence must also be true. The contradiction
relationship indicates that the first sentence contradicts the second sentence, meaning that if the first
sentence is true, then the second sentence must be false. The neutral relationship indicates that there is
no logical relationship between the two sentences [28-36].</p>
      <p>Methodology for data collection in MultiNLI is very similar to the SNLI. It has pair of sentences
from the previous source and then the annotator was asked to make a new sentence. The dataset was
created to encourage research into natural language understanding across multiple genres, rather than
just a single genre, which was the case with many previous datasets. This makes it more challenging
and realistic for models to perform well on this dataset.</p>
      <p>The text for MultiNLI assumption sentences comes from 10 sources of fully available text. Therefore
it should be diverse and represent American English. The sentence pairs in the MultiNLI dataset were
drawn from ten different genres, including telephone conversations, travel guides, and government
documents. This ensures that the dataset covers a wide range of topics and writing styles [37-41].</p>
      <p>
        The gold mark was assigned for each pair of sentences that represent the majority of votes between
the mark assigned by the original annotator and the four marks assigned by the verifying annotators.
Some small number of sentences did not have a consensus. Such sentences should not be used in
standard estimates and are included in the distributed housing but they have a dash as a gold label. As
it shown in Fig. 5 dataset has sentence pair, genre, gold label and all labels assigned by annotators. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
      </p>
    </sec>
    <sec id="sec-11">
      <title>Properties of binary relations</title>
      <p>The binary relation can be defined this way “There any sets A and B. Binary relation called R from
A to B and can be described formally as R : A × B. Also, this relation is a subset of A × B set.”</p>
      <p>Reflexive is a relation R on a set A is called reflexive if (a, a) ∈ R for every element a ∈ A. Every
vertex has a self-loop.
relation such that for all a, b ∈ A, if (a, b) ∈ R and (b, a) ∈ R.</p>
      <p>Antisymmetric relation if for all a, b ∈ A, if (a, b) ∈ R and (b, a) ∈ R. So between distinct vertices
there is at most one edge.</p>
    </sec>
    <sec id="sec-12">
      <title>Textual Entailment as a Directional Relation</title>
      <p>
        In textual entailment task is premise and hypothesis relation is considered directional. For example,
let’s take “Last year I was in Paris” as a premise and “I was abroad a year ago”. In the example we
clearly can see that relation is directional, we can entail hypothesis from the premise, but cannot do
other way around because if someone was abroad doesn’t mean he was in Paris. Only few authors
exploited the directional character of the entailment relation, which means that if T → H it is unlikely
that the reverse H → T also holds. From a logical point of view, the entailment relation is alike to the
implication which, contrary to the equivalence, is not symmetric. [
        <xref ref-type="bibr" rid="ref2 ref8 ref9">2, 8, 9</xref>
        ]
      </p>
    </sec>
    <sec id="sec-13">
      <title>4. Check of how many hypothesis are premises in other samples</title>
      <p>Idea is to check if there are hypothesises (second sentences) which are the premises (first sentences)
in other samples. That will allow us to create dataset to validate how good is models in catching
transitive relation. Strong matching is used, but fuzzy matching can be utilized as well. Sample of the
dataset was checked because the dataset is huge and it will take a lot of time to make calculations on
the whole dataset.</p>
      <p>= (
⋅ 
)2
(2)</p>
      <p>Iteration time on my computer was 0.0028, so we can plot the graph of calculation time. So, even
for half of the dataset we need 84 hours of calculations.</p>
    </sec>
    <sec id="sec-14">
      <title>5. Discussion and conclusions 5.1.</title>
    </sec>
    <sec id="sec-15">
      <title>Validation of model entailment with transitive on data</title>
      <p>From all of these rules only first one can be considered strictly. If have entailment in first pair and
entailment in second one then there should be entailment relation between premise of first pair and
hypothesis of second pair. For example first pair: “I have been to Paris” and “I have been to France”,
second pair: “I have been to France” and “I visited France”, then “I have been to Paris” and “I visited
France” is an entailment. Similarly we can explain for other labels.</p>
      <p>Based on domain knowledge such rules were created:</p>
      <p>Absolutely. entailment</p>
      <p>Absolutely. Definitely.
I'm afraid
not, sir.
you know
and he you
know i don't
know
Not at
all-or at least I
don't think
so.</p>
      <p>Not at
all-or at least I
don't think
so.
uh-huh
that's true
that's true
yeah</p>
      <p>In these tables wrong classifications are highlighted with grey background. We can use accuracy
metric here, because classes are balanced. Accuracy here equals 75% (6/8*100). Accuracy is calculated
based only on 8 samples so it cannot be considered reliable from statistical point of view, but there is
restriction related to computational complexity. When we go deeper in analysis of the result we can see
that there are 2 errors for the samples in which there are errors on the previous table. For other classes
samples is accurate. So, there are troubles only with samples for which model failed even on the original
dataset, but if we look at the samples they are really hard to decide that this two sentences are really
contradiction. As a conclusion, we can say that in general model catched transitive relation, except for
the samples in which model fails on the original dataset. Possible improvement will be to validate
samples with class “contradiction” in the dataset model was trained on.
5.2.</p>
    </sec>
    <sec id="sec-16">
      <title>Model entailment from hypothesises to premises</title>
      <p>
        We’ve decided to check if what we will get if we swap first and second sentence. Second sentence
will be considered as premise and first one as hypothesis [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Because the dataset is huge random
subset containing 10000 samples was taken from the dataset (calculation time was more than 2 hours).
      </p>
      <p>In the table we can see that accuracy decreased by a factor of 2.97, 1.17, 1.26 for class similar
(entailment), neutral, and contradiction respectively. From this we can infer that similar (entailment)
class is the most directional relation, then contradiction, and then neutral. These number are actually
intuitive because if one of the sentences contradicts to another one usually means that it can be in reverse
order from second sentence to the first one. For class “neutral” accuracy drops even less compared to
“contradiction” because neutral sentences are usually even more often neutral in other direction. As for
“entailment” class direction is very strong and we can support that with the numbers, because accuracy
became lower approximately by a factor of 3.
6. References
[22] V. Lytvyn, S. Kubinska, A. Berko, T. Shestakevych, L. Demkiv, Y. Shcherbyna, Peculiarities of
Generation of Semantics of Natural Language Speech by Helping Unlimited and
ContextDependent Grammar, CEUR workshop proceedings, Vol-2604 (2020) 536-551.
[23] O. Bisikalo, Y. Ivanov, N. Karevina, Encoding of Natural Language Information on the Basis of
the Power Set. In: 2018 IEEE 13th International Scientific and Technical Conference on Computer
Sciences and Information Technologies, CSIT 2018 - Proceedings, 2018, 2, pp. 17–20.
[24] O. Bisikalo, I. Bogach, V. Sholota, The Method of Modelling the Mechanism of Random Access
Memory of System for Natural Language Processing. In: Proceedings - 15th International
Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer
Engineering, TCSET 2020, 2020, pp. 472–477.
[25] O.V. Bisikalo, O.M. Vasilevskyi, Evaluation of uncertainty in the measurement of sense of natural
language constructions, International Journal of Metrology and Quality Engineering, 2017, 8.
[26] S. Albota, Linguistically Manipulative, Disputable, Semantic Nature of the Community Reddit</p>
      <p>Feed Post. In: CEUR Workshop Proceedings, Vol-2870 (2021) 769-783.
[27] S. Albota, Semantic analysis of the Reddit vaccination news feed in the Reddit social network. In
Computer science and information technologies: proceedings of IEEE 16th International
conference CSIT, Lviv, Ukraine, 22–25 September, 2021, pp. 56–59.
[28] S. Albota, War Implications in the Reddit News Feed: Semantic Analysis. In Computer science
and information technologies: proceedings of IEEE 17th International conference CSIT, Lviv,
Ukraine, 10–12 November, 2022, pp. 99–102.
[29] I. Khomytska, V. Teslyuk, The Multifactor Method Applied for Authorship Attribution on the</p>
      <p>Phonological Level, CEUR workshop proceedings, Vol-2604 (2020) 189-198.
[30] I. Khomytska, V.Teslyuk, I. Bazylevych, Y. Kordiiaka, Machine Learning and Classical Methods</p>
      <p>Combined for Text Differentiation, CEUR Workshop Proceedings, Vol-3171 (2022) 1107-1116.
[31] I. Khomytska, V. Teslyuk, K. Prysyazhnyk, N. Hrytsiv, The Lehmann-Rosenblatt test applied for
determination of statistical parameters of Charles Dickens's authorial style. In Computer Science
and Information Technologies (CSIT): Proceedings of IEEE XVIth Scientific and Technical
Conference. Lviv, Ukraine, 22–25 Sept. 2021, Vol. 2, pp. 64–67.
[32] I. Khomytska, V. Teslyuk, I. Bazylevych, Yu. Kordiiaka, Machine learning and classical methods
combined for text differentiation, CEUR Workshop Proceedings, Vol-3171 (2022) 1107-1116.
[33] I. Khomytska, V. Teslyuk, I. Bazylevych, The statistical parameters of Ivan Franko’s authorial
style determined by the chi-square test. In Computer Science and Information Technologies
(CSIT): Proceedings of IEEE XVIIth Scientific and Technical Conference, Lviv, Ukraine, 10–12
November, 2022, pp. 73–76.
[34] V. Husak, O. Lozynska, I. Karpov, I. Peleshchak, S. Chyrun, A. Vysotskyi, Information System
for Recommendation List Formation of Clothes Style Image Selection According to User’s Needs
Based on NLP and Chatbots, CEUR workshop proceedings, Vol-2604 (2020)788-818.
[35] O. Veres, I. Rishnyak, H. Rishniak, Application of Methods of Machine Learning for the</p>
      <p>Recognition of Mathematical Expressions, CEUR Workshop Proceedings 2362 (2019) 378-389.
[36] Yu.M. Furgala, B.P. Rusyn, Peculiarities of melin transform application to symbol recognition. In:
14th International Conference on Advanced Trends in Radioelectronics, Telecommunications and
Computer Engineering, TCSET, 2018, pp. 251-254.
[37] E. Fedorov, O. Nechyporenko, Method for Recognizing Linguistic Constructions Based on</p>
      <p>Stochastic Neural Networks, CEUR Workshop Proceedings, Vol-3171 (2022) 104-115.
[38] S. Kubinska, R. Holoshchuk, S. Holoshchuk, L. Chyrun, Ukrainian Language Chatbot for
Sentiment Analysis and User Interests Recognition based on Data Mining, CEUR Workshop
Proceedings, Vol-3171 (2022) 315-327.
[39] K. Smelyakov, A. Chupryna, D. Darahan, S. Midina, Effectiveness of Modern Text Recognition</p>
      <p>Solutions and Tools for Common Data Sources, CEUR Workshop Proceedings 2870 (2021) 154.
[40] R. Martsyshyn, et al. Technology of speaker recognition of multimodal interfaces automated
systems under stress. In: International Conference: The Experience of Designing and Application
of CAD Systems in Microelectronics, CADSM, 2013, pp. 447-448.
[41] P. Zhezhnych, O. Markiv, Recognition of tourism documentation fragments from web-page posts.</p>
      <p>In: 14th International Conference on Advanced Trends in Radioelectronics, Telecommunications
and Computer Engineering, TCSET, 2018, pp. 948-951.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Nangia</surname>
            ,
            <given-names>S. R.</given-names>
          </string-name>
          <string-name>
            <surname>Bowman</surname>
          </string-name>
          ,
          <article-title>A broad-coverage challenge corpus for sentence understanding through inference</article-title>
          .
          <source>arXiv preprint arXiv:1704.05426</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Tatar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Serban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Mihis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <article-title>Textual entailment as a directional relation</article-title>
          .
          <source>Journal of Research and Practice in Information Technology</source>
          <volume>41</volume>
          (
          <issue>1</issue>
          ) (
          <year>2009</year>
          )
          <fpage>53</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , et al.
          <article-title>Attention is all you need</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , et al.
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>William</given-names>
            <surname>McDaniel</surname>
          </string-name>
          <string-name>
            <given-names>Albritton</given-names>
            , ICS 241:
            <surname>Discrete Mathematics</surname>
          </string-name>
          <string-name>
            <surname>II</surname>
          </string-name>
          , Waterloo,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Zdebskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lytvyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Burov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Rybchak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kravets</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lozynska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Holoshchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kubinska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dmytriv</surname>
          </string-name>
          ,
          <source>Intelligent System for Semantically Similar Sentences Identification and Generation Based on Machine Learning Methods, CEUR workshop proceedings</source>
          Vol-
          <volume>2604</volume>
          (
          <year>2020</year>
          )
          <fpage>317</fpage>
          -
          <lpage>346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Kalouli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Buis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Real</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmer</surname>
          </string-name>
          , V. De Paiva,
          <article-title>Explaining simple natural language inference</article-title>
          .
          <source>In Proceedings of the 13th Linguistic Annotation Workshop</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>132</fpage>
          -
          <lpage>143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poliak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sakaguchi</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Van Durme</surname>
          </string-name>
          ,
          <article-title>Uncertain natural language inference</article-title>
          .
          <source>arXiv preprint arXiv:1909.03042</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Poliak</surname>
          </string-name>
          ,
          <article-title>A survey on recognizing textual entailment as an NLP evaluation</article-title>
          . arXiv preprint arXiv:
          <year>2010</year>
          .03061,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>You</surname>
          </string-name>
          , et al.
          <article-title>Large batch optimization for deep learning: Training bert in 76 minutes</article-title>
          . arXiv preprint arXiv:
          <year>1904</year>
          .00962,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L. Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kasai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Probing across time: What does RoBERTa know and when?</article-title>
          .
          <source>arXiv preprint arXiv:2104.07885</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Khairova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shapovalova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mamyrbayev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sharonova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mukhsina</surname>
          </string-name>
          ,
          <article-title>Using BERT model to Identify Sentences Paraphrase in the News Corpus</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          , Vol-
          <volume>3171</volume>
          (
          <year>2022</year>
          )
          <fpage>38</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Zweigenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jacquemart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Grabar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Habert</surname>
          </string-name>
          ,
          <article-title>Building a text corpus for representing the variety of medical language</article-title>
          ,
          <source>Studies in Health Technology and Informatics</source>
          <volume>84</volume>
          (
          <year>2001</year>
          )
          <fpage>290</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Livinska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Makarevych</surname>
          </string-name>
          ,
          <article-title>Feasibility of Improving BERT for Linguistic Prediction on Ukrainian Сorpus</article-title>
          .
          <source>CEUR workshop proceedings</source>
          Vol-
          <volume>2604</volume>
          (
          <year>2020</year>
          )
          <fpage>552</fpage>
          -
          <lpage>561</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>V.</given-names>
            <surname>Lytvyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pukach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vovk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kholodna</surname>
          </string-name>
          ,
          <source>Identification and Correction of Grammatical Errors in Ukrainian Texts Based on Machine Learning Technology. Mathematic</source>
          <volume>11</volume>
          , (
          <year>2023</year>
          )
          <article-title>904</article-title>
          . https://doi.org/10.3390/math11040904
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sharonova</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kyrychenko</surname>
          </string-name>
          , I. Gruzdo, G. Tereshchenko,
          <source>Generalized Semantic Analysis Algorithm of Natural Language Texts for Various Functional Style Types, CEUR Workshop Proceedings</source>
          , Vol-
          <volume>3171</volume>
          (
          <year>2022</year>
          )
          <fpage>16</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>V.</given-names>
            <surname>Shynkarenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Demidovich</surname>
          </string-name>
          ,
          <source>Natural Language Texts Authorship Establishing Based on the Sentences Structure, CEUR Workshop Proceedings</source>
          , Vol-
          <volume>3171</volume>
          (
          <year>2022</year>
          )
          <fpage>328</fpage>
          -
          <lpage>337</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>O.</given-names>
            <surname>Kuropiatnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shynkarenko</surname>
          </string-name>
          ,
          <article-title>Automation of Template Formation to Identify the Structure of Natural Language Documents</article-title>
          .
          <source>In: CEUR Workshop Proceedings</source>
          , Vol-
          <volume>2870</volume>
          (
          <year>2021</year>
          )
          <fpage>179</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V.</given-names>
            <surname>Shynkarenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Demidovich</surname>
          </string-name>
          ,
          <article-title>Authorship Determination of Natural Language Texts by Several Classes of Indicators with Customizable Weights</article-title>
          .
          <source>In: CEUR Workshop Proceedings</source>
          , Vol-
          <volume>2870</volume>
          (
          <year>2021</year>
          )
          <fpage>832</fpage>
          -
          <lpage>844</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>O.</given-names>
            <surname>Kuropiatnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shynkarenko</surname>
          </string-name>
          ,
          <article-title>Text Borrowings Detection System for Natural Language Structured Digital Documents</article-title>
          ,
          <source>CEUR workshop proceedings</source>
          Vol-
          <volume>2604</volume>
          (
          <year>2020</year>
          )
          <fpage>294</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>