<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>COLINS-</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Olga Cherednichenko and Olga Kanishcheva</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University “Kharkiv Polytechnic Institute”</institution>
          ,
          <addr-line>2, Kyrpychova str., Kharkiv, 61002</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>5</volume>
      <fpage>22</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>In our work, we decided to demonstrate how to work different readability formulas on our Ukrainian-language corpus (UKRMED) of medical texts. UKRMED contains three types of texts in the medical domain divided by their complexity: “Complex texts”, “Moderate texts”, and “Simple texts”. This research aims to (1) demonstrate the use of the most commonly used readability formulas on written health information in Ukrainian, (2) compare and contrast these different formulas to various texts (simple, complex, and moderate), (3) research different medical text features which will be used for text simplification and classification medical texts and (4) prepare recommendations for using these formulas to the evaluation of readability medical texts in Ukrainian.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Text simplification</kwd>
        <kwd>readability formulas</kwd>
        <kwd>reading indexes</kwd>
        <kwd>medicine text corpus</kwd>
        <kwd>Ukrainian</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The perception of the text is very important when it comes to a special domain (for example,
medicine, military science, mathematics, etc.) or a text in a foreign language. The task of assessing
the text complexity is often set in a general meaning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It is necessary to simplify the text in order to
people whose education level is insufficient (for example, children) or who are not native speakers
can perceive such text easier. However, in ordinary life, we often have to deal with texts that are hard
to perceive even by educated native speakers. For example, texts on medical topics, such as official
medical protocols, drug descriptions, medical records, etc. The situation is aggravated by a huge
increase in information on the Internet. Internet users look for information on a medical topic and
often read low-quality but clearly written texts. Blogs, forums, posts on social networks are becoming
a source of information for many people. People do not read official medical literature because of the
difficulty in perceiving such texts. Accordingly, we began our study of the complexity of medical
texts in Ukrainian in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        A text corpus is an important resource for learning a language. In our research, we are faced with a
lack of text resources in the Ukrainian language. This is especially important for learning the language
of special domains, such as medicine. This is the reason for the formation of our UKRainian
MEDicine text corpus – UKRMED, which is described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. UKRMED was formed specifically to
study the complexity of the perception of medical texts in Ukrainian. In work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we suggested that
all texts can be divided into three categories: simple, moderate, and complex. This is due to the
different perceptions of these texts. The purpose of this study is to evaluate the complexity level of
texts from our UKRMED corpus using various readability metrics. This will allow us to evaluate the
hypothesis that the texts in our corpus represent three groups of perception complexity.
      </p>
      <p>
        An important feature of studying medical texts is that the simplification or explanation of such
texts for ordinary people will help them to properly prepare for examination or a visit to a doctor,
properly organize taking medicine, and consult a specialist in case of important symptoms. Based on
our previous studies [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], we can highlight that in Ukrainian medical texts there are a lot of
borrowed words, Latin terms, and special collocations. This gives reasons to study the features of
Ukrainian texts in the medical domain. Figure 1 shows the basic elements for readability. In our work,
we focused only on the analysis of the medical text style, namely the sentence analysis and medical
lexis.
      </p>
      <p>This research sight to (1) how the most commonly used readability formulas work with the
Ukrainian texts in the medical domain, (2) compare and contrast these different formulas to various
texts (blogs, protocols, and wiki texts), (3) research different medical text features which will be used
for text simplification and classification medical texts and (4) prepare recommendations for using
these formulas to the evaluation of readability medical texts in Ukrainian.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Readability formulas</title>
      <p>
        Various readability indices are used to measure text complexity [
        <xref ref-type="bibr" rid="ref1 ref4 ref5">1, 4, 5</xref>
        ]. The analysis shows that
the use of readability ratings allows us to assess the relevance of the text to a specific target group, to
characterize the age of readers, as well as the attitude of non-native speakers to this text. When the
text is too complicated or difficult to read, messages may not be understood. On the other hand, when
the text is too simple, your audience may feel boring. In any case, the readability of the text affects the
degree of interaction and perception of the message.
      </p>
      <p>
        Therefore study the complexity of texts is important. Many researchers look deeply at the issue [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4,
5, 6, 7</xref>
        ]. The task of text simplification is quite wide. Paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is focused on text simplification for
congenitally deaf people. Authors [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] study complex-simple sentence pairs from the Newsela corpus.
Newsela is the largest collection of professionally written simplifications for tasks of text
simplification [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The complexity of the perception of questionnaires is studied by the authors [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
In their work, they use multidimensional analytical methods.
      </p>
      <p>
        Consumer health informatics is a field that provides health information to improve healthcare
decision-making [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Such works as [
        <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 13, 14</xref>
        ] are devoted to the evaluation of the readability of
texts on medical topics.
      </p>
      <p>
        A special place among the researches related to solving the problem of simplifying text is occupied
by studies of readability metrics [
        <xref ref-type="bibr" rid="ref15 ref16 ref17">15, 16, 17</xref>
        ]. We can notice that some authors pay attention to the
readability issues of medical texts [
        <xref ref-type="bibr" rid="ref12 ref14">12, 14, 18</xref>
        ]. In the study [19] a method for assessing the difficulty
of words has adapted to make it more suitable to medical Swedish. In the paper [20] is underlined that
poor health literacy is known to impact negatively on medical outcomes. They assess the readability
of online ophthalmic literature by applying validated readability formulas: Flesch Reading Ease
Score, Simple Measure of Gobbledygook, and Flesch-Kincaid Grade Level [20].
      </p>
      <p>There are many formulas that measure the readability of text. Any readability formula represents
the method of measuring or predicting the difficulty level of text. Following the deep literature
analysis, we can highlight the most popular readability formulas.</p>
      <p>It is a well-known issue that readability formulas are used for the evaluation of written
information. However, we underline that evaluation under the readability formula results varies
considerably due to the language or domain area features. These variations caused uncertainty of
interpretations of reading grade level estimates. Next, we will consider the most commonly used
readability evaluation methods (https://readable.com/features/readability-formulas/).</p>
      <p>Let us consider the set of readability formula chosen for our text corpus evaluation.</p>
      <p>
        The Flesch Reading Ease index [
        <xref ref-type="bibr" rid="ref17 ref4">4, 17</xref>
        ]. It is computed based on the average number of syllables
per word and the average number of words per sentence (1). Nowadays, this Flesch test is one of the
most widely used, most tested, and reliable readability formulas [21].
      </p>
      <p>Flesch-Kincaid Grade Level [21]. It computes readability based on the average number of
syllables per word and the average number of words per sentence (2). The score indicates a
gradeschool level. The higher the reading score, the easier a piece of text is to read.
(1)
(2)
(3)
(4)
(5)
(6)</p>
      <p>) + 11.8 ∗ (
) − 15.59.</p>
      <p>
        Gunning's Fog Index [
        <xref ref-type="bibr" rid="ref4">4, 21</xref>
        ]. It is a weighted average of the number of words per sentence, and
the number of long words per word.
      </p>
      <p>
        The Coleman–Liau Readability Formula (Coleman–Liau index) [
        <xref ref-type="bibr" rid="ref4">4, 21</xref>
        ]. This index is
calculated with the following formula:
      </p>
      <p>= 0.0588 − 0.296 − 15.8.</p>
      <p>L is the average number of letters per 100 words.  is the average number of sentences per 100
words.</p>
      <p>
        Dale–Chall readability formula [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] based on the following equation:


= 0.1579 ∗ (
) + 0.0496 ∗ 
where Raw Score – reading grade of a reader who can comprehend your text at 3rd grade or below.
PDW is a percentage of difficult words and ASL – average sentence length in words.
      </p>
      <p>
        The FORCAST readability formula [
        <xref ref-type="bibr" rid="ref4">4, 21</xref>
        ]. The formula is:
      </p>
      <p>= 20 − ( / 10),
where N – number of single-syllable words in a 150-word sample.</p>
      <p>
        The Automated Readability Index (ARI) [
        <xref ref-type="bibr" rid="ref4">4, 21</xref>
        ]. This index is calculated as
where characters are the number of letters and numbers.
      </p>
      <p>4.71 ∗ (
 ℎ

) + 0.5 ∗ (
) − 21.43,</p>
      <p>
        We create the corpus UKRMED, the UKRainian MEDicine text corpus, with a focus on three
categories of medical writing information related to their complexity [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Therefore, using formulas
presented above, and considering their applicability to medical texts, we intend to evaluate the texts
from our corpus UKRMED and confirm our assumptions about the different complexity of the
collected texts.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments with readability formulas on our corpus</title>
    </sec>
    <sec id="sec-4">
      <title>3.1. Data description</title>
      <p>The common requirement of the corpus is providing data for language issues study. The
information about our data is given in Table 1.</p>
      <p>
        UKRMED is created to study medical text simplification and for experiments with readability
metrics. In our previous works [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], we calculate some featured indices for our text corpus. We try
to collect texts under the balance, i.e. text length in tokens is 363,539 for Simple texts, 320,209 for
Moderate texts, and 329,837 for Complex texts that are quite similar.
      </p>
      <p>
        As a result of our experiments, we calculated statistical features on the lexical, syntactic, and
paragraph levels. Also, we received parts of speech categorization for our three categories and
analyzed them. More information about UKRMED is presented in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] work.
3.2.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Analysis different features of UKRMED corpus</title>
      <p>
        Based on the analysis of variety publications to determine the complexity of the text [
        <xref ref-type="bibr" rid="ref1 ref4">1, 4, 18, 21</xref>
        ],
we have identified a number of properties, the values of which we calculated for our corpus of
medical texts. These properties are presented in Table 2. All properties are divided into several
categories: phonological, morphological, syntactic, and inter-sentential features.
Connectors, such as and, therefore, and hence, indicate long and elaborate sentences as well as
an advanced structure of the text (#connectors)
Argumentative discourse connectors are a subset of discourse connectors that indicate a higher
level of reasoning and argumentation (#argumentative_connectors)
Connectors sentences feature - (#connectors/ #sentances)
      </p>
      <p>Argumentative connectors sentences - (#argumentative_connectors/ #sentances)</p>
      <p>
        Indicators (features) that are shown in the Table 2 were detailed described in the work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], but
below we give a brief description of some of them. The morphological diversity is calculated as
      </p>
      <p>The verb sophistication measure (VSM) estimates the number of sophisticated verbs in
relation to the total number of verbs
as</p>
      <p>Lexical sophistication reflects percentage of sophisticated or advanced words in a text. There
are different definitions of sophisticated vocabulary. We consider that the word is sophisticated in
case its frequency rank is over 3000.</p>
      <p>Guiraud's corrected TTR (GTTR) is calculated as</p>
      <p>Carroll’s lexical diversity measure or Caroll’s corrected type-token ratio (CTTR) is calculated
The D measure is based on the predicted decrease of the TTR according to the size of the text.
The Measure of Textual Lexical Diversity (MTLD) evaluates the lexical diversity in another way.
MTLD is designed to reduce the effect of the text length. MTLD is calculated as the mean length
of strings in a text that has a given TTR value.</p>
      <p>For all these features from Table 2 we received values for each text per category of our dataset
(Table 3). In Table 3 we stayed only that features that differ depending on the category of texts. The
rest of the meanings are very close and did not change in any way depending on the category of texts.
For example, the morphological diversity for per text category is showed on Figure 2.</p>
      <p>The values from Table 3 are showed that the most important features are lexical sophistication,
mean length of the sentence, argumentative discourse connectors, measure of textual lexical
diversity and argumentative connectors sentences. Other indicators are differ, but not so much.
Therefore, these are features could be used for medical text classification or text simplification.




.</p>
      <p>.</p>
      <p>ℎ</p>
      <p>.
= 
√</p>
    </sec>
    <sec id="sec-6">
      <title>Experiments with readability formulas for medicine domain</title>
      <p>In this section, we tried to analyze the received results and interpreted theirs for our data and
domain. Firstly, it should be noted that we formed our corpus in a certain way, breaking it into three
categories (genres). We suggested that texts from the category "Simple texts" will be easy to
understand, as they are taken from blogs, forums, etc. and written in a lively and simple language for
most readers. Texts from the category "Complex texts" will be the most difficult, since they represent
clinical protocols, medical scientific articles, etc., but texts from the category "Moderate texts" will be
somewhere between simple and complex, since there are also Wikipedia articles simple enough, but
sometimes complex. It depends on the article author. It should also be noted that none of the formulas
was adapted for the Ukrainian language.</p>
      <p>We calculated Gunning Fog Index, Flesch-Kinсaid Grade Level, Coleman–Liau index, Dale–Chall
readability formula, the FORCAST readability formula and the Automated Readability Index (ARI)
for all text categories of our corpus. All values are given in Table 4.</p>
      <p>Consider the Flesch-Kinсaid Grade Level. We have obtained very low values and with a minus.
It’s mean that texts are very difficult for the majority of people.</p>
      <p>As a confirmation of our results, we used the site LeStCor (http://www.lestcor.org/). This resource
was created for the calculation of different readability indices for the Russian language. Because
Ukrainian and Russian are kindred languages, we used this resource for our experiments too. We
received the following message “Very difficult to read. Best understood by university graduates”. So,
all texts from all categories are very difficult. However, our hypothesis was confirmed, because we
received the lowest value for “Simple texts” category, and the highest value for the “Complex texts”.</p>
      <p>The Gunning Fog Index, Coleman–Liau index, Dale–Chall readability formula and ARI have the
same trend. Only The FORCAST readability formula doesn't feet the common tendency and has the
highest value for the “Moderate texts”.
3.4.</p>
    </sec>
    <sec id="sec-7">
      <title>Analysis of difficult lexica in the corpus</title>
      <p>After we received the results of experiments on our corpus using readability formulas, we decided
to mark in our corpus the elements that cause the reader the greatest difficulty in perceiving and
understanding the text. We asked volunteers (master students) to labeling words, phrases and
sentences in the texts for understanding.</p>
      <p>We have not yet managed to process all the texts in our corpus, but for the first experiments, we
received 140 texts – moderate category, 143 texts – simple, and 148 texts – moderate. As an analysis
results of the marked elements in these documents, we received the following information, is
presented in Table 5.</p>
      <p>After we removed the duplicate words and phrases, the number of words decreased, but still there
are quite a lot of them (Table 6).</p>
      <p>A detail analysis of Tables 5 and 6, you can see that the category of texts "Simple texts" is really
the easiest to understand, it has the least complex words, phrases and sentences. The most difficult
category of texts is "Complex texts", but after reducing duplicates, the category "Moderate texts" is
closer to "Complex texts". The result of the labeling showed that our assumptions that we used when
forming this corpus were confirmed. The gradation of the text categories is correct.</p>
      <p>In this work, we decided to focus not on all the complex elements that were involved in the
markup, but only on words. Since, firstly, they prevail in all categories and cause more difficulty for
the reader, and secondly, they are then actively used to simplify the text.</p>
      <p>Here are examples of complex words that were highlighted during the labeling (Table 7).</p>
      <p>Table 7 shows Top-10 difficult words that caused the reader to understand difficulties. When
analyzing all the words, we identified three categories of words that are complex: 1 – abbreviations,
2 – medical special terms, 3 – noise words, words that were mistakenly and are commonly used
words.</p>
      <p>If we consider these words from the point of view of their further use in the process of simplifying
the text, then both abbreviations and medical special terms can be explained using available
definitions, external dictionaries of medical vocabulary, and other linguistic resources.</p>
      <p>We decided to see if complex words are found in phrases and sentences. Perhaps it is they that
cause the reader's difficulty in perceiving of the phrase or a whole sentence. To do this, we checked
how often complex words occur in a particular category in phrases and sentences.</p>
      <p>Table 8 showed that complex words are found in phrases more often than in sentences, but at the
same time, in ratio to the total number of phrases and sentences for each category of texts, this is a
fairly small percentage. The number of compound words that were found in phrases and sentences to
the total number of compound words is no more than 0.06%. Therefore, we can conclude that
individual complex words (abbreviations, special medical terms, etc.) do not have a large impact on
the complexity of the phrase and sentence.</p>
      <p>Consider an example of a sentence that is difficult to understand:
«Зазвичай уражаються метастазами последовательнокаждая група, але нерідко бувають
винятки і метастази можуть бути знайдені в проміжній або базальної групі, а
епіпараколіческіе лімфатіческіеузли залишаються інтактнимі.По топографії
лімфометастазов раку слепойі висхідної ободової кишки для радикального видалення зон
регіонарного метастазірованіянеобходіма правобічна геміколектомія з резекцією…».</p>
      <p>("Usually affected by metastases sequentially each group, but often there are exceptions and
metastases can be found in the intermediate or basal group, and epiparakolicheskie lymph nodes
remain intact. According to the topography of lymphometastases of cancer of the cecum and
ascending colon for radical removal of areas of regional metastasis, a right-sided hemicolectomy
with resection is required.")</p>
      <p>In this sentence, the complex word «резекцією» («resection») was found, but if we look at the
whole sentence, we understand that it, in principle, contains many other compound words, long
enough and difficult for a person who is not a specialist in medicine. And we don't have these other
difficult words in the list of difficult words. Perhaps this is due to the fact that the evaluators did not
correctly mark up the texts, or perhaps because this sentence is so incomprehensible and complex that
the evaluators decided to place it in complex sentences, but not to single out individual complex
words in it.</p>
    </sec>
    <sec id="sec-8">
      <title>4. Discussion and future work</title>
      <p>Linguistically complex tasks, such as the medical text understanding, are the most challenging
because they require linguistic intuition. This task is a rather complicated and depends on many
factors such as language, subject area, etc. For the Ukrainian language, this direction is only
beginning to develop, and therefore there are no large results in this area. Our experiments showed
that the use of readability formulas could help us in this task, and we must look for other methods to
test the complexity of the medical text.</p>
      <p>Quality of text corpora is the key to obtaining good results. The problem is that there is a lack of
texts to form corps in specific areas, such as medicine. The Ukrainian language is also in the early
stages of research. Our corpus UKRMED still has many shortcomings, but it is the first step towards
solving the problem of simplifying the Ukrainian medical text.</p>
      <p>We collected the texts under the assumption that three categories of text complexity can be
distinguished. The main idea of our research is the simplification of the medical text depends on the
complexity of this text and the stakeholder, who studies this text. Therefore, we evaluated all texts by
readability metrics.</p>
      <p>In our work, we analyzed the most commonly used readability formulas in health care literature.
Readability estimates using readability formulas were compared for different genres in the medicine
domain. We apply our own perception and attitude of medical texts to divide them into three
categories. Readability formulas demonstrated sometimes very similar results, but sometimes not.
However, all texts are very difficult for understanding in general meaning.</p>
      <p>In future, we plan to mark our corpus as follows. Each document will have three types of markup,
the first type will mark out complex lexis (medical terms), the second – sentences that are difficult to
understand, and the third type will contain a text complexity label (easy, intermediate, and d ifficult).
Such markup will allow a qualitative classification of our texts, as well as preparatory work to
identify the complex elements of a medical text for its further simplification.</p>
    </sec>
    <sec id="sec-9">
      <title>5. Acknowledgements</title>
      <p>We would like to thank for the help in preparing the UKRMED corpus of master students of the
National Technical University “KhPI”.</p>
    </sec>
    <sec id="sec-10">
      <title>6. References</title>
      <p>[18] G. Leroy, S. Helmreich, J. R. Cowie, The influence of text characteristics on perceived and
actual difficulty of health information. International Journal of Medical Informatics 79(6) (2010)
438–449. doi:10.1016/j.ijmedinf.2010.02.002.
[19] E. Abrahamsson, T. Forni, M. Skeppstedt, M. Kvist, Medical text simplification using synonym
replacement: Adapting assessment of word difficulty to a compounding language, Association
for Computational Linguistics (ACL), 2015, pp. 57–65. doi:10.3115/v1/w14-1207.
[20] M. R. Edmunds, R. J. Barry, A. K. Denniston, Readability assessment of online ophthalmic
patient information. JAMA Ophthalmology 131(12) (2013) 1610–1616.
doi:10.1001/jamaophthalmol.2013.5521.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vajjala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Meurers</surname>
          </string-name>
          ,
          <article-title>Readability assessment for text simplification: From analysing documents to identifying sentential simplifications</article-title>
          , ITL - International
          <source>Journal of Applied Linguistics</source>
          <volume>165</volume>
          (
          <issue>2</issue>
          ) (
          <year>2014</year>
          )
          <fpage>194</fpage>
          -
          <lpage>222</lpage>
          . doi:
          <volume>10</volume>
          .1075/itl.165.2.04vaj.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Cherednichenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kanishcheva</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Babkova, Complex term identification for Ukrainian medical texts</article-title>
          ,
          <source>Proceedings of the 1st International Workshop on Informatics &amp; Data-Driven Medicine (IDDM 2018)</source>
          , Vol.
          <volume>2255</volume>
          ,
          <year>2018</year>
          , pp.
          <fpage>146</fpage>
          -
          <lpage>154</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Cherednichenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kanishcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yakovleva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Arkatov</surname>
          </string-name>
          ,
          <article-title>Collection and Processing of a Medical Corpus in Ukrainian</article-title>
          .
          <source>Proceedings of the 4 Int. Conf. On Computational Linguistics and Intelligent Systems (COLINS)</source>
          , volume I:
          <article-title>Main Conference CEUR-WS</article-title>
          . Vol.
          <volume>2604</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>272</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. Z.</given-names>
            <surname>Kurdi</surname>
          </string-name>
          ,
          <article-title>Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL</article-title>
          ,
          <source>Journal of Data Mining and Digital Humanities</source>
          <year>2020</year>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          ,
          <source>Automatic Text Simplification. Synthesis Lectures on Human Language Technologies</source>
          .
          <volume>10</volume>
          (
          <issue>1</issue>
          ) 2017
          <fpage>1</fpage>
          -
          <lpage>137</lpage>
          . doi:
          <volume>10</volume>
          .2200/S00700ED1V01Y201602HLT032.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Scarton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Paetzold</surname>
          </string-name>
          , L. Specia,
          <article-title>Text simplification from professionally produced corpora</article-title>
          .
          <source>Proceedings of the LREC 2018 - 11th International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3504</fpage>
          -
          <lpage>3510</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferrés</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marimon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          , A. AbuRa'ed,
          <source>YATS: Yet another text simplifier, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)</source>
          , Springer Verlag, Vol.
          <volume>9612</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>335</fpage>
          -
          <lpage>342</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -41754-7_
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Inui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fujita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Takahashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Iida</surname>
          </string-name>
          , T. Iwakura,
          <article-title>Text simplification for reading assistance</article-title>
          .
          <source>Association for Computational Linguistics (ACL)</source>
          ,
          <year>2003</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .3115/1118984.1118986.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Scarton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Paetzold</surname>
          </string-name>
          , L. Specia,
          <article-title>Text simplification from professionally produced corpora</article-title>
          .
          <source>Proceedings of the LREC 2018 - 11th International Conference on Language Resources and Evaluation European Language Resources Association (ELRA)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3504</fpage>
          -
          <lpage>3510</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Peter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Whelan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Pfund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Meyers</surname>
          </string-name>
          ,
          <article-title>A text comprehension approach to questionnaire readability: An example using gambling disorder measures</article-title>
          .
          <source>Psychological Assessment</source>
          <volume>30</volume>
          (
          <issue>12</issue>
          ) (
          <year>2018</year>
          )
          <fpage>1567</fpage>
          -
          <lpage>1580</lpage>
          . doi:
          <volume>10</volume>
          .1037/pas0000610.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Flaherty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hoffman-Goetz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Arocha</surname>
          </string-name>
          ,
          <article-title>What is consumer health informatics? A systematic review of published definitions</article-title>
          .
          <source>Informatics for Health and Social Care</source>
          <volume>40</volume>
          (
          <issue>2</issue>
          ) (
          <year>2015</year>
          )
          <fpage>91</fpage>
          -
          <lpage>112</lpage>
          . doi:
          <volume>10</volume>
          .3109/17538157.
          <year>2014</year>
          .
          <volume>907804</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alotaibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alyahya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Al-Khalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Alageel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abanmy</surname>
          </string-name>
          ,
          <article-title>Readability of Arabic Medicine Information Leaflets: A Machine Learning Approach</article-title>
          . In Procedia Computer Science, Elsevier
          <string-name>
            <surname>B.V.</surname>
          </string-name>
          , Vol.
          <volume>82</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>122</fpage>
          -
          <lpage>126</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.procs.
          <year>2016</year>
          .
          <volume>04</volume>
          .017.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Leroy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kauchak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajanarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Y.</given-names>
            <surname>Romero Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , S. Colina,
          <article-title>NegAIT: A new parser for medical text simplification using morphological, sentential and double negation</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>69</volume>
          (
          <year>2017</year>
          )
          <fpage>55</fpage>
          -
          <lpage>62</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.jbi.
          <year>2017</year>
          .
          <volume>03</volume>
          .014.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kauchak</surname>
          </string-name>
          , G. Leroy,
          <article-title>Moving beyond readability metrics for health-related text simplification</article-title>
          .
          <source>IT Professional 18(3)</source>
          (
          <year>2016</year>
          )
          <fpage>45</fpage>
          -
          <lpage>51</lpage>
          . doi:
          <volume>10</volume>
          .1109/MITP.
          <year>2016</year>
          .
          <volume>50</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Crossley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McNamara</surname>
          </string-name>
          ,
          <article-title>Text readability and intuitive simplification: A comparison of readability formulas. Reading in a foreign language 23(1) (</article-title>
          <year>2011</year>
          )
          <fpage>84</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Štajner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Orăsan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mitkov</surname>
          </string-name>
          ,
          <source>What Can Readability Measures Really Tell Us About Text Complexity? Workshop on Natural Language Processing for Improving Textual Accessibility (NLP4ITA)</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gwon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Kung</surname>
          </string-name>
          ,
          <article-title>Language modeling by clustering with word embeddings for text readability assessment</article-title>
          .
          <source>Proceedings of the International Conference on Information and Knowledge Management, Association for Computing Machinery</source>
          , Vol.
          <source>Part F131841</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2003</fpage>
          -
          <lpage>2006</lpage>
          . doi:
          <volume>10</volume>
          .1145/3132847.3133104.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>