<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dataset for Multi-Document Automatic Summarization of News Articles and Forum Threads</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Volodymyr Taranukha</string-name>
          <email>taranukha@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tetiana Horokhova</string-name>
          <email>t.horokhova@kubg.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yaroslav Linder</string-name>
          <email>yaroslav.linder@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Borys Grinchenko Kyiv University</institution>
          ,
          <addr-line>Bulvarno-Kudriavska st., 18/2, Kyiv, 04053</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>Volodymirska st., 64, Kyiv, 01033</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>15</fpage>
      <lpage>24</lpage>
      <abstract>
        <p>The problem of semi-automatic dataset creation for multi-document summarization and forum threads summarization is analyzed. Aspects specific to Slavic languages are underlined. Dedicated algorithms for this purpose were designed and tested. Due to not smooth nature of the optimization problem genetic algorithms were suggested. Some new and interesting results are received. dataset creation Automatic summarization, multi-document summarization, forum thread summarization, Multiple document summarization.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The extreme proliferation of modern electronics (first and foremost - mobile phones) made
electronic data sources widely available to all kinds of users. In response to this tendency multiple
organizations, newspapers, forums jumped the chance to provide their own point of view, spin
narrative, push advertisement, etc. This extended and enhanced data flow often takes the form of text
with some images and there are too much data in the usual data flow aimed at a single person.</p>
      <p>
        In this research automatic summarization is suggested as a tool to solve the problem of locating
and distilling information. Among areas of application for automatic summarization two stands as
more problematic:
high
expense
human-written summaries, the
generation
of large-scale
multi-document
summarizing datasets for training has been hampered. There was an attempt [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to train abstractive
sequence-to-sequence models with citations and search engine results as input documents on a huge
corpus of Wikipedia text. As far as efficiency goes there is a notable loss of quality in these results
compared to results achieved in single-document summarization. So, a dedicated dataset to train
multi-document summarization is sorely required.
      </p>
      <p>The</p>
      <p>WWW</p>
      <p>discussion forums come in a variety of flavors, each with its own topic and
community. User-generated content on web forums is an excellent source of information. In the case</p>
      <p>
        2022 Copyright for this paper by its authors.
of question-and-answer sites like Quora the opening post is a question and the responses are answers
to that question. The best answer in these forums may be chosen by the forum community via voting.
On the other hand, there is no such thing as “the best answer” in discussion forums where people
share their thoughts and experiences. Furthermore, discussion threads on a single topic might easily
contain dozens or hundreds of individual posts, making it difficult to identify the important
information in the thread, especially when using a mobile device to visit the forum. In this research,
extractive summarization [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is proposed to extract salient units of text from a source and then
concatenate them to generate a shorter version of the discussion. Sentences are commonly utilized as
summarizing units in most summarization assignments but for this assignment, it is expected that
posts will be better suitable as basic units for summaries of discussion threads.
      </p>
      <p>While there are many differences between both tasks (multi-document summarization and forum
summarization) there are also some similarities due to the enclosed nature of articles in data sources
(or posts in threads) so there is an option to exploit said similarities on top of problem-specific
features. This paper describes useful elements to build the required dataset. The main point of
research is to provide tools for the semi-automatic preliminary summary generation that will help to
create summaries that are required for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        To assess summarizing systems, man-made reference summaries are often utilized. TIPSTER Text
Summarization Evaluation Conference [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], NIST Document Understanding Conference [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and NIST
Text Analytics Conference [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] all employed benchmarks based on reference summaries.
      </p>
      <p>
        A reference summary is a subset of text units picked from the source document for extractive
summarizing, which is a researcher in this study. Depending on the task, these units might be
sentences [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], utterances [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], or forum posts [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The first and last approaches are examined in this
work. Summarization is a very subjective task: the substance of the summary varies, as does the
length of the summary produced. Experts who write summaries frequently disagree on what
information should be included in the summary.
      </p>
      <p>To address this problem, the DUC 2005 assessment approach was established to account for
diversity in human-generated reference summaries. As a result, for each of the 50 themes, at least four
distinct summaries were developed. Each issue in the NIST TAC Guided Summarization Task
received up to four alternative answers.</p>
      <p>
        When it comes to establishing a reference, specialists writing abstractive summaries are typically
asked to create a summary of a certain length for a specific document or document collection. As a
result, a corpus of reference summaries usually is produced for abstractive discussion thread
summarizing [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. While abstractive summaries are not the greatest option when it comes to
evaluation of extractive summaries, they can be used in conjunction with variation of ROUGE [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
metrics to make them helpful. There is ROUGE 2.0 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] particularly designed with ability to process
synonym substitution which is often used by humans in abstractive summarization tasks.
      </p>
      <p>
        The key feature is the agreement between human experts on the content of an extractive summary.
It can be measured using the percentage of common decisions and the proportions of selected and
non-selected units by the experts. The agreement is then calculated in terms of effect size ( number
measuring the strength of the relationship between two variables in a population). Useful measure is
Fleiss’ κ [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
κ =
Pr( ) − Pr⁡( )
      </p>
      <p>1 − Pr⁡( )
where Pr( ) is the measured agreement (the percentage of common decisions) and Pr( ) is expected
agreement based on the proportion of selected and non-selected units by the experts.</p>
      <p>A negative κ indicates structural disagreement. If κ = 0 then there is no agreement between the
experts (observed agreement is as good as random). Positive κ up to 0.2 indicates slight agreement, if
(1)
0.2 &lt; κ &lt; 0.4 it’s fair agreement, if 0.4 ≤ κ &lt; 0.6 it’s moderate agreement, if 0.6 ≤ κ &lt; 0.8 it’s
substantial agreement and if 0.8 ≤ κ ≤ 1 it’s strong agreement.</p>
      <p>
        For the purpose of this research extensive search was performed but there was not a single paper
found with the agreement being higher than “moderate”. Among available papers, the highest scores
go to single document news articles summaries (“moderate”) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Multi-document summarization is
expectedly worse. There was no research on conversation transcripts and such but it’s expected to
have even worse marks there. For now, the most direct way to resolve this issue is to use voting based
on the number of experts in favor of a certain segment. Also, there was found little to no good data on
summarization of forum threads. All abovementioned problems are exacerbated when it comes to
Slavic languages: Ukrainian and Russian languages in particular.
      </p>
      <p>
        Recent neural network based research almost totally superseded the non-neural approach. There
are papers on extractive and abstractive approach [
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref19">16-19</xref>
        ] on this matter. Though it’s crucial to point
out that a machine-learning based approach works well if and only if there is a sufficiently large and
sufficiently comprehensive dataset. More so, some modern approaches, such as T5 [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and GPT-3
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] will not work without very significant investments from third parties.
      </p>
      <p>
        So, for the purpose of this research non-neural approaches were chosen especially for extractive
multi-document summarization. There were some papers on this topic both for extractive and
abstractive approaches such as [
        <xref ref-type="bibr" rid="ref22 ref23 ref24">22-24</xref>
        ] albeit most of them were significantly outdated.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Initial analysis</title>
      <p>Since there is no good dataset to latch to the problem was reformulated in a roundabout manner:
how one can develop a feature-based method that will produce good semi-finished summaries to
reduce the future workload on expert(s)?</p>
      <p>Subsequently, this question was divided into several other questions and a number of assumptions.
Basic assumptions are the following:
1. A thread is a sequence of small documents forming a discussion, with each user having a
different point of view on the topic of discussion.
2. News flow is a set of documents representing different points of view corresponding to
different editorial policies of news agencies. Often such a set can form a discussion if the topic
stays the same during a notable period.</p>
      <p>According to the assumptions research questions were formulated:
1. Do basic assumptions stand?
2. What is the best form to represent a preliminary multi-document news summary?
3. What is the best form to represent a preliminary thread summary?
4. What is the best length of a preliminary multi-document news summary?
5. What is the best length of a preliminary thread summary?
6. What are the characteristics of articles that are selected by humans to be included in the
preliminary multi-document news summary?
7. What are the characteristics of the posts that are selected by humans to be included in the
preliminary thread summary?
8. What are the major qualities important for humans in a preliminary multi-document news
summary?
9. What are the major qualities important for humans in a preliminary thread summary?
To give the answer to these questions small polling was carried out among students of Taras
Shevchenko National University of Kyiv. The answers were the following:
1. Yes, basic assumptions look relevant and logical.
2. For the multi-document news summary, the best is to sort articles by relative relevance to the
topic and reduce each subsequent article removing irrelevant and repetitive parts.
3. For thread summary, the best way is to include whole posts if the thread is short enough. For
long threads, it’s useful to define the topic by first post and reduce the content of each subsequent
post by removing irrelevant and repetitive parts.
4. The best length of preliminary multi-document news summary corresponds to a single screen.
It was noted that it’s useful to include hyperlinks to detailed articles.
5. The best length of a preliminary thread summary is from 5 to 7 posts.
6. The most important characteristic for an article to be selected into a preliminary
multidocument news summary is relevance.
7. The most important characteristic for a post to be selected into a preliminary thread summary
is relevance.
8. The major quality for a preliminary multi-document news summary is representativeness.
9. The major qualities for thread summary are representativeness and readability.</p>
      <p>The answers point to the notable difference in source structure: it’s much easier to obtain a
readable summary from the set of articles as long as one of them serves as a basis. For thread
summarization, it’s much harder to get consistent (i.e. readable) summary.</p>
      <p>These answers combined with the lack of notable dataset (especially in Ukrainian language) forced
to develop a sequence of methods based on some a priori heuristics derived from a period predating
the current resurgence of neural networks. From two key qualities readability looks like the one that is
harder to achieve. So, readability was analyzed further.</p>
      <p>A text differs from a set of grammatically correct sentences by a certain number of connections
between the sentences that are part of it. These connections are of a different nature.</p>
      <p>It is necessary to specify what types of connectivity are observed in the text:
 structural coherence;
 logical coherence;
 semantic closeness.</p>
      <p>The structural coherence between the elements is most often formed by using special words or
grammatical forms to connect the elements of the text. Structural coherence is defined by the
following means:
 anaphora;
 elliptical structures;
 repetition of structural elements of the text, namely phrases and words;
 usage of conjunctions.</p>
      <p>The logical coherence of the text is ensured at the level of interpretation, although it has certain
syntactical features. For example, the words «якщо»(“if”), «то»(“then”), «інакше» (“otherwise” or
“else”) create logical connections in the text that are clearly different from structural connections as
they were defined above. As for the task while having incomplete tools for text interpretation most of
emphasis is made on structural coherence. There are several reasons for this.</p>
      <p>First, logical coherence is based on connections such as “explanation”, “cause”, “consequence”,
and so on. The main problem lies in the fact that not all these connections and not in all cases have
clear markers and the form of appropriate vocabulary or grammatical structures.</p>
      <p>Second, logical coherence is formed by interpretation and is therefore subjective. Thus, it should
be considered for each reader separately, and for this reason it is not invariant.</p>
      <p>Third, logical coherence is formed through interpretation and therefore requires a full
understanding of the text. Unfortunately, this requirement is beyond capabilities of any modern
machine learning model. Thus, the algorithm is mainly based on the analysis of structural coherence,
especially since it often also allows finding logical coherence, as logical coherence is often
accompanied by structural coherence.</p>
      <p>Semantic closeness is a special type of coherence and will be discussed below.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Algorithms to create dataset</title>
      <p>
        There were two complex algorithms tested for the purpose of creation of preliminary summaries.
The first was developed in IRTC IT &amp; S in department 165 during “Pattern computer” research
program [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Only some features and usage of the first algorithm will be described in this paper. The
second algorithm was hand tailored using relatively modern instruments and is described below.
4.1.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Initial processing</title>
      <sec id="sec-5-1">
        <title>There is a pipeline established for the purpose of initial processing:</title>
        <p> subsystem of morphological analysis;
 subsystem of partial parsing;
 subsystem for simplified anaphora resolution.</p>
        <p>
          Big Ukrainian dictionary (over 100,000 words) was used for morphological analysis. For the
outof-dictionary words heuristical algorithm was used [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. The main purpose of this analysis is to find
out canonical forms of words and some grammatical characteristics such as gender, case, time etc.
        </p>
        <p>
          Having canonical forms greatly improves frequencies of text elements and also allows to process
service words in correct manner. For English texts it’s possible to use stemmers but for Slavic
languages (Ukrainian and Russian in particular) it’s notably more efficient to use dedicated
morphological dictionaries. On the basis of the received morphological data the primitive syntactic
analysis is carried out. Adjectives (and participles) are associated with the corresponding nouns and
nouns are associated with the corresponding verbs. To do anaphora resolution morphological features
are used. And only then a simple semantic closeness analysis is performed, namely: among the
alternatives, the word that has the meaning that is the closest in semantic similarity to the words of the
context is selected. To determine the semantic similarity, the semantic database WordNet localized to
Ukrainian language [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] was used. Other parts of semantic closeness such as implied inference are
ignored because they are considered too complex to analyze.
        </p>
        <p>This pipeline is used in both algorithms though for the purposes of new one minor alteration were
made to fit it into modern programming languages and libraries.
4.2.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Important element detection</title>
      <p>tf − idfd(t) = ft,d · ⁡log⁡(
),</p>
      <p>The importance of text elements is determined by how much the user is interested in them and how
important they are to present the content of the text. To evaluate importance of term (word) simple
tfidf statistic is used.</p>
      <p>The tf-idf value increases proportionally to the number of times a word appears in the document
and is offset by the number of documents in the corpus that contain the word, which helps to adjust
for the fact that some words appear more frequently in general.</p>
      <p>N (2)
 
where ft,d - is the raw count of a term in a document, N - total number of documents in the selected
set, nt - number of documents containing term t.</p>
      <p>This particular weighting schema promotes diversity though it’s important to underline mutual
influence of terms (words). It is possible to receive input document(s) with multiple synonym usages
which can water down observed saliency and thus exclude important elements from summary.</p>
      <p>
        To avoid this problem it’s useful to calculate lexical chains [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] in texts using WordNet [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. For
the purpose of this research lexical chains were used. In order to simplify the process scores of the
chains were calculated inside each text independently.
      </p>
      <p>Homogenityd(Ch) =
nd⁡(Ch) − 1
Lengthd(Ch)
where ⁡nd(Ch) - number of distinct term occurrences in chain for the document, Lengthd(Ch)- length
of chain (total number of occurrences of different terms in chain) in the document.
(3)
(4)
(5)
Chains were tested with quality criterion:</p>
      <p>Scored(Ch) = Lengthd(Ch) · ⁡ Homogenityd(Ch),</p>
      <p>Scored(Ch) &gt;</p>
      <p>(Scored(Ch)) + 2σ
Where Avg(Scored(Ch)) is average of all scores of all chains in the particular document and σ
is standard deviation.</p>
      <p>Initially the sentences received scores based both on chain scores and on tf-idf scores.</p>
      <p>Scored(S) =</p>
      <p>∑
t(S),Ch(S)
tf − idfd(t) · Scored(Ch),
(6)
where summation includes only terms from the sentence S and if said term is also included into
relevant chain.</p>
      <p>This approach on itself penalized shorter posts or shorter articles for they often have shorter lexical
chains.
4.3.</p>
    </sec>
    <sec id="sec-7">
      <title>Genetic algorithm</title>
      <sec id="sec-7-1">
        <title>Fairly standard genetic algorithm was used in research.</title>
        <p>The chromosome was defined as the list corresponding to sentence numbers that will be included
in the final document (summary). An element having value of “True” indicates that the sentence will
be included into the summary. Value of “False” corresponds to sentences that are not included into
the summary. Number of values equal to “True” must not exceed summary sentence allotment.</p>
        <p>A chromosome can mutate by randomly changing the value of a list item at random. Two
chromosomes can perform cross-over in several ways:
1. Equal uniform random selection per element form parents.
2. Single point cross-over when everything before (and including) the point is taken from one
parent and everything else – from another.
3. Dual point cross-over when head and tail are taken from one parent and middle part between
two cut points is taken from another parent.</p>
        <p>Each version of cross-over has own influence on performance and evaluation results.</p>
        <p>After the cross-over is performed either padding or trimming can happen. If the summary is
shorter than necessary the most salient unused sentences are included into chromosome. If the
summary is longer than necessary – the least salient sentences are removed from it.</p>
        <p>The algorithms works as following:
First generation of chromosomes is generated at random. They are places into empty List  1.
For number of generations</p>
        <p>List  2 = {}
Random mutations  times are imposed on the chromosomes.</p>
        <p>Mutants are placed into  2.</p>
        <p>For all chromosomes on the list  1</p>
        <p>For all chromosomes on the list  1</p>
        <p>A pair of chromosomes creates offsprings by cross-over</p>
        <p>Descendants are put into  2</p>
        <p>The chromosomes in  2 are sorted by rating
⁡⁡⁡⁡⁡⁡⁡⁡ best are selected.</p>
        <p>If combined score of the population⁡ 1 = combined score the population⁡ 2: abort algorithm
 1=  2.
where  – number of generations,  - power of chromosome set,  - mutability parameter.</p>
        <p>The rating is calculated as total sum of individual scores of terms combined with global coherence
score in accordance with principles laid out in Initial analysis.</p>
        <p>The first version of cross-over was implemented in the algorithm developed in Department 165.
The second and third ones were implemented during this research.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>5. First series of numerical experiments</title>
      <p>For each version of cross-over experiments were performed with Ukrainian texts and the results
were evaluated by hand. Each experiment consisted of 10 launches of multi-document summarization
and 10 launches of forum thread summarization. During each launch of multi-document
summarization 7 different articles collected from source (https://www.ukr.net/news/politics.html)
were processed. During each launch of thread summarization a thread with at least 7 posts from
https://replace.org.ua/forum/9/ (“Український форум програмістів → Обговорення”) was
processed. Each time a score to the summary was manually assigned based on perceived performance
ranging from 1 (“entirely unsatisfactory”) to 5 (“good selection for future work”). The average scores
and their variances are presented in Table 1. Initially it was expected that first type of cross-over will
perform in the best way because it’s true for many other problems that are solved using genetic
algorithms. But in this experiment things went other way: first type of cross-over ended up as the
worst out of three in both tasks. Also there is unexpected difference between performance of the
second and third types of cross-over in both tasks. Usually it’s expected to have consistent difference
in performance across the board as long as the task stays the same.</p>
    </sec>
    <sec id="sec-9">
      <title>6. Algorithm adjustments and second series of numerical experiments</title>
      <p>Due to unexpected behavior of the summarization algorithm several hypotheses were put forward
and tested. They mostly revolved around notions of continuity, document (post) boundaries and
hidden features of human perception. For example, in most cases results of multi-document
summarization retained majority of sentences from most salient (important) document and less
important documents ended up in small chunks mashed together at the end of summary. Nevertheless,
it does not prevent testers from giving relatively high marks to such summary. In the same time
destruction (chunking) of the first post in thread summarization task was a surefire way to generate
low expert score regardless to relative value (contribution) of abovementioned post to general
discussion quality in thread. More so, chunking of any post carried significant negative impact on the
score of generated summary while exclusion of the same post often resulted in notably milder expert
score penalty. To address abovementioned issues some changes were introduced to summarization
algorithm. First of all, first post in a thread was made mandatory regardless of actual contribution to
summary. Second, chunking penalty was inserted into final calculations.</p>
      <p>Penaltyd =</p>
      <sec id="sec-9-1">
        <title>Selectedd</title>
      </sec>
      <sec id="sec-9-2">
        <title>Totald</title>
        <p>,
where Selectedd - number of selected sentences from the document, Totald – total number of
sentences in the document.</p>
        <p>Scored(S) = ∑ tf − idf(t) · Scored(Ch) · Penaltyd⁡,</p>
        <p>t,ch</p>
        <p>
          It resulted in higher score if less posts were cut in pieces during general optimization. Observation
of results often showed that algorithm retained initial boundaries of posts in exchange of removing
some posts entirely. Also, additional changes were introduces to genetic algorithm to improve quality
of summaries and speed of convergence. This approach has some similarities to chromosome reuse
strategy [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] but instead of chromosome library it uses memory about ancestral behavior directly.
6.1.
        </p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Chromosomes with memory</title>
      <p>The chromosome was defined as the list corresponding to sentence numbers that will be included
in the final document (summary). In comparison to standard chromosome extra field was introduced
for each variable (gene). This field contains information about recent changes in variable and is used
during mutation and cross-over. There are two rules which influence behavior of chromosomes with
memory field:
1. If new (mutant) chromosome is created by flipping Boolean variable that was recently in
opposite state then other variable is picked for flipping.
2. If two chromosomes undergoing cross-over have too many (above certain threshold⁡ )
variables going in opposite directions (as it is indicated by respective memory fields) the
crossover does not produce offsprings. If necessary other pair of chromosomes will undergo cross-over.</p>
      <p>For the purpose of this research extra memory field was presented by Boolean variables, but in
general case memory field can be implemented as a set of integer or even real variables.</p>
      <p>First of all this optimization is intended to boost convergence by avoiding recalculating summary
quality due to algorithm clearly cycling values of the same (sub)set of variables.</p>
      <p>While in general this optimization can be used with any kind of cross-over for the purpose of this
research it was applied to dual-point cross-over as it showed the best average performance. All other
versions of cross-over were removed from consideration.
6.2.</p>
    </sec>
    <sec id="sec-11">
      <title>Final evaluation results</title>
      <p>The evaluation results for fourth and fifth version of preliminary summary generation are
presented in Table 2.</p>
      <p>As it is shown in the table introduction of strict order and chunking penalty was most beneficial for
thread summarization. Never the less none of the tweaks to the algorithm allowed perfect scores.
Also, regardless of tweaks to the algorithm forum thread summarization is working definitely worse
in comparison to multi-document summarization. On the issue of performance the results were not
very conclusive. In most cases the algorithm converges notably faster, but there were some cases
when the algorithm was working for the full number of generations and failed to achieve good results.
This is also reflected by notable growth of score variance.</p>
    </sec>
    <sec id="sec-12">
      <title>7. Conclusions and future work</title>
      <p>For the purpose of future development of dataset(s) for evaluation of summaries some experiments
were performed. It was shown that it’s possible to achieve good (rated as acceptable by human
experts) results with genetic algorithms for semi-automatic summary generations for multi-document
summarization dataset. For the purpose of thread summarization dataset it can be beneficial to
combine different approaches. Relatively high variance in scores points to some good and some poor
results. It can be presumed that by picking n-best from several methods the abovementioned problems
can be alleviated. The strange behavior of optimization algorithms probably emerges from the nature
of the medium of thread discussions. Unlike journalists forum writers actually discuss things directly
and often quote each other creating complex and convoluted chains of reasoning. More so, sometimes
they don’t quote directly but instead make indirect references or conclusions based on discussed
subject. It makes analysis of such occasions inconvenient not only for the algorithm but also for
human assistants. The future work will be centered on getting enough manpower to implement
datasets for Slavic languages, first and foremost a dataset for Ukrainian language.</p>
    </sec>
    <sec id="sec-13">
      <title>8. Acknowledgements</title>
      <p>The author would like to acknowledge the following people for their contributions to the research:
prof. Anisimov A.V. from faculty of Computer Sciences and Cybernetics, Taras Shevchenko National
University of Kyiv for useful suggestions on the nature of natural language texts and general support;
staff members of dpt. 165 of IRTC IT &amp;S, Kyiv for libraries and support provided during this
research.</p>
    </sec>
    <sec id="sec-14">
      <title>9. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lapata</surname>
          </string-name>
          ,
          <article-title>Hierarchical transformers for multi-document summarization</article-title>
          .
          <source>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          , Florence, Italy,
          <source>July 28 - August 2</source>
          ,
          <year>2019</year>
          pp.
          <fpage>5070</fpage>
          -
          <lpage>5081</lpage>
          URL: https://aclanthology.org/P19-1500.pdf
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yasunaga</surname>
          </string-name>
          et al.,
          <article-title>Graph-based neural multi-document summarization</article-title>
          .
          <source>Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL</source>
          <year>2017</year>
          ), Vancouver, Canada,
          <source>August 3 - August 4</source>
          ,
          <year>2017</year>
          pp.
          <fpage>452</fpage>
          -
          <lpage>462</lpage>
          , URL: https://aclanthology.org/K17-1045.pdf
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Peter J. Liu</surname>
          </string-name>
          et al.,
          <article-title>Generating Wikipedia by Summarizing Long Sequences</article-title>
          .
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1801</year>
          .10198
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          and Durrett,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ,
          <article-title>Neural extractive text summarization with syntactic compression</article-title>
          .
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing</source>
          , ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          , China, November 3-
          <issue>7</issue>
          ,
          <year>2019</year>
          , pp.
          <fpage>3292</fpage>
          -
          <lpage>3303</lpage>
          URL: https://aclanthology.org/D19-1324.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Mani</surname>
          </string-name>
          , et al.,
          <article-title>The TIPSTER SUMMAC text summarization evaluation</article-title>
          .
          <source>In: Proceedings of Ninth Conference of the European Chapter of the Association for Computational Linguistics</source>
          ,
          <year>1999</year>
          pp.
          <fpage>77</fpage>
          -
          <lpage>85</lpage>
          . https://doi.org/10.3115/977035.977047
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nenkova</surname>
          </string-name>
          ,
          <article-title>Automatic text summarization of newswire: Lessons learned from the document understanding conference</article-title>
          .
          <year>2005</year>
          . URL: https://www.aaai.org/Papers/AAAI/2005/AAAI05- 228.pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>El-Haj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Kruschwitz</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Fox</surname>
          </string-name>
          , University of Essex at the TAC 2011 MultiLingual
          <string-name>
            <given-names>Summarisation</given-names>
            <surname>Pilot</surname>
          </string-name>
          .
          <year>2011</year>
          . URL: http://repository.essex.ac.uk/8920/1/UoEssex.proceedings.pdf
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jing</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>McKeown</surname>
          </string-name>
          ,
          <article-title>Cut and paste based text summarization</article-title>
          .
          <source>In Proceedings of 1st Meeting of the North American Chapter of the Association for Computational Linguistics</source>
          .
          <year>2000</year>
          . p.
          <fpage>178</fpage>
          -
          <lpage>185</lpage>
          URL: https://aclanthology.org/A00-2024.pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          et al.,
          <article-title>Keep meeting summaries on topic: Abstractive multi-modal meeting summarization</article-title>
          .
          <source>In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics July</source>
          <year>2019</year>
          . pp.
          <fpage>2190</fpage>
          -
          <lpage>2196</lpage>
          . URL: https://aclanthology.org/P19-1210.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.A.</given-names>
            <surname>Grozin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.F.</given-names>
            <surname>Gusarova</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.V.</given-names>
            <surname>Dobrenko</surname>
          </string-name>
          ,
          <article-title>Feature selection for language independent text forum summarization</article-title>
          .
          <source>In: Proceedings of International Conference on Knowledge Engineering and the Semantic Web</source>
          , Springer,
          <year>September 2015</year>
          .pp.
          <fpage>63</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Barker</surname>
          </string-name>
          et al.,
          <article-title>The SENSEI annotated corpus: Human summaries of reader comment conversations in on-line news</article-title>
          . In:
          <article-title>Proceedings of the 17th annual meeting of the special interest group on discourse and dialogue</article-title>
          .
          <source>September</source>
          <year>2016</year>
          . pp.
          <fpage>42</fpage>
          -
          <lpage>52</lpage>
          . URL: https://aclanthology.org/W16-3605.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Kavita</surname>
            <given-names>Ganesan</given-names>
          </string-name>
          , ROUGE
          <volume>2</volume>
          .
          <article-title>0: Updated and Improved Measures for Evaluation of Summarization Tasks Computational Linguistics</article-title>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ).
          <year>2006</year>
          , URL: https://arxiv.org/pdf/
          <year>1803</year>
          .
          <year>01937</year>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Rouge: A package for automatic evaluation of summaries</article-title>
          . In Workshop Text summarization branches out.
          <source>Barcelona, Spain. July</source>
          <year>2004</year>
          . p.
          <fpage>74</fpage>
          -
          <lpage>81</lpage>
          . URL: https://aclanthology.org/W04-1013.pdf
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Fleiss</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <article-title>The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability, Educational</article-title>
          and Psychological
          <string-name>
            <surname>Measurement</surname>
          </string-name>
          (
          <year>1973</year>
          ), Vol.
          <volume>33</volume>
          <fpage>613</fpage>
          -
          <lpage>619</lpage>
          https://doi.org/10.1177/001316447303300309
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singhal</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Buckley</surname>
          </string-name>
          ,
          <article-title>Automatic text summarization by paragraph extraction</article-title>
          .
          <source>Intelligent Scalable Text Summarization</source>
          . (
          <year>1997</year>
          ) URL: https://aclanthology.org/W97-0707.pdf
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Al-Sabahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zuping</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Nadher</surname>
          </string-name>
          ,
          <article-title>A hierarchical structured self-attentive model for extractive document summarization (HSSAS)</article-title>
          .
          <source>IEEE Access</source>
          ,
          <volume>6</volume>
          ,
          <year>2018</year>
          . pp.
          <fpage>24205</fpage>
          -
          <lpage>24212</lpage>
          . URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=
          <fpage>8344797</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yao</surname>
          </string-name>
          et al.,
          <article-title>Deep reinforcement learning for extractive document summarization</article-title>
          .
          <source>Neurocomputing</source>
          ,
          <volume>284</volume>
          ,
          <year>2018</year>
          . pp.
          <fpage>52</fpage>
          -
          <lpage>62</lpage>
          . URL: https://www.researchgate.net/publication/322715462_Deep_
          <article-title>Reinforcement_Learning_for_Extra ctive_Document_Summarization</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          et al.,
          <article-title>Improving neural abstractive document summarization with explicit information selection modeling</article-title>
          .
          <source>In: Proceedings of the 2018 conference on empirical methods in natural language processing</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1787</fpage>
          -
          <lpage>1796</lpage>
          . URL: https://aclanthology.org/D18-1205.pdf
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wan</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <article-title>Abstractive document summarization with a graph-based attentional neural model</article-title>
          .
          <source>In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics July</source>
          <year>2017</year>
          . Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          . pp.
          <fpage>1171</fpage>
          -
          <lpage>1181</lpage>
          . URL: https://aclanthology.org/P17-1108.pdf
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>T5</surname>
            <given-names>URL</given-names>
          </string-name>
          : https://ai.googleblog.com/
          <year>2020</year>
          /02/exploring-transfer
          <article-title>-learning-with-t5.html</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>[21] GPT-3 URL: https://openai.com/blog/gpt-3-apps/</mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Haghighi</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Vanderwende</surname>
          </string-name>
          ,
          <article-title>Exploring content models for multi-document summarization</article-title>
          . In:
          <article-title>Proceedings of human language technologies: The 2009 annual conference of the North American Chapter of the Association for Computational Linguistics</article-title>
          .
          <source>June</source>
          <year>2009</year>
          , pp.
          <fpage>362</fpage>
          -
          <lpage>370</lpage>
          . URL: https://aclanthology.org/N09-1041.pdf
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>C.Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <article-title>From single to multi-document summarization. In Proceedings of the 40th annual meeting of the association for computational linguistics</article-title>
          ,
          <source>July</source>
          <year>2002</year>
          , pp.
          <fpage>457</fpage>
          -
          <lpage>464</lpage>
          . URL: https://aclanthology.org/P02-1058.pdf
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wan</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          <article-title>, Multi-document summarization using cluster-based link analysis</article-title>
          .
          <source>In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. July</source>
          <year>2008</year>
          . pp.
          <fpage>299</fpage>
          -
          <lpage>306</lpage>
          URL: https://citeseerx.ist.psu.edu/viewdoc/download?doi
          <source>=10.1.1.222.6018&amp;rep=rep1&amp;type=pdf</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>V.</given-names>
            <surname>Gritsenko</surname>
          </string-name>
          ,
          <article-title>Zakluchnuy zvit pro vukonannya DCNTP “Obrazny konpyuter” [Final report on completion of STSTP “Pattern computer”]</article-title>
          . IRTC IT&amp;S, Kyyiv,
          <year>2010</year>
          . 44 p. URL: http://obrazcomp.irtc.org.ua/Pressa/Zvit/Zvit_OK.pdf
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.V.</given-names>
            <surname>Anisimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.N.</given-names>
            <surname>Romanik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Yu</surname>
          </string-name>
          . Taranukha,
          <article-title>Evrisiticheskiye algoritmy dlya opredeleniya kanonicheskih form i gramaticheskih harakteristic slov [Heuristic Algorithms for Determination of Canonical Forms and Grammatical Characteristics of Words]</article-title>
          .
          <source>Cybernetics and Systems</source>
          Analysis Vol.
          <volume>40</volume>
          (
          <year>2004</year>
          ).
          <source>- Iss. 2</source>
          . pp.
          <fpage>3</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmer</surname>
          </string-name>
          ,
          <article-title>Verb semantics and lexical selection</article-title>
          .
          <source>In: 32nd. Annual Meeting of the Association for Computational Linguistics</source>
          , (
          <year>1994</year>
          ) New Mexico State University, Las Cruces, New Mexico pp.
          <fpage>133</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Anisimov</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Ukrainian</surname>
            <given-names>WordNet</given-names>
          </string-name>
          <article-title>: creation and filling</article-title>
          .
          <source>In: International Conference on Flexible Query Answering Systems September</source>
          <year>2013</year>
          . pp.
          <fpage>649</fpage>
          -
          <lpage>660</lpage>
          . Springer, Berlin, Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>R.</given-names>
            <surname>Barzilay</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Elhadad</surname>
          </string-name>
          ,
          <article-title>Using lexical chains for text summarization</article-title>
          .
          <source>Advances in automatic text summarization</source>
          ,
          <year>1999</year>
          . pp.
          <fpage>111</fpage>
          -
          <lpage>121</lpage>
          . URL: https://academiccommons.columbia.edu /doi/10.7916/ D8086DM3/download
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Acan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tekol</surname>
          </string-name>
          ,
          <article-title>Chromosome reuse in genetic algorithms</article-title>
          .
          <source>In Genetic and evolutionary computation conference</source>
          , July 2003 Springer, Berlin, Heidelberg. pp.
          <fpage>695</fpage>
          -
          <lpage>705</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>