Towards Efficient model for Automatic Text
                                 Summarization
                     Yetunde O. Folajimi                                                       Tijesuni I. Obereke
               Department of Computer Science                                           Department of Computer Science
                     University of Ibadan.                                                    University of Ibadan.
                      +2348056648530                                                           +2348137462256
                 yetundeofolajimi@gmail.com                                                  tobereke@gmail.com


ABSTRACT                                                                     program takes in a text and outputs a short version of the text
                                                                             retaining the important parts only. The essence of text
Automatic text summarization aims at producing summary from a                summarization is to bring out the salient parts of a text.
document or a set of documents. It has become a widely explored
                                                                             The method used for automatic text summarization can either be
area of research as the need for immediate access to relevant and
                                                                             extractive or abstractive. Extractive summarization method
precise information that can effectively represent huge amount of
                                                                             involves picking important sentences from a document while
information. Because relevant information is scattered across a
                                                                             abstractive method of summarization involves the use of linguistic
given document, every user is faced with the problem of going
                                                                             methods to analyze and interpret a document, the system then
through a large amount of information to get to the main gist of a
                                                                             looks for another way to portray the content of the document in a
text. This calls for the need to be able to view a smaller portion of
                                                                             short form and still pass across the main gist of the document.
large documents without necessarily losing the important aspect
                                                                             Also the input of a text summarization system can either be single
of the information contained therein. This paper provides an
                                                                             or multiple. Single document summarization involves
overview of current technologies, techniques and challenges in
                                                                             summarizing a single text while Multi-document summarization
automatic text summarization. Consequently, we discuss our
                                                                             involves summarizing from more than one source text.
efforts at providing an efficient model for compact and concise
documents summarization using sentence scoring algorithm and a               Automatic text summarization is one of the many applications of
sentence reduction algorithm. Based on comparison with the well-             Natural Language Processing. It can be used for question and
known Copernic summarizer and the FreeSummarizer, our system                 answering, information retrieval among other things. Earlier
showed that the summarized sentences contain more relevant                   methods of text summarization used statistical methods that
information such that selected sentences are relevant to the query           assigned scores to sentences or words in a sentence, and these
posed by the user                                                            methods are inefficient because they didn’t consider the context of
                                                                             words, which made the resulting summaries, incoherent. More
CS Concepts                                                                  research unveiled approaches that do not score sentences for
• Computing methodologies ➝Artificial intelligence ➝Natural                  extraction, but merged lots of knowledge bases to enable them
language processing ➝Information extraction • Information                    know the part of speech of words in a sentence but do not
Systems ➝Information Retrieval ➝Retrieval tasks and goals                    consider keywords identification to identify important parts of
                                                                             documents..
➝Summarization
                                                                             Automatic text summarization system helps saves time and effort
Keywords                                                                     that one would have used to scan a whole document, it also helps
Automatic text summarization, extractive summary, sentence                   increase productivity and with the amount of research that has
scoring, sentence reduction, query-based summarization.                      been done in automatic text summarization, summaries are
                                                                             available in different languages [1].
1. INTRODUCTION                                                              This paper presents the current technologies and techniques as
The area of automatic text summarization has become a widely                 well as prevailing challenges in automatic text summarization,
explored area of research because of the need for immediate                  consequently, we propose a model for improving text
access to information at this age where the amount of information            summarization by using a method that combines sentence scoring
on the World Wide Web is voluminous. The problem is not the                  algorithm with sentence reduction.
availability of information but users have access to more than
enough information than they need, they are also faced with the
problem of digging through that large amount of information to               2. SENTENCE EXTRACTION METHODS
get what they really need.
                                                                             FOR TEXT SUMMARIZATION
Automatic text summarization is a process whereby a computer                  A system that scans a document in machine-readable form then
                                                                             selects from the sentences in the Article the ones that carry
                                                                             important information was proposed in [2]. The significance
                                                                             factor of a sentence is derived from an analysis of its words,
CoRI’16, Sept 7–9, 2016, Ibadan, Nigeria.                                    whereby the frequency of words occurrence is a useful
                                                                             measurement of word significance and that the relative position of
                                                                             words within a sentence having given values of significance
                                                                             furnishes a useful measurement for determining the significance
                                                                             of sentences. In [2], a justification was made about measure of
                                                                             significance based on how frequent some words occurred by


                                                                        53
pointing out that an author when trying to express his thoughts on          their system include the presence of uppercase words, length of
a subject repeats some words.                                               sentence, structure of phrase and position of words. The author
Another research in IBM pointed out that the position of a                  assumed the following:
sentence can be used to find areas of a document containing                 s = a certain sentence, S = the sentences in the summary, and F 1, ,
important information [3]. There, it was shown that sentences that          Fk = the features.
occur in the initial or final parts of a paragraph contain important
information. By analyzing a sample of 200 paragraphs, it was
discovered that in most paragraphs the headings came first and in
few it came last.
                                                                                                                                       -- (1)
Unlike the method used in [2], which used only the frequency of
word occurrence to produce extracts, [4] analyzed using cue                 In equation 1, Sentences are scored based on these features and
words, title and heading words, sentence location and key method            the formula is used to calculate the score, the highest ranking
individually and together. The justification of using cue method is         sentences are extracted.
that sentences containing words like “most importantly”, “in this           Th naïve-bayes classifier was also used in DimSum [8], which
paper” indicate sentence importance. For key method, scores were            used term frequency (tf) which is the number of times that a word
assigned to frequently occurring words in the document. For title           appears in a sentences and inverse document frequency (idf)
method, sentences are scored based on how much of the title or              which is the number of sentences in which a word occurs, to know
heading words it contains and for the sentence location, the                words that hold point at the key concepts of a document.
importance of a sentence is determined using position as criteria
like words at the beginning of a paragraph are considered                   3.2       Rich Features and Decision Trees
important. His results showed that the best match between                   Decision trees are powerful and popular tools for classification
automatic and human-written abstracts was accomplished when                 and prediction. It is a classifier in the form of a tree structure. The
sentence location, cue words and title words are considered.                following nodes make up the tree:
2.1        Beyond Sentence Extraction                                                Decision node: specifies a test on a single attribute,
A method that involved the removal of irrelevant phrases from                        Leaf node: indicates the value of the target attribute,
sentences extracted for summary was introduced in [5]. The first                     Arc/edge: split of one attribute,
step involves the generation of a parse tree, followed by grammar                    Path: a disjunction of test to make the final decision
checking so as to know which of the nodes of the tree can be
deleted, it then checks the parts of the sentences that contains            In [9], the authors concentrated on text position by making an
information relating to the main topic. After doing all the above it        effort to determine how sentence position affects the selection of
then removes the unnecessary parts of the sentences leaving                 sentences. The justification for the focus on position method is
behind a concise and coherent summary.                                      that texts are in a particular discourse structure, and that sentences
                                                                            containing ideas related to the topic of a document are always in
Motivated by the fact that automatic summarizers cannot always
                                                                            specifiable locations (e.g. title, abstracts, etc). They also
identify where the main gist of a document lies and the way text is
                                                                            mentioned that discourse structure significantly varies over
generated is poor, [6] introduced a cut and paste method which
                                                                            domains, so therefore the position method cannot be easily
involved six operations:
                                                                            defined.
      I.   Sentence reduction, where unnecessary phrases are
                                                                            A sentence reduction algorithm that is based on decision tree was
           removed from the sentences,
                                                                            introduced in [10]. The algorithm proposed used semantic
     II.   Sentence combination, where sentences are combined
                                                                            information to aid the process of sentence reduction and decision
           together,
                                                                            tree to handle the fact that the orders of original sentences change
 III.      Syntactic transformation, involves rearrangement of
                                                                            after they are reduced. They extended Knight and Marcu’s
           words or phrases,
                                                                            sentence compression algorithm [11], which was also based on
 IV.       Lexical paraphrasing, phrases are substituted with
                                                                            decision tree by adding semantic information to theirs. To achieve
           paraphrases,
                                                                            this, they used a Parser to parse the original sentences and by
     V.    Generalization and specification, substituting phrases
                                                                            using WordNet, they enhanced the syntax tree gotten with
           with general/specific description,
                                                                            semantic information.
 VI.       Reordering, rearrangement of the sentences extracted
           for summary.                                                     3.3       Hidden Markov Models
                                                                            A hidden Markov model is a tool for denoting probability
3.         MACHINE LEARNING METHODS                                         distributions over sequences of observations. If we represent the
Various machine learning techniques have been exploited in                  observation at time t by the variable Yt, we assume that the
automatic text summarization. Some of the techniques used                   observation are sampled at discrete, equally-spaced time intervals,
include: Naïve-Bayes method, Rich Features and Decision Trees               so t can be an integer-valued time index. The two defining
method, Hidden Markov model, Log-linear models and Neural                   properties of hidden Markov model are: the assumption that the
Networks and Third Party Features.                                          observation at time t was generated by some process whose state
                                                                            St is hidden from the observer and the assumption that the state of
3.1        Naïve-Bayes Method                                               the hidden process satisfies the Markov property i.e. given the
                                                                            value of St-1, the current state St is independent of all the states
Naïve-Bayes method was first used in [7] by using Bayesian
                                                                            prior to t-1 [12].
classifier to determine if a sentence should be extracted or not.
The system was able to learn from data. Some features used by


                                                                       54
Two sentence reduction algorithms were proposed in [13]. Both
were template-translation based which means that they don’t need
syntactic parser to represent the original sentences for reduction.
One was founded on example-based machine-translation which
does a good job of in the area of sentence reduction. On the other
hand in specific cases, the computational complexity can be
exponential. While the second one was an addition to the
template-translation algorithm through the application of Hidden
Markov model, the model employs the set of template rules that
was learned from examples to overcome the problem of
computational complexity.                                                                        Figure 1: Text summarizer Architecture
2.4.4     Log-Linear Models                                                  3.1 Sentence Scoring Module
Log-Linear models are generally used in Natural Language
                                                                                                                                          Extracted
processing. The flexibility of this model is its major benefit; it                        User Input              Sentence
                                                                                                                                          Sentences
allows the use of rich set of features.                                                                           Extraction

In [14], log-linear models were used to bring to null the
assumption that existing systems were feature independent.                                                              Sentence
Consequently, it was also showeshown empirically that using                                                              resemblance to
log-linear models produced better extracts than naïve-bayes                                                              query
model. The conditional log-linear model used by the author can be                                                       Cue Phrases
                                                                                                                        Word frequency
stated as follow:
                                                                                                       Figure 2: Sentence scoring module
                                                                             In the sentence scoring module, there are two major steps
                                                                             involved:
                                                    …………..(2)
                                                                                      1.      Preprocessing:
Let c = label, s = item we want to label, fi = i-th feature, λi = the
corresponding feature weight and Z (s) = ∑c exp (∑i λi fi (c,s)).            This step involves the removal of stop-word and tokenization;
                                                                             stop-words are extremely common words (e.g. a, the, for). For this
3.4 Neural Networks and Third Party Features                                 part, a stoplist which is a list of stop-words is used. Tokenization
                                                                             involves breaking the input document into sentences.
The automatic text summarization system developed in [15] had
learning ability. This was done through combination of a                         2.       Sentence scoring: after the document has been broken into
statistical approach, extraction of keywords, neural network and                          group of sentences. As seen in Figure 1 above, sentences
unsupervised learning. The process used involved three steps: step                        are extracted based on three important features; sentence
one involved removal of stop words like “a” and stemming which                            resemblance to query, cue phrases and word frequency.
is done by removing suffixes and prefixes to convert a word to its                       Sentence resemblance to query: This is modelled after
stem. Step two involves keywords are extracted by computing the                           sentence resemblance to title which calculates a score based
matrix of the term frequency against the inverse document                                 on the similarity between a sentence and the title of a
frequency, the most frequent terms listed are the keywords to be                          document. So sentence resemblance to query calculates a
extracted for the summary. For the final step, the model checks                           score based on the similarity between a sentence and the
for stop words again to be sure that no stop word is selected as                          user query which means that any sentence that is similar to
keyword after which it selects sentences containing keywords to                           the query or includes words in the query are considered
be added to the summary.                                                                  important. And the score will be calculated using the
NetSum [16] was the first to use neural network ranking algorithm                         following formula:
and third-party datasets for automatic text summarization. The
authors trained a system that learned from a train set containing            Score =                                       ----(1)
labels of best sentences from which features are extracted. From             Where nQW = number of words in query
the train set, the system learns how features are distributed in the
best sentences and it gives a result of ranked sentences for each                            Cue Phrases: the justification of using this feature is
document, the ranking is done using RAnkNet [17]                                              that the presence of some words likes “significantly”,
                                                                                              “Since” point to important gist in a document and a
3. SENTENCE SCORING AND SENTENCE                                                              score is assigned to such sentences. The score is
REDUCTION MODELS                                                                              computed using:
Sentence score is a value that determines the sentences that are             Score =                                  ----------(2)
relevant to the input text. As shown in Figure 1, in our
architecture, the input to the system is a single document.                          Word frequency, is a useful measurement of significance
Sentence scoring occurs at the first stage; significant sentences are                 because it is revealed in [2] that an author tend to repeat
identified and extracted. The second stage involves the sentence                      certain words when trying to get a point across. So sentences
reduction module; the extracted sentences from the sentence                           that contain frequently occurring words are considered to be
scoring module are processed, grammar checking and removal of                         significant. The algorithm involves:
target structures is done.


                                                                        55
       I.   Breaking sentence into tokens                                   syntactic parsing, grammar checking, removal of target structures
      II.   For each token, if the token already exists in array,           and display of output.

                               Increment its count,
                               Else add token to array and
                    Set initial count to 1.
The boolean formula below is used to decide the sentences to be
selected for further processing:
(SrqScore >= 0.5 || (CpScore >= 0.1 && WfScore >=3) ---(3)
Where SrqScore is Sentence resemblance to query Score, CpScore
is Cue phrase score and WfScore is Word Frequency score.
3.2          Sentence Reduction Module


Figure 3: Sentence reduction module
In the sentence scoring module, the original document and the
extracted sentences from the sentence scoring module is processed           Figure 3: Activity Flow
so as to remove irrelevant phrases from the document to make the            4.1 Implementation Resources
summary concise, the sentence reduction algorithm is described in
                                                                            The following resources were used for the implementation of our
details in [18]. The processing involves:
                                                                            summarization system:
Syntactic Parsing
                                                                                     Java: it was considered a good choice for developing
Stanford parser, a syntactic parser is used to analyze the structure
                                                                                      the summarization system because it makes more
of the sentences in the document and a sentence parse tree is
                                                                                      efficient use of memory needed for a system of this
produced. The other stages involved in the sentence reduction
                                                                                      nature, it is faster and more effective, and also provides
module add more information to the parse tree, these information
                                                                                      more data structures to handle the different data types
aids the final decision to be made.
                                                                                      used in the implementation of the proposed algorithm.
Grammar checking
We go back and forth on the sentence parse tree, node by node to                     Stanford Parser: The Stanford parser was used as the
identify parts of the sentence are important and must not be                          tagging tool in the sentence reduction module of our
removed to make the sentence grammatically correct. For                               implementation. The Stanford parser analyses the
example, in a sentence, the main verb, the subject, and the                           sentences and provides us with the parts of speech of the
object(s) are essential if they exist.                                                words in a sentence, as well as the class e.g. adverbial
                                                                                      phrase, adjectival phrases, etc., different parts of the
Removal of Target Structures
For this research work we will be using the main clause algorithm                     sentence belongs to.
for sentence reduction. In this algorithm, the main clause of a                      Gate (General Architecture for Text Engineering):
sentence is obtained and in that main clause we identify the target                   We used GATE in the development our algorithm to
structures which are the structures to be removed and they are                        integrate the Stanford parser and its other modules such
adjectives, adverbs, appositions, parenthetical phrase, and relative                  as the sentence splitter and tokenizer which properly
clauses. A reduced sentence is gotten after the targeted structures                   handles complexities that may occur in long articles
has been identified and removed from the sentence parse tree.                         than ordinary tokenizers cannot handle properly. The
Summary Construction                                                                  integration of the parser and this other modules was
We once again go back and forth on the sentence parse tree and                        used to create an application pipeline which was then
see if the reduced sentences are grammatically correct. The                           used as a plugin in the implementation and development
reduced sentences are then merged together to give the final                          of our summarisation system.
summary.
After the sentence reduction module carries out all four steps, a           5. RESULTS AND DISCUSSION
concise and coherent summary is expected as output.                         To test our summarisation system, we obtained random articles
                                                                            online from Wikipedia to use as our input document. The article is
4. IMPLEMENTATION                                                           then saved in a text (.txt) file to be used in our system. Wikipedia
The system allows a user to input a document; which is then                 references are disregarded during our extraction and not included
prepossessed and the sentences ranked. The high ranked sentences            in the content of the article to use as input document for our
are then extracted for further processing. The result is then viewed        summarisation system, as they do not provide any value of
by the user. As shown in figure 3, the flow of activities includes          importance to the overall article and the summary we want to
uploading of input file, preprocessing, assignment of scores,               generate. For this task, we selected an article about Web 2.0,


                                                                       56
however, it is important to note that any article could be selected        [4] Edmundson, H. P. (1969). New methods in automatic
and used.                                                                      extracting. Journal of the ACM, 16(2):264–285. [2, 3, 4]

                                                                           [5] Jing, H. (2000). Sentence Reduction for Automatic Text
                                                                               Summarization, In Proceedings of the Sixth Applied
                                                                               Natural Language Processing Conference, pp. 310-315.
                                                                               Association for Computational Linguistics.
                                                                           [6] Jing, H. and McKeown, K.R. (2000). Cut and Paste Based
                                                                               Text Summarization, In Proceedings of the 1st Meeting of
                                                                               the North American Chapter of the Association for
                                                                               Computational Linguistics, pp. 178-185. Association for
                                                                               Computational Linguistics.
                                                                           [7] Kupiec, J., Pedersen, J., and Chen, F. (1995). A trainable
                                                                               document summarizer, In Proceedings of the 18th annual
                                                                               international ACM SIGIR conference on Research and
                                                                               development in information retrieval, pp. 68-73. ACM.
Figure 4: GUI interface of our summarization system.                       [8] Larsen, B. (1999). A trainable summarizer with knowledge
We evaluated our summarization system using 3 standard                          acquired from robust NLP techniques. Advances in
parameters; (1) precision (the fraction of the returned summary                 Automatic Text Summarization, pp. 71.
that is relevant to the query posed by the user); (2) recall (the          [9] Lin, C. Y., and Hovy, E. (1997). Identifying topics by
fraction of the relevant information contained in the document                  position, In Proceedings of the fifth conference on Applied
that was returned by the summarization system); and (3) F-                      natural language processing, pp. 283-290. Association for
measure (the weighted harmonic mean of precision and recall).                   Computational Linguistics
We had a human subject read two articles, one was the web 2.0              [10] Le, N. M., & Horiguchi, S. (2003). A new sentence reduction
article and the other was a Blackberry passport review.                         based on decision tree model, In Proceedings of the 17th
The web 2.0 article had 179 sentences, the summary presented by                 Pacific Asia Conference on Language, Information and
the subject contained 110 sentences and our system produced a                   Computation, pp. 290-297.
summary containing 112 sentences. 92 sentences were present in             [11] Knight, K., and Marcu, D. (2002). Summarization beyond
both the summary made by the subject and the summary made by                    sentence extraction: A probabilistic approach to sentence
our system.                                                                     compression, Artificial Intelligence, Vol. 139, No. 1, pp. 91-
The second document, a blackberry passport review contained                     107.
129 sentences. In the summary presented by the subject we had 40           [12] Ghahramani, Z. (2001). An Introduction to Hidden Markov
sentences and our summarizers’ summary, we had 57 sentences.                    Models and Bayesian Networks, International Journal of
27 sentences were present in both the summary made by the                       Pattern Recognition and Artificial Intelligence, Vol 15, No.
subject and the summary made by our system. Based on                            1, pp. 9-42.
comparison with the well-known Copernic summarizer which
                                                                           [13] Nguyen, M. L., Horiguchi, S., Shimazu, A., & Ho, B. T.
produces summary based on statistical and linguistic methods and
                                                                                (2004). Example-based sentence reduction using the hidden
the FreeSummarizer that produces summary based on specified
                                                                                markov model, ACM Transactions on Asian Language
number of sentences, our system gave high recall values of 83.6%
                                                                                Information Processing (TALIP), Vol. 3, No. 2, pp. 146-158.
and 67.5% respectively; an indication that the sentences in our
system’s summary contain more relevant information such that               [14] Osborne, M. (2002). Using maximum entropy for sentence
selected sentences are relevant to the query pos. ed by the user.               extraction, In Proceedings of the ACL-02 Workshop on
In conclusion, this work can be extended to multi-document                      Automatic Summarization, Vol. 4, pp. 1-8. Association for
summarization whereby the source text is more than one; the                     Computational Linguistics.
resulting summary may contain a larger amount of information               [15] Yong, S. P., Abidin, A. I., & Chen, Y. Y. (2005). A Neural
when compared to a single document summarization. Also, it can                  Based Text Summarization System, In Proceedings of the 6th
be extended to other languages apart from English language.                     International Conference of DATA MINING, pp. 45-50
7. REFERENCES                                                              [16] Svore, K. M., Vanderwende, L., & Burges, C. J. (2007).
[1] Wells, M. (2009). Advantages of automatic text                              Enhancing Single-Document Summarization by Combining
    summarization, http://ezinearticles.com/?3-Advantages-of-                   RankNet and Third-Party Sources, In EMNLP-CoNLL, pp
    Automatic-Text-Summarization&id=3465270 downloaded                          448–457.
    on 11 September, 2014.                                                 [17] Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M.,
[2] Luhn, H. P. (1958). The automatic creation of literature                    Hamilton, N., & Hullender, G. (2005). Learning to rank
    abstracts, IBM Journal of research and development, Vol. 2,                 using gradient descent. In Proceedings of the 22nd
    No. 2, pp. 159-165.                                                         international conference on Machine learning (pp. 89-96).
[3] Baxendale, P. (1958). Machine-made index for technical                      ACM.
    literature - an experiment. IBM Journal of Research
    Development, 2(4):354–361                                              [18] Silveira, S. B., and Branco, A. (2014). Sentence Reduction
                                                                                Algorithms to Improve Multi-document


                                                                      57