<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Demo: Text Titling Application</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cédric Lopez</string-name>
          <email>lopez@lirmm.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Violaine Prince</string-name>
          <email>prince@lirmm.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathieu Roche</string-name>
          <email>mroche@lirmm.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LIRMM</institution>
          ,
          <addr-line>Univ. Montpellier 2</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper deals with an application allowing the automatic titling of texts. This one consists of four stages: Corpus acquisition, candidate sentence determination for the titling, extraction of noun phrases among the candidate sentences, and nally the choice of the title.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>In this paper, we present an application dealing with an
automatic approach providing a title to a document, which
meets the di erent characteristics of human issued titles. So,
when a title is absent, for instance in emails without objects
or to determine the le title for saving, the described method
enables the user to save time by informing him/her about
the content in a single glance. In addition, it is designed
to meet at least one of the criteria of the standard W3C.
Indeed, titling web pages is one of key elds of the web page
accessibility, such as de ned by associations for the disabled.
The goal is to enhance the page readability. Moreover, a
relevant title is an important issue for the webmaster improving
the indexation of web pages. Let us note that titling is not
a task to be confused with automatic summarization, text
compression, and indexation, although it has several
common points with them. This will be detailed in the 'related
work' section.</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        While a lot of applications are borned in the NLP domain,
it seems that no application was realized to title
automatically textual documents. As for articles, it was noticed that
elements appearing in the title are often present in the body
of the text [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Recent work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] supports this idea and
shows that the covering rate of those words present in titles,
is very high in the rst sentences of a text.
      </p>
      <p>Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.</p>
      <p>Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.
3.
3.1</p>
      <p>
        A title is not exactly the smallest possible abstract. While
a summary, the most condensed form of a text, has to give an
outline of the text contents that respects the text structure,
a title indicates the treated subject in the text without
revealing all the content. Text compression could be
interesting for titling if a strong compression could be undertaken,
resulting in a single relevant word group. Compression texts
methods (e.g. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) could be used to choose a word group
obeying to titles constraints. However, one has to largely
prune compression results to select the relevant group [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>A title is not an index : A title does not necessarily contain
key words (and indexes are key words), and might present a
partial or total reformulation of the text (what an index is
not).</p>
      <p>Finally, a title is thus a full entity, has own functions, and
titling has to be sharply distinguished from summarizing
and indexing.</p>
    </sec>
    <sec id="sec-3">
      <title>THE AUTOMATIC TITLING APPLICA</title>
    </sec>
    <sec id="sec-4">
      <title>TION</title>
    </sec>
    <sec id="sec-5">
      <title>The process</title>
      <p>The global process in order to automatically title a
document is composed of the following steps:</p>
      <p>
        Step 0: Corpus Acquisition: Determining the
characteristics of the texts to be titled;
Step 1: Candidate Sentences Determination. We
assume that any text contains at least a few sentences
that would provide the relevant sentence for titling. In
this article, we suppose that the terms used in the title
can be located in the rst sentences of the text [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Step 2: Extracting Candidate Noun Phrases (NP) for
Titling. This step process consists in selecting among
this list of NP, the most relevant one. A rst
preselection allows to keep the longest NP, similarly to
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], with lengths equivalent to Lmax and Lmax 1
where Lmax is the longest local candidate. This
technique prevents from pruning interesting candidates too
quickly. These candidates are called N Pmax.
      </p>
      <p>
        Step 3: Selecting a Title by the ChTitres Approach
based on the use of TF-IDF measure [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This one
enables to evaluate how important a word is to a text
or corpus. The word importance increases
proportionally to the number of times a word appears in the
document (TF) but is o set by the frequency of the
word in the corpus (IDF). We shall use the TF-IDF
measure to calculate the score of every N Pmax. This
score can be the maximal TF-IDF obtained for a word
of the NP (T M AX) either the sum of the TF-IDF of
every words of the NP (T SU M ). If a Named Entity
is located among the N Pmax, then our approach
favors selecting of this NP as a title. Other methods are
detailed in the online application (see Fig. 1).
Subtitles are determinated with the same methods as
titles. Note that the di erence is based on calculation
of the TF-IDF measure: The IDF does not compute
the frequency of the word in the di erent documents
of the corpus but the measure depends of the word
frequency in the segments of the document. So, this
method can be seen as a local titling process.
3.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>The application</title>
      <p>The application is available in English and French:
http://www.lirmm.fr/ lopez/. This online application,
developped with HTML, PHP, and MySQL has an user-friendly
interface. On the screen shot (see Fig. 1), all parts of the
interface are annotated with a letter.</p>
      <p>A: It enables to choose which method the user can
apply in order to title a given document (TMAX, TSUM,
and other methods). By clicking the name of the
method, the user nds the explanations about it.</p>
      <p>B: The interface enables to choose which method will
be apply for subtitling.</p>
      <p>C: The text area has to receive the text to title. The
text block must be separated by 'carriage returns' in
order to be subtitled.</p>
      <p>D: Some application examples (specialized texts) are
proposed in a list.</p>
      <p>E: Link allowing to pass in French mode.</p>
      <p>The result page returns all the titles (and subtitles)
according to the chosen methods on the interface page.</p>
      <p>Ten experts have evaluated our methods (240 titles have
been evaluated with 3 corpora). For every text, six titles
were suggested1 among all the titles determined according
to the methods TMAX and TSUM , as well as the real title
T R. Three other titles (A1; A2; A3) are exposed in a random
way from the list of noun phrases extracted among those that
were rejected by the process. Comparing the evaluation of
1Identical titles obtained with di erent approaches are not
given.
rejected NP with selected ones will allow, in particular, the
estimation of the selection process accuracy.</p>
      <p>For every 'candidate' title, the expert has to apreciate its
relevance to the document contents with the following scale:
Very relevant (C1), Relevant (C2), I don't know (C3), not
very relevant (C4), not relevant at all (C5). For each of
these Cn judgements, a digital value is assigned: 2 for
C5, 1 for C4, 0 for C3, +1 for C2 and+2 for C1. The nal
note obtained for a title is the mean value of the experts
given grades (see Fig. 2). So, the higher the value, the more
accurate the NP, as a title.</p>
      <p>Based on this expert evaluation, the experiments show
that e-mails titles returned by our tool are relevant (0&lt;score&lt;1,
see Table 1). In particular, evaluation results show that
automatic e-mail titles (0.61) are more relevant than real titles
(0.57) (with TMAX ).</p>
      <p>CORPORA TR
E-mails 0.57
Fora 1.15
Mailing lists 1.8</p>
      <p>TMAX
0.61
0.42
0.81</p>
      <p>TSUM
0.52
0.58
0.43
4.</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION</title>
      <p>The experts have con rmed that the titles built by our
automatic titling tool are relevant. It is a possible bene t of
an automatic method that might build a more relevant title
than a 'real' one, and is a time saving procedure for a heavy
e-mails writer.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Belhaoues</surname>
          </string-name>
          . Titrage automatique de pages web.
          <source>Master Thesis</source>
          , University Montpellier II, France,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bourigault</surname>
          </string-name>
          . Lexter, un logiciel d'extraction de terminologie.
          <article-title>Application a l'acquisition des connaissances a partir de textes</article-title>
          .
          <source>PhD thesis</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Buckley</surname>
          </string-name>
          .
          <article-title>Term-weighting approaches in automatic text retrieval</article-title>
          .
          <source>Information Processing and Management 24, page 513 a 523</source>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Teufel</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Moens</surname>
          </string-name>
          .
          <article-title>Sentence extraction and rhetorical classi cation for exible abstracts</article-title>
          .
          <source>In AAAI Spring Symposium on Intelligent Text Summarisation</source>
          , pages
          <volume>16</volume>
          {
          <fpage>25</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yous -Monod</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Prince</surname>
          </string-name>
          .
          <article-title>Sentence compression as a step in summarization or an alternative path in text shortening</article-title>
          .
          <source>In Coling'08</source>
          , UK., pages
          <volume>139</volume>
          {
          <fpage>142</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zajic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Door</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Schwarz</surname>
          </string-name>
          .
          <article-title>Automatic headline generation for newspaper stories</article-title>
          . Workshop on Text Summarization (
          <article-title>ACL 2002</article-title>
          and
          <article-title>DUC 2002 meeting on Text Summarization)</article-title>
          . Philadelphia.,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>