<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Methods and Techniques for Information Extraction by Text Segmentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Altigran Soares da Silva</string-name>
          <email>alti@icomp.ufam.edu.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eli Cortez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto de Computação Universidade Federal do Amazonas Manaus</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The growing use of text files for information exchange, such as HTML pages, XML documents, e-mail, blogs posts, tweets, RSS and SMS messages, has brought many problems related to how properly exploit the information implicitly contained therein. In particular, problems related to Information Extraction from such sources have motivated many studies in various scientific communities in areas such as Databases, Data Mining, Information Retrieval and Artificial Intelligence. In this tutorial, we present recent methods and techniques in the literature to address the problem of Information Extraction by Text Segmentation (IETS), which consists in extracting values of interest organized into semi-structured records (e.g., postal addresses, bibliographic citations, classified ads, etc.), implicitly present in textual sources. We will discuss the most recent major approaches proposed in the literature, with particular emphasis on probabilistic methods.</p>
      </abstract>
    </article-meta>
  </front>
  <body />
  <back>
    <ref-list />
  </back>
</article>