<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining Knowledge TV: A Proposal for Data Integration in the Knowledge TV Environment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>José Carlos Almeida Patrício Junior</string-name>
          <email>jcapjunior@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natasha Correia Queiroz Lino</string-name>
          <email>natasha@di.ufpb.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidade Federal da Paraíba</institution>
          ,
          <addr-line>João Pessoa - PB -</addr-line>
          <country country="BR">Brasil</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidade Federal da Paraíba</institution>
          ,
          <addr-line>João Pessoa - PB -</addr-line>
          <country country="BR">Brasil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents Mining Knowledge TV, a module for data mining that is part of the Knowledge TV (KTV) Project. KTV proposes the specification of a semantic layer that is embedded in a Digital TV (DTV) environment, improving the way that content is accessed by other applications.</p>
      </abstract>
      <kwd-group>
        <kwd>Data Mining</kwd>
        <kwd>Digital TV</kwd>
        <kwd>Digital TV personalisation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Interactive Digital TV [
        <xref ref-type="bibr" rid="ref1 ref2 ref8">1,2,8</xref>
        ] is a new stage of TV technology,
which intends to support the convergence of digital technologies
through a systematic change from analogical to digital equipments
and infra-structure. This change generates modifications in the
whole productive chain, mainly in the consumption of final
content.
      </p>
      <p>
        In this scenario, this paper aims at presenting the specification of
Mining Knowledge TV- MKTV, which focuses on the integration
of data mining [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] technology with semantic aspects, mostly of
them derived from the AI Knowledge Representation and
Semantic Web [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4,5,6</xref>
        ] research. The MKTV is being developed in
the context the Brazilian System of Digital TV - SBTVD and is
part of the project goal to give the TVDI a semantic layer.
Among other aspects, it has the aim of providing a rich
knowledge base of data descriptions, resources, services,
applications and relations amount such elements.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. MINING KNOWNLEDGE TV - MKTV</title>
      <p>
        The main aim of MKTV is the implementation of a KDD
environment, which focuses on data mining and semantic
information on the Knowledge TV platform [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This solution will
provide a priori unknown data to DTV applications that
use the SBTVD Ginga middleware [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] , so that they can use this
solution to face issues such as information overload,
personalization, directed merchandizing and so on.
      </p>
      <p>
        The mining process will be carried out on the data from many
sources, mainly the sources that come from the Service
Information (SI) metadata table, which uses the MPEG2 standard
in the Ginga DTV environment. This standard is used to represent
information about TV programs, services and multimedia
interaction. Examples of such information are channels, program
schedule, program classification, etc. User behaviour is also an



important kind of information source because it indicates, for
example, the channels usually watched with start time and total
watching period. The useful content obtained by means of data
mining will be semantically enriched through the use of
ontologies and then provided as a service to NCL or Java
languages application developers. This is possible because Ginga
supports the development of applications using both languages on
its architecture. More information about the Ginga architecture
can be seen in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>The data mining process acts on all these sources and generates
new information that is semantically enriched by means of a
domain ontology. This semantic process enables a better analysis
and turns more explicit the meaning of the data mining resultant
discovered knowledge. This semantic is provided as a service and
creates opportunities, which can be used for NCL or Java
developers to implement more powerful and sophisticated
applications.</p>
    </sec>
    <sec id="sec-3">
      <title>3. ARCHITECTURE DESCRIPTION</title>
    </sec>
    <sec id="sec-4">
      <title>3.1 Investigation of solution for data mining</title>
      <p>Brazilian DTV is being characterized as an environment of
technological convergence, new and extremely susceptible to
changes. It is not yet completely standardized and it is constantly
being updated. In this way, these aspects impose restrictions that
we must consider during the architectural modelling. These
evaluated aspects can be highlighted as restrictions:</p>
      <sec id="sec-4-1">
        <title>The small processing capacity of the set-top box;</title>
      </sec>
      <sec id="sec-4-2">
        <title>Reduced and information; unstable space for persistence of</title>
      </sec>
      <sec id="sec-4-3">
        <title>Mechanism for exclusion of applications when changing</title>
        <p>channels, i.e.; the change of channel will delete all
application information related to that channel.</p>
        <p>
          All these limitations in the architecture of the STB lead us to use a
hybrid approach detached from the middleware. That means that
the components of the KTV (and consequently the MKTV) with
highest consumption of resources (such as processing power and
memory) will be exploring the Ginga middleware return channel
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The return channel is the implementation of the htpp protocol
on the DTV environment. That means that some components will
be running on the web and will communicate via the Internet with
DTV components.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3.2 Architecture</title>
      <p>
        The Mining Knowledge TV (MKTV) is the component of the
Knowledge TV architecture that accounts for the discovery and
treatment of useful knowledge from the DTV data, users
behaviour and other sources such as the Web. These data are
initially stored in a local relational database and gradually we will
start the process of extraction, transformation and load (ETL) of
information. After the ETL process, the data will follow for the
next module that is the Data Warehouse (DW) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a technique
that is commonly used in conjunction with Data Mining [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
DW will be organized in departmental Data Marts, in accordance
with the domains and tasks to be mined (e.g. personalization,
marketing, business), concentrating on historical data and
integrated.
      </p>
      <p>The historical data will be organised in the DW. Next, the Data
Mining module applies data mining algorithms, searching and
discovering useful patterns and information not known in the
existent DW. The knowledge extracted through the MKTV will be
encapsulated in semantic files with more expressive power (OWL
files). Ontologies specification in OWL will be the standard for
communication between the modules of the KTV. Figure 1
illustrates the KTV conceptual architecture and the MKTV
module.</p>
      <p>One application scenario is the problem of recommendation and
personalization of content. To deal with such problems, specific
modules, specified on the conceptual architecture, will be
instantiated and executed. First the system stores the data that
comes from the STB to a database. Then, the information related
to user watched programs will be extracted to the Data Mart
Personalization in the DW. After this process, it will be used
clustering algorithms to find groups with similar preferences.
Such knowledge discovered will feed and enrich the ontology
specified in the semantic modelling layer and will return the
pattern discovered in the form of recommendation to the user. For
example, the next available programs similar to the ones the user
uses to watch. Depending on the data mining goal, other tasks and
algorithms can be applied to discover the desired knowledge.</p>
    </sec>
    <sec id="sec-6">
      <title>4. CONCLUSIONS AND DIRECTIONS</title>
      <p>This paper describes our initial works on the Mining Knowledge
TV(MKTV), which is part of the KTV project. The major aim of
this project is to provide semantic knowledge to be used for other
DTV applications. At the moment, the MKTV is in a development
stage, so that we have carried out a survey of the state of the art in
data mining for DTV. In addition, we have also identified the
main data mining methods and algorithms that are currently used
in the DTV, together with a list of tools that are compatible to this
new computational and interactive platform.</p>
      <p>We can testify the innovation feature of this proposal if we
consider the few DTV works that focus on the joint use of
knowledge representation and data mining techniques to
generate a better quality set of data.</p>
      <p>The next MKTV activities intend to simulate DTV data traffic and
integrate content from the data mining process and semantic
modelling sub-layer. As future work, during its validation stage,
MKTV will collaborate with the JCollab Project [16], whose aim
is to develop a platform to create journalistic content via a social
network. Another potential future work is the investigation about
the integration of MKTV solution to other Digital TV systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Lekakos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chorianopoulos</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Doukidis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Interactive Digital Television: Technologies and pplications</article-title>
          .
          <source>IGI Publishing. EUA.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Lemos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandes</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Elias</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>Introdução à Televisão Digital Interativa: Arquitetura, Protocolos, Padrões e Práticas</article-title>
          . In: JAI Jornada de
          <article-title>Atualização em informática</article-title>
          . Salvador, Bahia, Brazil.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          ., and
          <string-name>
            <surname>Kamber</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>Data Mining Concepts and Techniques</article-title>
          .
          <source>2a Edição</source>
          , Editora Elsevier, UK
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Aroyo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conconi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dietze</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaptein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nixon</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nufer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmisano</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vignaroli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Yankova</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>NoTube - Making TV a Medium for Personalized Interaction</article-title>
          ,
          <source>EuroITV</source>
          <year>2009</year>
          , Leuven, Belgium.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dietze</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Benn</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Semantic TV resources brokering towards future television</article-title>
          .
          <source>In 1st NoTube workshop on Future Television, in EuroITV</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>World</given-names>
            <surname>Wide Web Consortium</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>W3C Semantic Web Activity</article-title>
          . (http://www.w3.org/2001/sw/)
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Lino</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Araújo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Siebra</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Aspectos Semânticos e Convergência Digital (Web e TV Digital)</article-title>
          .
          <source>Proceedings of 2a. Conferência Web W3C Brasil (W3C Web.br</source>
          <year>2010</year>
          ), Belo Horizonte, Brasil.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Souza</given-names>
            <surname>Filho</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. L.</surname>
          </string-name>
          <year>d</year>
          .;
          <string-name>
            <surname>Leite</surname>
            ,
            <given-names>L. E. C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Batista</surname>
            ,
            <given-names>C. E. C. F.. Ginga-J</given-names>
          </string-name>
          :
          <article-title>The Procedural Middleware for the Brazilian Digital TV System</article-title>
          .
          <source>In: Journal of the Brazilian Computer Society. No. 4</source>
          , Vol.
          <volume>13</volume>
          . p.
          <fpage>47</fpage>
          -
          <lpage>56</lpage>
          . ISSN:
          <fpage>0104</fpage>
          -
          <lpage>6500</lpage>
          . Porto Alegre,
          <string-name>
            <surname>RS</surname>
          </string-name>
          ,
          <year>2007</year>
          [9]
          <string-name>
            <surname>Mangueira</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medeiros</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>JCollab: Uma Ferramenta para Produção e Distribuição de Telejornais no Contexto da Web 2.0</article-title>
          . In XXXVI Conferência
          <string-name>
            <surname>Latino-Americana de Informatica - CLEI.</surname>
          </string-name>
          Assunção - Paraguai
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>