Mining Knowledge TV: A Proposal for Data Integration in the Knowledge TV Environment José Carlos Almeida Patrício Junior Natasha Correia Queiroz Lino Universidade Federal da Paraíba Universidade Federal da Paraíba João Pessoa – PB - Brasil João Pessoa – PB – Brasil jcapjunior@gmail.com natasha@di.ufpb.br ABSTRACT important kind of information source because it indicates, for This paper presents Mining Knowledge TV, a module for data example, the channels usually watched with start time and total mining that is part of the Knowledge TV (KTV) Project. KTV watching period. The useful content obtained by means of data proposes the specification of a semantic layer that is embedded in mining will be semantically enriched through the use of a Digital TV (DTV) environment, improving the way that content ontologies and then provided as a service to NCL or Java is accessed by other applications. languages application developers. This is possible because Ginga supports the development of applications using both languages on Categories and Subject Descriptors its architecture. More information about the Ginga architecture H.3.4 [Systems and Software]: Information networks can be seen in [8]. General Terms The data mining process acts on all these sources and generates Algorithms, Design, Languages, Standardization. new information that is semantically enriched by means of a domain ontology. This semantic process enables a better analysis Keywords and turns more explicit the meaning of the data mining resultant Data Mining, Digital TV, Digital TV personalisation discovered knowledge. This semantic is provided as a service and creates opportunities, which can be used for NCL or Java 1. INTRODUCTION developers to implement more powerful and sophisticated Interactive Digital TV [1,2,8] is a new stage of TV technology, applications. which intends to support the convergence of digital technologies through a systematic change from analogical to digital equipments 3. ARCHITECTURE DESCRIPTION and infra-structure. This change generates modifications in the 3.1 Investigation of solution for data mining whole productive chain, mainly in the consumption of final content. Brazilian DTV is being characterized as an environment of technological convergence, new and extremely susceptible to In this scenario, this paper aims at presenting the specification of changes. It is not yet completely standardized and it is constantly Mining Knowledge TV- MKTV, which focuses on the integration being updated. In this way, these aspects impose restrictions that of data mining [3] technology with semantic aspects, mostly of we must consider during the architectural modelling. These them derived from the AI Knowledge Representation and evaluated aspects can be highlighted as restrictions: Semantic Web [4,5,6] research. The MKTV is being developed in the context the Brazilian System of Digital TV - SBTVD and is  The small processing capacity of the set-top box; part of the project goal to give the TVDI a semantic layer.  Reduced and unstable space for persistence of Among other aspects, it has the aim of providing a rich information; knowledge base of data descriptions, resources, services, applications and relations amount such elements.  Mechanism for exclusion of applications when changing channels, i.e.; the change of channel will delete all 2. MINING KNOWNLEDGE TV - MKTV application information related to that channel. The main aim of MKTV is the implementation of a KDD environment, which focuses on data mining and semantic All these limitations in the architecture of the STB lead us to use a information on the Knowledge TV platform [7]. This solution will hybrid approach detached from the middleware. That means that provide a priori unknown data to DTV applications that the components of the KTV (and consequently the MKTV) with use the SBTVD Ginga middleware [8] , so that they can use this highest consumption of resources (such as processing power and solution to face issues such as information overload, memory) will be exploring the Ginga middleware return channel personalization, directed merchandizing and so on. [8]. The return channel is the implementation of the htpp protocol on the DTV environment. That means that some components will The mining process will be carried out on the data from many be running on the web and will communicate via the Internet with sources, mainly the sources that come from the Service DTV components. Information (SI) metadata table, which uses the MPEG2 standard in the Ginga DTV environment. This standard is used to represent 3.2 Architecture information about TV programs, services and multimedia The Mining Knowledge TV (MKTV) is the component of the interaction. Examples of such information are channels, program Knowledge TV architecture that accounts for the discovery and schedule, program classification, etc. User behaviour is also an treatment of useful knowledge from the DTV data, users behaviour and other sources such as the Web. These data are in the DTV, together with a list of tools that are compatible to this initially stored in a local relational database and gradually we will new computational and interactive platform. start the process of extraction, transformation and load (ETL) of information. After the ETL process, the data will follow for the We can testify the innovation feature of this proposal if we next module that is the Data Warehouse (DW) [3], a technique consider the few DTV works that focus on the joint use of that is commonly used in conjunction with Data Mining [3]. The knowledge representation and data mining techniques to DW will be organized in departmental Data Marts, in accordance generate a better quality set of data. with the domains and tasks to be mined (e.g. personalization, The next MKTV activities intend to simulate DTV data traffic and marketing, business), concentrating on historical data and integrate content from the data mining process and semantic integrated. modelling sub-layer. As future work, during its validation stage, The historical data will be organised in the DW. Next, the Data MKTV will collaborate with the JCollab Project [16], whose aim Mining module applies data mining algorithms, searching and is to develop a platform to create journalistic content via a social discovering useful patterns and information not known in the network. Another potential future work is the investigation about existent DW. The knowledge extracted through the MKTV will be the integration of MKTV solution to other Digital TV systems. encapsulated in semantic files with more expressive power (OWL files). Ontologies specification in OWL will be the standard for 5. REFERENCES communication between the modules of the KTV. Figure 1 [1] Lekakos, G., Chorianopoulos, K., and Doukidis, G. 2007. illustrates the KTV conceptual architecture and the MKTV Interactive Digital Television: Technologies and pplications. module. IGI Publishing. EUA. [2] Lemos, G., Fernandes, J., and Elias, G. 2004. Introdução à One application scenario is the problem of recommendation and Televisão Digital Interativa: Arquitetura, Protocolos, Padrões personalization of content. To deal with such problems, specific e Práticas. In: JAI Jornada de Atualização em informática. modules, specified on the conceptual architecture, will be Salvador, Bahia, Brazil. instantiated and executed. First the system stores the data that comes from the STB to a database. Then, the information related [3] Han, J., and Kamber, M. 2006. Data Mining Concepts and to user watched programs will be extracted to the Data Mart Techniques. 2a Edição, Editora Elsevier, UK Personalization in the DW. After this process, it will be used [4] Aroyo, L., Conconi, A., Dietze, S., Kaptein, A., Nixon, L., clustering algorithms to find groups with similar preferences. Nufer, C., Palmisano, D., Vignaroli, L., and Yankova, M. Such knowledge discovered will feed and enrich the ontology 2009. NoTube - Making TV a Medium for Personalized specified in the semantic modelling layer and will return the Interaction, EuroITV 2009, Leuven, Belgium. pattern discovered in the form of recommendation to the user. For example, the next available programs similar to the ones the user [5] Yu, H., Dietze, S., and Benn, N. 2010. Semantic TV resources brokering towards future television. In 1st NoTube uses to watch. Depending on the data mining goal, other tasks and algorithms can be applied to discover the desired knowledge. workshop on Future Television, in EuroITV 2010. [6] World Wide Web Consortium. 2009. W3C Semantic Web Activity. (http://www.w3.org/2001/sw/) [7] Lino, N., Araújo, J., Lemos, G., and Siebra, C. 2010. Aspectos Semânticos e Convergência Digital (Web e TV Digital). Proceedings of 2a. Conferência Web W3C Brasil (W3C Web.br 2010), Belo Horizonte, Brasil. [8] Souza Filho, G. L. d.; Leite, L. E. C.; Batista, C. E. C. F.. Ginga-J: The Procedural Middleware for the Brazilian Digital TV System. In: Journal of the Brazilian Computer Society. No. 4, Vol. 13. p.47-56. ISSN: 0104-6500. Porto Alegre, RS, 2007 [9] Mangueira, J., Oliveira, F., Alves, K., Medeiros, A., Lemos, G. 2010. JCollab: Uma Ferramenta para Produção e Distribuição de Telejornais no Contexto da Web 2.0. In XXXVI Conferência Latino-Americana de Informatica – CLEI. Assunção – Paraguai Figure 1. KTV Architecture and MKTV module 4. CONCLUSIONS AND DIRECTIONS This paper describes our initial works on the Mining Knowledge TV(MKTV), which is part of the KTV project. The major aim of this project is to provide semantic knowledge to be used for other DTV applications. At the moment, the MKTV is in a development stage, so that we have carried out a survey of the state of the art in data mining for DTV. In addition, we have also identified the main data mining methods and algorithms that are currently used