Overview of the PENG project Gabriella Pasi and Gloria Bordogna the vagueness and uncertainty in both the user-system Abstract— In this paper a synthetic overview of the main aims, interaction, and adaptive in the learning of users' changing characteristics and innovations of the PENG project is presented. preferences over time. In particular the PENG prototypal PENG . “PErsonalised News content programminG” is a Specific system supports both the filtering of news based on a new TARGETED RESEARCH PROJECT (IST-004597) funded within the multi-criteria decision approach, and the cataloguing of news Sixth Program Framework of the European Research Area. into overlapping topics, These characteristics are not met by the current systems commonly used by journalists to carry out their habitual tasks, and their replacement with the PENG Index Terms— Information Filtering, Distributed Information Retrieval, Personlized User Profiles system will greatly innovate the current way of making news. The PENG project was mainly addressed to news professionals, such as journalists and editors, with the view of I. INTRODUCTION extending the use of the defined system to more general users in the future. In this context, with the term news we refer to T HE main objective of the PENG project was to define an innovative technological solution to the personalised multimedia news access, composition and presentation, with any kind of news, including information regarding leisure and entertainment. This PENG prototype is conceived as a personal assistant, an emphasis on personalised filtering, retrieval and supporting journalists in all stages of the news lifecycle. composition of multimedia news. Indeed, the proposed system Information (text, images, and videos) is gathered from aims to collect news from both news-feeds and specialised different sources (including the Web) using a combination of archives in a personalised way, so as to provide media push and pull technologies and is presented to the user in a professionals with a fully customisable environment. personalised way. This is performed by pushing personalised news towards the user, and by allowing her or him to expand a selected topic II. PENG SYSTEM’S ARCHITECTURE by searching for additional information or editing the final news through a personalized presentation approach. This The main functionalities of the PENG system are separated involves the integration of personalised filtering and into three main phases: a push phase, a pull phase and a distributed information retrieval, producing information that presentation phase. constitute a first but very important aid to a journalist's writing In the push phase a filtering system was developed, by activity (the initial target user for PENG). The target users of means of which a first selection of news are selected from PENG are classified according to a bi-dimensional schema newswires and other news archives. This filtering is based on a defined in terms of their level of interest in the news, and their dynamic user profile including the personal user’s trust in the topical interests. Possible user targets include information- information sources. intensive workers, students of communication faculties, In the pull phase a user query and the user profile are used journalists with specific topical interests etc. to retrieve further and more specific information from both the An important characteristic that is ensured by the system is same sources used in the pull phase, and also from additional the flexibility in modelling the user's topical interests and sources automatically selected in relation to the content of the context. This means modelling the capability to be tolerant to query and the user profile. A distributed information retrieval approach is used, where the query can be automatically generated from user feedback on the information presented by This work was supported by the E.C. through the Specific Targeted Research Project named PENG (Personalized NEws content programminG). the push phase. This project (IST-004597) funded within the Sixth Framework Programme, The presentation phase uses multi-document and multi- Priority 2, Information Society Technology, Thematic Priority: Cross Media media visualisation to present the results from the push and Content for Leisure and Entertainment. pull phases to the user. This takes into account the trust a user Gabriella Pasi is with the Università degli Studi di Milano Bicocca (DISCO), Via Bicocca degli Arcimboldi 8, 20126 Milano, Italy, , places in the information sources. The results visualized by the (corresponding author phone: +390264487847; e-mail: pasi@ system can be personalised not only to the user information disco.unimib.it). need but also to visualisation preferences and the subjective Gloria Bordogna is with the National Council of Research (IDPA-CNR), interpretation of the users trust in sources of information. Via Pasubio 5, 24044 Dalmine, Italy, (e-mail: gloria.bordogna@idpa.cnr.it). Figure 1, below, presents a high level view of the PENG architecture, with the modules corresponding to the three main multimedia and time-dependent content of the news so as to phases highlighted in gray. focus on a particular event. This enables the tuning of a personalised gathering and presenting of news that expresses an individuals view and opinion on an event, a condition for PENG journalism that has become predominant in recent years and a Information Presentation system very important condition for a personalised presentation of news to the general user. In fact, while this can greatly reduce the time needed for a journalist to consult the distinct sources and to report on a given topic of interest, it also enables the Database and presentation of news tailored to a specific users interests. communication layer The automatic classification of the news into thematic User clusters represented by sets of keywords can be coupled by profile database successively using PENG to yield personalised presentations Information Information Filtering Retrieval of up to date topics. This can help in drafting a personalised multimedia newspaper and can thus be a powerful tool for the editorial staff of a journal. rd 3 party REFERENCES Information [1] M. Agosti, F. Crestani and G. Pasi eds., "Lectures on sources Information Retrieval", Springer-Verlag, 2001. Figure 1: a sketch of the PENG system architecture [2] G. Bordogna, G.Pasi, R. Yager, “Soft Approaches to information Retrieval on the WEB”, Int. Journal of The user accesses the system locally through an interface Approximate Reasoning, 34, 105-120, (2003). provided by the presentation module. Each of the three main [3] G. Bordogna, G.Pasi, “Personalised indexing and retrieval of modules communicate via an intermediary layer that also heterogeneous structured documents”, Information Retrieval manages access to the common databases required by the Journal, in press (2004). system, the most important of which is the user profile [4] M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. netes, database (the other databases are not shown for clarity). This M. Sartin, “Combining content-based and collaborative filters in database and communication layer is composed of a user an online newspaper”, ACM sigir workshop on recommemder systems Aug. 19, Berkeley. (1999). profile manager (which manages the user profile database) and [5] F. Crestani and G. Pasi, editors. “Soft Computing in Information a common database manager (which manages other common Retrieval: Techniques and Applications.” Physica-Verlag databases and coordinates the communication between the (Springer-Verlag), Heidelberg, Germany, 2000. modules). In the PENG system, a user profile contains the data [6] F. Kilander, “A brief comparison of News filtering Software”, relating to a single user: personal information (such as name, http://www.glue.umd.edu/enee/medlab/filter/filter.html. email etc), information preferences (what information is [7] D. Moraru, L. Besacier, P. Mulhem and G. Quénot, "CLIPS- relevant to the user, from where), presentation preferences IMAG at TREC-11 : Experiments in Video Retrieval", 11th Text (how this information is to be displayed) and interaction Retrieval Conference, Gaithersburg, MD, USA, 19-22 history (the history of the user's interaction with the PENG November, 2002. [8] P. Mulhem, J. Gensel and H. Martin, “Adaptive Video system). Since a user may be interested in numerous different Summarization”, in Handbook on Video Databases, CRC Press, subjects, the information and presentation preferences are to appear, 2003 split into a set of different user interests. Each interest is [9] G. Pasi, “Modelling users’ preferences in systems for personal to the user to which it belongs, and plays an information access”, International Journal of Intelligent important part in the filtering module (which is intended to not Systems, 18, 793-808, (2003). only filter documents to users, but to the adjust user interest) [10] G. Amato and U. Straccia (1999) “User Profile Modeling and and information retrieval modules (as providing a context in Applications to Digital Libraries”, 3rd European Conference on which a search can take place). Digital Libraries, ECDL99, Paris, France, September 22-24, Importantly, the profile stores the degree to which a user LNCS 1696. trusts different information sources ('trust scores'), information hypothesised as being important in news gathering and filtering. Trust scores are conceived as indications of the potential reliability of the information sources to a specific user (or category of users) with respect to a given topical area. PENG has the potential to greatly contribute to the continuing development of filtering and retrieval systems, for the benefit of the journalists, and ultimately for all users of news services. Professionals, such as journalists or editors, can tune the contribution of the distinct sources to their information gathering, filtering and editing tasks. This is achieved by specifying queries expressing constraints on the