=Paper= {{Paper |id=Vol-233/paper-30 |storemode=property |title=Overview of the PENG project |pdfUrl=https://ceur-ws.org/Vol-233/p61.pdf |volume=Vol-233 |dblpUrl=https://dblp.org/rec/conf/samt/PasiB06 }} ==Overview of the PENG project== https://ceur-ws.org/Vol-233/p61.pdf
                                Overview of the PENG project

                                                     Gabriella Pasi and Gloria Bordogna


                                                                              the vagueness and uncertainty in both the user-system
   Abstract— In this paper a synthetic overview of the main aims,             interaction, and adaptive in the learning of users' changing
characteristics and innovations of the PENG project is presented.             preferences over time. In particular the PENG prototypal
PENG . “PErsonalised News content programminG” is a Specific                  system supports both the filtering of news based on a new
TARGETED RESEARCH PROJECT (IST-004597) funded within the                      multi-criteria decision approach, and the cataloguing of news
Sixth Program Framework of the European Research Area.
                                                                              into overlapping topics, These characteristics are not met by
                                                                              the current systems commonly used by journalists to carry out
                                                                              their habitual tasks, and their replacement with the PENG
  Index Terms— Information Filtering, Distributed Information
Retrieval, Personlized User Profiles                                          system will greatly innovate the current way of making news.
                                                                              The PENG project was mainly addressed to news
                                                                              professionals, such as journalists and editors, with the view of
                          I. INTRODUCTION                                     extending the use of the defined system to more general users
                                                                              in the future. In this context, with the term news we refer to
T    HE main objective of the PENG project was to define an
     innovative technological solution to the personalised
multimedia news access, composition and presentation, with
                                                                              any kind of news, including information regarding leisure and
                                                                              entertainment.
                                                                              This PENG prototype is conceived as a personal assistant,
an emphasis on personalised filtering, retrieval and                          supporting journalists in all stages of the news lifecycle.
composition of multimedia news. Indeed, the proposed system                   Information (text, images, and videos) is gathered from
aims to collect news from both news-feeds and specialised                     different sources (including the Web) using a combination of
archives in a personalised way, so as to provide media                        push and pull technologies and is presented to the user in a
professionals with a fully customisable environment.                          personalised way.
   This is performed by pushing personalised news towards
the user, and by allowing her or him to expand a selected topic                             II. PENG SYSTEM’S ARCHITECTURE
by searching for additional information or editing the final
news through a personalized presentation approach. This                          The main functionalities of the PENG system are separated
involves the integration of personalised filtering and                        into three main phases: a push phase, a pull phase and a
distributed information retrieval, producing information that                 presentation phase.
constitute a first but very important aid to a journalist's writing              In the push phase a filtering system was developed, by
activity (the initial target user for PENG). The target users of              means of which a first selection of news are selected from
PENG are classified according to a bi-dimensional schema                      newswires and other news archives. This filtering is based on a
defined in terms of their level of interest in the news, and their            dynamic user profile including the personal user’s trust in the
topical interests. Possible user targets include information-                 information sources.
intensive workers, students of communication faculties,                          In the pull phase a user query and the user profile are used
journalists with specific topical interests etc.                              to retrieve further and more specific information from both the
   An important characteristic that is ensured by the system is               same sources used in the pull phase, and also from additional
the flexibility in modelling the user's topical interests and                 sources automatically selected in relation to the content of the
context. This means modelling the capability to be tolerant to                query and the user profile. A distributed information retrieval
                                                                              approach is used, where the query can be automatically
                                                                              generated from user feedback on the information presented by
   This work was supported by the E.C. through the Specific Targeted
Research Project named PENG (Personalized NEws content programminG).          the push phase.
This project (IST-004597) funded within the Sixth Framework Programme,           The presentation phase uses multi-document and multi-
Priority 2, Information Society Technology, Thematic Priority: Cross Media    media visualisation to present the results from the push and
Content for Leisure and Entertainment.                                        pull phases to the user. This takes into account the trust a user
   Gabriella Pasi is with the Università degli Studi di Milano Bicocca
(DISCO), Via Bicocca degli Arcimboldi 8, 20126 Milano, Italy, ,
                                                                              places in the information sources. The results visualized by the
(corresponding     author    phone:    +390264487847;       e-mail:  pasi@    system can be personalised not only to the user information
disco.unimib.it).                                                             need but also to visualisation preferences and the subjective
   Gloria Bordogna is with the National Council of Research (IDPA-CNR),       interpretation of the users trust in sources of information.
Via Pasubio 5, 24044 Dalmine, Italy, (e-mail: gloria.bordogna@idpa.cnr.it).
                                                                                 Figure 1, below, presents a high level view of the PENG
architecture, with the modules corresponding to the three main       multimedia and time-dependent content of the news so as to
phases highlighted in gray.                                          focus on a particular event. This enables the tuning of a
                                                                     personalised gathering and presenting of news that expresses
                                                                     an individuals view and opinion on an event, a condition for
                                                    PENG             journalism that has become predominant in recent years and a
                    Information
                    Presentation
                                                    system           very important condition for a personalised presentation of
                                                                     news to the general user. In fact, while this can greatly reduce
                                                                     the time needed for a journalist to consult the distinct sources
                                                                     and to report on a given topic of interest, it also enables the
                  Database and
                                                                     presentation of news tailored to a specific users interests.
                communication layer
                                                                        The automatic classification of the news into thematic
                                                  User
                                                                     clusters represented by sets of keywords can be coupled by
                                                 profile
                                                database             successively using PENG to yield personalised presentations
          Information         Information
            Filtering          Retrieval                             of up to date topics. This can help in drafting a personalised
                                                                     multimedia newspaper and can thus be a powerful tool for the
                                                                     editorial staff of a journal.
                                               rd
                                              3 party                                           REFERENCES
                                            Information              [1] M. Agosti, F. Crestani and G. Pasi eds., "Lectures on
                                              sources                     Information Retrieval", Springer-Verlag, 2001.
    Figure 1: a sketch of the PENG system architecture
                                                                     [2] G. Bordogna, G.Pasi, R. Yager, “Soft Approaches to
                                                                          information Retrieval on the WEB”, Int. Journal of
The user accesses the system locally through an interface                 Approximate Reasoning, 34, 105-120, (2003).
provided by the presentation module. Each of the three main          [3] G. Bordogna, G.Pasi, “Personalised indexing and retrieval of
modules communicate via an intermediary layer that also                   heterogeneous structured documents”, Information Retrieval
manages access to the common databases required by the                    Journal, in press (2004).
system, the most important of which is the user profile              [4] M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. netes,
database (the other databases are not shown for clarity). This            M. Sartin, “Combining content-based and collaborative filters in
database and communication layer is composed of a user                    an online newspaper”, ACM sigir workshop on recommemder
                                                                          systems Aug. 19, Berkeley. (1999).
profile manager (which manages the user profile database) and
                                                                     [5] F. Crestani and G. Pasi, editors. “Soft Computing in Information
a common database manager (which manages other common                     Retrieval: Techniques and Applications.” Physica-Verlag
databases and coordinates the communication between the                   (Springer-Verlag), Heidelberg, Germany, 2000.
modules). In the PENG system, a user profile contains the data       [6] F. Kilander, “A brief comparison of News filtering Software”,
relating to a single user: personal information (such as name,            http://www.glue.umd.edu/enee/medlab/filter/filter.html.
email etc), information preferences (what information is             [7] D. Moraru, L. Besacier, P. Mulhem and G. Quénot, "CLIPS-
relevant to the user, from where), presentation preferences               IMAG at TREC-11 : Experiments in Video Retrieval", 11th Text
(how this information is to be displayed) and interaction                 Retrieval Conference, Gaithersburg, MD, USA, 19-22
history (the history of the user's interaction with the PENG              November, 2002.
                                                                     [8] P. Mulhem, J. Gensel and H. Martin, “Adaptive Video
system). Since a user may be interested in numerous different
                                                                          Summarization”, in Handbook on Video Databases, CRC Press,
subjects, the information and presentation preferences are                to appear, 2003
split into a set of different user interests. Each interest is       [9] G. Pasi, “Modelling users’ preferences in systems for
personal to the user to which it belongs, and plays an                    information access”, International Journal of Intelligent
important part in the filtering module (which is intended to not          Systems, 18, 793-808, (2003).
only filter documents to users, but to the adjust user interest)     [10] G. Amato and U. Straccia (1999) “User Profile Modeling and
and information retrieval modules (as providing a context in              Applications to Digital Libraries”, 3rd European Conference on
which a search can take place).                                           Digital Libraries, ECDL99, Paris, France, September 22-24,
    Importantly, the profile stores the degree to which a user            LNCS 1696.
trusts different information sources ('trust scores'), information
hypothesised as being important in news gathering and
filtering. Trust scores are conceived as indications of the
potential reliability of the information sources to a specific
user (or category of users) with respect to a given topical area.
    PENG has the potential to greatly contribute to the
continuing development of filtering and retrieval systems, for
the benefit of the journalists, and ultimately for all users of
news services. Professionals, such as journalists or editors, can
tune the contribution of the distinct sources to their
information gathering, filtering and editing tasks. This is
achieved by specifying queries expressing constraints on the