Semantic Content Processing in Web Portals

                                  Felicitas Löffler∗ , Bahar Sateli† , Birgitta König-Ries∗ , René Witte†
                               ∗ Institute for Computer Science, Friedrich-Schiller University of Jena, Germany
               † Department of Computer Science and Software Engineering, Concordia University, Montréal, Canada


   Abstract—Web portals provide a standardized way of inte-              named entity recognition in a given article, but the framework
grating multiple information sources and applications in a single        is not limited to a single domain: A clear separation of concerns
web interface. However, they currently do not provide semantic           allows a language engineer to make new NLP services available
support for users that need to navigate the often overwhelming           without requiring knowledge in portal technology, and a web
amount of content. We demonstrate our open source portal                 engineer can easily design a new web portal that incorporates
architecture “hanüwa” that integrates text mining web services,
based on the Semantic Assistants framework, with the Liferay
                                                                         language technology.
portal server.                                                                                                                     II.                 A RCHITECTURE

                          I.    I NTRODUCTION                                Our novel Semantic Assistants-portal integration architec-
                                                                         ture, illustrated in Fig. 1, is designed to allow various portlets
    Web portals are a specific kind of web-based systems                 to benefit from NLP techniques on their content. The core
that provide for an integration of diverse information sources           idea is to enable generic portlets to communicate with the
and applications. Deployed for a concrete scenario in an                 Semantic Assistants portlet, specifically designed to connect to
organization, they typically address the information needs of a          the back-end Semantic Assistants server and provide inquiry
wide range of users and their tasks through both internal and            and invoking capability of NLP pipelines to portal users.
external services.
                                                                                                                                                    Portal                              User
    While a web portal provides convenient access to infor-
mation, there is no standardized way that allows to further                                                                          Database


process the available content in order to support users in their                                                                Portlet Controller
tasks. There is also a lack of appropriate technologies for


                                                                                                                                                                          Web Server
document filtering within a web portal. We envision a new                                                          ...           Other
                                                                                                                                 Portlet
                                                                                                                                                       Content
                                                                                                                                                       Portlet
generation of web portals that can provide context-sensitive                                                                                                                           (Embedded)


support through semantic analysis services, in particular based                                                                                                                        Browser
                                                                                                                             Semantic Assistants Portlet
on natural language processing (NLP). These services are
deployed in shared or private servers and can be dynamically
requested by users that ask for help in a specific task: e.g.,
                                                                                             Client−Side Abstraction Layer


                                                                                                                                               Semantic Assistants Server
finding entities in a documents, summarizing a text, answering
                                                                                                                                                         NLP Service Connector
a question, or linking content to external sources. As such,
                                                                                                                                       Web Server


they perform the role of AI “assistants” that support their                                                                                                                              Language
                                                                                                                                                                                          Service
                                                                                                                                                             Service Invocation
users. Furthermore, we imagine enhancing web portals with a                                                                                                                             Descriptions


personalization component to adapt the content to the user’s                                                                                                 Service Information


needs. Sorting documents or highlighting terms according to a
specific user interests would be a great advantage for the user                                                                                        Semantic Assistants
and a step towards working against information overload.
                                                                         Fig. 1.   The Semantic Assistants-Portal Integration architecture
    In previous work, Bakalov et al. [1] demonstrated the
feasibility and usability of a portal integration with natural               In this architecture, all available portlets in a page can
language processing services. However, this implementation               communicate with the Semantic Assistants portlet by sending
was tied to a specific, commercial portal engine (IBM Web-               content for analysis and receiving the results. To commence
Sphere1 ). The work presented here is a complete re-design               an analysis session, users interact with the portal via their web
and re-implementation of the NLP-portal integration, taking              browser, for example, on their desktop computer or from a
into account future extensions and based exclusively on open             mobile device. Through this integration, users can select an NLP
source software. Similar to the solution presented in [1], we            service to execute on a portlet’s content from a dynamically-
rely on the Semantic Assistants framework [2] for brokering              generated list of available assistants in the Semantic Assistants
text mining pipelines as web services, but our new architecture          server repository. Where applicable, users can also customize
is based on the Liferay2 open source portal server.                      the services’ behaviour by setting runtime parameters. An
    Our new portlets can be deployed in any existing Liferay-            execution request is then sent to the Semantic Assistants server
based portal to offer natural language processing services to            from the Semantic Assistants portlet in form of a W3C3 standard
its users. Here, we demonstrate the core functionality with              web service call that triggers the execution of the designated
                                                                         NLP pipeline on the provided content. The results of each
  1 IBM WebSphere, http://www.ibm.com/software/websphere
  2 Liferay, http://www.liferay.com/                                       3 World Wide Web Consortium (W3C), http://www.w3.org
                                                                                                                                         Semantic
                                                                                                                                         Assistants
                                                                                                                                         Portlet


                 Content
                 Portlet


                                                                       NLP Service
                                                                       Results


Fig. 2.   Semantic Assistants-Portal Integration User Interface in Liferay

successful service execution are first received by the Semantic                      for analysis and requests the service execution by clicking on
Assistants portlet and then passed on to the portlet that requested                  the “Run Assistant” button. This interaction will request the
the service execution. The NLP pipelines are described in                            designated Semantic Assistants server for the execution of the
the OWL4 language and the Semantic Assistants server uses                            ANNIE pipeline, provided by GATE.6 Subsequently, the results
SPARQL5 for a dynamic discovery of available services upon                           are returned to the content portlet in form of annotations in
each user request. Hence, adding or removing NLP services                            a tabular format and highlighted in the text based on their
to the integration requires no modification to the code base of                      offsets. The processing time for different scenarios depends on
the portal.                                                                          both the length of the input text and the actual NLP pipeline.
                                                                                     Naturally, sophisticated NLP pipelines with deep syntactic or
    The basis of the personalization component will be an
                                                                                     semantic analysis require more time to process. Currently, we
ontology-based user profile, where all user interests are recorded
                                                                                     are working on a personalization scenario aimed at tackling
automatically while browsing through the portal and reading
                                                                                     the user’s information overload issue, by filtering the portal’s
documents. A user interface, embedded into a portlet, allows
                                                                                     content according to a user’s interest. The idea is to embed
a user to control interests, add new terms, delete or change
                                                                                     such capability directly within portlets, allowing users to be
concepts. The user can also enable or disable the personalization
                                                                                     able to switch to various personalization modes.
mode. When personalization is desired, the documents are re-
sorted and the relevant terms of the user profile are highlighted                                              IV.    C ONCLUSIONS
within the text. In contrast to [1], the personalization feature
will be available to various portlets in form of services, rather                        In this paper, we described our open source integration
than a concrete implementation on a per-portlet basis.                               of natural language processing capabilities within a portal
                                                                                     environment. We also intend to integrate a personalization
                           III.   A PPLICATION                                       feature into portals to adapt their content according to a user’s
                                                                                     needs. Furthermore, we want to provide a user interface to give
    The integration of NLP assistants within a portal context                        the users the opportunity to have control over their recorded
allows for a multitude of applications. Fig. 2 shows an example                      interests. The NLP-portal integration will be available as part
scenario in which a portal user needs assistance in analyzing                        of the Semantic Assistants distribution hosted on SourceForge.7
the textual content available in the content portlet (left). Such
assistance can be offered to the user through the NLP services                                                       R EFERENCES
listed in the Semantic Assistants portlet (right). This portlet                      [1] F. Bakalov, B. Sateli, R. Witte, M.-J. Meurs, and B. König-Ries, “Natural
allows the user to connect to different Semantic Assistants                              Language Processing for Semantic Assistance in Web Portals,” in IEEE
servers and review the list of their available pipelines in                              International Conference on Semantic Computing (ICSC 2012). Palermo,
order to find a suitable assistant for his task at hand. In                              Italy: IEEE, September 2012.
our example, the list of assistants contains a “Person and                           [2] R. Witte and T. Gitzinger, “Semantic Assistants – User-Centric
                                                                                         Natural Language Processing Services for Desktop Clients,” in 3rd
Location Extractor” service that extracts entities of person                             Asian Semantic Web Conference (ASWC 2008), ser. LNCS, vol. 5367.
and location types from a given text. The user then sends the                            Bangkok, Thailand: Springer, Feb. 2–5, 2009 2008, pp. 360–374.
text in the content portlet to the Semantic Assistants portlet                           [Online]. Available: http://rene-witte.net/semantic-assistants-aswc08

  4 Web Ontology Language, http://www.w3.org/2004/OWL/                                 6 General Architecture for Text Engineering (GATE), http://gate.ac.uk/
  5 SPARQL Query Language, http://www.w3.org/TR/rdf-sparql-query/                      7 Semantic Assistants, http://sourceforge.net/projects/semantic-assist/