Semantic Content Processing in Web Portals Felicitas Löffler∗ , Bahar Sateli† , Birgitta König-Ries∗ , René Witte† ∗ Institute for Computer Science, Friedrich-Schiller University of Jena, Germany † Department of Computer Science and Software Engineering, Concordia University, Montréal, Canada Abstract—Web portals provide a standardized way of inte- named entity recognition in a given article, but the framework grating multiple information sources and applications in a single is not limited to a single domain: A clear separation of concerns web interface. However, they currently do not provide semantic allows a language engineer to make new NLP services available support for users that need to navigate the often overwhelming without requiring knowledge in portal technology, and a web amount of content. We demonstrate our open source portal engineer can easily design a new web portal that incorporates architecture “hanüwa” that integrates text mining web services, based on the Semantic Assistants framework, with the Liferay language technology. portal server. II. A RCHITECTURE I. I NTRODUCTION Our novel Semantic Assistants-portal integration architec- ture, illustrated in Fig. 1, is designed to allow various portlets Web portals are a specific kind of web-based systems to benefit from NLP techniques on their content. The core that provide for an integration of diverse information sources idea is to enable generic portlets to communicate with the and applications. Deployed for a concrete scenario in an Semantic Assistants portlet, specifically designed to connect to organization, they typically address the information needs of a the back-end Semantic Assistants server and provide inquiry wide range of users and their tasks through both internal and and invoking capability of NLP pipelines to portal users. external services. Portal User While a web portal provides convenient access to infor- mation, there is no standardized way that allows to further Database process the available content in order to support users in their Portlet Controller tasks. There is also a lack of appropriate technologies for Web Server document filtering within a web portal. We envision a new ... Other Portlet Content Portlet generation of web portals that can provide context-sensitive (Embedded) support through semantic analysis services, in particular based Browser Semantic Assistants Portlet on natural language processing (NLP). These services are deployed in shared or private servers and can be dynamically requested by users that ask for help in a specific task: e.g., Client−Side Abstraction Layer Semantic Assistants Server finding entities in a documents, summarizing a text, answering NLP Service Connector a question, or linking content to external sources. As such, Web Server they perform the role of AI “assistants” that support their Language Service Service Invocation users. Furthermore, we imagine enhancing web portals with a Descriptions personalization component to adapt the content to the user’s Service Information needs. Sorting documents or highlighting terms according to a specific user interests would be a great advantage for the user Semantic Assistants and a step towards working against information overload. Fig. 1. The Semantic Assistants-Portal Integration architecture In previous work, Bakalov et al. [1] demonstrated the feasibility and usability of a portal integration with natural In this architecture, all available portlets in a page can language processing services. However, this implementation communicate with the Semantic Assistants portlet by sending was tied to a specific, commercial portal engine (IBM Web- content for analysis and receiving the results. To commence Sphere1 ). The work presented here is a complete re-design an analysis session, users interact with the portal via their web and re-implementation of the NLP-portal integration, taking browser, for example, on their desktop computer or from a into account future extensions and based exclusively on open mobile device. Through this integration, users can select an NLP source software. Similar to the solution presented in [1], we service to execute on a portlet’s content from a dynamically- rely on the Semantic Assistants framework [2] for brokering generated list of available assistants in the Semantic Assistants text mining pipelines as web services, but our new architecture server repository. Where applicable, users can also customize is based on the Liferay2 open source portal server. the services’ behaviour by setting runtime parameters. An Our new portlets can be deployed in any existing Liferay- execution request is then sent to the Semantic Assistants server based portal to offer natural language processing services to from the Semantic Assistants portlet in form of a W3C3 standard its users. Here, we demonstrate the core functionality with web service call that triggers the execution of the designated NLP pipeline on the provided content. The results of each 1 IBM WebSphere, http://www.ibm.com/software/websphere 2 Liferay, http://www.liferay.com/ 3 World Wide Web Consortium (W3C), http://www.w3.org Semantic Assistants Portlet Content Portlet NLP Service Results Fig. 2. Semantic Assistants-Portal Integration User Interface in Liferay successful service execution are first received by the Semantic for analysis and requests the service execution by clicking on Assistants portlet and then passed on to the portlet that requested the “Run Assistant” button. This interaction will request the the service execution. The NLP pipelines are described in designated Semantic Assistants server for the execution of the the OWL4 language and the Semantic Assistants server uses ANNIE pipeline, provided by GATE.6 Subsequently, the results SPARQL5 for a dynamic discovery of available services upon are returned to the content portlet in form of annotations in each user request. Hence, adding or removing NLP services a tabular format and highlighted in the text based on their to the integration requires no modification to the code base of offsets. The processing time for different scenarios depends on the portal. both the length of the input text and the actual NLP pipeline. Naturally, sophisticated NLP pipelines with deep syntactic or The basis of the personalization component will be an semantic analysis require more time to process. Currently, we ontology-based user profile, where all user interests are recorded are working on a personalization scenario aimed at tackling automatically while browsing through the portal and reading the user’s information overload issue, by filtering the portal’s documents. A user interface, embedded into a portlet, allows content according to a user’s interest. The idea is to embed a user to control interests, add new terms, delete or change such capability directly within portlets, allowing users to be concepts. The user can also enable or disable the personalization able to switch to various personalization modes. mode. When personalization is desired, the documents are re- sorted and the relevant terms of the user profile are highlighted IV. C ONCLUSIONS within the text. In contrast to [1], the personalization feature will be available to various portlets in form of services, rather In this paper, we described our open source integration than a concrete implementation on a per-portlet basis. of natural language processing capabilities within a portal environment. We also intend to integrate a personalization III. A PPLICATION feature into portals to adapt their content according to a user’s needs. Furthermore, we want to provide a user interface to give The integration of NLP assistants within a portal context the users the opportunity to have control over their recorded allows for a multitude of applications. Fig. 2 shows an example interests. The NLP-portal integration will be available as part scenario in which a portal user needs assistance in analyzing of the Semantic Assistants distribution hosted on SourceForge.7 the textual content available in the content portlet (left). Such assistance can be offered to the user through the NLP services R EFERENCES listed in the Semantic Assistants portlet (right). This portlet [1] F. Bakalov, B. Sateli, R. Witte, M.-J. Meurs, and B. König-Ries, “Natural allows the user to connect to different Semantic Assistants Language Processing for Semantic Assistance in Web Portals,” in IEEE servers and review the list of their available pipelines in International Conference on Semantic Computing (ICSC 2012). Palermo, order to find a suitable assistant for his task at hand. In Italy: IEEE, September 2012. our example, the list of assistants contains a “Person and [2] R. Witte and T. Gitzinger, “Semantic Assistants – User-Centric Natural Language Processing Services for Desktop Clients,” in 3rd Location Extractor” service that extracts entities of person Asian Semantic Web Conference (ASWC 2008), ser. LNCS, vol. 5367. and location types from a given text. The user then sends the Bangkok, Thailand: Springer, Feb. 2–5, 2009 2008, pp. 360–374. text in the content portlet to the Semantic Assistants portlet [Online]. Available: http://rene-witte.net/semantic-assistants-aswc08 4 Web Ontology Language, http://www.w3.org/2004/OWL/ 6 General Architecture for Text Engineering (GATE), http://gate.ac.uk/ 5 SPARQL Query Language, http://www.w3.org/TR/rdf-sparql-query/ 7 Semantic Assistants, http://sourceforge.net/projects/semantic-assist/