<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Web Information Retrieval System Architecture Based on Semantic MyPortal</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Haibo Yu</string-name>
          <email>yu@al.is.kyushu-u.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tsunenori Mine</string-name>
          <email>mine@al.is.kyushu-u.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Makoto Amamiya</string-name>
          <email>amamiya@al.is.kyushu-u.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Intelligent Systems</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Graduate School</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>} of Information Science and Electrical Engineering, Kyushu University 6-1 Kasuga-koen</institution>
          ,
          <addr-line>Kasuga, Fukuoka 816-8580</addr-line>
          ,
          <country country="JP">JAPAN</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we mainly focus on a communication mechanism which enables efficient information publishing and sharing among semantic desktops. We propose MyPortal as a “one stop” for all the information relevant to the user and further propose the conceptual architecture of a P2P community Web information retrieval system based on MyPortal. This architecture enables not only precise location of MyPortal instances and their Web resources but also the automatic or semiautomatic integration of hybrid semantic information delivered through Web content and Web services, and it also ensures that the semantics will not be lost during any part of the lifecycle of the information retrieval process.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Current Web design targets human consumption, based on keywords for
information indexing and searching, which not only gives rise to an enormous number
of irrelevant search responses, but is unsuitable for machine processing. In
addition, the user’s desktop information and the published Web information are
managed separately, giving rise not only to a redundancy of information but also
creating difficulties in managing the relationship among items of information and
applying user personalization.</p>
      <p>
        Currently, there are some research projects, such as Haystack [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Gnowsis
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] trying to use semantic Web technology for the management of user personal
desktop information. However, they lack the functionality for searching,
accessing, aggregating and processing of the Web information on the fly when necessary
and a unified interface for managing not only the personal desktop information
but also the relevant Web information. And a reasonable architecture and
efficient mechanisms for the connecting, discovering, and sharing of the information
among semantic information nodes are necessary.
      </p>
      <p>In this paper, we make our main concern on how to connect these information
nodes in a robust and efficient way, how to discover and share the information
among these information nodes and what functionalities need to be provided in
order to realize these targets.</p>
      <p>We propose our semantic Web information retrieval system architecture based
on the following main ideas.</p>
      <p>First, “combining Web portal technology with semantic desktop technology
to provide a “one stop” for the user to all his relevant information.” As
semantic desktop provides a good solution for managing user personal information but
lacks the functionality to search, collect and aggregate information from the Web
for the user on the fly. On the other hand, Web portals provide a good solution
for collecting relevant information for the user, but lack options for
personalization and suffer from the problems of centralized architecture. We make use of
the basic mechanisms for semantic personal information management of current
semantic desktops and enhance their Web information publishing and sharing
functionalities to construct a semantic MyPortal.</p>
      <p>Second, “using peer-to-peer computing architecture to connect MyPortals
with emphasis on an efficient method for reducing communication load.”
Decentralized P2P systems are robust, scalable and cheap to maintain, but tend
to have large amounts of information transferred among many peers. Hence, an
efficient mechanism for reducing communication loads with least loss of precision
and recall is very important in a P2P information retrieval system. We propose
our Agent-Community-based Peer-to-Peer information retrieval method called
ACP2P to connect and manage the communication among MyPortals.</p>
      <p>Third, “ensure that the semantics are not lost sight of during any part of the
lifecycle of information retrieval.” In order to enable consumer re-using semantic
data, we designed the interfaces and the protocols involved in the whole life cycle
of information retrieval tasks with semantic technology.</p>
      <p>Fourth, “all participants contribute to the semantic description consistently.”
Efficient searching for high quality results is based on pertinent matching
between well-defined resources and user queries, where the matching reflects user
preferences. We use Web site capability description (WSCD) to describe the
capabilities of MyPortal and submit user queries consistently.</p>
      <p>Fifth, “integrating Web information delivered through Web contents and
Web services.” Conventional Web contents and Web services have been managed
separately as they targeted different consumer, we will support the integrated
management of semantic Web contents and Web services at different levels in
MyPortal.
2</p>
    </sec>
    <sec id="sec-2">
      <title>MyPortal</title>
      <p>MyPortal is a “one stop” that links the user to all the information s/he needs. It
is at the user’s own desktop, which is also a Web server itself and is designed to
manage user’s personal information with semantic Web technology in a flexible
personalized way. It provides both semantic browser and semantic search engine
functionalities and these functions manage not only local user desktop
information but also the remote semantic MyPortal information. Its information can be
published through Web contents and Web services and shared by others with
proper authority.
! Ontology</p>
      <sec id="sec-2-1">
        <title>Core Component</title>
        <p>The structure of MyPortal is shown in Fig 1. It consists of following four
components: core component provides basic support for semantic Web
technologies and knowledge management, user interface component provides a unified
interface for creating, browsing, querying, and managing of the relevant
information, desktop information management component manages the conventional
personal information such as documents, e-mail, contact information, and
communication component which is the delegate of the user for communication with
other MyPortals.</p>
        <p>
          One can refer to [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for a little more detail for MyPortal.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conceptual Architecture of Web Information Retrieval</title>
    </sec>
    <sec id="sec-4">
      <title>System Based on MyPortal</title>
      <p>Our conceptual architecture for a community semantic Web information retrieval
system is illustrated in Fig 2.</p>
      <p>The architecture consists of three main components: a “consumer” which
searches for Web resources, a “provider” which holds certain resources, and
a mediator which enables the communication between the consumer and the
provider. In our architecture, the providers and consumers are all MyPortal.
Each provider describes its capabilities in what we call a WSCD (Web site
capability description), and each consumer will submit relevant queries based on
user requirements when a Web search is necessary. The mediator is comprised of
agents assigned to the consumer and providers using an Agent-Community-based
P2P information retrieval method to fulfill the search and access tasks.
3.1</p>
      <sec id="sec-4-1">
        <title>Connecting MyPortals with ACP2P method</title>
        <p>
          The communication between consumer and providers is based on an
AgentCommunity-based Peer-to-Peer information retrieval method called ACP2P method[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ],
        </p>
        <sec id="sec-4-1-1">
          <title>Communication</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Component</title>
          <p>Information Retrieval</p>
          <p>Agent (IRA)
History Management</p>
          <p>Agent (HMA)
Transformation
User Interface</p>
          <p>Agent</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>User</title>
        </sec>
        <sec id="sec-4-1-4">
          <title>Interface</title>
          <p>Desktop Information</p>
          <p>Management</p>
          <p>Desktop</p>
          <p>Adaptors Information
MyPortal 2
(GIDW,WSCCDD,WSD)
IR Agent
IR Agent</p>
          <p>MyPortal n
… (GIDW,WSCCDD,WSD)</p>
          <p>IR Agent</p>
          <p>Mediator
UI Agent</p>
          <p>HM Agent
MyPortal</p>
          <p>Consumer
which uses agent communities to manage and look up information related to a
user query.</p>
          <p>In order to retrieve information relevant to a user query, an agent uses
two histories: a query/retrieved document history (Q/RDH for short) and a
query/sender agent history (Q/SAH for short). Making use of the Q/SAH is
expected to have a collaborative filtering effect, which gradually creates virtual
agent communities, where agents with the same interests stay together.</p>
          <p>The ACP2P method employs three types of agents: user interface (UI) agent,
information retrieval (IR) agent and history management (HM) agent. A set of
three agents (UI agent, IR agent, HM agent) is assigned to each user. Although
a UI agent and an HM agent communicate only with the IR agent of their user,
an IR agent communicates with other users’ IR agents to search for information
relevant to its user’s query. A pair of Q/RDH and Q/SAH histories and retrieved
content files are managed by the HM agent.</p>
          <p>
            The ACP2P method is implemented with Multi-Agent Kodama (Kyushu
university Open &amp; Distributed Autonomous Multi-Agent) [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]. Kodama comprises
hierarchical structured agent communities based on a portal-agent model. A
portal agent is the representative of all member agents in a community and allows
the community to be treated as one normal agent outside the community.
          </p>
          <p>We are currently planning to use SPARQL RDF query language and SPARQL
protocol as our semantic communication interfaces between providers and
consumers.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Web site capability description (WSCD)</title>
        <p>Resource location is based on matching between user requirements and Web site
capabilities, hence a capability description of MyPortal is necessary. We describe
the layered capabilities of MyPortal by layers.</p>
        <p>First, we semantically describe the general capabilities of the Web site, and
we call this a “general information description (GID).” The GID gives an explicit
overview of the Web site capabilities such as their category, topic, and can be
used as the initial filter for judging congruence with user preferences. Second, we
give the Web content capability description (WCD), it is the metadata of Web
contents and is composed of knowledge bases of all domains involved. Third, we
give the Web service capability description (WSD) which is further expressed
by two layers: “a semantic Web service description (SWSD)” and “a concrete
Web service description (CWSD).” This hierarchical capability-describing
mechanism enables semantic and non-semantic Web service capability-describing and
matchmaking for different levels.</p>
        <p>
          For the details of our Web site capability description mechanism, one can
refer to document [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we addressed our main ideas on constructing a P2P community
semantic Web information retrieval system based on MyPortal, mainly focused on
how to connect MyPortals to enable automatic and efficient information
sharing and what functionalities are necessary when constructing a MyPortal. In
the future, we will realize a prototype of MyPortal and a P2P community Web
information retrieval system based on MyPortal, and evaluate the effectiveness
of our approaches. Experiments in using the ACP2P method for semantic Web
data retrieval in a dynamic multiple community environment will also be carried
out.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Quan</surname>
          </string-name>
          .
          <article-title>Haystack: A Platform for Creating, Organizing and Visualizing Information Using RDF</article-title>
          .
          <source>In Proceedings of the International Workshop on the Semantic Web (at WWW2002)</source>
          ,
          <year>2002</year>
          . http://semanticweb2002.aifb.unikarlsruhe.de/proceedings/Research/huynh.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>T.</given-names>
            <surname>Mine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Matsuno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kogo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Amamiya</surname>
          </string-name>
          .
          <article-title>Design and implementation of agent community based peer-to-peer information retrieval method</article-title>
          .
          <source>In Proc. of Eighth Int. Workshop CIA-2004 on Cooperative Information Agents (CIA</source>
          <year>2004</year>
          ),
          <source>LNAI 3191</source>
          , pages
          <fpage>31</fpage>
          -
          <lpage>46</lpage>
          , 9
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>L.</given-names>
            <surname>Sauermann</surname>
          </string-name>
          .
          <article-title>The Gnowsis Semantic Desktop for Information Integration</article-title>
          .
          <source>In IOA Workshop of the VM2005 Conference</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mine</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Amamiya</surname>
          </string-name>
          .
          <article-title>Towards a Semantic MyPortal</article-title>
          .
          <source>In The 3rd International Semantic Web Conference (ISWC</source>
          <year>2004</year>
          )
          <article-title>Poster Abstracts</article-title>
          , pages
          <fpage>95</fpage>
          -
          <lpage>96</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mine</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Amamiya</surname>
          </string-name>
          .
          <article-title>Towards Automatic Discovery of Web Portals -Semantic Description of Web Portal Capabilities-</article-title>
          .
          <source>In Semantic Web Services and Web Process Composition: First International Workshop, SWSWPC</source>
          <year>2004</year>
          , LNCS 3387/
          <year>2005</year>
          , pages
          <fpage>124</fpage>
          -
          <lpage>136</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amamiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Takahashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mine</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Amamiya</surname>
          </string-name>
          .
          <article-title>The Design and Implementation of KODAMA System</article-title>
          .
          <source>IEICE Transactions on Information and Systems</source>
          , E85-D(4):
          <fpage>637</fpage>
          -
          <lpage>646</lpage>
          , April,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>