=Paper= {{Paper |id=None |storemode=property |title=Social Sifter: An Agent-Based Recommender System to Mine the Social Web |pdfUrl=https://ceur-ws.org/Vol-966/STIDS2012_P02_NachawatiEtAl_SocialSifter.pdf |volume=Vol-966 |dblpUrl=https://dblp.org/rec/conf/stids/NachawatiRYKB12 }} ==Social Sifter: An Agent-Based Recommender System to Mine the Social Web== https://ceur-ws.org/Vol-966/STIDS2012_P02_NachawatiEtAl_SocialSifter.pdf
Social Sifter: An Agent-Based Recommender System
                to Mine the Social Web
         M. Omar Nachawati, Rasheed Rabbi, Genong (Eugene) Yu, Larry Kerschberg and Alexander Brodsky

                                                  Dept. of Computer Science
                                                   George Mason University
                                                       Fairfax, VA, USA
                                       {mnachawa, rrabbi, gyu, kersch, brodsky} at gmu.edu

Abstract— With the recent growth of the Social Web, an                  information that is made available in structured, machine
emerging challenge is how we can integrate information from             readable formats, such as RDF and OWL .
the heterogeneity of current Social Web sites to improve
semantic access to the information and knowledge across the                Conventionally, finding answers to questions and learning
                                                                        from the knowledge mine existed on the Social Web has
entire World Wide Web, the Web. Interoperability across the
Social Web sites make the simplest of inferences based on data          primarily been a manual process. It requires a lot of
                                                                        intelligence in sifting through the mountains of Social Web
from different sites challenging. Even if such data were
interoperable across multiple Social Web sites, the ability of          pages using only a keyword-based Web search engine, which is
                                                                        akin to a primitive pitch-fork in Semantic Web terms. More
meaningful inferences of a collective intelligence [1] system
depends on both its ability to marshal such semantic data, as           recently, however, Social Web sites have begun to embrace
                                                                        Semantic Web technologies such as RDF and OWL, and have
well as its ability to accurately understand and precisely
respond to queries from its users. This paper presents the              been offering much more machine-friendly data, such as geo-
                                                                        tagged images on Flickr, Friend Of A Friend (FOAF) exports
architecture for Social Sifter, an agent-based, collective
intelligence system for assimilating information and knowledge          in FaceBook and hCalendar [7] tagged events on Blogger. Such
                                                                        developments have sparked the evolution of the Social Web
across the Social Web. A health recommender system
prototype was developed using the Social Sifter architecture,           into a collective knowledge system [1], where the contributions
                                                                        of the user community are aggregated and marshaled with
which recommends treatments, prevention advice, therapies for
ailments, and doctors and hospitals based on shared                     knowledge from other heterogeneous sources (e.g., web pages,
                                                                        news and encyclopedia articles, and academic journals) in a
experiences available on the Social Web.
                                                                        synergy dubbed the Social Semantic Web.
    Keywords: social semantic search; collective knowledge
systems; recommender systems, OWL; RDF; SPARQL                             While the Semantic Web focuses on data to enable
                                                                        interoperability among heterogeneous semi-structured web
                     I.      INTRODUCTION                               pages, the focus of the Social Semantic Web vision is to create
                                                                        a system of collective intelligence by improving the way
   Since its inception, the World Wide Web has always
                                                                        people share and explore their own and others knowledge and
overwhelmed users with its vast quantity of information. The
                                                                        experience [1]. Work on the Social Sifter promotes that grand
advent of Social Webs, coined Web 2.0, has placed an
                                                                        vision and expands on the research done on the patented
additional burden on Web search engines. While the
                                                                        Knowledge Sifter architecture [7, 8, 9], as well as the Personal
established algorithms that Web search engines employ are
                                                                        Health Explorer [11], undertaken at George Mason University.
effective in surfacing the most popular results through
                                                                        As a proof of concept, we have designed a social health
hyperlink analysis, as demonstrated by the Hubs and
                                                                        knowledge and recommender system based on the Social Sifter
Authorities algorithm [2] and the PageRank algorithm [3],
                                                                        platform that utilizes the Social Semantic Web to provide
those results are not necessarily relevant despite popularity and
                                                                        precise search results and recommendations.
these algorithms have fallen short of solving the problem of
information overload [1, 2, 3] on the World Wide Web.                      The rest of this paper is organized as follows: section II
                                                                        discusses related work, section III describes the Social Sifter
   The research into natural language understanding [4]
                                                                        architecture and a brief description of the prototype system.
attempts to close that gap. However the quality of machine
                                                                        Section IV highlights the experimental results, and Section V
generated semantics still pales in comparison to that of humans.
                                                                        identifies the possible future work on the Social Sifter platform.
This became a core challenge for the Semantic Web or Web
3.0, where information is made available in structured,                                      II.     RELATED WORK
machine-friendly formats allowing machines not only to sort
and filter such data, but also to combine data from multiple            A. Knowledge Sifter and Personal Health Explorer
Web sites in a meaningful way and allow inferences to be made              Semantic systems belong to a class of systems that make use
upon that data. While semantic query languages, such as                 of ontologies, context awareness and other semantic methods to
SPARQL, can provide a database-like interface to the World              make informed recommendations. Such research in semantic
Wide Web, it is only as good as the quantity and quality of             search at George Mason University began with WebSifter [8, 9,

                                                                    1
10], an agent-based multi-criteria ranking system to select              of [11] to leverage an integrated semantic search engine and
semantically meaningful Web pages from multiple search                   recommender system.
engines such as Google, Yahoo, etc. The work further led to a
patent [8]. Knowledge Sifter (KS) [8] is motivated by                              III.     THE SOCIAL SIFTER ARCHITECTURE
WebSifter [7,8], but is augmented with the advanced use of                  Social Sifter, an enhancement of the existing Knowledge
semantic web ontologies, authoritative sources, and a service-           Sifter (KS), is a collection of cooperating agents that are
oriented plug-and-play architecture. Knowledge Sifter is a               exposed through web services and exhibits a Service-Oriented
scalable agent-based web services framework that is aimed to             Architecture (SOA)-based framework.
support i) ontology guided semantic searches, ii) refine
searches based on relevant feedback, and iii) accessing
heterogeneous data sources via agent-based knowledge
services. Personal Health Explorer (PHE) is an enhancement of
KS to perform semantic search in biomedical domain. PHE
leverages additional features of a personal health graph to be
identified, categorized, and reconstituted by providing links to
the user to rate individual results and return to previous queries
and update information through a semantically supported path.
   KS and PHE are able to obtain more relevant search results
than classic search engines; while the result is very general, it
leaves room to make it more personalized. Both KS and PHE
make multifaceted efforts towards realizing the Semantic Web
vision, primarily focusing on the formal ontological sources.
PHE provides facilities to include a user’s Personal Health
Record (PHR), which entails additional permission and access
control which may be constrained by HIPAA regulations.
Interestingly, both of these systems did not use the data
available on the Social Web, namely Wikipedia, YouTube,
Flickr, Facebook, LinkedIn, etc. This is where Social Sifter                   Figure 1. Social Sifter Architecture – Tiers and Components
makes its contribution.
                                                                            Depending on the functionality, agents are allocated into
B. BLISS and Cobot                                                       three different architecture layers – i) the User Layer, ii) the
   Other attempts to utilize Web 2.0 technology to enhance the           Knowledge Management Layer, and iii) Data Layer. The User
quality and relevance of health recommendation systems                   Layer consists of the User and Preferences agents, and
include bookmarking, crowd sourcing, crowd tagging and                   manages all user interaction and data preferences. The
harvesting user recommendations. The Biological Literature               Knowledge Management Layer handles the support for
Social Ranking System (BLISS) is one such prototype system               semantic search, access to data sources, and the ranking of
that allows users to bookmark and promote their                          search results using technologies like the Ontology, Social Web
recommendation to communities of special interest, facilitate            Crawling, Ranking, Query Formulation, and Web Services
the annotation and ranking by the community, and present the             agents. The Data Layer consists of the data repositories that
results to allow other users to get the recommendations based            provide authoritative information and documents. The
on community ranking [6]. The bookmarking approach is                    hierarchy of the architecture layers is already defined in KS;
useful in establishing the authoritativeness of information over         three additional agents were added, with an alteration of the
the long term because it uses social voting or ranking [5].              underlying algorithm to perform the execution flow into the
                                                                         Social Sifter.	
  
   The Cobot system uses social conversation and social
tagging (preference) to enhance the health recommendations.                Social Web agent basically collaborates with following two
Three techniques are noteworthy: (1) user-initiative dialogue in         agents to manipulate social web information.
capturing user’s intent, (2) social tagging in establishing the             Open SW agent performs open search within the blogs,
authoritativeness of social information, and (3) case-based              related support groups etc.
semantic reasoning in utilizing social knowledge for
recommendation [5].                                                         User Specific SW agent identifies user social identities
                                                                         across the web and conducts Collaborative Filtering by
  C. Semantic Analytics on Social Networks                               processing social tags, user participation and responses
   A multi-step engineering process is described in [9] to utilize       available on the social webs.
social knowledge. These steps are common procedure to across
                                                                                    IV.      HEALTH RECOMMENDER SYSTEM
the initiatives to transform the social web information to
semantic knowledge.                                                         As a proof-of-concept, we are building a health
                                                                         recommender system using our Social Sifter architecture that
   Social Sifter adheres to the underlying framework of                  provides health recommendations for any type of sickness,
Knowledge Sifter [9], the knowledge manipulation mechanism               disease or disorder. The present system does not do any natural
of PHE [10], and engineering process for semantic association            language processing on user queries, and therefore is limited as

                                                                     2
to what it can accept as a valid query. Currently, the system                 friendship and affiliation information to generate the
accepts a comma delimited list of words that relate to a specific             user’s Social Graph.
ailment and returns a list of relevant descriptions of the
                                                                         iii) User Agent passes the SPARQL query and the collected
ailment, therapy options, doctors, and treatment centers as
collected from the Social Semantic Web from our knowledge                     User Profile information to the Query Formulation Agent.
Management Layer. We intend for future versions of the health           Query Refinement: The Query Formulation Agent then
recommender system to allow for unrestricted language queries           attempts to enrich the original SPARQL query by:
by performing natural language processing to transform the
unstructured query input into a more structured format,                  i)   Semantic Query Decomposition: It will generate multiple
acceptable by the Social Sifter architecture.                                 sub-queries that generalize and specialize the term
                                                                              pancreatic cancer based on the health-domain ontology
                                                                              from The National Center for Biomedical Ontologies
                                                                              (NCBO), a BioPortal and MedLine (Medical Literature
                                                                              Analysis and Retrieval System Online), which is a
                             Parsing Key
                               Words
                                                                              bibliographic database of life sciences and biomedical
                                                                              information.
                Query Enrichment with Semantics                          ii) Marshalling: selected data will be marshaled with the
                               Key Words
                                                                             amassed folksonomy from the Social Web Agent. The
                                                                             inference engine will also generate queries based on the
                                                                             results of any cluster analysis from data crawled from the
                  Ontology        RDF        Social Media                    Social Web, which may pick up, for instance, other
                                                                             ailments that people have discussed together with
                                                                             pancreatic cancer.
                 Decomposing into Multiple Sub Queries
                                                                         iii) Ranking: The end result of this meta-search is a weighted
                                                                              tree of sub-queries, where weights are assigned based,
                                                                              among other features, on the static nature of the sub-
                          Perform Search
                                                                              query generated (heuristically) as well as the importance
                           with Existing
                          Search Engine
                                                                              of the source (back-reference analysis).
                                                                        Post Query Processing: Once all sub-queries have been
                        Analyze, Rank and
                       organize search result                           defined, the Web Service Agent passes them to the Data Layer,
                                                                        which accordingly runs the queries and itself ranks each result,
                                                                        based on many factors, including relevance (ontological),
                               Display
                                                                        importance (back-reference based) and belief (Bayesian-based
                                                                        inference from Social Semantic Web).
              Figure 2. Social Sifter work flow diagram
                                                                        Result Scrutinizing: The results are then returned to the
A. Scenario for Pancreatic Cancer                                       Integration Agent, which combines different classes (based on
   Consider the case when a user is exploring recommendations           the results from the classifier) of results based on a total
for pancreatic cancer. According to the NIH, treatment options          ordering derived from the aggregated ontology, and back-
include surgery and biliary stents. The NIH also lists links to         reference analysis. The agent also performs a clustering
support groups, among which CancerCare.org features a social            analysis on the result set to further group the results and
question-answer forum that is categorized by topic. Our                 perform statistical calculations on the groups of results before
inference agent for health recommendations takes advantage of           passing them to the User Layer.
this domain knowledge in attempting to provide better quality           Result displaying: The User Layer then displays the grouped
recommendations than what would be available from a general             and ranked results according to the preferences selected by the
Web search engine. Let us walk through the steps of the health          user.
recommender system for this particular query.
                                                                        B. Query life cycle for Pancreatic Cancer in Social sifter
Query Submission: User logs into the health recommender
system website and enters the following query terms                         The life cycle of a query in Social Sifter, e.g., searching for
“pancreatic cancer.”                                                    “pancreatic cancer”, is as follows: (1) a user allows access to
                                                                        his profile, (2) Sifter culls information from his social
Query String Preparation:                                               networks, (3) Sifter initiates targeted information harvesting,
                                                                        (4) Sifter conducts semantic inference and reasoning, and (5)
 i)   The User Agent parses the query string to identify key
                                                                        Sifter presents socially- and semantically-renked results are to
      words.
                                                                        the user.
 ii) The Preference Agent collects context information,
                                                                        C. Social Sifter Prototype
     including the user’s IP address, a query session identifier,
     and the best geographic location estimate available for               The Social Sifter prototype has been implemented to use
     that user. It tries to create a User Profile by indexing           information retrievable from Facebook using Graph API in


                                                                    3
gathering the information about the users. In Facebook, each            and Social Sifter. Social Sifter provided integrated results and
user can have feeds, likes, activities, interests, music, books,        used social ranking to rearrange the categories depending on
videos, events, groups, checkins, games, and his personal               users profile information. Location is determined based on user
information, like hometown and related locations. These                 provided current living locations. More testing is being carried
provide a very rich base for understanding the intension of a           out to determine metrics to assess the quality of social semantic
user when he is searching on the Web.                                   search recommendations.
   Social Sifter combined both semantic reasoning and social                                       VI.     CONCLUSIONS
ranking to better understand user’s intention and present the
results to users, based on initial search keywords or phrases              Social semantic search is an integration of social networks
provided. The algorithm for the currently implemented search            and semantic search. Semantic search provides rich means in
is described as follows.                                                enhancing search, especially the user’s intent and semantic
                                                                        reasoning. Social search involves people and links to their
(1) Login: User logs into his Facebook using OAuth                      social graphs. In this paper, a prototype social semantic search
    authentication. The program gets the authorized token and           engine, Social Sifter, has been presented. The lessons learned
    uses it to access user’s information with user’s                    from the implementation showed two areas for improving
    concurrence.                                                        search accuracy: social contextual information (user intent
                                                                        understanding) and social semantic ranking (results relevance).
(2) Information Retrieval: The system retrieves the
    information about the user (Feeds, Likes, Activities,                  The current implemented prototype system is limited in the
    Interests, Music, Books, Photos, Videos etc.) and uses              use of the semantic reasoning. The crawling of data should be
    them in supporting the targeted harvesting of information           expanded to other social media and social networks. Integration
    and formulating the social ranking of results in categories.        of these results into a standard semantic data store is necessary
                                                                        to realize the power of semantic reasoning. Further study
(3) Social ranking – A simple algorithm is used to calculate
                                                                        directions are: (1) to integrate mature ontologies, (2) to define
    the social weights of the harvested information in each
                                                                        customized actions to demonstrate the approach in health
    category. The algorithm is basically counting the
                                                                        domain, and (3) to use the reasoning power of semantics.
    occurrences of keywords or phrases in each category.
(4) Social context – The user’s background information is                                              REFERENCES
    used in refining the search results or filtering the results.       [1] T. Gruber, “Collective knowledge systems: Where the Social Web meets
    One specific example is the location information. The               the Semantic Web,” Web Semantics: Science, Services and Agents on the
                                                                        World Wide Web, vol. 6, no. 1, pp. 4–13, Feb. 2008.
    home location of the person is generally used to limit the
    places to be searched and returned.                                 [2] J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” J.
                                                                        ACM, vol. 46, no. 5, pp. 604–632, Sep. 1999.
(5) Semantic result presentation – The results are presented to
                                                                        [3] L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank Citation
    users in groups: people, groups, events, places, events,            Ranking: Bringing Order to the Web. 1999.
    pages, or posts. The current implementation is limited to
    use the categories or semantics of Facebook. The actions            [4] A. Ntoulas, G. Chao, and J. Cho, “The infocious web search engine:
                                                                        improving web searching through linguistic analysis,” in Special interest tracks
    in Facebook link objects and people. They are the bases             and posters of the 14th international conference on World Wide Web, New
    for our search engine in weighing the harvesting strategies.        York, NY, USA, 2005, pp. 840–849.
    They are also important in ranking the results and the              [5] J. M. Gomez, G. Alor-Hernandez, R. Posada-Gomez, M. A. Abud-
    categories when presenting the search results to users. The         Figueroa, and A. Garcia-Crespo, “SITIO: A Social Semantic Recommendation
    current implementation used the same social ranking                 Platform,” in 17th International Conference on Electronics, Communications
    strategy described in (3).                                          and Computers, 2007. CONIELECOMP ’07, 2007, p. 29–29
                                                                        [6] “hCalendar 1.0 · Microformats Wiki.” [Online].                   Available:
D. Proactive Social Search                                              http://microformats.org/wiki/hcalendar. [Accessed: 15-Apr-2012].
    The existing Facebook semantics do not capture the                  [7] L. Kerschberg, W. Kim, and A. Scime, “WebSifter II: A Personalizable
semantic of health queries. For health problems, users may be           Meta-Search Agent Based on Weighted Semantic Taxonomy Tree,” in
interested in finding out the cure of certain diseases, which is        International Conference on Internet Computing, Las Vegas, NV, 2001
not captured by the current set of actions available in                 [8] L. Kerschberg, W. Kim, and A. Scime, “Personalizable semantic taxonomy-
Facebook. Customized actions can be implemented using the               based search agent,” U.S. Patent 7117207Oct-2006
Facebook Open Graph, but it is beyond the scope of this paper.
                                                                        [9] L. Kerschberg, H. Jeong, Y. Song, and W. Kim, “A Case-Based Framework
                                                                        for Collaborative Semantic Search in Knowledge Sifter,” Case-Based
                V.      EXPERMENTAL FINDINGS                            Reasoning Research and Development, vol. 4626/2007, pp. 16–30, 2007.
    The Social Sifter prototype has been implemented. The               [10] T. G. Morrell and L. Kerschberg, “Personal Health Explorer: A Semantic
Facebook Graph API was used as the basis for harvesting                 Health Recommendation System," workshop on Data-Driven Decision Support
social network information about the user. Social information           and Guidance System (DGSS), 28th IEEE International Conference on Data
was used in two aspects – understanding the user’s intention            Engineering, Arlington, VA April 1, 2012.
(context) and ranking results (social semantic ranking). The            [11] Boanerges Aleman-Meza, Meenakshi Nagarajan, Cartic Ramakrishnan,
two aspects showed improved search results. For example, the            Li Ding, Pranam Kolari, Amit P. Sheth, I. Budak Arpinar, Anupam Joshi, Tim
searching case using phrase – “pancreatic cancer” can be                Finin. Semantic Analytics on Social Networks: Experiences in Addressing the
compared using three different engines – Google, Facebook,              Problem of Conflict of Interest Detection. WWW 2006, May 23–26, 2006,
                                                                        Edinburgh, Scotland. ACM 1-59593-323-9/06/0005.


                                                                    4