An Approach on Improving Search Engines through Social
               Content Recommendation

Una aproximación a la mejora de los Motores de Búsqueda a través de la
                Recomendación Social de Contenido
                           Luis Alberto Pérez García
      Universidad Europea de Madrid / Universidad Rey Juan Carlos de Madrid
                              --- dirección postal --
                       perezgarcia.luisalberto@gmail.com


  Resumen: El crecimiento de Internet ha provocado que la búsqueda de información haya
  pasado a tener uno de los papeles más relevantes de la industria y a ser uno de los temas de
  mayor actualidad en los ambientes de investigación. La red de redes es el mayor contenedor de
  información de la historia y su facilidad para generar información conlleva nuevos retos a la
  hora de recuperar dicha información y discernir aquella que tiene mayor relevancia que el resto.

  Paralelamente al crecimiento de la información en cantidad, también ha cambiado la forma en
  que podemos acceder a dicha información. Uno de los cambios que más movimiento de
  información ha provocado ha sido la aparición de las redes sociales. Hemos podido ver como
  las redes sociales pueden llegar a provocar más tráfico de información que los propios
  buscadores. Indudablemente, podemos sacar algunas conclusiones que nos permitan dar un
  enfoque ligeramente distinto al problema de la recuperación de información: el público general
  confía más en el contenido que le llega a través de contactos conocidos.

  En éste documento exploraremos un posible cambio en los motores de búsqueda clásicos para
  hacerlos más sociales.
  Palabras clave: búsqueda, motores de búsqueda, búsqueda social, redes sociales,
  recomendación de contenido, recuperación de información, sistemas de recomendación de
  contenido

  Abstract: Internet growth has provoked that information search had come to have one of the
  most relevant roles in the industry and to be one of the most current topics in research
  environments. Internet is the largest information container in history and its facility to generate
  new information leads to new challenges when talking about retrieving information and discern
  which one is more relevant than the rest.

  Parallel to the information growth in quantity, the way information is provided has also
  changed. One of these changes that has provoked more information traffic has been the
  emergence of social networks. We have seen how social networks can provoke more traffic
  than search engines themselves. We can draw conclusions that allow us to take a new approach
  to the information retrieval problem. Public trusts the most information coming from known
  contacts.

  In this document we will explore a possible change in classic search engines to bring them
  closer to the social side and adquire those social advantages.
  Keywords: search, search engines, social search, social netwoks, content recommendation,
  information retrieval, recsys
1   Introduction                                    lower positions in the result page. If we extend
                                                    this behavior for a long time it is easy to
1.1 Search so far                                   understand that the best old documents will
1.1.1   Classic Search                              continue being treated as the best ones even if a
                                                    better new document appears.
In classic information retrieval (IR) systems,          It also brings to the table a new problem,
the main goal is to return high-relevance and       every clicked result is a good one? Obviously
high-quality information.                           not. Maybe the title for a document is good
    Precision is an important term in IR systems.   enough to give it a click but when reading it we
It means the percentage of documents retrieved      notice it is not worth the position it has. The
that are revelant for the query.                    necessity of a recommendation system that
    An IR system needs to keep some process         allows to evaluate the documents after seing
sovered to be functional, some of them are quite    them is a priority.
common: gather existant information, convert            Another problem is that measuring the
that information to an structured model,            success of a search based on the results
ponderate that documents according to its           relevance according the query can be a mistake.
relevance and retrieve documents according to       Who do we really want to satify? The query
a query.                                            introduced or the user who introduces the
    Although all this processes are necssary, the   query? Obviously we want the user to find what
one which ponderates documents is nowadays          he is looking for, even if the query is not the
the most important and the one which makes a        best approach. Google has made some efforts
difference between search engines.                  on this, using query suggestion, but it is not
                                                    good enough as this approach is global
1.1.2   Measuring the Relevance                     oriented. A social approach could result in a
The way relevance is measured has always been       better solution here.
related to the topic of the document. In classic
search retrieving the document which was the        1.2 Impact of social networks
most accurated to the query was good enogh.
                                                    1.2.1   New Kings of Information
With the growth of information, we can find
lots of documents that really tight to each topic   The way the information reaches users is
from long ago. We needed a method for               changing. Some time ago there were two main
discerning which of those documents were the        ways to get to the information: direct visitis to
best result.                                        kown sites and results from search engines
    This was the time when Google comes to              Lately we have witnessed a change here.
scene. They provided a revolutionary algorithm      Information is reaching users from social
to measure relevance: the PageRank.                 networks. This is due to people finding more
    Google's pagerank takes care of which           relevant information from their contacts than
documents are the most selected by users            from search engines. This year, Facebook
searching about a same topic. It is                 overtake Google in traffic, also Twitter is
autoadaptative, and best results are shown in       driving more traffic each day.
best postitions for users to find them easily.          If this trend keeps going, we could be
    This algorithm brought a revolution to          talking about a change of reign in information
information retrieval and has been the best         retrieval quite soon.
solution so far.
                                                    1.2.2   Personalized Model
1.1.3   Problems                                    The great advantage of social networks is that
Focusing no the pagrank, we can see that it         you only see the information you want to see,
works in only one way, clicked results get a        and you can filter it by selecting which contacts
favorable treatment being promoted to higher        are related to you.
positions while not clicked results remain in
    Social networks can be seen as a new way of      is recommending you the person that can
retrieving information where the user decides        answer your query the best.
who are the sources that provide his                     An important point here is that you need
information.                                         people wondering to answer questions. Seen
    Facebook can be the better example of a          Aardvark results, people do like to answer
classic social network. It allows you to keep        questions but, remaining your social search
track of your contacts and see information they      engine base on users can be a failure if the
are producing or information they find               number of active users falls down.
important. That is the great point, they keep in         It is to be noticed that the village paradigm
mind that possibly you are interested in the         works very well on queries that deal with
same information your contacts are.                  opinion,        advice,        experience       or
    Twitter is, so far, the best mixture of search   recommendations. When having a conventional
engine and social network. It is oriented to         query that is factual or navigational, and it is
information and its main point is being the          based on keywords, the library paradigm,
fastest spreading news. It keeps the advantage       which is the one used in classic search engines,
of being social, as the information user gets        performs better than the village one.
from Twitter is provided by its social contacts          Working with the village paradigm makes
and uses a, still quite poor, topic filtgering       necessary to have an statistical model for
system to help discovering the best information      routing questions to potential answerers and
providers.                                           also needs a method for indexing people. Both
                                                     things are relationated, users are indexed
1.2.3   Advantages                                   according to the topics where they knowledge is
                                                     relevant, and queries are routed to people
There are tow huge components in getting
                                                     having a relevant knowledge on the query topic.
information: the information you retrieve when
                                                         An extra problem to deal with when working
searching for it and the information that reaches
                                                     on social search is understanding the natural
you altough you are not looking for it.
                                                     language. Queries in social search engines
    In a classic environment, the information
                                                     based on the village paradigm tend to be
that reaches you directly is usually treated as
                                                     questions in natural language, usually related to
unwanted information because it barely results
                                                     locations, times and more specific issues. It is
to be relevant to you.
                                                     needed to analyze the query and extract relevant
    The social component is making a point
                                                     information to route it according to its topic. It
here. Most of the information we consume,
                                                     is not as hard as understanding natural language
without considering its relevance, comes
                                                     per se, but the necessity of understanding the
directly to us. A social approach can make that
                                                     topic of a query is one of the important steps.
information to be relevant and that fact is an
advantage that search engines can not let go
through.
                                                     3    System Description
                                                     3.1 Concept and Goals
2    State of the Art
                                                     3.1.1   Main Concept
2.1 Social Search
                                                     The main goal of this document is to describe
2.1.1   Existing Approaches                          an approach to a mixture model of classic and
                                                     social search. Both paradigms have been taken
Social Search is quite an unexplored field.
                                                     in count, the library and village one.
Some approaches has been made, such as the
Village Paradigm and some implementations of
                                                         The idea is to make classic search engines
these approaches are working fine, such as
                                                     more       powerfull   by      adding     some
Aardvark.
                                                     recommendation based con social realtionships.
   The village paradigm is based on finding the
                                                     As Aardvark recommends answerers according
right person to answer a query, rather than the
                                                     to the introduced query this approach will deal
document that contains the information that
                                                     with recommending the best piece of
answers the query.
                                                     information according to a classic search and
   Using the village paradigm remains more to
                                                     direct recommendations from users.
a recommendation system than to a search
engine. In fact, what the village paradigm does
   Some of the problems search engines are         3.2.3   Ranking Algorithm
suffering where metioned before, and they can
                                                   None of the existing algorithms, classic ones as
be mitigated using social recommendations.
                                                   the pagerank and the Aardvark one fits
                                                   accurately on this model.
3.1.2   Goals to Achieve
                                                       The algorithm needed will deal with two
The main goal is to reponderate results from a     important factors: the selected result from the
search      engines      according   to    users   search engine and the recommendation given
recomendation and get a more relevant results      by the user.
organization.                                          An improvement of this algorithm will also
    Classic search engines do a reponderation on   include a user indexer to relate users and topics,
their own results based on the ones that get       as the Aardvark system does, so the system can
clicks from users. This makes that                 give more relevance to reommendations made
reponderation is done without taking in count if   by users with a proven knowledge in the query
the results satisfy the user or not.               topic.
    The approach proposed in this paper is             This will also allow to apply the
similar to that one, but the reponderation will    “following/follower” model used by Twitter, to
take in count the user satisfaction with the       search environments. Users could be notified
selected result, as recommendation or              when a relevant user recommends a result on a
penalization will be done after the information    topic related to previous queries.
is seen by the user.
    Notice that this also allows to mitigate the   3.3 Usability
SEO problem that affects to the most of classic
                                                   3.3.1   Simple Query Interface
search engines.
                                                   One of the problems recommendation systems
3.2 Architecture                                   have to deal with is the user interface.
                                                   Convencing the user to evaluate if something is
3.2.1   Information Source
                                                   recomendable or not can not be made throught
This initial approach will be held as a little     a heavy process.
experiment so, initally, there will be no self         User interface to introduce a query should be
indexed information.                               minimalistic and clean. The result lists has to be
    The information source of our social search    simple but containing enough information to
system will be a classic search engine,            allow the user to select the result that fits best
specifically Google search engine.                 his query.
    This provokes that our social search system
will not be able to work standalone, but will      3.3.2   Recommendation
allow us to reduce the initial size of our
                                                   The interface for recommending a result should
database as we will only store information
                                                   be the the least intrusive possible.
related to recommendations made by users.
                                                      The idea is to show a small bar in the
                                                   browser window which showing the content of
3.2.2   Database
                                                   the selected result. That bar will able the user to
Information on recommendations gather by the       recommend the result and even to rate it.
system will be stored in a database as well as
information related to users and their social      4    Future Work and Conclusions
relationships.
                                                   This is only a minimal specification on a
    Common databases are not the best option
                                                   different approach to social search so, many
for social based systems. Working with
                                                   future works can be borned from here.
relationships between users makes the relational
                                                       As a first option, a future work will be to
database model obsolete as it has a poor
                                                   replace the source search engine with a
performance and lacks of scalability. The
                                                   metasearch engine, which will retrieve unique
proper solution will be to use a graph database,
                                                   results from different search engines.
which have been created with social needs in
                                                       A suggestion system for powerful users will
mind and are being adopted by the most
                                                   be also a great addition to the system.
important      projects   related    to   social
technologies.
5   Referencias bibliográficas
El primer párrafo de cada sección no llevará
sangría, los restantes sí.
   The Anatomy of a Large-Scale Social
Search Engine (Damon Horowitz and Sepandar
D.Kamvar)
   Information Retrieval – Introduction and
Survey (Norbert Fuhr)
   A Survey on Web Information Retrieval
Technologies (Lan Huang)
   Information Retrieval: A Survey (de
Greengrass)