An Approach on Improving Search Engines through Social Content Recommendation Una aproximación a la mejora de los Motores de Búsqueda a través de la Recomendación Social de Contenido Luis Alberto Pérez García Universidad Europea de Madrid / Universidad Rey Juan Carlos de Madrid --- dirección postal -- perezgarcia.luisalberto@gmail.com Resumen: El crecimiento de Internet ha provocado que la búsqueda de información haya pasado a tener uno de los papeles más relevantes de la industria y a ser uno de los temas de mayor actualidad en los ambientes de investigación. La red de redes es el mayor contenedor de información de la historia y su facilidad para generar información conlleva nuevos retos a la hora de recuperar dicha información y discernir aquella que tiene mayor relevancia que el resto. Paralelamente al crecimiento de la información en cantidad, también ha cambiado la forma en que podemos acceder a dicha información. Uno de los cambios que más movimiento de información ha provocado ha sido la aparición de las redes sociales. Hemos podido ver como las redes sociales pueden llegar a provocar más tráfico de información que los propios buscadores. Indudablemente, podemos sacar algunas conclusiones que nos permitan dar un enfoque ligeramente distinto al problema de la recuperación de información: el público general confía más en el contenido que le llega a través de contactos conocidos. En éste documento exploraremos un posible cambio en los motores de búsqueda clásicos para hacerlos más sociales. Palabras clave: búsqueda, motores de búsqueda, búsqueda social, redes sociales, recomendación de contenido, recuperación de información, sistemas de recomendación de contenido Abstract: Internet growth has provoked that information search had come to have one of the most relevant roles in the industry and to be one of the most current topics in research environments. Internet is the largest information container in history and its facility to generate new information leads to new challenges when talking about retrieving information and discern which one is more relevant than the rest. Parallel to the information growth in quantity, the way information is provided has also changed. One of these changes that has provoked more information traffic has been the emergence of social networks. We have seen how social networks can provoke more traffic than search engines themselves. We can draw conclusions that allow us to take a new approach to the information retrieval problem. Public trusts the most information coming from known contacts. In this document we will explore a possible change in classic search engines to bring them closer to the social side and adquire those social advantages. Keywords: search, search engines, social search, social netwoks, content recommendation, information retrieval, recsys 1 Introduction lower positions in the result page. If we extend this behavior for a long time it is easy to 1.1 Search so far understand that the best old documents will 1.1.1 Classic Search continue being treated as the best ones even if a better new document appears. In classic information retrieval (IR) systems, It also brings to the table a new problem, the main goal is to return high-relevance and every clicked result is a good one? Obviously high-quality information. not. Maybe the title for a document is good Precision is an important term in IR systems. enough to give it a click but when reading it we It means the percentage of documents retrieved notice it is not worth the position it has. The that are revelant for the query. necessity of a recommendation system that An IR system needs to keep some process allows to evaluate the documents after seing sovered to be functional, some of them are quite them is a priority. common: gather existant information, convert Another problem is that measuring the that information to an structured model, success of a search based on the results ponderate that documents according to its relevance according the query can be a mistake. relevance and retrieve documents according to Who do we really want to satify? The query a query. introduced or the user who introduces the Although all this processes are necssary, the query? Obviously we want the user to find what one which ponderates documents is nowadays he is looking for, even if the query is not the the most important and the one which makes a best approach. Google has made some efforts difference between search engines. on this, using query suggestion, but it is not good enough as this approach is global 1.1.2 Measuring the Relevance oriented. A social approach could result in a The way relevance is measured has always been better solution here. related to the topic of the document. In classic search retrieving the document which was the 1.2 Impact of social networks most accurated to the query was good enogh. 1.2.1 New Kings of Information With the growth of information, we can find lots of documents that really tight to each topic The way the information reaches users is from long ago. We needed a method for changing. Some time ago there were two main discerning which of those documents were the ways to get to the information: direct visitis to best result. kown sites and results from search engines This was the time when Google comes to Lately we have witnessed a change here. scene. They provided a revolutionary algorithm Information is reaching users from social to measure relevance: the PageRank. networks. This is due to people finding more Google's pagerank takes care of which relevant information from their contacts than documents are the most selected by users from search engines. This year, Facebook searching about a same topic. It is overtake Google in traffic, also Twitter is autoadaptative, and best results are shown in driving more traffic each day. best postitions for users to find them easily. If this trend keeps going, we could be This algorithm brought a revolution to talking about a change of reign in information information retrieval and has been the best retrieval quite soon. solution so far. 1.2.2 Personalized Model 1.1.3 Problems The great advantage of social networks is that Focusing no the pagrank, we can see that it you only see the information you want to see, works in only one way, clicked results get a and you can filter it by selecting which contacts favorable treatment being promoted to higher are related to you. positions while not clicked results remain in Social networks can be seen as a new way of is recommending you the person that can retrieving information where the user decides answer your query the best. who are the sources that provide his An important point here is that you need information. people wondering to answer questions. Seen Facebook can be the better example of a Aardvark results, people do like to answer classic social network. It allows you to keep questions but, remaining your social search track of your contacts and see information they engine base on users can be a failure if the are producing or information they find number of active users falls down. important. That is the great point, they keep in It is to be noticed that the village paradigm mind that possibly you are interested in the works very well on queries that deal with same information your contacts are. opinion, advice, experience or Twitter is, so far, the best mixture of search recommendations. When having a conventional engine and social network. It is oriented to query that is factual or navigational, and it is information and its main point is being the based on keywords, the library paradigm, fastest spreading news. It keeps the advantage which is the one used in classic search engines, of being social, as the information user gets performs better than the village one. from Twitter is provided by its social contacts Working with the village paradigm makes and uses a, still quite poor, topic filtgering necessary to have an statistical model for system to help discovering the best information routing questions to potential answerers and providers. also needs a method for indexing people. Both things are relationated, users are indexed 1.2.3 Advantages according to the topics where they knowledge is relevant, and queries are routed to people There are tow huge components in getting having a relevant knowledge on the query topic. information: the information you retrieve when An extra problem to deal with when working searching for it and the information that reaches on social search is understanding the natural you altough you are not looking for it. language. Queries in social search engines In a classic environment, the information based on the village paradigm tend to be that reaches you directly is usually treated as questions in natural language, usually related to unwanted information because it barely results locations, times and more specific issues. It is to be relevant to you. needed to analyze the query and extract relevant The social component is making a point information to route it according to its topic. It here. Most of the information we consume, is not as hard as understanding natural language without considering its relevance, comes per se, but the necessity of understanding the directly to us. A social approach can make that topic of a query is one of the important steps. information to be relevant and that fact is an advantage that search engines can not let go through. 3 System Description 3.1 Concept and Goals 2 State of the Art 3.1.1 Main Concept 2.1 Social Search The main goal of this document is to describe 2.1.1 Existing Approaches an approach to a mixture model of classic and social search. Both paradigms have been taken Social Search is quite an unexplored field. in count, the library and village one. Some approaches has been made, such as the Village Paradigm and some implementations of The idea is to make classic search engines these approaches are working fine, such as more powerfull by adding some Aardvark. recommendation based con social realtionships. The village paradigm is based on finding the As Aardvark recommends answerers according right person to answer a query, rather than the to the introduced query this approach will deal document that contains the information that with recommending the best piece of answers the query. information according to a classic search and Using the village paradigm remains more to direct recommendations from users. a recommendation system than to a search engine. In fact, what the village paradigm does Some of the problems search engines are 3.2.3 Ranking Algorithm suffering where metioned before, and they can None of the existing algorithms, classic ones as be mitigated using social recommendations. the pagerank and the Aardvark one fits accurately on this model. 3.1.2 Goals to Achieve The algorithm needed will deal with two The main goal is to reponderate results from a important factors: the selected result from the search engines according to users search engine and the recommendation given recomendation and get a more relevant results by the user. organization. An improvement of this algorithm will also Classic search engines do a reponderation on include a user indexer to relate users and topics, their own results based on the ones that get as the Aardvark system does, so the system can clicks from users. This makes that give more relevance to reommendations made reponderation is done without taking in count if by users with a proven knowledge in the query the results satisfy the user or not. topic. The approach proposed in this paper is This will also allow to apply the similar to that one, but the reponderation will “following/follower” model used by Twitter, to take in count the user satisfaction with the search environments. Users could be notified selected result, as recommendation or when a relevant user recommends a result on a penalization will be done after the information topic related to previous queries. is seen by the user. Notice that this also allows to mitigate the 3.3 Usability SEO problem that affects to the most of classic 3.3.1 Simple Query Interface search engines. One of the problems recommendation systems 3.2 Architecture have to deal with is the user interface. Convencing the user to evaluate if something is 3.2.1 Information Source recomendable or not can not be made throught This initial approach will be held as a little a heavy process. experiment so, initally, there will be no self User interface to introduce a query should be indexed information. minimalistic and clean. The result lists has to be The information source of our social search simple but containing enough information to system will be a classic search engine, allow the user to select the result that fits best specifically Google search engine. his query. This provokes that our social search system will not be able to work standalone, but will 3.3.2 Recommendation allow us to reduce the initial size of our The interface for recommending a result should database as we will only store information be the the least intrusive possible. related to recommendations made by users. The idea is to show a small bar in the browser window which showing the content of 3.2.2 Database the selected result. That bar will able the user to Information on recommendations gather by the recommend the result and even to rate it. system will be stored in a database as well as information related to users and their social 4 Future Work and Conclusions relationships. This is only a minimal specification on a Common databases are not the best option different approach to social search so, many for social based systems. Working with future works can be borned from here. relationships between users makes the relational As a first option, a future work will be to database model obsolete as it has a poor replace the source search engine with a performance and lacks of scalability. The metasearch engine, which will retrieve unique proper solution will be to use a graph database, results from different search engines. which have been created with social needs in A suggestion system for powerful users will mind and are being adopted by the most be also a great addition to the system. important projects related to social technologies. 5 Referencias bibliográficas El primer párrafo de cada sección no llevará sangría, los restantes sí. The Anatomy of a Large-Scale Social Search Engine (Damon Horowitz and Sepandar D.Kamvar) Information Retrieval – Introduction and Survey (Norbert Fuhr) A Survey on Web Information Retrieval Technologies (Lan Huang) Information Retrieval: A Survey (de Greengrass)