Automated Recommendation Rule Acquisition for Two-Way Interaction-based Social Network Web Sites

Automated Recommendation Rule Acquisition for Two-Way Interaction-based Social Network Web Sites YSKim yskim@cse.unsw.edu.au School of Computer Science and Engineering The University of New South Wales

2052 Sydney NSW Australia

AMahidadia School of Computer Science and Engineering The University of New South Wales

2052 Sydney NSW Australia

PCompton compton@cse.unsw.edu.au School of Computer Science and Engineering The University of New South Wales

2052 Sydney NSW Australia

AKrzywicki School of Computer Science and Engineering The University of New South Wales

2052 Sydney NSW Australia

WWobcke wobcke@cse.unsw.edu.au School of Computer Science and Engineering The University of New South Wales

2052 Sydney NSW Australia

MBain School of Computer Science and Engineering The University of New South Wales

2052 Sydney NSW Australia

XCai xcai@cse.unsw.edu.au School of Computer Science and Engineering The University of New South Wales

2052 Sydney NSW Australia

Automated Recommendation Rule Acquisition for Two-Way Interaction-based Social Network Web Sites 48695524E4AA1A88D7C8B6AA3A9D6A68 GROBID - A machine learning software for extracting information from scholarly documents H 2 8 [DatabaseManagement]: Database Applications-DataMining recommendation systems

A problem with social network web sites for activities such as dating or finding new friends is that often there is little positive response from those contacted. In this research we investigated historical data from a large commercial social network site to establish which subgroups of people were most likely to respond to a particular individual. Our two-way interaction model developed a table for each attribute to determine which pair of values for sender and recipient gave the best response rate. From all the attributes the user profile of a likely responder was created, but then less significant attributes were removed. With this simple technique we were able to demonstrate that where users had contacted people the system would have recommended, the success rate was 29.4% compared to a baseline success rate of 16.6%. This represents a very considerable increase in the likelihood of getting a favourable response. We are now planning a study that provides prospective recommendations to actual users, based on our model.

INTRODUCTION

With the ever-increasing use of Web 2.0 social networking web sites, recommender systems can be used to suggest the best matching participants. In this case, it is necessary to consider a two-way interaction model, where a user, called sender, sends a message to another user, called recipient and the recipients reply positively or negatively to the sender. Within this model, the recommendation method suggests a group of candidate recipients who are more likely to reply positively to the sender.

Recommendation methods for two way interaction differ from one-way interaction model, because the recipients in the two-way interaction can choose their response whereas the items in oneway interaction passively receive the user's actions. Though many recommendation methods have been researched and commercialized based on the one-way interaction model, including Amazon [1], Google [2], and Neflix [3], it is not clear whether they can also be successfully applied to the recommendation problems in two-way interaction.

In our research, three different rule-based recommendation methods, which employed different assumptions on the preferences of the sender and the recipient, were compared to a collaborative filtering method, a typical one-way recommendation method.

Recommendation Rule Learning Method

For a given user, our method learns recommendation rules using profiles and the history of interactions between the senders and the recipients. In summary, our method creates interaction look-up tables for each attribute based on past interaction data. For each attribute value of a given user, the method finds a value for the same attribute (called the best matching attribute value) of a subgroup of recipients based on three different criteria -sending activity (SA), receiving activity (RA) and success rate (SR). Sending activity (SA) is simply the number of contacts send by the sender group to the recipient group. It suggests the sender's interests in the recipients. Receiving activity (RA) is the number of contacts sent from the recipient group to the sender group. It suggests the recipient group's interest in the senders. Success rate (SR) is the ratio of the number of positive responses over the number of interactions from senders to recipients. Success rate represents both senders' interests in recipients and vice versa. Once the best matching attribute values for all attributes of a given user are selected, it is necessary to find a subgroup of recipients who satisfy all these attribute values. Given that the number of attributes is large, it is possible that no recipients may satisfy all attribute values. Therefore, it is necessary to select more significant attribute values from the best matching attribute values. For this purpose, we used the weighted lift, which represents the normalized 'interest of the sender in the recipients', who have specific attribute value. The weighted lift is calculated as follows: For a given attribute value of a sender ‫ݏݒܽ(‬ ), let its best matching attribute value be ‫ݎݒܽ‬ . The interest of a sender subgroup who has attribute value ‫ݏݒܽ‬ in the recipients who has ‫ݎݒܽ‬ is:

‫ܫ‬ ௩௦ →௩ = ௦ ೌೡೞ → ೌೡ ௦ ೌೡೞ →ோ (1)

where ‫ݏ‬ ௩௦ → ‫ݎ‬ ௩ and ‫ݏ‬ ௩௦ → ܴ represent the number of interactions sent from a sender subgroup defined by ‫ݏݒܽ‬ to a recipient subgroup defined by ‫ݎݒܽ‬ and to all recipients ܴ respectively. As each attribute has a different number of attribute values, the 'interest of the sender in the recipients' ‫ܫ(‬ ௩௦ →௩ ) is normalized as follows:

After calculating the weighted lift ( ω ) of all best matching attribute values, the method adds best matching attribute values to the condition of a recommendation rule from high to low weighted lift (ω). This process is repeated until there are no more pairs of attributes or there is no training data for the current rule. Finally the method chooses the best rule that shows the highest success rate and exceeds a threshold for statistical significance.

EXPREIMENTAL RESULTS

Data Sets

The social network site we used provided two types of data -user profile and user interactions. In total, 32 attributes were used for our recommendation methods. User interaction logs contain contact history between users, identifying types of messages sent and received. Reply messages were classified into positive and negative and accordingly each interaction is also classified as a positive or negative interaction. A failure to reply was taken as a negative interaction. The data sets are summarised in Table 1. Train I was collected for our rule learning method. Train II was collected for the CF-based method from March, 2009 (one month). Preliminary data analysis using the CF method over different time periods showed that a training period of one month was appropriate. Test data were collected from the first week of April for evaluation immediately following the CF training period, to give it the best chance of performing. The collaborative filtering (CF) method is based on [1].

Results

Rule acquisition results with different best matching attribute value selection criteria are summarized in Table 2. The RA method produced the largest number of rules (8,739), followed by the SA method (6,534) and by the SR method (146). Note that these methods do not produce rules in the conventional sense, as a rule is constructed for each user for which a recommendation is made. Usage indicates the number of senders covered by each rule, on average. Obviously the more rules, the less users covered. Of more interest is the number of conditions in a rule. On average the SA method and the RA method used more condition elements, 8.62 and 7.90 respectively than the SR method with 2.71 per rule.

Obviously the SR method created more general rules, while the SA and RA methods created more specific rules.

Recommendation performance of each method was measured by coverage and success rate. By coverage we mean the fraction of users for which the recommender is able to make a recommendation. The SR method has the highest coverage because it has more general rules. The difference between SA and RA is interesting. The SA method has a smaller number of more specialized rules giving it the lowest coverage -slightly lower even than the CF method. The SA, RA and SR methods all try to identify the characteristics of a recipient who is likely respond to a particular type of sender. The problem with the SA method is that it does not take into account the recipients' interests at all, so that we end up with highly specialized rules about sender preferences -and since these highly specialized rules are constructed from features considered independently, there is a greater chance that the test data may not contain recipients who match these rules.

The success rate of each method has no significant differences between the SA method and the CF method. They performed slightly better than the test period success rate. The CF method had similar limitations to the SA method as it only considered sender preferences. The success rates of the SR method and the RA method are higher than SA and CF, for the obvious reason that they take into account the recipient's interest in the sender.

CONCLUSIONS

Because we are dealing with the intangibles of human preferences in seeking interactions with others, the highest success rate (29.4% for SR) obtained from our experiment is still low. However, this is a considerable improvement over the baseline success rate of 16.6%, which comes from senders' unguided choices about whom they would like to communicate with, and who is likely to respond positively. The improved success rate of 29.4% comes from the senders who happened to choose the corresponding recipients we would have recommended. This means, there is enormous potential for providing actual recommendations to the current users that could significantly increase the chance of a favourable response. We plan to conduct a study that provides actual recommendations to some of the current users using our model.

ACKNOWLEDGMENTS

Table 1 . Training and Test Data Set Data Set Total Interactions1Positive Interactions %Negative InteractionsTrain I3,888,034689,41917.73,198,615Train II1,357,432236,52117.41,120,911Test284,70247,46816.7237,234

Table 2 Experimental Results2RulesRule UsageAvg. ConditionCoverage (%)Success Rate (%)SA6,5343.18.6267.117.9SR146201.12.7196.629.4RA8,7392.97.9082.421.1CF74.017.3

This research has been supported by the Smart Services Cooperative Research Centre.

Amazon.Com Recommendations: Item-to-Item Collaborative Filtering GLinden BSmith JYork Internet Computing 7 1 2003 IEEE. The Pagerank Citation Ranking: Bringing Order to the Web LPage SBrin RMotwani TWinograd 1999 Stanford InfoLab Technical Report Collaborative Filtering with Temporal Dynamics YKoren 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Paris, France

ACM 2009