<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Location Based Services Recommendation with Budget Constraints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hao Wei</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bei Shi</string-name>
          <email>bshi@se.cuhk.edu.hk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Junwen Chen</string-name>
          <email>rolanchen@tencent.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Internal Search Platform Department, Tencent Technology Co. Ltd</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Chinese University of Hong Kong</institution>
          ,
          <addr-line>Shatin, N.T., Hong Kong, hwei</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>48</fpage>
      <lpage>56</lpage>
      <abstract>
        <p>The paper reports the approach of our team \711" to recommend Brick-and-Mortar Store with Budget Constraints. By extensive studies, we nd that the history data is very e ective in this location based service recommendation and we exploit the history data in our recommendation from both the users' aspect and the merchants' aspect. Moreover, we extract predicative features from the train set and implement some existing methods, e.g., Xgboost. Our nal ensemble method obtains a F1 score of 0.452 and reaches the third place in the SocInf'16 competition.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>consider the limit of budgets. Otherwise, the exceeding of budgets will
decrease the nal performance.</p>
      <p>In the competition, given the user behavior records of online website and
onsite merchants, we need to recommend the potential merchants that users will
visit in the future. A test set of users on the speci c location are also provided.
For each merchant, the information about branches and budgets should also
be considered practically. To prevent over tting, this competition contains two
stages. In the rst stage, some merchants are removed from the evaluation.
Speci cally, the involved data are shown as follows.</p>
      <p>{ Online user behavior before Dec. 2015. (ijcai2016 taobao)
{ Users shopping records at brick-and-mortar stores before Dec. 2015.
(ijcai2016 Koubei train)
{ Merchant information. (ijcai2016 merchant info)
{ Prediction result. (ijcai2016 Koubei test)</p>
      <p>To tackle the task, we exploit historical data for each user and each
merchant. Moreover, we extract predicative features from the train set and
implement Xgboost and Factorization Machine in the task. Considering the budgets,
we proposed a method to prune our outputs into the results which satisfy the
constraints. By ensembling our result of di erent methods, we obtained a F1
score of 0.452 and reached the third place.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Exploiting Historical Data</title>
      <p>We analyze the given train data and split the Koubei train data into two parts.
The rst part is the train data with the date before November 23 and we treat
it as the new train data, while the second part is the data with the date after
November 23 and we treat it as the new test data in our analysis. By extensive
studies on the generated dataset, we nd that the history data is very e ective
in this location based service recommendation. That is, given the location, users
tend to visit the same merchants they have visited before. From the angle of
merchant, the popular merchants attracting many users in the history will
remain to be visited by many customers in the near future. Based on the above
observations, we exploit the history data from both the users' aspect and the
merchants' aspect.
2.1</p>
      <sec id="sec-2-1">
        <title>Users' history</title>
        <p>We nd that some queries (user, location) in test le have been occurred in
the train data. Based on the observation that people tend to visit the same
merchants visited before, a simple strategy is to answer these queries with the
merchants the users have visited in those locations. However, the frequency of
the merchant to be visited by the user a ects the recommendation quality. The
higher the frequency is, the higher accuracy of the recommendation we can get.</p>
        <p>For example, if the user visit a merchant more than 6 times in the history, we
can get over 90% success rate when recommending the merchant to the user
in the future. On the other hand, every merchant has the budget constraint
which may not be able to feed the need of all queries (user, location) that have
visited the merchant before. To utilize the information of frequency, we represent
the history data as a set of tuples (user, location, merchant, frequency) where
frequency counts the number of tuples (user, location, merchant) occur in the
history. Then we sort the tuples (user, location, merchant, frequency) in the
order of decreasing frequency and recommend merchants to the queries by this
order until the budget ran out. Since we nd that every user visits only about
1.3 merchants in a place according to the train data, we limit the maximum
number of merchants to be recommended as 4.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Merchants' history</title>
        <p>After analyzing the train data, we nd that di erent merchants have di erent
popularity. For example, we nd a merchant named '820' takes nearly a quarter
of data records in the train data. In the following, we assume every merchant
opens in only one location. For the case that the same merchant open in di erent
locations, we treat them as the di erent merchants because even the same
merchant may have di erent popularity in di erent locations. Supposed that 90% of
users coming to location X will visit a merchant Y . In this case, we can achieve
90% precision for queries occurring in the location X by recommending merchant
Y to users in location X no matter who they are. The percentage of pair (user,
location) visiting a merchant is called the merchant's popularity. To calculate
the popularity, we rstly count the number of users coming to a location X as
NX and then count the number of users visiting to the merchant of that location
as NY . So the popularity of the merchant Y is calculated as PY = NNXY , PY 1.
We cannot get the popularity of every merchant in September directly but
estimate it by every merchant's historical popularity. Note that the popularity of
a merchant may vary with time. The popularity should be obtained based on
the history data closed to September. After many testings, we choose to use the
data with the date after November 23 as the source of computing popularity.</p>
        <p>To improve the quality of recommendation, two key issues should be
emphasized. The rst one is how to make good use of merchant budget. It is obvious
that the higher popularity the merchant has, the higher precision we can get.
So we sort the pair (merchant, popularity) in the order of decreasing popularity
and do the recommendation by the order. To get a higher F1 score, after
recommending merchant Y to a query, we deduct the expected value of precision PY ,
rather than 1, from the budget of merchant Y . Therefore, although a merchant
Y has only BY budget, we may recommend BPYY , whhich is larger than or equal to
BY for PY 1, users to it because we keep the same recommendation precision
but get higher recall for this merchant because we get more, on average BY ,
correct answers in this case. The second issue is what popularity threshold we
should set to obtain a higher F1 score. We denote Ns as the number of submitted
merchants, Nr as the number of correct answers in the submission, Nt as the
total number of correct answers of the whole test le. So the precision and recall
are P = NNrs , R = NNrt . When answering a query with a merchant of popularity p,
we investigate under what popularity p, the F1 score can remain the same value.
We formulate it as the mathematical problem below,</p>
        <p>F1 =
2</p>
        <p>Nr</p>
        <p>Ns
Nr + Nr
Ns Nt</p>
        <p>Nr
Nt =</p>
        <p>So we set the popularity threshold as 12 F1 at rst and recommend merchants
with popularity higher than the threshold. Since the estimated popularity may
be di erent from the real values, we decide to raise the threshold a little bit after
several testings and set it as 0.25.
Recommendation by merchants' popularity takes no consideration for every
user's own feature. In fact, we can utilize users' taobao browsing record to do
some ltering during the recommendation. For example, users browsing online
sellers of electronic products should rarely visit on-site merchant of women's
clothes. Therefore, we build up a connection table from online sellers to on-site
merchant. If there exist records that users browsing online sellers A also visit
the on-site merchant B, we put (A, B) into the connection table. Based on the
connection table, for users having taobao records of browsing online seller A, we
will only recommend on-site merchants that exist in connection table together
with A. By this way, we lter the recommendations with low precision.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Modeling Techniques</title>
      <p>In the competition, we have tried some methods for recommendation, including
Xgboost, Logistic Regression and Matrix Factorization. Moreover, we generate
all kinds of features for each user, location and item. In this section, we will
introduce these methods and features in detail.
3.1</p>
      <sec id="sec-3-1">
        <title>Data Set Generation</title>
        <p>We formalize the original recommendation problem as binary classi cation. For
example, in the original visiting history, user u has visited merchant m in the
location l. For this training record, we can generate a 3-tuple, i.e., (u; l; m). For
each 3-tuple in the original visiting history, we can generate a series of train
records. First, we can produce positive instances by attaching a true label with
the original 3-tuple. Second, we construct the negative instances by considering
all the other possible merchants in the location. For each merchant m0 in l
except m, we can generate the negative instances (u; l; m0) which label is false.
Therefore, we generate a training data set for the binary classi er. Similarly, we
can also generate a test data set according to the given users and locations in
the original test le. The recommended merchants are associated with an output
label of true. To balance the counts of positive and negative records, we sample
the negative records from the set of m0 instead of all the m0.</p>
        <p>To avoid over tting and test our methods e ciently, we also construct an
o -line data set denoted by D. The train set Ds are generated from the orginal
records of history before 20151123. The corresponding test set Dt is the data
after 20151123. We test our algorithms on D to ensure the improvements of the
on-line judge.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Feature Engineering</title>
        <p>The generated features play an important role in modeling. The essential task
in the recommendation problem is to design e ective features. From the train
set, we observe that some users usually visit merchants in their history records.
Some users like to spend more time on popular merchants in that location.
Some branches of popular merchants emerges after opening. Base on the above
investigations, for each 3-tupe (u; l; m), we design a set of features to depicts
these observations, which are shown as follows.</p>
        <p>{ history exists flag: the binary ag indicating there exists other history
records of (u; l; m) before the timestamp t.
{ history count: the normalized count of history records (u; l; m) before the
timestamp t.
{ user id: the one-hot encoding of user id.
{ merchant id: the one-hot encoding of merchant id.
{ merchant location count: the normalized count of (l; m) in the training
set.
{ merchant count: the normalized count of m in the training set.
{ open interval: the time interval between tm;l and 20150701. We select the
rst record of (l; m) by the order of the timestamp and tm;l denotes the
timestamp of the rst record which menas the opening date.
{ time interval: the time interval between t and tm;l. t is the timestamp of
the record. This feature captures the trend of m in l.
{ taobao records count: the count of online records of taobao for user u.
{ taobao category id: the one-hot encoding of categegory id for the taobao
records that user u clicked or bought.
{ taobao seller id: the one-hot encoding of seller id for the taobao records
that user u clicked or bought.
{ taobao buy ratio. the ratio between the count of user u bought and clicked.
{ taobao buy count: the count of records that user u bought in the on-line</p>
        <p>
          Taobao.
{ taobal click count: the count of records that user u clicked in the on-line
Taobao.
{ user latent vector: the latent vector for user u. We generat the latent
vectors with the SVD factorization [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] of the history record provided by the
scikit-learn package.
{ merchant latent vector: the latent vector for merchant m. Similarly, we
generate the latent vector of the merchant m using SVD factorization.
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Xgboost</title>
        <p>
          Xgboost [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] is a large-scale machine learning method which can be used to build
scalable learning systems. Xgboost has been used by a series of kaggle winning
solutions as well as KDD Cup winners, which prove the e cacy of this method.
It is fast and optimized for out-of-core computations. XGBoost is short for
Extreme Gradient Boosting, where the term Gradient Boosting is proposed in the
paper [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The method of boosted trees has been around for really a while. It
is trained with decision trees of a xed sizes as base learner, which is robust to
outliers. As a tree-based algorithm, the gradient boosting decision trees can also
handle non-linear feature interactions.
        </p>
        <p>In our competition, we use the implementation of Xgboost provided in the
github3. We have tried two di erent optimization functions. One is the loss of
Logistic classi cation and the other one is the rank loss. We mainly tune the
parameters including the maximum depth of tree, the step size shrinkage used
in update to prevents over tting and minimum sum of instance weight(hessian)
needed in a child. Finally, we set the maximum depth to 10, the step size to 0.3
and the minimum sum of instance weight to 2.0 in our competition.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Factorization Machine</title>
        <p>
          The technique of factorization machine(FM) has been widely used in the
recommendation problem [
          <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
          ]. FM method can combine collaborative ltering with
other large categorical features, especially sparse data. The time complexity of
FM is O(n), which makes FM applicable to handle large scale data size problems.
For user u and the branch m in the location l, we can de ne the factorization
machine as:
rui =
+ bu + bm + qmTpu
(3)
Where is the global bias. bu and bi is the bias of user and merchant respectively.
pu is the corresponding latent vector for user u. Similarly, qm is the latent vector
of the branch of m in the location l. For learning the model parameters, we
can use the algorithms such as Stochastic Gradient Descent (SGD) and Markov
Chain Monte Carlo (MCMC). In the competition, we exploited SGD to infer the
parameters.
3 https://github.com/dmlc/xgboost
For each candidate 3-tuple (u; l; m) in the test set, our binary classi cation model
will output the probability of the instance to be positive, i.e., the probability
that user u visit the merchant m in the location l. Similarly with the threshold
in Eq. 2, we set the threshold as one half of F1 measure.
        </p>
        <p>Given the output probabilities of instances related to m, we should also
consider the budget of mercant Bm. Assuming that we have Nm instances of m
to recommend. The precision with respect to m is minfBm;Nm0g . N m0 is the count
Nm
of the right instances for recommendation. Obviously, we should recommend</p>
        <p>Bm instances in total at least. Moreover, we should consider the instances
with high probability to be positive at rst. Therefore, considering merchant m,
the algorithm of post-processing is shown as follows.
3.6</p>
      </sec>
      <sec id="sec-3-5">
        <title>Ensemble</title>
        <p>Our method based on history data has a higher precision compared with our
learning method. Therefore, we ensemble our methods with the following steps.
1. Use the method based on history data in Section 2, we can get the result R1
which has a higher precision and the slack of the budgets for each budget.
2. We use our learning methods in Section 3 to recommend the merchants with
the constraints of the slack of the budgets. We get the result R2.
3. After concatenating R1 and R2, we output the nal result.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Results</title>
      <p>Our experimental dataset consists of three parts,
{ ijcai2016 Koubei train: The train dataset have more than 1 million records
from about 230 thousand users. Most of these data is collected in the period
from October to November. On average, every user visits 1.05 locations and
visits 1.25 merchants in a location, which shows that users tend to visit the
same merchant in the future.
{ ijcai2016 merchant info: This le contains the budget constraint for every
merchant. There are about 6,000 merchants and every merchant has a budget
of at least 100. We nd that the budget resource is very limited for popular
merchants while the budget of the unpopular merchant is su cient enough
for recommendation. So the long tail e ect will in uence a lot for the nal
results.
{ ijcai2016 taobao: This le stores over 40 million users' browsing record in
Taobao. We mainly use it to predict the interest of users and perform ltering
for the nal recommendation.</p>
      <p>The evaluation metric used in this competition is the F1 score with budget
constraint. Here we list some of our experiment results step by step.
In this paper, we study the location based services recommendation problem,
which comes from the real applications. We design history based approach,
modeling by Xgboost, factorization machines for this task, and get a nal F1 score
of 0.45. We note that some issues still need to be further addressed. One is the
long tail merchants which actually have a high weight under the setting that
every merchant has a budget constraint. The other one is to build up a strong
connection between user's online behavior in Taobao and his on-site behavior in
Koubei. Although we design a ltering method based on it, we believe further
work can be made to use the connection for recommendation directly.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We thank the organizer of this competition, Tianchi, for providing an interesting
and challenging topic from real applications. We also thank other participants
for their helpful discussions in the forum and we learn a lot from them.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Xgboost: A scalable tree boosting system</article-title>
          .
          <source>arXiv preprint arXiv:1603.02754</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Svdfeature: a toolkit for feature-based collaborative ltering</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          <volume>13</volume>
          (
          <issue>1</issue>
          ),
          <volume>3619</volume>
          {
          <fpage>3622</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Greedy function approximation: a gradient boosting machine</article-title>
          . Annals of statistics pp.
          <volume>1189</volume>
          {
          <issue>1232</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Furnas</surname>
            ,
            <given-names>G.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deerwester</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Landauer</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harshman</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Streeter</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lochbaum</surname>
            ,
            <given-names>K.E.</given-names>
          </string-name>
          :
          <article-title>Information retrieval using a singular value decomposition model of latent semantic structure</article-title>
          .
          <source>In: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <volume>465</volume>
          {
          <fpage>480</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rendle</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Factorization machines with libfm</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology (TIST) 3</source>
          (
          <issue>3</issue>
          ),
          <volume>57</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>