=Paper=
{{Paper
|id=Vol-1809/article-07
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1809/article-07.pdf
|volume=Vol-1809
}}
==None==
<pdf width="1500px">https://ceur-ws.org/Vol-1809/article-07.pdf</pdf>
<pre>
        Learning to Identify Complementary Products from
                              DBpedia

                Victor Anthony Arrascue Ayala Trong-Nghia Cheng                            Anas Alzoghbi
                                           Georg Lausen

                                           Department of Computer Science
                                                 University of Freiburg
                                   Georges-Köhler-Allee 051, 79110 Freiburg, Germany
                                  arrascue|chengt|alzoghba|lausen@informatik.uni-freiburg.de


ABSTRACT                                                          stance the RS can expose the user interested in a product
Identifying the complementary relationship between prod-          to other products whose use is related to their desired prod-
ucts, like a cartridge to a printer, is a very useful technique   uct. Those products are referred to as complementary prod-
to provide recommendations. These are typically purchased         ucts [2]. For instance, a user who placed a guitar in the
together or within a short time frame and thus online re-         shopping cart might also be interested in purchasing a gui-
tailers benefit from it. Existing approaches rely heavily on      tar tuner, a case, or a book about learning how to play it. It
transactions and therefore they suffer from: (1) the cold         is important to notice that complementary products might
start problem for new products; (2) the inability to pro-         belong to a different category than that of the associated
duce good results for infrequently bought products; (3) the       product, e.g. Instrument Accessories, Bags & Cases and
inability to explain why two products are complementary.          Books in the example above. This means that there are also
  We propose a framework that aims at alleviating these           some relationships at the level of categories too. Some of
problems by exploiting a knowledge graph (DBpedia) in             them are evident, like Instruments and Instrument Acces-
addition to products’ available information such as titles,       sories, while some of them are not immediately obvious like
descriptions and categories rather than transactions. Our         Instruments and Books.
problem is modeled as a classification task on a set of prod-        The majority of the approaches to find complementary
uct pairs. Our starting point is the semantic paths in the        products make use of transactional data, e.g. by extract-
knowledge graph linking between product attributes, from          ing association rules or by applying other data mining tech-
which we model product features. Then, having a labeled           niques [7, 22, 27, 31, 32]. This means that new catalog
set of product pairs we learn a model; and finally, we use this   products have no chance to appear as complementary to any
model to predict complementary products for an unseen set         other product. In addition to this, these approaches might
of products. Our experiments on a real world dataset from         also produce noisy recommendations for products which are
Amazon show high performance of our framework in pre-             infrequently purchased and are not able to explain the rea-
dicting whether one product is complementary to another           son for the recommendation [18].
one.                                                                 Therefore, a model is required from which this relationship
                                                                  between a complementary product and its associated prod-
                                                                  uct can be learned. We believe that a graph representation is
Keywords                                                          the most suitable for this. Products, their attributes and the
Recommender Systems, Complementary Products, DBpe-                categories to which they belong, as well as the interconnec-
dia, Knowledge-graph, Supervised Learning, Cross-domain.          tions between them can be represented as nodes and paths
                                                                  in the graph. Having such a model would not only allow
                                                                  us to find links between a complementary product and its
1.   INTRODUCTION                                                 associated product without any need for transactions, but
  Online retailers like Amazon or eBay sell millions of prod-     also to provide explanations of why this relationship exists.
ucts from several categories and continuously expand their           However, learning to discriminate between different kinds
catalogs. Being aware that browsing a large catalog can           of relationships from those interconnections within a com-
be overwhelming for a user and can therefore have a neg-          plex semantic graph model is still an open problem. See
ative impact on revenue [3, 24], it is not a surprise that        for instance the feature vector illustrated in Figure 1, which
many of these retailers are equipped with Recommender Sys-        represents a pair of products (pi , pj ). Each of the features
tems (RS) [28]. While recommending products based on the          should indeed reflect a property which characterizes both
user’s taste is certainly an important task any RS has to         products as a whole, or a certain interaction between those.
carry out, special attention has to be paid at the moment         A class label, complementary or non-complementary is as-
in which a purchase is about to be made. In this circum-          sociated with each sample. While trying to fit a classifica-
                                                                  tion model which learns how to discriminate the relation-
                                                                  ship from a set of samples like those might seem the obvious
Copyrights held by author/owner(s).                               way of solving the problem, the problem remains: what are
WWW2017 Workshop: Linked Data on the Web (LDOW2017)               those features that reflect some property of pairs of prod-
ucts? What do they reflect? Is this property related to the       arranged: acoustic guitars are a certain kind of guitar and
fact that one product might be complementary to the other         these are a certain kind of musical instrument.
one? This work aims at addressing these questions.                   Since RDF is used as an underlying data model, we need
                                                                  to introduce some notation. Let I, B and L be the set of all
                                                                  IRIs (Internationalized Resource Identifiers), the set of all
                                                                  blank nodes and the set of all literals, respectively. Then a
                                                                  triple (rs , p, ro ) ∈ (I ∪ B) × I × (I ∪ B ∪ L) is called
                                                                  an RDF triple. An RDF graph G is a set of RDF triples.
                                                                  Moreover, let V be a set of variables. A tuple of the set
                  Figure 1: Feature vector                        (I ∪ L ∪ V ) × (I ∪ V ) × (I ∪ L ∪ V ), e.g. (?x, p, ro ), is
                                                                  called a triple pattern, in which variable names are prefixed
                                                                  with the question mark (“?”) symbol. A triple pattern can
Contributions. The contributions of this paper can be             be evaluated according to the SPARQL semantics to find all
summarized as follows.                                            those mappings between variables and resources from the
                                                                  graph which would make of that tuple a valid triple. For a
     1. We propose CompLoD, a novel and extensible frame-         more comprehensive description of RDF and SPARQL we
        work that predicts the complementary relationship be-     refer the reader to [13, 25].
        tween two products by using only textual information         A path in graph G between a source resource rs and a
        such as titles, descriptions and categories.              target resource ro is denoted by ρ(rs , ro )(p1 ,p2 ,...,pt ) , where
                                                                  (p1 , p2 , ..., pt ) is the sequence of predicates that connects rs
     2. The approach does not require transactions as other
                                                                  to ro 2 . A path can be recursively defined as follow:
        approaches and therefore, it is immune to the cold-
        start problem for new items.                                   • ρ(rs , ro )(p) is a path of length 1 if there exists a triple
                                                                         (rs , p, ro ) ∈ G which connects rs to ro .
     3. Our scheme bridges the gap of how to encode infor-
        mation from a knowledge graph into a relevant feature
                                                                       • ρ(rs , ro )(p1 ,p2 ,...,pt ) is a path of length t if there exists
        set to fit a classification model and predict the com-
                                                                         a resource rl such that ρ(rs , rl )(p1 ,p2 ,...,pt−1 ) is a path
        plementary relationship.
                                                                         of length t − 1 and ρ(rl , ro )(pt ) is a path of length 1.
     4. Rather than building a knowledge graph from scratch,
        we believe that Semantic Web standards have enough        In our RDF graph each of the above-mentioned elements,
        to offer in this regard. Therefore, we use DBpedia1 , a   products, attributes and categories, corresponds to an RDF
        well-known Linked Open Data (LoD) dataset, as the         resource. Let P ⊆ I = {rp1 , rp2 , ..., rpn } be a subset of all
        core of the knowledge graph. This dataset is the struc-   IRIs representing a set of products, A ⊆ I = {ra1 , ra2 , ..., ram }
        tured counterpart of Wikipedia and it is modeled as an    is a set of products’ attributes and C ⊆ I = {rc1 , rc2 , ..., rcl }
        RDF graph. We thereby demonstrate the usefulness of       a set of products’ categories. From now on products, at-
        Semantic Web data.                                        tributes and categories will denote the RDF resource which
                                                                  identify these elements. Moreover, we will use rp , ra and
   We evaluate the accuracy of our approach by conducting         rc to denote an arbitrary product, attribute and category,
experiments using Amazon data [14, 15] as our ground truth        respectively. We assume that each product is connected to
and find that it produces competitive results in comparison       its attributes and to the categories to which it belongs in
to those showed in [14] when taking into account that no          G. Moreover, there exist arbitrary connections in G among
transaction data is used. Our paper is structured as fol-         attributes or categories, i.e. there are no restrictions on how
lows: We aim at providing some notation and definitions           resources are connected.
used within this paper in section 2. In section 3 the work-
flow of CompLoD is explained in detail. Section 4 shows           Problem statement. Given two products (rpi , rpj ) in G,
the experiments that have been carried out to validate our        we would like to predict whether rpj is complementary to
system. In section 5 we present the related work. Finally,        rpi using G. Let comp(rpi , rpj ) be a function that returns 1
we present our conclusions in section 6.                          when this relationship holds, 0 otherwise. It is directional,
                                                                  i.e. comp(rpi , rpj ) = 1 doesn’t imply that comp(rpj , rpi )
                                                                  also returns 13 .
2.     NOTATION
   This section provides some fundamental definitions which
allow us to explain the system in detail. Our approach re-        3.     OUR APPROACH
quires modeling four elements within an RDF graph: prod-            Figure 2 illustrates a high-level view of our framework
ucts, attributes and the categories to which products be-         CompLoD. The system needs to process only textual infor-
long in addition to the relationships between all of these        mation of products, such as titles, descriptions and cate-
elements. Attributes are characteristics of a product which       gories. Example 1 illustrates the input expected for a prod-
are extracted from a product’s specification. Categories are      uct.
instead given and are classified in a taxonomy. For instance,
bronze strings is an attribute of a guitar. Moreover a gui-
                                                                  2
tar might belong to categories such as Musical instruments,         For simplicity, we assume (p1 , p2 , ..., pt ) identifies a unique
Guitars and Acoustic Guitars. Categories are hierarchically       sequence of predicates which connect rs to ro .
                                                                  3
                                                                    For instance, a guitar tuner is complementary to a guitar,
1
    http://wiki.DBpedia.org/                                      but the opposite does not necessarily hold true.
                                                 Figure 2: Workflow of CompLoD


  Title: Logitech M510 Wireless Mouse, Large Mouse, Com-            also has regularities with predictive power [20]. To make use
  puter Wireless Mouse.                                             of this, we need to integrate the products’ meta-data into
  Category: Electronics; Computers & Accessories; Computer          it by extracting structured information from the text. We
  Accessories & Peripherals; Keyboards, Mice & Accessories;         therefore use AlchemyAPI, a NER tool. This identifies those
  Mice                                                              tokens which correspond to published LoD IRIs. Whereas
  Description:                                                      the tool supports the mapping to resources from several LoD
  Contoured shape with soft rubber grips for all-day comfort        datasets4 , we only store the information relative to DBpedia
  * Back/forward buttons and side-to-side scrolling plus zoom       and leave the possibility of exploiting multiple datasets to
  let you do more, faster Requires Logitech SetPoint software       future work. For instance, from the previous text some of
  * 2-year battery life eliminates the need to replace batteries.   the identified resources are5 :
  Battery life may vary based on computing conditions
                                                                      Categories
  * Comes with a tiny Logitech Unifying receiver that stays in
                                                                        Electronics: dbpedia:Electronics
  your computer - plug it in, forget it
                                                                         Computer & Accessories: dbpedia:Computer, ...
  * 3-year limited hardware warranty
                                                                      Attributes
  * Compatibility: Works with - Windows XP, Windows Vista,
                                                                         Logitech: dbpedia:Logitech, yago:Logitech, ...
  Windows 7, Mac OS X 10.4 or later, USB Port
                                                                         Operating system: dbpedia:Operating system, ...
  * Mouse Dimensions (height x width x depth): 2.56 in (65
                                                                         Microsoft Windows: dbpedia:Microsoft Windows
  mm) x 4.72 in (120 mm) x 1.61 in (41 mm)
                                                                         Mac OS X: dbpedia:Mac OS X, ...
                                                                         Microsoft: dbpedia:Microsoft, yago:Microsoft, ...
                                                                         IMac: dbpedia:IMac, yago:IMac, ...
 Example 1: Title, categories and description of a                       Windows Vista: dbpedia:Windows Vista, ...
                product (mouse)
As previously mentioned, attributes are extracted from the
description of a product. Moreover, a category is a struc-              Example 2: Resources identified from the
tured piece of information organized as a taxonomy. The               meta-data of the product showed in example 1
categories go from generic (Electronics) to more specific
(Mice). However, CompLoD doesn’t make use of the taxon-             It can be observed that some of the identified resources cor-
omy and sees all categories as independent.                         respond to categories, such as dbpedia:Electronics, while oth-
   Together all of this information is then processed in a          ers are attributes, such as dbpedia:Logitech (brand). To keep
pipeline-like fashion. First the product’s meta-data is trans-      the approach fully automated, we treat categories as text as
formed into a structured graph in which product attributes          well. The quality of this task highly depends on the effi-
and categories are matched against DBpedia resources (A).           cacy of the NER tool, each of which has its own limitations.
The resulting knowledge graph is a subgraph of DBpedia,             While most of the important resources are identified, some
extended with the products represented as synthetic nodes.          of them, like dbpedia:Computer mouse (category), which are
Being aware that not all attributes have the same role in           important to characterize the product, are not necessarily
characterizing a product and that not all the interconnec-          recognized. For more details about the expected perfor-
tions are able to provide the same quality of information,          mance of these tools we refer the reader to [8].
we decided to attach scores to the nodes and paths in the              Given the products and the identified resources, the RDF
graph using consolidated metrics (B). The results are used in       graph is built as follows:
the next stage to build the feature set for pairs of products.        1. First we build a basic RDF graph Gp . We create one
Finally, we train a classification model (C) and when this is            resource rp for each product and connect it to its at-
fitted, the framework is able to answer the question whether             tributes IRIs. Let Arp be the bag6 of attributes iden-
one product is complementary to another one. More details                tified in the input text of the product represented by
about the individual tasks are provided in the following sec-       4
                                                                      Currently supported to the date of publication are DBpe-
tions.                                                              dia, Freebase, US Census, GeoNames, UMBEL, OpenCyc,
                                                                    YAGO, MusicBrainz, CIA Factbook and CrunchBase.
3.1   Building a knowledge graph                                    5
                                                                      We use prefixes to shorten the namespaces:
  Our framework requires a knowledge graph as the starting          dbpedia: http://dbpedia.org/resource/
point and we opted for DBpedia. In addition to the deter-           yago: http://yago-knowledge.org/resource/
                                                                    6
ministic rules which are used to build the graph, DBpedia             The tokens can appear multiple times in the text. We have
      rp . We create a triple (rp , sp, ra ) linking the product   graph to terms and documents. This requires the considera-
      rp with each ra ∈ Arp using a synthetic predicate sp.        tion of a product rp as a bag of attributes {ra1 , ra2 , ..., rak }.
      If an attribute ra appears more than once in Arp , we        A category rc can be also thought of as a bag of attributes,
      create as many triples as the number of times ra ap-         namely those used to describe products of that category.
      pears in Arp , where each triple has a different synthetic   We will explain these metrics in more detail in the following
      predicate.                                                   sections.
  2. Let Crp be the set of categories to which the product         3.2.1     (Product) Attribute Frequency - Inverse Prod-
     belongs. We also create triples (rp , sp, rc ) that link                uct Frequency (PAF-IPF)
     products with their categories.                                  The representation of a product as a bag of attributes
  3. We enrich Gp by extracting a subgraph from DBpedia.           allows us to define a weighting scheme that reflects how well
     This subgraph contains all DBpedia resources identi-          an attribute ra is able to represent or describe a product rp .
     fied by the NER tool and all the resources reachable          We instantiate TF-IDF using a product as a document and
     within 2 hops from them. We ignore the direction of           its attributes as the terms within the document.
     the edges to do so. The reason why we limit the length                  PAF-IPF (ra ,rp ) = P AF(ra ,rp ) × IP F(ra )
     to 2 is to reduce the computational cost of computing
                                                                                                            |P |
     metrics on top of the graph (more details in the next                              IP F(ra ) = log
     section). We merge Gp with the extracted subgraph                                                    P F(ra )
     to form the Extended graph EGp . Note that attributes         Intuitively, P AF(ra ,rp ) counts the number of times a product
     and categories in EGp are interconnected.                     rp is described by an attribute ra , i.e. it counts the number
                                                                   of triples that link the product to an attribute. In a similar
  EGp is then the input of the next task, in which metrics
                                                                   way P F(ra ) counts the number of products that have ra as
are computed to measure the importance of an attribute or
                                                                   an attribute. |P | is the cardinality of the set P , i.e. the
a category for a product as well as the importance of the
                                                                   number of products available in G. The goal of IP F(ra ) is
semantic paths interlinking them.
                                                                   to reduce the score by a factor which is proportional to the
3.2    Computation of metrics                                      number of times the attribute is used to describe products.
                                                                   If an attribute is common to all products then it cannot
  In our knowledge graph every product is connected to its
                                                                   strongly characterize the product.
attributes and categories. However, those attributes might
have a distinct relevance in describing a product or even in       3.2.2     (Category) Attribute Frequency - Inverse Cat-
describing a category (seen as a set of products). Therefore,                egory Frequency (CAF-ICF)
we require some metrics that reflect the following intuition:
                                                                     In the same way we define the following:
   • How well can an attribute characterize a product, e.g.                  CAF-ICF (ra ,rc ) = CAF(ra ,rc ) × ICF(ra )
     is equalizer more representative than rechargeable to
                                                                                                           |C|
     describe a speaker ? Section 3.2.1.                                                ICF(ra ) = log
                                                                                                          CF(ra )
   • How well can an attribute characterize a category, e.g.
                                                                   which reflects to which extent an attribute characterizes a
     is 4K more representative than wireless for the cate-
                                                                   category. CAF(ra ,rc ) counts the number of times that an
     gory Television & Video? Section 3.2.2.
                                                                   attribute resource ra is used to describe a product rp that
   • How relevant is the information carried by a path con-        belongs to a category rc . CF(ra ) counts the number of cat-
     necting two resources? Let’s assume, for instance, that       egories in which the attribute ra is used to describe at least
     two product resources, a television (tv) and a speaker        one product which belongs to that category. |C| is the car-
     (s), are connected through several paths of different         dinality of the set C. The role of ICF(ra ) is the same as
     lengths in EGp .                                              that of IP F(ra ) .
     Is the path tv −connectivity          connectivity
                      −−−−−−−→ Bluetooth ←−−−−−−−− s that
                                                                   3.2.3     Path Informativeness
     goes through Bluetooth more important than another
     one that goes through Stereo? Which of these paths              In addition to the two metrics introduced above, we re-
     is the most informative? Does the length also play a          quire another one which reflects the degree of informative-
     role? Suppose that the connection is now                      ness a path carries with it. We use the definition of path
         connectivity          type         connectivity           informativeness presented in [26], which we tailored for our
     tv −−−−−−−−→ Bluetooth −−−→ W ireless ←−−−−−−−− s. Does
                                                                   needs. This concept is based on a similar instantiation of
     the informativeness decay with the length of the path?
                                                                   TF-IDF. RDF resources r are used as documents. However,
     Section 3.2.3.
                                                                   two kinds of documents are considered: those in which r
In order to provide answers to those questions and fulfill the     appears as a subject of a triple and those in which it has
requirements we employ Term frequency-inverse document             the role of an object. This allows us to define the following
frequency (TF-IDF)[19], a weighting scheme widely used in          metric:
Information Retrieval and text mining to determine how im-         Predicate frequency - Inverse Triple Frequency PF-ITF :
portant a term is to a document within a corpus. The design                    I-PF-ITF (p,r) = I-PF (p,r) × IT F(p)
of such metrics is achieved through different instantiations
                                                                               O-PF-ITF (p,r) = O-PF (p,r) × IT F(p)
of TF-IDF, i.e. we map different components of the RDF
                                                                                                            |T |
to preserve the information of how often the attribute occurs                            IT F(p) = log
in order to compute metrics in the next stage.                                                            |T (p)|
I-PF-ITF (p,r) is the Input PF-ITF of a term p within the               path as a feature either, for the same reason explained above
document resource r and O-PF-ITF (p,r) is the Output PF-                (high-dimensionality).
ITF of a term p within the document resource r. I-PF (p,r)                 We propose therefore a strategy to combine these metrics,
counts the number of resources that can be mapped to the                by following a very natural intuition which turned out to
variable ?x of the triple pattern (?x, p, r). This is also the          produce useful results. This allows one to design a low-
case for O-PF (p, r) in which the possible mappings for ?y              dimensional feature vector which can be used to model pairs
in (r, p, ?y) are counted. |T | is the overall number of triples        of products.
whereas |T (p)| is the number of triples in which p appears.
The role of IT F(p) is the same as for the previous instanti-           3.3    Features engineering
ations.                                                                    To the best of our knowledge there is no work which
  This metric allows us to define the informativeness metric.           addresses the issue of which features for pairs of products
For a path ρ(ri , rj )(p) of length 1 the informativeness I is          (rpi , rpj ) can be used to predict the complementary relation-
defined as:                                                             ship. We aim at filling this gap. Our rationale is based on
                          O-PF-ITF (p,ri ) + I-PF-ITF (p,rj )           the observation that for products belonging to different do-
       Iρ(ri ,rj )(p) =                                                 mains there might exist correspondences and dependencies
                                           2                            between their attributes. This has been validated in market-
  For paths of length t the informativeness is defined as the           ing, behavioral, and data mining research studies [4]. Our
sum of the informativeness of its components divided by the             hypothesis is that (1) the correspondences and dependencies
length of the path.                                                     tell us also something about the complementary relation-
                                                                        ship, and (2) this correspondence can be measured by the
                          ρ(ri , rj )(p1,...,pt−1 ,pt )                 informativeness carried in the interconnection between at-
        Iρ(ri ,rj )(p1,...,p               = (Iρ(ri ,rk )(p1) + ... +   tributes, weighted by those attributes’ relative importance.
                               t−1 ,pt )
                                                                        We therefore explain in this section how to leverage the in-
                     Iρ(rq ,rs )(p            + Iρ(rs ,rj )(pt ) )/t    terconnections of the knowledge graph and to combine the
                                      t−1 )
                                                                        metric scores in a meaningful way.
   Given the previous definitions, it is possible to define Imax ,         Let Arpi and Arpj be the bag of attributes of products
i.e. the maximum informativeness carried by a path connect-
                                                                        rpi and rpj . To simplify the explanation of the formulas to
ing two resources ri and rj .
                                                                        calculate the features we will from now on assume that rai
       Imax (ri , rj ) = max{I(ri , rj )ρ1 , ..., I(ri , rj )ρk },      and raj are attributes of rpi and rpj , respectively. More-
                                                                        over, we also assume that rci and rcj are the most specific
where ρi is an arbitrary path. Each pair of paths ρi and ρj             categories of rpi and rpj . For the sake of readability we in-
is different, i.e. there exists at least one predicate in both          troduce two weighting functions ωP and ωC shown in figure
paths which differs. Similarly, it is possible to define Iavg ,         3, which use PAF-IPF and CAF-ICF, respectively. In this
the average informativeness of all paths between ri and rj .            way, we can weigh a path between two attributes of rai and
                                                                        raj according to their characterization relevance. Next, we
       Iavg (ri , rj ) = (I(ri , rj )ρ1 + ... + I(ri , rj )ρk )/k       will present the design choices we made.
To summarize. The metrics allow us to determine the
importance of attributes in describing a product or a cate-             Maximal Informativeness over the most relevant prod-
gory, or to measure the amount of informativeness carried by            uct attributes (MIMPA): this feature combines PAF-IPF
the semantic paths connecting two resources. These are de-              and the concept of informativeness as follows. Given a prod-
signed considering different granularities for the definition of        uct rpi , we first search for the attribute rai that has the
document. In the case of PAF-IPF a document is a product                maximum PAF-IPF(rai ,rpi ) among other attributes. We do
                                                                                                M         M
and the terms are the attributes that describe the product.             the same for rpj . Let rai p and rajp be these attributes for rpi
In the second case, CAF-ICF, a document is a category and               and rpj , respectively. These are the most relevant product
the terms are the attributes used to describe products of               attributes. The feature is then designed as follows:
that category. In the case of informativeness documents are                                                       
                                                                                                           M    M           M     M
subject or object resources whereas predicates are the terms.              M IM P A(rpi ,rpj ) = Imax rai p , rajp · ωP (rai p , rajp )
CompLoD computes the relevance metrics in EGp by iterat-
                                                                                                                  M     M
ing over the set of identified attributes and then it proceeds          Recall that when computing Imax (rai p , rajp ) all possible
by searching paths linking between them.                                                       M        M
                                                                        paths which connect rai p to rajp are considered and from
All these metrics have been validated by other works and                these, it returns the max. informativeness value.
have been used to solve a variety of problems. However,
when one focuses again on the problem illustrated in fig-               Maximal Informativeness over the most relevant cat-
ure 1, i.e. the problem of designing a feature set that ap-             egory attributes (MIMCA): we focus again on the at-
plies to pairs of products, one can notice that these metrics           tribute bags of both products. However, rather than search-
cannot be used directly as features. For instance, PAF-IPF              ing for those which characterize the products the best, we
and CAF-ICF are metrics which apply to a single product.                take a look to the most specific category of each product and
While nothing prevents one from using them as features for              leverage CAF-ICF to find the attribute which characterizes
pairs, it is not possible to have each possible attribute as a          this category the best. Let raMi c and raMjc be these attributes
feature. This would lead to an extremely high-dimensional               for rpi and rpj , respectively. Then:
feature vector. And whereas the path informativeness be-
tween two products’ attributes could be thought of as a fea-
                                                                                                                   
                                                                           M IM CA(rpi , rpj ) = Imax raMi c , raMjc · ωC (raMi c , raMjc )
ture for pairs of products, one cannot have each possible
                        s                                        2
                                                                                                  s                                        2
                                            2                                                                      2 
                             PAF-IPF(ra ,rp ) + PAF-IPF(ra ,rp )                                       CAF-ICF(ra ,rc ) + CAF-ICF(ra ,rc )
                                       i   i              j   j                                                  i   i              j   j
    ωP (rai , raj ) =                          2
                                                                    ,         ωC (rai , raj ) =                          2


                                                 Figure 3: Weighting functions ωP and ωC


where rci and rcj are the categories of products rpi and rpj ,                Average of average informativeness over all attributes
respectively.                                                                 (AAIO): this feature is computed in the same way as AMIO,
                                                                              except that Iavg is used instead of Imax . In this way we con-
Average Informativeness over the most relevant prod-                          sider the complete flow of informativeness between each pair
uct attributes (AIMPA): this feature works in a similar                       of attributes.
way as MIMPA. When the most relevant attributes of the
           M         M
products, rai p and rajp , are identified, all paths connecting               Maximal relevance of common product attributes
these resources are considered. Therefore, the average infor-                 (MRCPA): when considering
                                                                                                    T        only the common attributes
mativeness Iavg is considered and not only the score of the                   of both products Arpi Arpj , we compute the PAF-IPF of
most informative one.                                                         the elements of the intersection and return the maximum
                                                                              value.
                                          
                                  M     M          M      M
   AIM P A(rpi , rpj ) = Iavg rai p , rajp · ωP (rai p , rajp )
                                                                                        M RCP A(rpi , rpj ) = max { ωP (rai , raj ) :
Average Informativeness over the most relevant cat-
egory attributes (AIMCA): in the same way as MIMCA,                                               rai = raj ∈ (Arpi ∩ Arpj )}
we consider here the attributes which characterize the cat-
                                                                              Maximal relevance of common category attributes
egory the most, but we use Iavg instead of Imax , i.e. we
                                                                              (MRCCA): this is the symmetrical counterpart of MR-
consider the average informativeness of all paths connecting
                                                                              CPA in which one considers the contribution of common
them.
                                                                            attributes in characterizing the product categories.
   AIM CA(rpi , rpj ) = Iavg raMi c , raMjc · ωC (raMi c , raMjc )
                                                                                        M RCCA(rpi , rpj ) = max { ωC (rai , raj ) :
Average of maximal informativeness over all attributes                                            rai = raj ∈ (Arpi ∩ Arpj )}
(AMIO): rather than considering the most relevant at-
tribute for each product, we focus here on the informative-                   Main product relevance (MPR): this metric measures
ness flowing from one product (from all of its attributes) to                 the relevance of the main product (first element of the pair
the other one. To do so we consider all pairs of attributes rai               at the input sample) on the basis of its attributes. Let
and raj , which belong to rpi and rpj , respectively, and com-                |(ra , ?p, ?o)| be the number of triples having ra as subject
pute the maximum informativeness carried by the links of                      and |(?s, ?p, ra )| be the number of triples having ra as object.
each product pair. As for previous features we also combine                   Then:
the relevance of individual attributes with the informative-                                           X
ness of the link.                                                                    M P R(rpi ) =          (|(ra , ?p, ?o)| + |(?s, ?p, ra )|)
Let P(rpi ,rpj ) = {(rai1 , raj1 ), (rai1 , raj2 ), ..., (raik , rajl )} be                            ra ∈Arp
                                                                                                                 i

a set of attributes tuples, where the first element is an at-
                                                                              Related product relevance (RPR): this metric is the
tribute of rpi and the second of rpj . Then:
                                                                              same as the previous one, but applied to the related product
                                                1                             (second element of the pair at the input sample).
                AM IOP (rpi , rpj ) =                      ·
                                          | P(rpi ,rpj ) |                    Interconnection between products (IP): this feature
                                                                            aims at reflecting the degree of connectivity between the at-
                X                                                             tributes of two products. Let ϕ be the number of attributes
                                 Imax (rai , raj ) · ωP (rai , raj )         pairs (rai , raj ) ∈ P(rpi ,rpj ) in which there exist at least one
                                                                   
    
        (rai ,raj )∈P(rp ,rp )
                        i   j
                                                                              path connecting them. Then:
                                                                                                                         ϕ
                                                1                                                 IP (rpi , rpj ) =
                AM IOC (rpi , rpj ) =                      ·                                                        |P(rpi ,rpj ) |
                                          | P(rpi ,rpj ) |
                                                                            Product similarity (PS ): to measure the degree of simi-
                 X                                                            larity, we use the Jaccard similarity to the set of attributes
                                 Imax (rai , raj ) · ωC (rai , raj 
                                                                  
                                                                             of both products:
        (rai ,raj )∈P(rp ,rp )                                                                                       T
                        i   j                                                                                   |Arpi Arpj |
                                                                                              P S(rpi , rpj ) =      S
                     AM IOP (rpi , rpj ) + AM IOC (rpi , rpj )                                                  |Arpi Arpj |
 AM IO(rpi , rpj ) =
                                         2                                    Other metrics: We designed and tried several other fea-
AMIO combines both, the characterization power of the at-                     tures whose contribution was small or minimal.
tributes in the role of product attributes and category at-                     In this section we introduced the features which played an
tributes.                                                                     important role in predicting whether one product is comple-
                                                                              mentary to another one. All the 12 features are the result
of a feature selection task performed to boost the accuracy      Computing metrics. Once the knowledge graph is built
prediction of the classifiers. These are shown on top of the     we compute the metrics explained in section 3.2. The PAF-
next page together with their relative importance7 .             ICF and CAF-ICF relevance scores and the informativeness
                                                                 of the paths are efficiently stored in the system using in-
3.4   Classification model                                       verted indexes for easy retrieval.
   With enough input samples and a well-defined feature set      Classification. With the metrics computed, we compute
the last missing step consists of fitting a classifier. A wide   the features as explained in section 3.3 for those pairs of
range of classification techniques exist for supervised learn-   products which appear in the ground truth. We store the
ing. We compared the performance of different kinds of clas-     feature vectors and attach the class label to it, complemen-
sifiers, ensemble and non, and random forest was the fittest     tary and non-complementary. The overall number of in-
model. Moreover, we validated our model and carried out          put samples is ca. 3.7 million, of which 1.4 million rep-
hyper-parameter optimization. More details can be found          resents complementary pairs and 2.3 million pairs the non-
in the next section.                                             complementary counterpart. These input samples were then
                                                                 split into training and test sets. We kept the distribution
4.    EXPERIMENTS                                                at 50% of positive and 50% negative samples. We started
                                                                 with a relatively small number of samples for the training
   We conducted experiments to assess the performance of
                                                                 set (10%) and we gradually increased it until we reached
CompLoD. Some of these experiments helped us to choose
                                                                 90%. In this way we were able to analyze the learning rate
the most suitable classifier and to optimize it. But the cen-
                                                                 of the classifiers. We tried the performance of several of
tral question, whether the designed feature set was really
                                                                 them, such as Decision Trees, Random Forest, AdaBoost,
significant, could be also answered. Each of the single tasks
                                                                 Naive Bayes (Gaussian), Linear Discriminant Analysis and
of CompLoD were run on a single machine with a processor
                                                                 Quadratic Discriminant Analysis. We used the following
Intel(R) Xeon(R) CPU E3-1231@3.80GHz with 4 cores and
                                                                 rates computed from the confusion matrix (TP =true posi-
32 GB of RAM.
                                                                 tive, FP =false positive, TN =true negative, FN =false neg-
Dataset. We used a dataset from Amazon.com8 . This con-
                                                                 ative):
tains products’ meta-data for each category. Our experi-
ments focus on the category Electronics, which includes a                          TP                             TN
wide range of subcategories, such as Internal Hard Drives,        Precision(+) =         ;       Precision(-) =
                                                                                 TP + FP                      TN + FN
eBook Readers or Camera Batteries, etc. The overall num-
                                                                                   TP                         TN
ber of subcategories considered is 817 whereas the number of         Recall(+) =         ;    Recall(-) =
products is ca. 0.5 million. In addition to the meta-data for                    TP + FN                   TN + FP
each product the dataset also contains four lists of related                                 TP + TN
                                                                           Accuracy =
products: also bought, also viewed, bought together, buy after                        TP + TN + FP + FN
viewing. As in [14] we assume that also bought and bought                                  Precision(+) · Recall(+)
                                                                       F1-Measure(+) = 2 ·
together products are complementary. Products from the                                     Precision(+) · Recall(+)
remaining lists are substitutable products9 . We use them
as the negative class (non-complementary). We manually           The label (+) in the metric focuses on the quality attribute
verified that this assumption is not always correct: for some    for predicting the positive class whereas (-) focuses on the
products some related products appeared in both groups.          negative one. Accuracy is then defined as the mean of the
Assuming that a product cannot be both, substitutable and        positive and negative precision score. The performance of
complementary to another one, we removed those ambiguous         the different classifiers was similar. To avoid overfitting
related products. As a result of this process, 11% of comple-    and make sure that the model generalizes to an indepen-
mentary products and 8% of non-complementary products            dent dataset, we validate our model with K-Fold and Monte
were removed from the dataset.                                   Carlo cross-validation (CV). While k-Fold CV makes sure
Building the graph. One of the initial stages of the frame-      that each of the folds is used at least once as a test set,
work consists of identifying from the text those tokens which    Monte Carlo CV randomly shuffles the whole dataset, thus
correspond to resources published in the LoD cloud. Ap-          generating independent partitions for each run. The final
prox. 3500 resources were recognized by the NER tool,            results are shown in Figure 4(A), which illustrates that the
which is a relatively low number. However, for 96% of the        Random Forest classifier improves its accuracy with more
products at least one resource could be identified. The un-      input samples from which it can learn, being able to obtain
matched 4% was removed from the dataset. By querying             up to ca. 78.5% accuracy. In the same figure (B) shows that
DBpedia’s SPARQL remote endpoint we extract the triples          the recall for the negative class overtakes that of the posi-
in which those resources appear as subject or object of a        tive class. It is important to mention that a TN refers to a
triple. We repeat this process iteratively until we obtain       non-complementary ground truth pair which is predicted to
all resources reachable within 2 hops from the identified re-    be non-complementary. As the very last task we performed
sources. The resulting graph has approx. 1.4 million re-         hyper-parameter optimization by exploiting two techniques,
sources.                                                         namely Random Search (run with 30 iterations) and Grid
7                                                                Search (run with the top-3 parameter set).
  The importance is computed as the normalized total re-         Results and reflections. One of the unique advantages
duction of the criterion represented by that feature (Gini
importance of random forests).                                   of our framework is that new products have a chance to be
8
  The dataset is part of the project “Stanford Network Anal-     classified as complementary to other products even in the
ysis Project (SNAP)” and made available for research.            absence of previous purchase history of the prod-
9                                                                uct. The results are certainly not as good as for the case
  We are not addressing in this work the task of predicting
the substitutable relationship.                                  for which transaction data or post-transaction feedback is
     MIMPA        MIMCA           AIMPA          AIMCA        AMIO          AAIO       MRCPA               MRCCA       MPR            RPR          IP          PS          class
      6.87         10.32           8.27           11.48        5.73          9.01       6.89                10.25      9.75           9.17        5.04        7.17        1 or 0

                    Feature set for pairs of product resources (rpi , rpj ) and their feature importance.

                                                             80

                                                                                                                  75
       75                                                    75


                                                             70
                                                                                                                  70
       70
                                   positive precision
                                                             65                            positive recall                                  positive F1-measure
                                   negative precision
                                                                                           negative recall                                  negative F1-measure
                                   accuracy precision
       65                                                                                                         65
                                                             60
             10    20   30   40   50   60   70   80     90        10   20   30   40   50    60   70   80     90        10   20   30    40    50    60    70    80    90
                        (A) Training set in %                               (B) Training set in %                                (C) Training set in %


     Figure 4: Performance results of CompLoD in terms of (A) accuracy, (B) recall, and (C) F1-measure


available, but this work should open a new door for fur-
ther contributions. For instance, the work of McAuley et
al. in [14], which uses post-transaction feedback to learn a
model, obtained an accuracy of 88% on the same dataset
and category using reviews and additional meta-data which
we didn’t consider in our work, e.g. prices10 . The perfor-
mance of CompLoD is then surprisingly good at taking into
account that only minimal information about the product is
used, therefore making it well-suited to deal with the cold-
start problem.
   CompLoD also enables a different perspective. Whenever
the relationship between two products is identified as com-
plementary and an explanation is required, it is possible to
go back to the graph components and the values of the com-
puted graph metrics from the values of the single features
and to provide the user with a reasonable explanation. For                                 Figure 5: Two products and their most relevant at-
instance, one could use the most relevant attributes or cat-                               tributes
egory attributes of both products, e.g. those that maximize
PAF-IPF or CAF-ICF, respectively, and the path connecting
those attributes which maximizes the informativeness. Fig-                                 metrics can be then computed and stored at once. In this
ure 5 illustrates this. All these components are a valuable                                way when a new product arrives in the system, all of its at-
source of explanation. However, assessing which of these                                   tributes will already be present in the graph, together with
graph components are the most useful would require an on-                                  the informativeness scores of the semantic paths connecting
line study with real users.                                                                them to other attributes.
Computational cost and scalability. PAF-IPF and CAF-
ICF have a cost of O(|P |+|A|) and O(|C|+|A|), respectively.
The cost of path informativeness is higher: O(|I| + |P red|),
                                                                                           5.    RELATED WORK
where P red is the set of all vocabulary predicates. While                                    Approaches to find complementary products have a wide
it is obvious that extracting information from a large graph                               range of applications, from product placement to products
and computing metrics over it is certainly expensive, most                                 bundling to website structure optimization, etc. In partic-
of the tasks done in the single stages of the framework can                                ular, on-line retailers make use of complementary products
be pre-computed off-line. For instance, one could try to keep                              to provide recommendations [30, 35], which help the users
as many attributes in the graph as possible or even try to                                 to discover related products while improving their volume of
anticipate attributes of soon-to-be-released products. The                                 sales. Most of these techniques typically consist of mining
                                                                                           transactional purchase data, event logs, or users’ implicit
10
                                                                                           feedback. However, for new products for which such infor-
 The results are not comparable only in terms of the learn-                                mation is not available or scarce, most of these approaches
ing source but also on the fact that they only considered in                               are not applicable or fail altogether.
their experiments products with 20 reviews or more, which
means that only 20% of the pairs of products contained in                                     As in [14], we aim at learning the semantics of this rela-
the original ground truth remain after filtering those prod-                               tionship, thus making it possible to disregard transactions.
ucts out.                                                                                  Therefore, we build a knowledge graph in which products,
attributes, and categories are represented as nodes inter-        DBpedia demonstrates the usefulness and value of Semantic
connected by semantic paths. Knowledge acquisition is in          Web standards. The prediction task in the last stage is im-
general a very costly task. Therefore, we extract this infor-     plemented with a Random Forest Classifier which predicts,
mation from DBpedia, whose usage as a knowledge base has          for a pair of products, whether one product is complemen-
been validated in other fields, such as that of Cross-domain      tary to the other. However, such a model requires a feature
Recommender Systems [6, 9, 23, 29].                               set which represents properties of pairs of products. To the
   Being able to learn from the semantics encoded in the          best of our knowledge, this is the first time where the prob-
graph and the patterns therein has some commonalities with        lem of extracting this information from a semantic graph to
Link Discovery or Link prediction. These approaches see           predict the complementary relationship is addressed. Our
a knowledge graph as a statistical model. The existence           experiments show that the classifier is able to learn the re-
of triples is represented in these models as binary random        lationship from our designed feature set, thus validating it.
variables which are correlated with each other. Different as-        Although the predictive power of our system is surpris-
sumptions for the correlation lead to different models, e.g.      ingly good, given that this is achieved only by using prod-
based on latent features [21], graph features or those in         ucts’ meta-data, it is important to mention that the quality
which variables have local interactions [20]. Our work is         of CompLoD depends on the quality of the models and tools
closer to observable graph feature models, in the sense that      used in the single stages. This dependency consequently
some of the problems addressed there, e.g. designing a fea-       leaves room for improvement. Therefore, we would like to
ture set from a graph to fit a learning model, are similar. For   try different NER tools and LoD datasets and investigate the
instance, [1] focuses on predicting future friendship links in    qualitative impact. We could also exploit a more domain-
Social Networks using topological features. Aiming at the         specific LoD dataset. For this reason, the framework is ex-
same goal, [5] presents and uses a combination of a large         tensible, i.e. all components can be extended or replaced.
number of features. These techniques are not necessarily             In the future we would like to conduct further experiments
limited to Social Networks, but they are used in fields like      considering other categories, to strengthen the significance
Biomedicine [10] or Genomics [34].                                of our approach. Predicting whether a product is comple-
   However, there is a central difference between these ap-       mentary to another one might still produce too many recom-
proaches and ours. First, we deal here with a knowledge           mendation candidates for a single user. Therefore, we would
graph in which the structured information is integrated into      like to extend our model to be able to further refine the list
a complex ontology. This means two nodes can be inter-            of complementary products taking into account the user’s
connected in several ways which are not necessarily known         taste. For this task, a ranking strategy might be more suit-
in advance. This differs from the kinds of graphs typically       able than a classification one. We would also like to include
treated in the link prediction problem, e.g. social, concept,     features that reflect the hierarchical structure of categories
or interaction networks, whose edges intend rather to repre-      and in general to continue searching for new features that
sent a small number of relationships and are therefore less       improve the accuracy of the system.
flexible in the representation. Secondly, many of the fea-
tures proposed are based on local similarity indices, such as
Common Neighbors, Adamic-Adar Index, Preferential At-
                                                                  7.   REFERENCES
tachment Index, Jaccard Index, etc. and might therefore be         [1] C. Ahmed, A. ElKorany, and R. Bahgat. A supervised
too localized to capture the patterns which determine the              learning approach to link prediction in twitter. Social
complementary relationship.                                            Network Analysis and Mining, 6(1):24, 2016.
   A related problem can also be found in the field of online      [2] A. Aribarg and N. Z. Foutz. Category-based screening
search behavior [16]. In this area some efforts have been              in choice of complementary products. Journal of
done to predict whether two queries posed by a user within             Marketing Research, 46(4):518–530, 2009.
a single or multiple sessions aim at performing the same           [3] A. Azaria, A. Hassidim, S. Kraus, A. Eshkol,
task. To do so, some approaches try to model hierarchies               O. Weintraub, and I. Netanely. Movie recommender
of tasks and subtasks. For example, booking a hotel and                system for profit maximization. In 7th ACM Conf. on
registering for a conference venue are subtasks of planning a          Recommender Systems, RecSys ’13, Hong Kong,
conference attendance. These methods are typically based               China, pages 121–128, 2013.
on classification, as in our case, or clustering [11, 12, 17,      [4] I. Cantador, I. Fernández-Tobı́as, S. Berkovsky, and
33]. Since a purchase of one product and a complementary               P. Cremonesi. Cross-domain recommender systems. In
one can be thought of as the problem of matching tasks, we             Recommender Systems Handbook, pages 919–959.
consider this line of research related work.                           2015.
                                                                   [5] W. Cukierski, B. Hamner, and B. Yang. Graph-based
6.   CONCLUSION AND FUTURE WORK                                        features for supervised link prediction. In Neural
   We presented CompLoD, a framework that leverages dif-               Networks (IJCNN), Int. Joint Conf. on, pages
ferent models and techniques to identify complementary prod-           1237–1244, July 2011.
ucts while only using a product’s available meta-data, such        [6] I. Fernández-Tobı́as, I. Cantador, M. Kaminskas, and
as titles, descriptions and categories. By extracting struc-           F. Ricci. A generic semantic-based framework for
tured information from the text, we manage to build a knowl-           cross-domain recommendation. In Proc. of the 2Nd
edge graph where products’ attributes are interconnected by            Int. Workshop on Information Heterogeneity and
semantic paths. Being able to identify complementary prod-             Fusion in Recommender Systems, HetRec ’11, pages
ucts from this graph rather than using transactions, as most           25–32, 2011.
of the current techniques do, solves the cold-start problem        [7] F. Gedikli and D. Jannach. Neighborhood-restricted
for new items. Moreover, the fact that this graph is based on          mining and weighted application of association rules
     for recommenders. In Proc. of the 11th Int. Conf. on    [22] D. Nikovski and V. Kulev. Induction of compact
     Web Information Systems Engineering, WISE’10,                decision trees for personalized recommendation. In
     pages 157–165, 2010.                                         Proc. of the 2006 ACM Symposium on Applied
 [8] T. Heuss, B. Humm, C. Henninger, and T. Rippl. A             Computing, SAC ’06, pages 575–581, 2006.
     comparison of ner tools w.r.t. a domain-specific        [23] V. C. Ostuni, T. Di Noia, E. Di Sciascio, S. Oramas,
     vocabulary. In Proc. of the 10th Int. Conf. on               and X. Serra. A semantic hybrid approach for sound
     Semantic Systems, SEM ’14, pages 100–107, 2014.              recommendation. In Proc. of the 24th Int. Conf. on
 [9] M. Kaminskas, I. Fernández-Tobı́as, I. Cantador, and        World Wide Web, WWW ’15 Companion, pages
     F. Ricci. Ontology-Based Identification of Music for         85–86, 2015.
     Places, pages 436–447. 2013.                            [24] B. Pathak, R. Garfinkel, R. D. Gopal, R. Venkatesan,
[10] J. Katukuri, Y. Xiey, V. V. Raghavan, and A. Gupta.          and F. Yin. Empirical analysis of the impact of
     Supervised link discovery on large-scale biomedical          recommender systems on sales. Journal of
     concept networks. In Proc. of the 2011 IEEE Int.             Management Information Systems, 27(2):159–188,
     Conf. on Bioinformatics and Biomedicine, BIBM ’11,           2010.
     pages 562–568, 2011.                                    [25] J. Pérez, M. Arenas, and C. Gutierrez. Semantics and
[11] L. Li, H. Deng, A. Dong, Y. Chang, and H. Zha.               complexity of SPARQL. ACM Trans. Database Syst.,
     Identifying and labeling search tasks via query-based        34(3), 2009.
     hawkes processes. In Proc. of the 20th ACM SIGKDD       [26] G. Pirrò. Reword: Semantic relatedness in the web of
     Int. Conf. on Knowledge Discovery and Data Mining,           data. In Proc. of the 26th AAAI Conf. on Artificial
     KDD ’14, pages 731–740, 2014.                                Intelligence, 2012, Toronto, Ontario, Canada., 2012.
[12] Z. Liao, Y. Song, Y. Huang, L. w. He, and Q. He.        [27] T. Raeder and N. V. Chawla. Market basket analysis
     Task trail: An effective segmentation of user search         with networks. Social Network Analysis and Mining,
     behavior. IEEE Transactions on Knowledge and Data            1(2):97–113, 2011.
     Engineering, 26(12):3090–3102, Dec 2014.                [28] F. Ricci, L. Rokach, and B. Shapira, editors.
[13] F. Manola, E. Miller, and B. McBride. RDF Primer.            Recommender Systems Handbook. 2015.
     W3C Recom., 2004.                                       [29] P. Ristoski, M. Schuhmacher, and H. Paulheim. Using
[14] J. J. McAuley, R. Pandey, and J. Leskovec. Inferring         graph metrics for linked open data enabled
     networks of substitutable and complementary                  recommender systems. In E-Commerce and Web
     products. In Proc. of the 21th ACM SIGKDD Int.               Technologies - 16th Int. Conf. on Electronic
     Conf. on Knowledge Discovery and Data Mining,                Commerce and Web Technologies, EC-Web 2015,
     Sydney, NSW, Australia, 2015, pages 785–794, 2015.           Valencia, Spain, 2015, Revised Selected Papers, pages
[15] J. J. McAuley, C. Targett, Q. Shi, and A. van den            30–41, 2015.
     Hengel. Image-based recommendations on styles and       [30] A. Sharma, J. M. Hofman, and D. J. Watts.
     substitutes. In Proc. of the 38th Int. ACM SIGIR             Estimating the causal impact of recommendation
     Conf. on Research and Development in Information             systems from observational data. In Proc. of the 16th
     Retrieval, Santiago, Chile, 2015, pages 43–52, 2015.         ACM Conf. on Economics and Computation, EC ’15,
[16] R. Mehrotra, P. Bhattacharya, and E. Yilmaz.                 pages 453–470, 2015.
     Uncovering task based behavioral heterogeneities in     [31] G. Suchacka and G. Chodak. Using association rules
     online search behavior. In Proc. of the 39th Int. ACM        to assess purchase probability in online stores.
     SIGIR Conf. on Research and Development in                   Information Systems and e-Business Management,
     Information Retrieval, SIGIR ’16, pages 1049–1052,           pages 1–30, 2016.
     2016.                                                   [32] I. M. A. O. Swesi, A. A. Bakar, and A. S. A. Kadir.
[17] R. Mehrotra and E. Yilmaz. Towards hierarchies of            Mining positive and negative association rules from
     search tasks & subtasks. In Proc. of the 24th Int.           interesting frequent and infrequent itemsets. In Fuzzy
     Conf. on World Wide Web, WWW ’15 Companion,                  Systems and Knowledge Discovery (FSKD), 2012 9th
     pages 73–74, 2015.                                           Int. Conf. on, pages 650–655, May 2012.
[18] R. Meymandpour and J. G. Davis. Recommendations         [33] H. Wang, Y. Song, M.-W. Chang, X. He, R. W.
     using linked data. In Proc. of the 5th Ph.D. Workshop        White, and W. Chu. Learning to extract cross-session
     on Information and Knowledge Management, PIKM                search tasks. In Proc. of the 22Nd Int. Conf. on World
     2012, Maui, HI, USA, 2012, pages 75–82, 2012.                Wide Web, WWW ’13, pages 1353–1364, 2013.
[19] I. C. Mogotsi. Christopher d. manning, prabhakar        [34] Z. You, S. Zhang, and L. Li. Integration of genomic
     raghavan, and hinrich schütze: Introduction to              and proteomic data to predict synthetic genetic
     information retrieval - cambridge university press,          interactions using semi-supervised learning. In Proc. of
     cambridge, england, 2008, 482 pp, ISBN:                      the 5th Int. Conf. on Emerging Intelligent Computing
     978-0-521-86571-5. Inf. Retr., 13(2):192–195, 2010.          Technology and Applications, ICIC ’09, pages 635–644,
[20] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich.          2009.
     A review of relational machine learning for knowledge   [35] J. Zheng, X. Wu, J. Niu, and A. Bolivar. Substitutes
     graphs. Proc. of the IEEE, 104(1):11–33, Jan 2016.           or complements: Another step forward in
[21] M. Nickel, V. Tresp, and H.-P. Kriegel. Factorizing          recommendations. In Proc. of the 10th ACM Conf. on
     yago: Scalable machine learning for linked data. In          Electronic Commerce, EC ’09, pages 139–146, 2009.
     Proc. of the 21st Int. Conf. on World Wide Web,
     WWW ’12, pages 271–280, 2012.

</pre>