<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Analyzing Knowledge Graph Embedding Methods from a Multi-Embedding Interaction Perspective</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Hung</forename><forename type="middle">Nghiep</forename><surname>Tran</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">SOKENDAI</orgName>
								<orgName type="institution" key="instit2">The Graduate University for Advanced Studies</orgName>
								<address>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Atsuhiro</forename><surname>Takasu</surname></persName>
							<email>takasu@nii.ac.jp</email>
							<affiliation key="aff1">
								<orgName type="institution">National Institute of Informatics Tokyo</orgName>
								<address>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Analyzing Knowledge Graph Embedding Methods from a Multi-Embedding Interaction Perspective</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E96732C82AD0EDAEDF9B2147026E79FC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T17:27+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Knowledge Graph</term>
					<term>Knowledge Graph Completion</term>
					<term>Knowledge Graph Embedding</term>
					<term>Multi-Embedding</term>
					<term>Representation Learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Knowledge graph is a popular format for representing knowledge, with many applications to semantic search engines, questionanswering systems, and recommender systems. Real-world knowledge graphs are usually incomplete, so knowledge graph embedding methods, such as Canonical decomposition/Parallel factorization (CP), DistMult, and ComplEx, have been proposed to address this issue. These methods represent entities and relations as embedding vectors in semantic space and predict the links between them. The embedding vectors themselves contain rich semantic information and can be used in other applications such as data analysis. However, mechanisms in these models and the embedding vectors themselves vary greatly, making it difficult to understand and compare them. Given this lack of understanding, we risk using them ineffectively or incorrectly, particularly for complicated models, such as CP, with two role-based embedding vectors, or the state-of-the-art ComplEx model, with complex-valued embedding vectors. In this paper, we propose a multi-embedding interaction mechanism as a new approach to uniting and generalizing these models. We derive them theoretically via this mechanism and provide empirical analyses and comparisons between them. We also propose a new multiembedding model based on quaternion algebra and show that it achieves promising results using popular benchmarks.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Knowledge graphs provide a unified format for representing knowledge about relationships between entities. A knowledge graph is a collection of triples, with each triple (h, t, r ) denoting the fact that relation r exists between head entity h and tail entity t. Many large real-world knowledge graphs have been built, including WordNet <ref type="bibr" target="#b21">[22]</ref> representing English lexical knowledge, and Freebase <ref type="bibr" target="#b2">[3]</ref> and Wikidata <ref type="bibr" target="#b28">[29]</ref> representing general knowledge. Moreover, knowledge graph can be used as a universal format for data from applied domains. For example, a knowledge graph for recommender systems would have triples such as (UserA, Item1, review) and (UserB, Item2, like).</p><p>Knowledge graphs are the cornerstones of modern semantic web technology. They have been used by large companies such as Google to provide semantic meanings into many traditional applications, such as semantic search engines, semantic browsing, and question answering <ref type="bibr" target="#b1">[2]</ref>. One important application of knowledge graphs is recommender systems, where they are used to unite multiple sources of data and incorporate external knowledge <ref type="bibr">[5] [36]</ref>. Recently, specific methods such as knowledge graph embedding have been used to predict user interactions and provide recommendations directly <ref type="bibr" target="#b9">[10]</ref>.</p><p>Real-world knowledge graphs are usually incomplete. For example, Freebase and Wikidata are very large but they do not contain all knowledge. This is especially true for the knowledge graphs used in recommender systems. During system operation, users review new items or like new items, generating new triples for the knowledge graph, which is therefore inherently incomplete. Knowledge graph completion, or link prediction, is the task that aims to predict new triples.</p><p>This task can be undertaken by using knowledge graph embedding methods, which represent entities and relations as embedding vectors in semantic space, then model the interactions between these embedding vectors to compute matching scores that predict the validity of each triple. Knowledge graph embedding methods are not only used for knowledge graph completion, but the learned embedding vectors of entities and relations are also very useful. They contain rich semantic information similar to word embeddings <ref type="bibr" target="#b20">[21]</ref> [20] <ref type="bibr" target="#b13">[14]</ref>, enabling them to be used in visualization or browsing for data analysis. They can also be used as extracted or pretrained feature vectors in other learning models for tasks such as classification, clustering, and ranking.</p><p>Among the many proposed knowledge graph embedding methods, the most efficient and effective involve trilinear-productbased models, such as Canonical decomposition/Parallel factorization (CP) <ref type="bibr" target="#b12">[13]</ref>  <ref type="bibr" target="#b16">[17]</ref>, DistMult <ref type="bibr" target="#b34">[35]</ref>, or the state-of-the-art Com-plEx model <ref type="bibr" target="#b27">[28]</ref>. These models solve a tensor decomposition problem with the matching score of each triple modeled as the result of a trilinear product, i.e., a multilinear map with three variables corresponding to the embedding vectors h, t, and r of head entity h, tail entity t, and relation r , respectively. The trilinear-product-based score function for the three embedding vectors is denoted as ⟨h, t, r ⟩ and will be defined mathematically in Section 2.</p><p>However, the implementations of embedding vectors for the various models are very diverse. DistMult <ref type="bibr" target="#b34">[35]</ref> uses one realvalued embedding vector for each entity or relation. The original CP <ref type="bibr" target="#b12">[13]</ref> uses one real-valued embedding vector for each relation, but two real-valued embedding vectors for each entity when it is as head and as tail, respectively. ComplEx <ref type="bibr" target="#b27">[28]</ref> uses one complexvalued embedding vector for each entity or relation. Moreover, a recent heuristic for CP <ref type="bibr" target="#b16">[17]</ref>, here denoted as CP h , was proposed to augment the training data, helping CP achieve results competitive with the state-of-the-art model ComplEx. This heuristic introduces an additional embedding vector for each relation, but the underlying mechanism is different from that in ComplEx. All of these complications make it difficult to understand and compare the various models, to know how to use them and extend them. If we were to use the embedding vectors for data analysis or as pretrained feature vectors, a good understanding would affect the way we would use the complex-valued embedding vectors from ComplEx or the different embedding vectors for head and tail roles from CP.</p><p>In this paper, we propose a multi-embedding interaction mechanism as a new approach to uniting and generalizing the above models. In the proposed mechanism, each entity e is represented by multiple embedding vectors {e (1) , e (2) , . . . } and each relation r is represented by multiple embedding vectors {r (1) , r (2) , . . . }. In a triple (h, t, r ), all embedding vectors of h, t, and r interact with each other by trilinear products to produce multiple interaction scores. These scores are then weighted summed by a weight vector ω to produce the final matching score for the triple. We show that the above models are special cases of this mechanism. Therefore, it unifies those models and lets us compare them directly. The mechanism also enables us to develop new models by extending to additional embedding vectors.</p><p>In this paper, our contributions include the following.</p><p>• We introduce a multi-embedding interaction mechanism as a new approach to unifying and generalizing a class of state-of-the-art knowledge graph embedding models. • We derive each of the above models theoretically via this mechanism. We then empirically analyze and compare these models with each other and with variants. • We propose a new multi-embedding model by an extension to four-embedding vectors based on quaternion algebra, which is an extension of complex algebra. We show that this model achieves promising results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">RELATED WORK</head><p>Knowledge graph embedding methods for link prediction are actively being researched <ref type="bibr" target="#b29">[30]</ref>. Here, we only review the works that are directly related to this paper, namely models that use only triples, not external data such as text <ref type="bibr" target="#b31">[32]</ref> or graph structure such as relation paths <ref type="bibr" target="#b17">[18]</ref>. Models using only triples are relatively simple and they are also the current state of the art.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">General architecture</head><p>Knowledge graph embedding models take a triple of the form (h, t, r ) as input and output the validity of that triple. A general model can be viewed as a three-component architecture:</p><p>(1) Embedding lookup: linear mapping from one-hot vectors to embedding vectors. A one-hot vector is a sparse discrete vector representing a discrete input, e.g., the first entity could be represented as [1, 0, . . . , 0] ⊤ . A triple could be represented as a tuple of three one-hot vectors representing h, t, and r , respectively. An embedding vector is a dense continuous vector of much lower dimensionality than a one-hot vector thus lead to efficient distributed representations <ref type="bibr">[11] [12]</ref>. (2) Interaction mechanism: modeling the interaction between embedding vectors to compute the matching score of a triple. This is the main component of a model. (3) Prediction: using the matching score to predict the validity of each triple. A higher score means that the triple is more likely to be valid.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Categorization</head><p>Based on the modeling of the second component, a knowledge graph embedding model falls into one of three categories, namely translation-based, neural-network-based, or trilinear-product-based, as described below.</p><p>2.2.1 Translation-based: These models translate the head entity embedding by summing with the relation embedding vector, then measuring the distance between the translated images of head entity and the tail entity embedding, usually by L 1 or L 2 distance:</p><p>S(h, t, r</p><formula xml:id="formula_0">) = − ||h + r − t || p = − D d |h d + r d − t d | p 1/p ,<label>(1)</label></formula><p>where • h, t, r are embedding vectors of h, t, and r , respectively,</p><p>• p is 1 or 2 for L 1 or L 2 distance, respectively,</p><p>• D is the embedding size and d is each dimension.</p><p>TransE <ref type="bibr" target="#b3">[4]</ref> was the first model of this type, with score function basically the same as the above equation. There have been many extensions such as TransR <ref type="bibr" target="#b18">[19]</ref>, TransH <ref type="bibr" target="#b32">[33]</ref>, and TransA <ref type="bibr" target="#b33">[34]</ref>. Most extensions are done by linear transformation of the entities into a relation-specific space before translation <ref type="bibr" target="#b18">[19]</ref>.</p><p>These models are simple and efficient. However, their modeling capacity is generally weak because of over-strong assumptions about translation using relation embedding. Therefore, they are unable to model some forms of data <ref type="bibr" target="#b30">[31]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2">Neural-network-based:</head><p>These models use a nonlinear neural network to compute the matching score for a triple:</p><formula xml:id="formula_1">S(h, t, r ) =N N (h, t, r ),<label>(2)</label></formula><p>where • h, t, r are the embedding vectors of h, t, and r , respectively,</p><p>• N N is the neural network used to compute the score.</p><p>One of the simplest neural-network-based model is ER-MLP <ref type="bibr" target="#b6">[7]</ref>, which concatenates the input embedding vectors and uses a multi-layer perceptron neural network to compute the matching score. NTN <ref type="bibr" target="#b25">[26]</ref> is an earlier model that employs nonlinear activation functions to generalize the linear model RESCAL <ref type="bibr" target="#b23">[24]</ref>. Recent models such as ConvE <ref type="bibr" target="#b5">[6]</ref> use convolution networks instead of fully-connected networks.</p><p>These models are complicated because of their use of neural networks as a black-box universal approximator, which usually make them difficult to understand and expensive to use.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.3">Trilinear-product-based:</head><p>These models compute their scores by using trilinear product between head, tail, and relation embeddings, with relation embedding playing the role of matching weights on the dimensions of head and tail embeddings:</p><formula xml:id="formula_2">S(h, t, r ) =⟨h, t, r ⟩ =h ⊤ diaд(r )t = D d =1 (h ⊙ t ⊙ r ) d = D d =1 (h d t d r d ) ,<label>(3)</label></formula><p>where • h, t, r are embedding vectors of h, t, and r , respectively, • diaд(r ) is the diagonal matrix of r , • ⊙ denotes the element-wise Hadamard product, • D is the embedding size and d is the dimension for which h d , t d , and r d are the entries.</p><p>In this paper, we focus on this category, particularly on Dist-Mult, ComplEx, CP, and CP h with augmented data. These models are simple, efficient, and can scale linearly with respect to embedding size in both time and space. They are also very effective, as has been shown by the state-of-the-art results for ComplEx and CP h using popular benchmarks <ref type="bibr">[28] [17]</ref>.</p><p>DistMult <ref type="bibr" target="#b34">[35]</ref> embeds each entity and relation as a single realvalued vector. DistMult is the simplest model in this category. Its score function is symmetric, with the same scores for triples (h, t, r ) and (t, h, r ). Therefore, it cannot model asymmetric data for which only one direction is valid, e.g., asymmetric triples such as (Paper1, Paper2, cite). Its score function is:</p><formula xml:id="formula_3">S(h, t, r ) =⟨h, t, r ⟩,<label>(4)</label></formula><p>where h, t, r ∈ R k . ComplEx <ref type="bibr" target="#b27">[28]</ref> is an extension of DistMult that uses complexvalued embedding vectors that contain complex numbers. Each complex number c with two components, real a and imaginary b, can be denoted as c = a + bi. The complex conjugate c of c is c = a − bi. The complex conjugate vector t of t is form from the complex conjugate of the individual entries. Complex algebra requires using the complex conjugate vector of tail embedding in the inner product and trilinear product <ref type="bibr" target="#b0">[1]</ref>. Thus, these products can be antisymmetric, which enables ComplEx to model asymmetric data <ref type="bibr" target="#b27">[28]</ref>  <ref type="bibr" target="#b26">[27]</ref>. Its score function is:</p><formula xml:id="formula_4">S(h, t, r ) =Re(⟨h, t, r ⟩),<label>(5)</label></formula><p>where h, t, r ∈ C k and Re(c) means taking the real component of the complex number c. CP <ref type="bibr" target="#b12">[13]</ref> is similar to DistMult but embeds entities as head and as tail differently. Each entity e has two embedding vectors e and e (2) depending on its role in a triple as head or as tail, respectively. Using different role-based embedding vectors leads to an asymmetric score function, enabling CP to also model asymmetric data. However, experiments have shown that CP's performance is very poor on unseen test data <ref type="bibr" target="#b16">[17]</ref>. Its score function is:</p><formula xml:id="formula_5">S(h, t, r ) =⟨h, t (2) , r ⟩,<label>(6)</label></formula><p>where h, t (2) , r ∈ R k . CP h <ref type="bibr" target="#b16">[17]</ref> is a direct extension of CP. Its heuristic augments the training data by making an inverse triple (t, h, r (a) ) for each existing triple (h, t, r ), where r (a) is the augmented relation corresponding to r . With this heuristic, CP h significantly improves CP, achieving results competitive with ComplEx. Its score function is: S(h, t, r ) =⟨h, t (2) , r ⟩ and ⟨t, h (2) , r (a) ⟩, <ref type="bibr" target="#b6">(7)</ref> where h, h (2) , t, t (2) , r, r (a) ∈ R k .</p><p>In the next section, we present a new approach to analyzing these trilinear-product-based models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">MULTI-EMBEDDING INTERACTION</head><p>In this section, we first formally present the multi-embedding interaction mechanism. We then derive each of the above trilinearproduct-based models using this mechanism, by changing the embedding vectors and setting appropriate weight vectors. Next, we specify our attempt at learning weight vectors automatically. We also propose a four-embedding interaction model based on quaternion algebra.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Multi-embedding interaction mechanism</head><p>We globally model each entity e as the multiple embedding vectors {e (1) , e (2) , . . . , e (n) } and each relation r as the multiple embedding vectors {r (1) , r (2) , . . . , r (n) }. The triple (h, t, r ) is therefore modeled by multiple embeddings as h (i) , t (j) , r (k ) , i, j, k ∈ {1, ..., n}.</p><p>In each triple, the embedding vectors for head, tail, and relation interact with each and every other embedding vector to produce multiple interaction scores. Each interaction is modeled by the trilinear product of corresponding embedding vectors. The interaction scores are then weighted summed by a weight vector:</p><formula xml:id="formula_6">S(h, t, r ; Θ, ω) = i,j,k ∈ {1,...,n } ω (i,j,k ) ⟨h (i) , t (j) , r (k) ⟩,<label>(8)</label></formula><p>where</p><p>• Θ is the parameter denoting embedding vectors h (i) , t (j) , r (k ) ,</p><p>• ω is the parameter denoting the weight vector used to combine the interaction scores, with ω (i,j,k ) being an element of ω.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Deriving trilinear-product-based models</head><p>The existing trilinear-product-based models can be derived from the proposed general multi-embedding interaction score function in Eq. ( <ref type="formula" target="#formula_6">8</ref>) by setting the weight vector ω as shown in Table <ref type="table" target="#tab_0">1</ref>.</p><p>For DistMult, we can see the equivalence directly. For Com-plEx, we need to expand its score function following complex algebra <ref type="bibr" target="#b0">[1]</ref>: S(h, t, r ) =Re(⟨h, t, r ⟩) =⟨Re(h), Re(t), Re(r )⟩ + ⟨Re(h), Im(t), Im(r )⟩ − ⟨Im(h), Re(t), Im(r )⟩ + ⟨Im(h), Im(t), Re(r )⟩, <ref type="bibr" target="#b8">(9)</ref> where</p><formula xml:id="formula_7">• h, t, r ∈ C k ,</formula><p>• Re(c) and Im(c) mean taking the real and imaginary components of the complex vector c, respectively.</p><p>Changing Re(h) to h (1) , Im(h) to h (2) , Re(t) to t (1) , Im(t) to t (2) , Re(r ) to r (1) , and Im(r ) to r (2) , we can rewrite the score function of ComplEx as: S(h, t, r ) =Re(⟨h, t, r ⟩) =⟨h (1) , t (1) , r (1) ⟩ + ⟨h (1) , t (2) , r (2) ⟩ − ⟨h (2) , t (1) , r (2) ⟩ + ⟨h (2) , t (2) , r (1) ⟩, <ref type="bibr" target="#b9">(10)</ref> which is equivalent to the weighted sum using the weight vectors in Table <ref type="table" target="#tab_0">1</ref>. Note that by the symmetry between h and t, we can also obtain the equivalent weight vector ComplEx equiv. 1. By symmetry between embedding vectors of the same entity or relation, we can also obtain the equivalent weight vectors ComplEx equiv. 2 and ComplEx equiv. 3.</p><p>For CP, note that the two role-based embedding vectors for each entity can be mapped to two-embedding vectors in our model and the relation embedding vector can be mapped to r (1) . For CP h , further note that its data augmentation is equivalent to adding the score of the original triple and the inverse triple  (1) , t (1) , r (1) ⟩ 1 1 1 0 0 0 0 0 ⟨h (1) , t (1) , r (2) ⟩ 0 0 0 1 1 0 0 0 ⟨h (1) , t (2) , r (1) ⟩ 0 0 0 -1 1 1 1 0 ⟨h (1) , t (2) , r (2) ⟩ 0 1 -1 0 0 0 0 1 ⟨h (2) , t (1) , r (1) ⟩ 0 0 0 1 -1 0 0 1 ⟨h (2) , t (1) , r (2) ⟩ 0 -1 1 0 0 0 1 0 ⟨h (2) , t (2) , r (1) ⟩ 0 1 1 0 0 0 0 0 ⟨h (2) , t (2) , r (2) ⟩ 0 0 0 1 1 0 0 0 when training using stochastic gradient descent (SGD):</p><p>S(h, t, r ) =⟨h, t (2) , r ⟩ + ⟨t, h (2) , r (a) ⟩.</p><p>We can then map r (a) to r (2) to obtain the equivalence given in Table <ref type="table" target="#tab_0">1</ref>. By symmetry between h and t, we can also obtain the equivalent weight vector CP h equiv. 1.</p><p>From this perspective, all four models DistMult, ComplEx, CP, and CP h can be seen as special cases of the general multiembedding interaction mechanism. This provides an intuitive perspective on using the embedding vectors in complicated models. For the ComplEx model, instead of using a complex-valued embedding vector, we can treat it as two real-valued embedding vectors. These vectors can then be used directly in common learning algorithms that take as input real-valued vectors rather than complex-valued vectors. We also see that multiple embedding vectors are a natural extension of single embedding vectors. Given this insight, multiple embedding vectors can be concatenated to form a longer vector for use in visualization and data analysis, for example.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Automatically learning weight vectors</head><p>As we have noted, the weight vector ω plays an important role in the model, because it determines how the interaction mechanism is implemented and therefore how the specific model can be derived. An interesting question is how to learn ω automatically. One approach is to let the model learn ω together with the embeddings in an end-to-end fashion. For a more detailed examination of this idea, we will test different restrictions on the range of ω by applying tanh(ω), sigmoid(ω), and softmax(ω).</p><p>Note also that the weight vectors for related models are usually sparse. We therefore enforce a sparsity constraint on ω by an additional Dirichlet negative log-likelihood regularization loss:</p><formula xml:id="formula_9">L dir = −λ dir i,j,k ∈ {1,...,n } (α − 1) log |ω (i,j,k ) | ||ω|| 1 , (<label>12</label></formula><formula xml:id="formula_10">)</formula><p>where α is a hyperparameter controlling sparseness (a small α will make the weight vector sparser) and λ dir is the regularization strength.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Quaternion-based four-embedding interaction model</head><p>Another question is whether using more embedding vectors in the multi-embedding interaction mechanism is helpful. Motivated by the derivation of ComplEx from a two-embedding interaction model, we develop a four-embedding interaction model by using quaternion algebra to determine the weight vector and the interaction mechanism.</p><p>Quaternion numbers are extension of complex numbers to four components <ref type="bibr" target="#b14">[15]</ref>  <ref type="bibr" target="#b7">[8]</ref>. Each quaternion number q, with one real component a and three imaginary components b, c, d, could be written as q = a + bi + cj + dk where i, j, k are fundamental quaternion units, similar to the imaginary number i in complex algebra. As for complex conjugates, we also have a quaternion conjugate q = a − bi − cj − dk.</p><p>An intuitive view of quaternion algebra is that each quaternion number represents a 4-dimensional vector (or 3-dimensional vector when the real component a = 0) and quaternion multiplication is rotation of this vector in 4-(or 3-)dimensional space. Compared to complex algebra, each complex number represents a 2-dimensional vector and complex multiplication is rotation of this vector in 2-dimensional plane <ref type="bibr" target="#b0">[1]</ref>.</p><p>Several works have shown the benefit of using complex, quaternion, or other hyper-complex numbers in the hidden layers of deep neural networks <ref type="bibr" target="#b8">[9]</ref> [23] <ref type="bibr" target="#b24">[25]</ref>. To the best of our knowledge, this paper is the first to motivate and use quaternion numbers for the embedding vectors of knowledge graph embedding.</p><p>Quaternion multiplication is noncommutative, thus there are multiple ways to multiply three quaternion numbers in the trilinear product. Here, we choose to write the score function of the quaternion-based four-embedding interaction model as:</p><formula xml:id="formula_11">S(h, t, r ) =Re(⟨h, t, r ⟩),<label>(13)</label></formula><p>where h, t, r ∈ H k . By expanding this formula using quaternion algebra <ref type="bibr" target="#b14">[15]</ref> and mapping the four components of a quaternion number to four embeddings in the multi-embedding interaction model, respectively, we can write the score function in the notation of the multi-embedding interaction model as:</p><formula xml:id="formula_12">S(h, t, r ) =Re(⟨h, t, r ⟩)</formula><p>=⟨h (1) , t (1) , r (1) ⟩ + ⟨h (2) , t (2) , r (1) ⟩ + ⟨h (3) , t (3) , r (1) ⟩ + ⟨h (4) , t (4) , r (1) ⟩ + ⟨h (1) , t (2) , r (2) ⟩ − ⟨h (2) , t (1) , r (2) ⟩ + ⟨h (3) , t (4) , r (2) ⟩ − ⟨h (4) , t (3) , r (2) ⟩ + ⟨h (1) , t (3) , r (3) ⟩ − ⟨h (2) , t (4) , r (3) ⟩ − ⟨h (3) , t (1) , r (3) ⟩ + ⟨h (4) , t (2) , r (3) ⟩ + ⟨h (1) , t (4) , r (4) ⟩ + ⟨h (2) , t (3) , r (4) ⟩ − ⟨h (3) , t (2) , r (4) ⟩ − ⟨h (4) , t (1) , r (4) ⟩, <ref type="bibr" target="#b13">(14)</ref> where h, t, r ∈ H k .</p><p>The learning problem in knowledge graph embedding methods can be modeled as the binary classification of valid and invalid triples. Because knowledge graphs do not contain invalid triples, we generate them by negative sampling <ref type="bibr" target="#b19">[20]</ref>. For each valid triple (h, t, r ), we replace the h or t entities in each training triple with other random entities to obtain the invalid triples (h ′ , t, r ) and (h, t ′ , r ) <ref type="bibr" target="#b3">[4]</ref>.</p><p>We can then learn the model parameters by minimizing the negative log-likelihood loss for the training data with the predicted probability modeled by the logistic sigmoid function σ (•) on the matching score. This loss is the cross-entropy:</p><formula xml:id="formula_13">L(D, D ′ ; Θ, ω) = − (h,t ,r )∈ D log σ (S(h, t, r ; Θ, ω)) − (h ′ ,t ′ ,r )∈ D ′ log σ 1 − S(h ′ , t ′ , r ; Θ, ω) , (<label>15</label></formula><formula xml:id="formula_14">)</formula><p>where D is true data ( p = 1), D ′ is negative sampled data ( p = 0), and p is the empirical probability.</p><p>Defining the class label Y (h,t ,r ) = 2 p(h,t ,r ) − 1, i.e., the labels of positive triples are 1 and negative triples are −1, the above loss can be written more concisely. In cluding the L 2 regularization of embedding vectors, this loss can be written as:</p><formula xml:id="formula_15">L(D, D ′ ; Θ, ω) = (h,t ,r )∈ D∪D ′ log(1 + e −Y (h,t ,r ) S(h,t ,r ;Θ,ω) ) + λ nD ||Θ|| 2 2 ,<label>(16)</label></formula><p>where D is true data, D ′ is negative sampled data, Θ are the embedding vectors corresponding to specific current triples, n is the number of multi-embedding, D is the embedding size, and λ is the regularization strength.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">EXPERIMENTAL SETTINGS 5.1 Datasets</head><p>For our empirical analysis, we used the WN18 dataset, the most popular of the benchmark datasets built on WordNet <ref type="bibr" target="#b21">[22]</ref> by <ref type="bibr">Bordes et al. [4]</ref>. This dataset has 40,943 entities, 18 relations, 141,442 training triples, 5,000 validation triples, 5,000 test triples.</p><p>In our preliminary experiments, the relative performance on all datasets was quite consistent, therefore choosing the WN18 dataset is appropriate for our analysis. We will consider the use of other datasets in in future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Evaluation protocols</head><p>Knowledge graph embedding methods are usually evaluated on link prediction task <ref type="bibr" target="#b3">[4]</ref>. In this task, for each true triple (h, t, r ) in the test set, we replace h and t by every other entity to generate corrupted triples (h ′ , t, r ) and (h, t ′ , r ), respectively <ref type="bibr" target="#b3">[4]</ref>. The goal of the model now is to rank the true triple (h, t, r ) before the corrupted triples based on the predicted score S.</p><p>For each true triple in the test set, we compute its rank, then we can compute popular evaluation metrics including MRR (mean reciprocal rank) and Hit@k for k ∈ {1, 3, 10} (how many true triples are correctly ranked in the top k) <ref type="bibr" target="#b27">[28]</ref>.</p><p>To avoid false negative error, i.e., corrupted triples are accidentally valid triples, we follow the protocols used in other works for filtered metrics <ref type="bibr" target="#b3">[4]</ref>. In this protocol, all valid triples in the training, validation, and test sets are removed from the corrupted triples set before computing the rank of the true triple.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Training</head><p>We trained the models using SGD with learning rates auto-tuned by Adam <ref type="bibr" target="#b15">[16]</ref>, that makes the choice of initial learning rate more robust. For all models, we found good hyperparameters with grid search on learning rates ∈ {10 −3 , 10 −4 }, embedding regularization strengths ∈ {10 −2 , 3 × 10 −3 , 10 −3 , 3 × 10 −4 , 10 −4 , 0.0}, and batch sizes ∈ {2 12 , 2 14 }. For a fair comparison, we fixed the embedding sizes so that numbers of parameters for all models are comparable. In particular, we use embedding sizes of 400 for one-embedding models such as DistMult, 200 for twoembedding models such as ComplEx, CP, and CP h , and 100 for four-embedding models. We also fixed the number of negative samples at 1 because, although using more negative samples is beneficial for all models, it is also more expensive and not necessary for this comparative analysis.</p><p>We constrained entity embedding vectors to have unit L 2 -norm after each training iteration. All training runs were stopped early by checking the filtered MRR on the validation set after every 50 epochs, with 100 epochs patient.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">RESULTS AND DISCUSSION</head><p>In this section, we present experimental results and analyses for the models described in Section 3. We report results for derived weight vectors and their variants, auto-learned weight vectors, and the quaternion-based four-embedding interaction model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Derived weight vectors and variants</head><p>6.1.1 Comparison of derived weight vectors . We evaluated the multi-embedding interaction model with the score function in Eq. ( <ref type="formula" target="#formula_6">8</ref>), using the derived weight vectors in Table <ref type="table" target="#tab_0">1</ref>. The results are shown in Table <ref type="table">2</ref>. They are consistent with the results reported in other works <ref type="bibr" target="#b27">[28]</ref>. Note that ComplEx and CP h achieved good results, whereas DistMult performed less well. CP performed very poorly in comparison to the other models, even though it is a classical model for the tensor decomposition task <ref type="bibr" target="#b12">[13]</ref>.</p><p>For a more detailed comparison, we report the performance on training data. Note that ComplEx and CP h can accurately predict the training data, whereas DistMult did not. This is evidence that ComplEx and CP h are fully expressive while DistMult cannot model asymmetric data effectively.</p><p>The most surprising result was that CP can also accurately predict the training data at a comparable level to ComplEx and CP h , despite its very poor result on the test data. This suggests that the problem with CP is not its modeling capacity, but in its generalization performance to new test data. In other words, CP is severely overfitting to the training data. However, standard regularization techniques such as L 2 regularization did not appear to help. CP h can be seen as a regularization technique that does help CP generalize well to unseen data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.2">Comparison with other variants of weight vectors.</head><p>In Table <ref type="table">2</ref>, we show the results for two bad examples and two good examples of weight vector variants. Note that bad example 1 performed similarly to CP and bad example 2 performed similarly to DistMult. Good example 1 was similar to CP h and good example 2 was similar to ComplEx.</p><p>This shows that the problem of bad weight vectors is not unique to some specific models. Moreover, it shows that there Table <ref type="table">2</ref>: Results for the derived weight vectors on WN18.</p><p>Weight setting MRR Hit@1 Hit@3 Hit@10 DistMult (1, 0, 0, 0, 0, 0, 0, 0) 0.796 0.674 0.915 0.945 ComplEx (1, 0, 0, 1, 0, −1, 1, 0) 0.937 0.928 0.946 0.951 CP (0, 0, 1, 0, 0, 0, 0, 0) 0.086 0.059 0.093 0.139 CP h (0, 0, 1, 0, 0, 1, 0, 0) We note that the good weight vectors exhibit the following properties.</p><p>• Completeness: all embedding vectors in a triple should be involved in the weighted-sum matching score. • Stability: all embedding vectors for the same entity or relation should contribute equally to the weighted-sum matching score. • Distinguishability: the weighted-sum matching scores for different triples should be distinguishable. For example, the score ⟨h (1) , t (2) , r (1) ⟩ + ⟨h (2) , t (1) , r (2) ⟩ is indistinguishable because switching h and t forms a symmetric group. As an example, consider the ComplEx model, where the multiplication of two complex numbers written in polar coordinate format, c 1 = |c 1 |e −iθ 1 and c 2 = |c 2 |e −iθ 2 , can be written as</p><formula xml:id="formula_16">c 1 c 2 = |c 1 ||c 2 |e −i(θ 1 +θ 2 ) [1]</formula><p>. This is a rotation in the complex plane, which intuitively satisfies the above properties.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Automatically learned weight vectors</head><p>We let the models learn ω together with the embeddings in an end-to-end fashion, aiming to learn good weight vectors automatically. The results are shown in Table <ref type="table" target="#tab_1">3</ref>.</p><p>We first set uniform weight vector as a baseline. The results were similar to those for DistMult because the weighted-sum matching score is also symmetric. However, other automatically learned weight vectors also performed similarly to Dist-Mult. Different restrictions by applying tanh(ω), sigmoid(ω), and softmax(ω) did not help. We noticed that the learned weight vectors were almost uniform, making them indistinguishable, suggesting that the use of sparse weight vectors might help.</p><p>We enforced a sparsity constraint by an additional Dirichlet negative log-likelihood regularization loss on ω, with α tuned to 1  16 and λ dir tuned to 10 −2 . However, the results did not improve. Tracking of weight vectors value showed that the sparsity constraint seemed to amplify the initial differences between the weight values instead of learning useful sparseness. This suggests that the gradient information is too symmetric that the model cannot break the symmetry of ω and escape the local optima.</p><p>In general, these experiments show that learning good weight vectors automatically is a particularly difficult task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Quaternion-based four-embedding interaction model</head><p>In Table <ref type="table" target="#tab_2">4</ref>, we present the evaluation results for the proposed quaternion-based four-embedding interaction model. The results were generally positive, with most metrics higher than those in Table <ref type="table">2</ref> for state-of-the-art models such as ComplEx and CP h . Especially, H@10 performance was much better than other models.</p><p>Note that this model needs more extensive evaluation. One potential problem is its being prone to overfitting, as seen in the on train results, with H@10 at absolute 1.000. This might mean that better regularization methods may be needed. However, the general results suggest that extending to more embedding vectors for multi-embedding interaction models is a promising approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">CONCLUSION</head><p>This paper proposes a multi-embedding interaction mechanism as a new approach to analyzing state-of-the-art knowledge graph embedding models such as DistMult, ComplEx, CP, and CP h . We show that these models can be unified and generalized under the new approach to provide an intuitive perspective on using the models and their embedding vectors effectively. We analyzed and compared the models and their variants empirically to better understand their properties, such as the severe overfitting problem of the CP model. In addition, we propose and have evaluated a new multi-embedding interaction model based on quaternion algebra, which showed some promising results.</p><p>There are several promising future directions. One direction is to find new methods of modeling the interaction mechanism between multi-embedding vectors and the effective extension to additional embedding vectors. Another direction is to evaluate multi-embedding models such as the proposed quaternion-based four-embedding interaction model more extensively.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Weight vectors for special cases.</figDesc><table><row><cell>Weighted terms DistMult ComplEx</cell><cell>ComplEx equiv. 1</cell><cell>ComplEx equiv. 2</cell><cell>ComplEx equiv. 3</cell><cell>CP CP h</cell><cell>CP h equiv.</cell></row><row><cell>⟨h</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3 :</head><label>3</label><figDesc>Results for the auto-learned weight vectors on WN18.</figDesc><table><row><cell></cell><cell>0.937</cell><cell>0.929</cell><cell>0.944</cell><cell>0.949</cell></row><row><cell>DistMult on train</cell><cell>0.917</cell><cell>0.848</cell><cell>0.985</cell><cell>0.997</cell></row><row><cell>ComplEx on train</cell><cell>0.996</cell><cell>0.994</cell><cell>0.998</cell><cell>0.999</cell></row><row><cell>CP on train</cell><cell>0.994</cell><cell>0.994</cell><cell>0.996</cell><cell>0.999</cell></row><row><cell>CP h on train</cell><cell>0.995</cell><cell>0.994</cell><cell>0.998</cell><cell>0.999</cell></row><row><cell>Bad example 1 (0, 0, 20, 0, 0, 1, 0, 0)</cell><cell>0.107</cell><cell>0.079</cell><cell>0.116</cell><cell>0.159</cell></row><row><cell>Bad example 2 (0, 0, 1, 1, 1, 1, 0, 0)</cell><cell>0.794</cell><cell>0.666</cell><cell>0.917</cell><cell>0.947</cell></row><row><cell>Good example 1 (0, 0, 20, 1, 1, 20, 0, 0)</cell><cell>0.938</cell><cell>0.934</cell><cell>0.942</cell><cell>0.946</cell></row><row><cell cols="2">Good example 2 (1, 1, −1, 1, 1, −1, 1, 1) 0.938</cell><cell>0.930</cell><cell>0.944</cell><cell>0.950</cell></row><row><cell>Weight setting</cell><cell cols="4">MRR Hit@1 Hit@3 Hit@10</cell></row><row><cell>Uniform weight (1, 1, 1, 1, 1, 1, 1, 1)</cell><cell>0.787</cell><cell>0.658</cell><cell>0.915</cell><cell>0.944</cell></row><row><cell>Auto weight no restriction</cell><cell>0.774</cell><cell>0.636</cell><cell>0.911</cell><cell>0.944</cell></row><row><cell>Auto weight ∈ (−1, 1) by tanh</cell><cell>0.765</cell><cell>0.625</cell><cell>0.908</cell><cell>0.943</cell></row><row><cell>Auto weight ∈ (0, 1) by sigmoid</cell><cell>0.789</cell><cell>0.661</cell><cell>0.915</cell><cell>0.946</cell></row><row><cell>Auto weight ∈ (0, 1) by softmax</cell><cell>0.802</cell><cell>0.685</cell><cell>0.915</cell><cell>0.944</cell></row><row><cell>Auto weight no restriction, sparse</cell><cell>0.792</cell><cell>0.685</cell><cell>0.892</cell><cell>0.935</cell></row><row><cell>Auto weight ∈ (−1, 1) by tanh, sparse</cell><cell>0.763</cell><cell>0.613</cell><cell>0.910</cell><cell>0.943</cell></row><row><cell cols="2">Auto weight ∈ (0, 1) by sigmoid, sparse 0.793</cell><cell>0.667</cell><cell>0.915</cell><cell>0.945</cell></row><row><cell cols="2">Auto weight ∈ (0, 1) by softmax, sparse 0.803</cell><cell>0.688</cell><cell>0.915</cell><cell>0.944</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4 :</head><label>4</label><figDesc>Results for the quaternion-based four-embedding interaction model on WN18.</figDesc><table><row><cell>Weight setting</cell><cell cols="4">MRR Hit@1 Hit@3 Hit@10</cell></row><row><cell>Quaternion-based four-embedding</cell><cell>0.941</cell><cell>0.931</cell><cell>0.950</cell><cell>0.956</cell></row><row><cell cols="2">Quaternion-based four-embedding on train 0.997</cell><cell>0.995</cell><cell>0.999</cell><cell>1.000</cell></row><row><cell>are other good weight vectors, besides those for ComplEx and</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>CP</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table><note>h , that can achieve very good results.</note></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGMENTS</head><p>This work was supported by a JSPS Grant-in-Aid for Scientific Research (B) (15H02789).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Complex Analysis: An Introduction to the Theory of Analytic Functions of One Complex Variable</title>
		<author>
			<persName><forename type="first">Lars</forename><forename type="middle">V</forename><surname>Ahlfors</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1953">1953. 1953</date>
			<biblScope unit="page">177</biblScope>
			<pubPlace>New York, London</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">Amit</forename><surname>Singhal</surname></persName>
		</author>
		<ptr target="https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html" />
		<title level="m">Official Google Blog: Introducing the Knowledge Graph: Things, Not Strings</title>
				<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge</title>
		<author>
			<persName><forename type="first">Kurt</forename><surname>Bollacker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Colin</forename><surname>Evans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Praveen</forename><surname>Paritosh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tim</forename><surname>Sturge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jamie</forename><surname>Taylor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGMOD Conference</title>
				<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1247" to="1250" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Translating Embeddings for Modeling Multi-Relational Data</title>
		<author>
			<persName><forename type="first">Antoine</forename><surname>Bordes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nicolas</forename><surname>Usunier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alberto</forename><surname>Garcia-Duran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jason</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oksana</forename><surname>Yakhnenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="2787" to="2795" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Social Knowledge-Based Recommender System. Application to the Movies Domain</title>
		<author>
			<persName><forename type="first">Walter</forename><surname>Carrer-Neto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">María</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Luisa</forename><surname>Hernández-Alcaraz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rafael</forename><surname>Valencia-García</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Francisco</forename><surname>García-Sánchez</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.eswa.2012.03.025</idno>
		<ptr target="https://doi.org/10.1016/j.eswa.2012.03.025" />
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="page" from="10990" to="11000" />
			<date type="published" when="2012-09">2012. Sept. 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Convolutional 2d Knowledge Graph Embeddings</title>
		<author>
			<persName><forename type="first">Tim</forename><surname>Dettmers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pasquale</forename><surname>Minervini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pontus</forename><surname>Stenetorp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Riedel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Thirty-Second AAAI Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion</title>
		<author>
			<persName><forename type="first">Xin</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Evgeniy</forename><surname>Gabrilovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Geremy</forename><surname>Heitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wilko</forename><surname>Horn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ni</forename><surname>Lao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Murphy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Strohmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Shaohua</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wei</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.1145/2623330.2623623</idno>
		<ptr target="https://doi.org/10.1145/2623330.2623623" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining -KDD &apos;14</title>
				<meeting>the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining -KDD &apos;14<address><addrLine>New York, New York, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="601" to="610" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Rethinking Quaternions</title>
		<author>
			<persName><forename type="first">Ron</forename><surname>Goldman</surname></persName>
		</author>
		<idno type="DOI">10.2200/S00292ED1V01Y201008CGR013</idno>
		<ptr target="https://doi.org/10.2200/S00292ED1V01Y201008CGR013" />
	</analytic>
	<monogr>
		<title level="j">Synthesis Lectures on Computer Graphics and Animation</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="157" />
			<date type="published" when="2010-10">2010. Oct. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">On Complex Valued Convolutional Neural Networks</title>
		<author>
			<persName><forename type="first">Nitzan</forename><surname>Guberman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1602.09046</idno>
		<idno>arXiv:cs.NE/1602.09046</idno>
		<imprint>
			<date type="published" when="2016-02">2016. Feb. 2016</date>
		</imprint>
	</monogr>
	<note>cs.NE</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Translation-Based Recommendation</title>
		<author>
			<persName><forename type="first">Ruining</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wang-Cheng</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julian</forename><surname>Mcauley</surname></persName>
		</author>
		<idno type="DOI">10.1145/3109859.3109882</idno>
		<ptr target="https://doi.org/10.1145/3109859.3109882" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eleventh ACM Conference on Recommender Systems (RecSys &apos;17)</title>
				<meeting>the Eleventh ACM Conference on Recommender Systems (RecSys &apos;17)<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="161" to="169" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Learning Distributed Representations of Concepts</title>
		<author>
			<persName><forename type="first">Geoffrey</forename><forename type="middle">E</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eighth Annual Conference of the Cognitive Science Society</title>
				<meeting>the Eighth Annual Conference of the Cognitive Science Society<address><addrLine>Amherst, MA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1986">1986</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Distributed Representations</title>
		<author>
			<persName><forename type="first">G E</forename><surname>Hinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J L</forename><surname>Mcclelland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Rumelhart</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Parallel Distributed Processing</title>
				<meeting><address><addrLine>Pittsburgh, PA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1933">1984. 33</date>
		</imprint>
		<respStmt>
			<orgName>Carnegie-Mellon University</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">The Expression of a Tensor or a Polyadic as a Sum of Products</title>
		<author>
			<persName><forename type="first">L</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><surname>Hitchcock</surname></persName>
		</author>
		<idno type="DOI">10.1002/sapm192761164</idno>
		<ptr target="https://doi.org/10.1002/sapm192761164" />
	</analytic>
	<monogr>
		<title level="j">Journal of Mathematics and Physics</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="164" to="189" />
			<date type="published" when="1927-04">1927. April 1927</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">GloVe: Global Vectors for Word Representation</title>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</title>
				<meeting>the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Hypercomplex Numbers: An Elementary Introduction to Algebras</title>
		<author>
			<persName><forename type="first">Lvovich</forename><surname>Isai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aleksandr</forename><surname>Kantor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Solodovnikov</forename><surname>Samuilovich</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1989">1989</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Adam: A Method for Stochastic Optimization</title>
		<author>
			<persName><forename type="first">P</forename><surname>Diederik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jimmy</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><surname>Ba</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd International Conference on Learning Representations (ICLR)</title>
				<meeting>the 3rd International Conference on Learning Representations (ICLR)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Canonical Tensor Decomposition for Knowledge Base Completion</title>
		<author>
			<persName><forename type="first">Timothée</forename><surname>Lacroix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nicolas</forename><surname>Usunier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guillaume</forename><surname>Obozinski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 35th International Conference on Machine Learning (ICML&apos;18)</title>
				<meeting>the 35th International Conference on Machine Learning (ICML&apos;18)</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Modeling Relation Paths for Representation Learning of Knowledge Bases</title>
		<author>
			<persName><forename type="first">Yankai</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhiyuan</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Huanbo</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maosong</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Siwei</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Song</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2015 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Learning Entity and Relation Embeddings for Knowledge Graph Completion</title>
		<author>
			<persName><forename type="first">Yankai</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhiyuan</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maosong</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yang</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xuan</forename><surname>Zhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence</title>
				<meeting>the Twenty-Ninth AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="2181" to="2187" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Efficient Estimation of Word Representations in Vector Space</title>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICLR&apos;13 Workshop</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Distributed Representations of Words and Phrases and Their Compositionality</title>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeff</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="3111" to="3119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">WordNet: A Lexical Database for English</title>
		<author>
			<persName><forename type="first">George</forename><forename type="middle">A</forename><surname>Miller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="page" from="39" to="41" />
			<date type="published" when="1995">1995. 1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Feed Forward Neural Network with Random Quaternionic Neurons</title>
		<author>
			<persName><forename type="first">Toshifumi</forename><surname>Minemoto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Teijiro</forename><surname>Isokawa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Haruhiko</forename><surname>Nishimura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nobuyuki</forename><surname>Matsui</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.sigpro.2016.11.008</idno>
		<ptr target="https://doi.org/10.1016/j.sigpro.2016.11.008" />
	</analytic>
	<monogr>
		<title level="j">Signal Processing C</title>
		<imprint>
			<biblScope unit="volume">136</biblScope>
			<biblScope unit="page" from="59" to="68" />
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">A Three-Way Model for Collective Learning on Multi-Relational Data</title>
		<author>
			<persName><forename type="first">Maximilian</forename><surname>Nickel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hans-Peter</forename><surname>Volker Tresp</surname></persName>
		</author>
		<author>
			<persName><surname>Kriegel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th International Conference on Machine Learning</title>
				<meeting>the 28th International Conference on Machine Learning</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="809" to="816" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Quaternion Recurrent Neural Networks</title>
		<author>
			<persName><forename type="first">Titouan</forename><surname>Parcollet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mirco</forename><surname>Ravanelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohamed</forename><surname>Morchid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Georges</forename><surname>Linarès</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chiheb</forename><surname>Trabelsi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Renato</forename><forename type="middle">De</forename><surname>Mori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Learning Representations (ICLR&apos;19</title>
				<meeting>the International Conference on Learning Representations (ICLR&apos;19</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Reasoning With Neural Tensor Networks for Knowledge Base Completion</title>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Danqi</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="926" to="934" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Knowledge Graph Completion via Complex Tensor Factorization</title>
		<author>
			<persName><forename type="first">Théo</forename><surname>Trouillon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">R</forename><surname>Dance</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Éric</forename><surname>Gaussier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Johannes</forename><surname>Welbl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guillaume</forename><surname>Bouchard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="4735" to="4772" />
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Complex Embeddings for Simple Link Prediction</title>
		<author>
			<persName><forename type="first">Theo</forename><surname>Trouillon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Johannes</forename><surname>Welbl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eric</forename><surname>Gaussier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guillaume</forename><surname>Bouchard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning (ICML&apos;16)</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="2071" to="2080" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Wikidata: A Free Collaborative Knowledgebase</title>
		<author>
			<persName><forename type="first">Denny</forename><surname>Vrandečić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Markus</forename><surname>Krötzsch</surname></persName>
		</author>
		<idno type="DOI">10.1145/2629489</idno>
		<ptr target="https://doi.org/10.1145/2629489" />
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">57</biblScope>
			<biblScope unit="page" from="78" to="85" />
			<date type="published" when="2014-09">2014. Sept. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Knowledge Graph Embedding: A Survey of Approaches and Applications</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Guo</surname></persName>
		</author>
		<idno type="DOI">10.1109/TKDE.2017.2754499</idno>
		<ptr target="https://doi.org/10.1109/TKDE.2017.2754499" />
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="2724" to="2743" />
			<date type="published" when="2017-12">2017. Dec. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">On Multi-Relational Link Prediction with Bilinear Models</title>
		<author>
			<persName><forename type="first">Yanjie</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rainer</forename><surname>Gemulla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hui</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Thirty-Second AAAI Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Knowledge Graph and Text Jointly Embedding</title>
		<author>
			<persName><forename type="first">Zhen</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianwen</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianlin</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zheng</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2014 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1591" to="1601" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Knowledge Graph Embedding by Translating on Hyperplanes</title>
		<author>
			<persName><forename type="first">Zhen</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianwen</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianlin</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zheng</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI Conference on Artificial Intelligence</title>
				<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1112" to="1119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">TransA: An Adaptive Approach for Knowledge Graph Embedding</title>
		<author>
			<persName><forename type="first">Han</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Minlie</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yu</forename><surname>Hao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiaoyan</forename><surname>Zhu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1509.05490</idno>
	</analytic>
	<monogr>
		<title level="m">AAAI Conference on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Embedding Entities and Relations for Learning and Inference in Knowledge Bases</title>
		<author>
			<persName><forename type="first">Bishan</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wen-Tau</forename><surname>Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiaodong</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianfeng</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Li</forename><surname>Deng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<title level="m" type="main">Collaborative Knowledge Base Embedding for Recommender Systems</title>
		<author>
			<persName><forename type="first">Fuzheng</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nicholas</forename><surname>Jing Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Defu</forename><surname>Lian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xing</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wei-Ying</forename><surname>Ma</surname></persName>
		</author>
		<idno type="DOI">10.1145/2939672.2939673</idno>
		<ptr target="https://doi.org/10.1145/2939672.2939673" />
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>ACM Press</publisher>
			<biblScope unit="page" from="353" to="362" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
