Towards Binarization of Knowledge Graph Embeddings for Node Classification

Towards Binarization of Knowledge Graph Embeddings for Node Classification VitorFaria De Souza Data and Web Science Group University of Mannheim

Germany

HeikoPaulheim heiko.paulheim@uni-mannheim.de Data and Web Science Group University of Mannheim

Germany

Towards Binarization of Knowledge Graph Embeddings for Node Classification 1613-0073 EC02BDF0EE985E90161F8636C8E1762C GROBID - A machine learning software for extracting information from scholarly documents Knowledge Graph Embedding Node Classification Binarization Compression

Knowledge Graph Embeddings (KGEs) are dense representations of entities and relations of a knowledge graph (KG) in a continuous vector space. In this paper, we present a preliminary analysis showing that KGEs can be substantially compressed by using only binary instead of continuous features, replacing costly floating point storage by a more space-efficient bitwise storage, while retaining the representational power.

Introduction

Knowledge Graph Embeddings (KGEs) are dense representations of entities and relations in knowledge graphs, where each entity and relation is represented by a vector in a continuous vector space. They are used for various in knowledge graph tasks, such as link prediction, entity classification, for helping in knowledge-graph related tasks such as entity linking and disambiguation, but also as knowledge representations in other downstream tasks, such as recommender systems. [1,2] While the scalability of KGE methods has been identified as a challenge [3,4], most papers looking into scalability take the angle of scaling up the training process, i.e., being able to learn a representation for a (large) knowledge graph with reasonable time and memory usage. What has been overlooked so far is the size of the resulting embedding, which is also an important consideration for downstream tasks. Using KGEs in downstream applications requires loading and processing those within the application, which may be hindered by the sheer size of the embedding, especially when computing resources are limited.

To illustrate the issue of KGE size, we use the well-known knowledge graph DBpedia [5], whose latest release contain 7.62 million entities. 1 Both the most common implementation of word2vec [6] in the gensim library [7], which is used by RDF2vec [8], as well as the PyTorch library [9], which underlies embedding frameworks like DGL-KE [4] and PyKEEN [10], use 32 bit floating point values for representing embeddings by default. 2,3 Embedding DBpedia with 200 dimensions would thus yield a 6GB large KGE model. 4In this paper, we propose to use binary instead of continuous embeddings, i.e., embeddings that only use the values 0 and 1 for each dimension. By doing so, the size of embeddings can be reduced by a factor of 32 (using only 1 bit per entity and dimension instead of 32). In the example above, we could thus store a 200 dimensional binary embedding using less than 200MB.

The proposed approach is not limited to a single KGE method. Instead, we propose to binarize an already trained KGE. In experiments with the DLCC node classification benchmark, eleven different embedding methods, and different binarization methods, we show that the loss in accuracy of using binary embeddings is rather small, but allows for a drastic reduction of storage size. Especially for embeddings that are already computed for popular large-scale knowledge graphs [11], binarization can reduce the data volume and downstream processing load.

Approach

In this paper, we utilize two approaches for binarizing knowledge graph embeddings: a simple dimension-based baseline, and an autoencoder-based approach adapted from [12].

Baseline -Dimension-wise Binarization

Given that 𝑣𝑒𝑐(𝑒) is the embedding vector of an entity 𝑒, we first compute the average vector 𝑣𝑒𝑐 as

𝑣𝑒𝑐 𝑖 = 𝑎𝑣𝑔 all entities e 𝑣𝑒𝑐(𝑒) 𝑖 ,(1)

where 𝑣𝑒𝑐(𝑒) 𝑖 is the i-th element of 𝑣𝑒𝑐(𝑒). With that average vector, we can obtain a binary vector 𝑏𝑣𝑒𝑐 from 𝑣𝑒𝑐(𝑒) as

𝑏𝑣𝑒𝑐 𝑖 (𝑒) = { 0 𝑣𝑒𝑐 𝑖 (𝑒) < 𝑣𝑒𝑐 𝑖 1 𝑣𝑒𝑐 𝑖 (𝑒) ≥ 𝑣𝑒𝑐 𝑖(2)

In other words: every floating point value lower than the average for that dimension is represented by a 0, all others are represented by a 1.

Autoencoder-based Binarization

Autoencoders are, in their simplest form, three-layer neural networks trained for dimensionality reduction [13]. Between an input and an output layer, they have a (smaller) code layer. They are trained by presenting instances of a dataset both as input and output (i.e., the input and output being identical), and thereby learning to minimize the reconstruction error. [12] propose to use a heavyside step function as an activation function 𝜎 𝐸 ∶ R → {0, 1}, which only outputs the values 0 and 1. Specifically, they use of the Heaviside step function ℎ, defined as

ℎ(𝑥) = { 1 𝑥 ≥ 0 0 𝑥 < 0(3)

With that, the output of 𝐸(𝑥) is a binary vector. For further optimizing computations, they propose to align the dimensionality of the code layer to CPU register widths, such as 64, 128, and 256 bits. In this paper, we use their implementation 5 for binarizing knowledge graph embeddings.

Evaluation

In order to evaluate the quality of binarized embeddings, we use the DBpedia portion of DLCC [14]. For our experiments, we use the DBpedia embeddings also used in the paper [1], which are available online 6 . As embedding methods, we use four flavors of RDF2Vec [15,16] and two flavors (Norms L1 and L2) of TransE [17] embeddings are exploited, as well as ComplEx [18], DistMult [19], RESCAL [20], RotatE [21], and TransR [22]. Table 1 shows the average file sizes achieved with the different binarization methods. The standard storage method uses an uncompressed TXT variant (i.e., storing the value "0" or "1" in one byte), which can yield a compression factor up to 14. The .VEC format, which is supported by the autoencoder implementation used in this paper [12], actually stores each eight single embedding values in one byte (i.e., multiple embedding dimensions are stored in one long integer). With this method, a compression factor of almost 50 can be achieved, leading to an embedding originally stored in 1.6 GB now only occupying 33 MB.

DLCC has three sizes of problems for each task; here, we only present the results for largest problem size (5,000 examples per task). The full results are available online 7 ; the results for the smaller groups (500 and 50 examples, resp.) are comparable to those on the largest task group.

Table 2 shows the results on the DLCC entity classification dataset. The average losses in accuracy are not very large, with the the baseline performing better than the smaller autoencoders. This makes us assume that the original embeddings capture the required information for the tasks at hand already at a very coarse level, i.e., it is sufficient to know that a vector value for an entity at a particulary dimension is high or low, but it is not necessary to know how low it actually is. Moreover, we observe the largest losses for ComplEx and DistMult, which are both tensor factorization methods, whereas RDF2vec SG has the smallest loss, compared to the uncompressed variant. Therefore, we conclude that binarization is not equally effective for each model, but that there are differences by model family.

Furthermore, we can observe that the binarization has a smaller impact than the choice of the embedding variant: even the binarized versions of RDF2vec SG OA yield superior results to the original, i.e., non binarized embeddings of most other embedding methods.

Conclusion and Outlook

In this paper, we have shown two strategies for reducing the size of a KGE model, one simple baseline using dimension-wise binarization, and an approach based on neural autoencoders. A promising preliminary evaluation on the DLCC entity classification benchmark shows that the size of embeddings can be drastically reduced at a comparatively low loss in downstream classification performance.

While the focus of this paper has been on optimizing the storage, we have not yet looked into possible gains in computation time for downstream processing, i.e., at inference time. For example, [12] have proposed processing binary vectors with logical bitwise operators, which could also yield significant performance gains over floating point operations. This is a promising, but yet underexplored area. Moreover, a comparison of binarizing classically trained embeddings with directly trained binary embeddings would be intriguing, especially for cases where no pre-trained embeddings exist.

So far, we have only looked into tasks involving single entities, as defined in the gEval and DLCC benchmarks. The transfer to another popular usage of knowledge graph embeddings, i.e., link prediction and triple scoring, has not yet been investigated. Except for the B-CP approach [23], there are, to the best of our knowledge, no methods for exploiting binary knowledge graph embeddings for link prediction. In particular, since B-CP directly learns the embeddings, there are no studies on binarizing existing embeddings for link prediction.

In summary, the results have shown that binary knowledge graph embeddings can be an appealing alternative to the widely used floating point representations, allowing a drastic reduction of storage space while, at the same time, leading to only marginal loss in downstream performance.

Table 11Average file sizes and compression factors relative to original-200 of the 33 .VEC files and 55 .TXT vector files, after entity filtering.EmbeddingCompact .VEC files Average file size AverageFull vector .TXT files Average file size Average(MB)compression factor(MB)compression factororiginal-200--1607.51avgbin-200--174.59auto-12832.949112.614auto-25647.834207.48auto-51278.321396.64

Table 22Results on the DLCC gold standard (5,000 examples only, macro average accuracy and relative difference to results with the original embedding)OriginalBaselineΔAuto-128ΔAuto-256ΔAuto-512ΔAvg. ΔRDF2vec CBOW0.7450.729-0.0210.689-0.0750.702-0.0570.707-0.050-0.051RDF2vec CBOW OA0.8410.831-0.0130.805-0.0430.807-0.0410.818-0.028-0.031RDF2vec SG0.8570.831-0.0310.818-0.0460.835-0.0260.844-0.015-0.030RDF2veC SG OA0.8810.847-0.0380.829-0.0590.841-0.0460.858-0.027-0.042ComplEx0.8330.752-0.0970.757-0.0910.777-0.0670.781-0.063-0.080DistMult0.8250.728-0.1180.739-0.1050.750-0.0910.762-0.077-0.098RESCAL0.8690.830-0.0460.836-0.0380.833-0.0410.838-0.035-0.040RotatE0.7440.701-0.0580.692-0.0700.689-0.0750.706-0.052-0.064TransE L10.8240.785-0.0470.766-0.0700.773-0.0610.786-0.046-0.056TransE L20.8890.861-0.0320.861-0.0320.853-0.0400.868-0.024-0.032TransR0.8250.779-0.0550.775-0.0610.775-0.0600.789-0.043-0.055Avg. Δ-0.051-0.063-0.055-0.042

https://groups.google.com/g/gensim/c/JSSenT7Hhlc/m/FTEynSwmAQAJ https://pytorch.org/docs/stable/notes/numerical_accuracy.html 200 dimensions × 7,620,000 entities × 32 Bit https://github.com/tca19/near-lossless-binarization https://data.dws.informatik.uni-mannheim.de/kgvec2go/dbpedia/2021-09/ https://github.com/vitor-faria/kgembeddings-binarization

Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction -two sides of the same coin? JPortisch NHeist HPaulheim 10.3233/SW-212892 Semantic Web 13 2022 Knowledge graph embedding: A survey of approaches and applications QWang ZMao BWang LGuo IEEE transactions on knowledge and data engineering 29 2017 Knowledge graph embeddings: open challenges and opportunities RBiswas L.-AKaffee MCochez SDumbrava TEJendal MLissandrini VLopez ELMencía HPaulheim HSack Transactions on Graph Data and Knowledge 1 2023 Dgl-ke: Training knowledge graph embeddings at scale DZheng XSong CMa ZTan ZYe JDong HXiong ZZhang GKarypis Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval the 43rd international ACM SIGIR conference on research and development in information retrieval 2020 DBpedia -A large-scale, multilingual knowledge base extracted from Wikipedia JLehmann RIsele MJakob AJentzsch DKontokostas PNMendes SHellmann MMorsey PVan Kleef SAuer CBizer 10.3233/SW-140134 Semantic Web 6 2015 Efficient Estimation of Word Representations in Vector Space TMikolov KChen GCorrado JDean 10.48550/arXiv.1301.3781 arXiv:1301.3781 2013 RŘehůřek PSojka Gensim-statistical semantics in python 2011 Retrieved from genism Embedding Knowledge Graphs with RDF2vec HPaulheim PRistoski JPortisch 2023 Springer Nature Pytorch: An imperative style, high-performance deep learning library APaszke SGross FMassa ALerer JBradbury GChanan TKilleen ZLin NGimelshein LAntiga Advances in neural information processing systems 32 2019 Pykeen 1.0: a python library for training and evaluating knowledge graph embeddings MAli MBerrendorf CTHoyt LVermue SSharifzadeh VTresp JLehmann Journal of Machine Learning Research 22 2021 JPortisch MHladik HPaulheim 10.48550/arXiv.2003.05809 arXiv:2003.05809 KGvec2go -Knowledge Graph Embeddings as a Service 2020 Near-lossless Binarization of Word Embeddings JTissier CGravier AHabrard 10.1609/aaai.v33i01.33017104 arXiv:1803.09065 Proceedings of the AAAI Conference on Artificial Intelligence the AAAI Conference on Artificial Intelligence 2019 33 Auto-encoder based dimensionality reduction YWang HYao SZhao Neurocomputing 184 2016 The DLCC Node Classification Benchmark for Analyzing Knowledge Graph Embeddings JPortisch HPaulheim arXiv: 2022 Putting RDF2vec in Order JPortisch HPaulheim 10.48550/arXiv.2108.05280 arXiv:2108.05280 2021 RDF2Vec: RDF graph embeddings and their applications PRistoski JRosati TDi Noia RDeLeone HPaulheim 10.3233/SW-180317 Semantic Web 10 2019 Translating Embeddings for Modeling Multi-relational Data ABordes NUsunier AGarcia-Duran JWeston OYakhnenko 2013 2013 Complex embeddings for simple link prediction TTrouillon JWelbl SRiedel ÉGaussier GBouchard International conference on machine learning

PMLR

2016 Embedding Entities and Relations for Learning and Inference in Knowledge Bases BYang W-T. Yih XHe JGao LDeng 2014 A Three-Way Model for Collective Learning on Multi-Relational Data MNickel VTresp PKröger 2011 RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space ZSun Z.-HDeng J.-YNie JTang 10.48550/arXiv.1902.10197 arXiv:1902.10197 2019 cs, stat Learning Entity and Relation Embeddings for Knowledge Graph Completion YLin ZLiu MSun YLiu XZhu 10.1609/aaai.v29i1.9491 Proceedings of AAAI AAAI 2015 29 Binarized embeddings for fast, space-efficient knowledge graph completion KHayashi KKishimoto MShimbo IEEE Transactions on Knowledge and Data Engineering 35 2021