IoT Data Storage: Relational & Non-Relational Database Management Systems Performance Comparison Gizem Kiraz Cengiz Toğay Computer Engineering Computer Engineering Uludag University Uludag University Gorukle, Bursa Gorukle, Bursa 501631002@ogr.uludag.edu.tr ctogay@ uludag.edu.tr ABSTRACT Big data is a term that describes the large volume of data Internet of Things (IoT) becomes recently a popular –both structured and unstructured. The DBMSs basically research topic and market reality. According to several can be separated into relational and non-relational DBMS. research companies, in 2025, up to 75 billion devices are The relational DBMS stores the data rows and columns in estimated to connect internet and generate an enormous tables with a high data consistency. The most commonly number of data. This increases in data cause several used open source relational DBMS is MySQL. Non- difficulties such as the storage cost and processing of such relational databases (NOSQL) have arisen as an large data. In this paper, we have been studied on alternative to relational databases. The aim of the NOSQL performance comparison of relational (MySQL) and non- is often not to give guaranty the Atomicity, Consistency, relational (MongoDB) database management systems for Isolation, and Durability (ACID). The NOSQL does not storing and processing of this large IoT data. Both types depend on constant table definitions and rigid schemas. of database management systems have been tested. Columns or records can be added to the collection at any According to comparison of experimental results, the non- time without exclusive process. Therefore; the number of relational database management systems, which we records in the columns does not have to equal with each studied and searched, have provided better performance other. Data sets in IoT environment can change after setup for storing and processing of large data. of the system, so this environment requires a flexible data storage system. There are four different storage formats in Keywords NOSQL namely key-value, columns, document-based Internet of Things; MySQL; MongoDB; RDBMS; and graff-based. NRDBMS. It has been investigated a document-based storage format INTRODUCTION in NOSQL. In such a system, a record is called document The Internet of Things (IoT) is a self-configuring and and these documents are usually stored in JSON format adaptive system that consist of sensor networks and smart [3]. There are various implementation of the NOSQL objects whose aim is to interconnect all devices/sensors in DBMS such as MongoDB [4] [5], CouchDB [6], HBase daily life [1]. According to the projections of many [7], Cassandra [8], Amazon SimpleDB [9], and Redis organizations and companies, up to 75 billion devices will [10]. Since MongoDB is open source and commonly used, interconnect using the internet and several challenges and it has been chosen MongoDB in this study. There are no issues that need to be addressed will raise. Therefore, IoT database schemas or tables in MongoDB. MongoDB uses becomes one of the most popular research topic recent “collection” instead of a table, and “document” instead of years. Moreover, IoT is closely related to big data and rows to store data. Furthermore, MongoDB uses two cloud technology. Big data is produced by the different different operations instead of the join operation. These types of the applications such as industrial processes, are nesting documents inside each other and to store a medical devices, embedded control systems, gateways, reference to the other document rather than nesting entire and GPS sensors etc. This means that an amount of data document. worldwide increases day by day. In 2016, more than 5.5 There are many studies about comparing the performance million connected devices are inserted every day, and it is of databases [11] [12] [13] [14]. These studies vary expected that number of devices more than 20.8 billion depending on the data size, the variety of data, the worldwide by 2020 [2]. Sensors are also produced data differences in databases used, implementation languages, and they are important for big data growth. The sensor and subjects of the projects. In the study [15], MongoDB, data is the most popular data type between IoT MySQL, CouchDB, and Redis are compared. It is applications. declared that MongoDB is performing better among the comparative database management systems in terms of the "bulk insert" writing performance. However, MySQL and MongoDB have similar performance results for reading operations. Performance parameters between these DBMSs can be negligible (typically less than 1 second) [15]. However, our test results show that MongoDB has better performance than MySQL in terms of reading and writing as represented in “Results of Experiments” section. MongoDB is utilized for to store GPS sensor data and to communicate with the analysis tools such as Apache Mahout [11]. ACID operations on Figure 2.Test Environment MongoDB and MySQL DBMSs are also applied to compare them [12] [13]. According to the results, the use writing transactions. In our platform, our target of the MongoDB has been encouraged for large data throughput is forty messages/milliseconds in average for applications, especially for applications of big data [12]. writing. Our target throughput and test results compared In [14], MongoDB, Raven, CouchDB, Cassandra, with “Results of Experiments” section. HyperTable, CouchBase, and SQL DBMSs are compared The rest of the article is organized as follows. “Test in terms of the read, write, delete, and instantiate Environment & Methodology” section consists of the operations. According to results of this study[14], all keys information about the environment of the experiments and are needed to fetch, MongoDB has better performance methodology. “Results of Experiments” section presents than the others. the results and graphics from the experiments. The aim of the experiments is a comparison of the “Conclusions” section summarizes and concludes the relational and non-relational DBMSs for utilization an IoT experiments and gives a recommendation for future of this platform. The system includes IoT devices which publish study. a tremendous number of sensor data where servers store and process them. Performance of reading and writing TEST ENVIRONMENT & METHODOLOGY tests has been done in both MYSQL and MongoDB in this In this paper, research is conducted on the relational and study. The test results which are calculated after the non-relational databases. Server and clients of chosen application runs at least three times are compared to find DBMSs to be used in our experiments are set up out where we can store data of IoT considering lowest cost separately in virtual machines with Ubuntu 16.04 Server in terms of throughput. version. Virtual machines are hosted on a physical machine (i7 6700HQ, 16 GB DDR3 RAM, and SSD disc). IoT platform is defined which collects, and processes The virtual machines (4 CPU cores, 4 GB RAM) are sensor data as seen in Figure 1. The sensor data is executed on the physical servers as depicted in Figure 2. produced by devices/clients and collected by Data A dedicated network is a setup among the servers. Storage Server (DSS) in server side. The data is stored in Therefore, it is guaranteed that another network traffıc is a DBMS through insert and update operations. Data not disrupted the tests. Both DBMSs are installed on the Query Server (DQS) in the platform provides an interface computer with SSD for the fastest possible read and write for processing and reporting by Client Application. The speed and they are executed separately during the test client application sends the data request to the DQS and scenarios. Multithreaded Java applications for reading and they are formed as a query for DBMS. The result of the writing operations on DBMSs are implemented. query is delivered to the client application. In our case, enormous data should be stored in the DBMS. Therefore, Experiments’ constraints are the number of machines, writing operations are more important than the reading number of threads, number of messages, and the size of operations. The platform has active-active architecture. the string. These constraints are applied for both DBMSs. Therefore, more than one DSSs can handle the data The application is executed on a virtual machine; therefore, the applications are limited in terms of the CPU core and memory. In this study, since writing operations are more important than reading operations, our tests concentrate on the writing operations. In our tests, data examples are selected as similar to real-time sensor applications. In this study, two columns are defined including variable- length string type, an integer type. A primary key column is automatically defined in MySQL, but it needs to be defined in MongoDB. The execution time of the tests is calculated in milliseconds. The number of messages is Figure 1.The IoT Platform Architectural Structure measured dividing the total number of messages into the experiment’s execution time. MYSQL MySQL; forty threads for one computer and twenty 20,00 threads for two computers and also for MongoDB; eighty NUMBER OF MESSAGES/TIME 15,00 threads for one computer, forty threads for two computers, and twenty threads for three computers. It can be seen that 10,00 MongoDB supports more than sixty threads for best throughput. 5,00 MYSQL NUMBER OF MESSAGES / TIME 20,00 0,00 500000 1000000 1500000 2000000 4000000 NUMBER OF MESSAGES 1-10 1-20 1-40 1-80 10,00 2-10 2-20 2-40 2-80 0,00 Figure 3.Result Graph of Write Operations on MySQL 500000 1000000 1500000 2000000 4000000 NUMBER OF MESSAGES 2-10-100 2-20-100 2-40-100 RESULTS OF EXPERIMENTS 2-10-1000 2-20-1000 2-40-1000 Insert tests are executed with a multithreaded Java 2-10-2000 2-20-2000 2-40-2000 application. The application sends insert SQL request which contains a string (100 characters) and an integer Figure 5.Results of String Length Experiments on MySQL value to DBMSs. The application is executed in It is also tested the data with different variable string computers based on the parameters (number of computer length as depicted in Figure 5 and Figure 6. Since best and threads) as depicted in Figure 3 and Figure 4. Best results are obtained from the two computers, only two throughput is 18.21 messages/millisecond in average for computers cases are tested in these tests. As it can be seen MySQL such that is succeeded with two computers each that 18.21, 9.33, and 6.13 messages/millisecond in has twenty threads as presented in Figure 3. As it can be average are obtained for MySQL with 100, 1000, and seen that forty threads for one and two computers 2000 string length respectively as depicted in Figure 5. utilization have close results. Best throughput will go up Similarly, previous results, twenty threads utilization for when the number of threads increases. However, when the MySQL has best throughput results. For MongoDB, forty number of threads reaches eighty, throughput value begins threads utilization has best throughput results; 70.95, to decrease. Therefore, it has been decided that MySQL 49.04, and 39.85 messages/millisecond in average are can manage forty threads for best results. obtained with 100, 1000, and string (2000 characters) Similarly, best throughput is 70.95 messages/millisecond respectively as depicted in Figure 6. MongoDB DBMS in average for MongoDB such that is succeeded with two has about four times better results than MYSQL. As it can computers each has forty threads as presented in Figure 4. be seen that length of the message is one of the most To eliminate the effects of the thread switching, the third important parameters. Such as when the length is doubled, computer is also used for MongoDB. Throughput for throughput is decreased about twenty percent. utilization of the one, two and three computers is 61.22, Select tests are also executed with the same multithreaded 70.95, and 61.65 messages/millisecond, respectively. As Java application. The application retrieves data from two it can be seen that two computers’ utilization has the best DBMSs. The application is executed on a computer and performance. Another result can be obtained from the two computers with a different number of the threads as Figure. 3 and Figure. 4 is a correlation between a number depicted in Figure 7 and Figure 8. MySQL results are not of threads and computers. Best results are obtained for 80,00 MONGODB 80,00 MONGODB NUMBER OF MESSAGES / TIME NUMBER OF MESSAGES/TIME 60,00 60,00 40,00 40,00 20,00 20,00 0,00 500.000 1.000.0001.500.0002.000.0004.000.000 0,00 NUMBER OF MESSAGES 500000 1000000 1500000 2000000 4000000 1-10 1-20 1-40 1-80 NUMBER OF MESSAGES 2-10-100 2-20-100 2-40-100 2-80-100 2-10 2-20 2-40 2-80 3-10 3-20 3-40 3-80 2-10-1000 2-20-1000 2-40-1000 2-80-1000 2-10-2000 2-20-2000 2-40-2000 2-80-2000 Figure 6.Results of String Length Experiments on Figure 4.Result Graph of Write Operations on MongoDB MongoDB The DQS in the IoT platform applies reading operations 80,00 MYSQL on the DBMS. Test results show that MongoDB has better NUMBER OF MESSAGES/TIME throughput than the MySQL. MongoDB has 55.07 60,00 messages/millisecond in average and MySQL has 46.66 40,00 messages/millisecond in average for two computers. Therefore, MongoDB is selected as DBMS for writing 20,00 and reading operations in the IoT Platform. 0,00 ACKNOWLEDGEMENT 1 0 0 . 0 0 0 2 5 0 . 0 0 0 5 0 0 . 0 0 0 1 . 0 0 0 . 0 0 02 . 0 0 0 . 0 0 0 These results preliminary study of the project proposal NUMBER OF MESSAGES 1-10 1-15 1-20 1-40 1-80 applied to TUBITAK 1505 University Industry 2-10 2-15 2-20 2-40 Collaboration Grant Program and the study is supported by EMKO Electronic A.Ş located in Bursa, Turkey. Figure 7.Result Graph of Read Operations on MySQL REFERENCES stable as MongoDB. Best throughput is 60.09 [1] “IoT.” [Online]. Available: messages/millisecond and 44.34 in average for MySQL https://connectedtechnbiz.wordpress.com/tag/internet-of- things/. with one computer has eighty threads. In MongoDB, the best result is 68.68 messages/millisecond and 58.61 [2] “Gartner Says 6.4 Billion Connected ‘Things’ Will Be in Use messages/millisecond in average is succeeded with three in 2016, Up 30 Percent From 2015,” 2015. [Online]. Available: http://www.gartner.com/newsroom/id/3165317. computers each has eighty threads as depicted in Figure 8. Furthermore, it has been concluded that MongoDB can [3] “JSON.” [Online]. Available: http://json.org/. manage eighty threads for reading operations. [4] “MongoDB Documentation,” https://docs.mongodb.com/manual/. . CONCLUSIONS [5] K. Chodorow, Mongo DB: The Definitive Guide. 2013. Our IoT platform requires that forty messages/milliseconds in average should be written to the [6] “CouchDB.” [Online]. Available: http://couchdb.apache.org/. chosen DBMS. Otherwise, the number of messages [7] “HBASE.” [Online]. Available: https://hbase.apache.org/. waiting in the queue for writing will increase and it can [8] “Cassandra.” [Online]. Available: cause memory problems. In the IoT platform, active- http://cassandra.apache.org/. active architecture is applied. Therefore, more than one [9] “Amazon SimpleDB.” [Online]. Available: computer can write to DBMS at the same time. Test https://aws.amazon.com/simpledb/. results show that MongoDB has better performance than [10] “Redis.” [Online]. Available: https://redis.io/. the MYSQL in terms of both writing and reading operations. In the IoT Platform, the message payload is [11] G. Aydin, I. R. Hallac, and B. Karakus, “Architecture and varying between 100 bytes and 200 bytes. For these types implementation of a scalable sensor data storage and analysis system using cloud computing and big data technologies,” J. of messages, MongoDB has 70.95 messages/milliseconds Sensors, vol. 2015, 2015. in average and MySQL has 18.21 messages/milliseconds [12] S. Chickerur, A. Goudar, and A. Kinnerkar, “Comparison of in average for writing. As it can be seen that only Relational Database with Document-Oriented Database MongoDB satisfy the target expectations 40 (MongoDB) for Big Data Applications,” Proc. - 8th Int. Conf. messages/milliseconds in average. MongoDB also can Adv. Softw. Eng. Its Appl. ASEA 2015, pp. 41–47, 2016. satisfy the requirement with a single machine which has [13] Z. Parker, S. Poe, and S. V. Vrbsky, “Comparing NoSQL 61.22 messages/milliseconds throughput for writing. MongoDB to an SQL DB,” Proc. 51st ACM Southeast Conf. - ACMSE ’13, p. 1, 2013. 80,00 MONGODB [14] Y. Li and S. Manoharan, “A performance comparison of SQL NUMBER OF MESSAGES / TIME and NoSQL databases,” IEEE Pacific RIM Conf. Commun. Comput. Signal Process. - Proc., no. August 2013, pp. 15–19, 60,00 2013. [15] P. T. A. Mai, J. K. Nurminen, and M. Di Francesco, “Cloud 40,00 databases for internet-of-things data,” Proc. - 2014 IEEE Int. Conf. Internet Things, iThings 2014, 2014 IEEE Int. Conf. Green Comput. Commun. GreenCom 2014 2014 IEEE Int. 20,00 Conf. Cyber-Physical-Social Comput. CPS 20, no. iThings, 100.000 250.000 500.000 1.000.000 2.000.000 pp. 117–124, 2014. NUMBER OF MESSAGES 1-10 1-15 1-20 1-40 1-80 2-10 2-15 2-20 2-40 2-80 3-10 3-15 3-20 3-40 3-80 Figure 8.Result Graph of Read Operations on MongoDB BIOGRAPHY(S) Gizem Kiraz Cengiz Toğay Gizem Kiraz has completed her Cengiz Togay, Ph.D. is an assistant undergraduate (2016) in Computer professor at Uludag University's Engineering Department from the Computer Engineering Department. Pamukkale University (PAÜ). Her He obtained his undergraduate (1999) postgraduate has started (2017) in and MS (2001) in computer Computer Engineering Department engineering from the Canakkale from the Uludag University. Onsekiz Mart University and his PhD Currently, she is studying on the (2008) in computer engineering Internet of Things. department from Middle East Technical University. He has national and international patent applications, papers, articles and projects about Software Engineering, Secure Communications, Smart Cards, and Internet of Things(IoT).