=Paper= {{Paper |id=Vol-2492/paper2 |storemode=property |title=Performance of Raspberry Pi microclusters for Edge Machine Learning in Tourism |pdfUrl=https://ceur-ws.org/Vol-2492/paper2.pdf |volume=Vol-2492 |authors=Andreas Komninos,Ioulia Simou,Nikolaos Gkorgkolis,John Garofalakis |dblpUrl=https://dblp.org/rec/conf/ami/KomninosSGG19 }} ==Performance of Raspberry Pi microclusters for Edge Machine Learning in Tourism== https://ceur-ws.org/Vol-2492/paper2.pdf
     Performance of Raspberry Pi microclusters for Edge
               Machine Learning in Tourism

      Andreas Komninos                Ioulia Simou            Nikolaos Gkorgkolis             John Garofalakis
                           [akomninos, simo, gkorgkolis, garofala]@ceid.upatras.gr
               Computer Technology Institute and Press “Diophantus”, Rio, Patras, 26504, Greece




                                                       Abstract
                      While a range of computing equipment has been developed or proposed
                      for use to solve machine learning problems in edge computing, one of
                      the least-explored options is the use of clusters of low-resource devices,
                      such as the Raspberry Pi. Although such hardware configurations have
                      been discussed in the past, their performance for ML tasks remains
                      unexplored. In this paper, we discuss the performance of a Raspberry
                      Pi micro-cluster, configured with industry-standard platforms, using
                      Hadoop for distributed file storage and Spark for machine learning.
                      Using the latest Raspberry Pi 4 model (quad core 1.5GHz, 4Gb RAM),
                      we find encouraging results for use of such micro-clusters both for local
                      training of ML models and execution of ML-based predictions. Our
                      aim is to use such computing resources in a distributed architecture to
                      serve tourism applications through the analysis of big data.




1    Introduction
The rise of machine learning (ML) applications has led to a sharp increase in research and industrial interest in
the topic. Paired with the increase of Internet of Things (IoT) deployments, ML is used to deliver batch (off-
line) and real-time (streaming) processing of big data, to serve a variety of purposes, including real-time data
analytics, recommender systems, forecasting via regressors and classification of numerical, textual and image data
[LOD]. Typical deployments involve a distributed architecture, where remote nodes submit data to a central
analysis and storage system, which in turn either stores the data for future processing, or responds by returning
ML results to the contributing nodes. These central storage and processing systems are sometimes physically
co-located, but often are distributed themselves across various datacenters, in a cloud computing configuration,
particularly in large scale systems. As data is gathered centrally, remote clients benefit from ML results based
on the contribution of all nodes in a system, but suffer from issues such as network latency (which is important
for time-critical applications) and reliability (since the central repository becomes a single point of failure in
the system). More recently, the concept of edge computing has sought to address some of these problems, by
further distributing the storage and processing capabilities of the system, to nodes closer to the end-user [APZ].
These nodes are not as resource-rich as cloud computing datacenters, but are generally more capable than typical
IoT devices. By relocating the storage and ML model execution closer to end users, the system becomes more
responsive and is more resilient, though one associated drawback is that edge nodes cannot store as much data
and thus cannot derive highly accurate models, compared to cloud-computing setups.
Copyright c by the paper’s authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
In: E. Calvanese Strinati, D. Charitos, I. Chatzigiannakis, P. Ciampolini, F. Cuomo, P. Di Lorenzo, D. Gavalas, S. Hanke, A.
Komninos, G. Mylonas (eds.): Proceedings of the Poster and Workshop Sessions of AmI-2019, the 2019 European Conference on
Ambient Intelligence, Rome, Italy, 04-Nov-2019, published at http://ceur-ws.org




                                                             1
   Interest in edge deployments is strong, since the drawbacks can be mitigated using alternative deployment
approaches. For example, edge nodes can forward data to a cloud server so that complex and powerful ML
models can be built. These models can be saved and distributed back to the edge nodes for use. Additionally,
edge nodes can pre-process data locally before forwarding them to cloud servers, helping with the distribution
of data cleansing and transformation workloads. Edge computing architectures also lend themselves particularly
well to certain types of application, where the users might be more interested in locally pertinent data. Edge
computing hardware form factors vary, ranging from server-class hardware, to simple desktop-class computers,
and even IoT-class devices (e.g. Arduino, Raspberry Pi) can perform edge computing roles. Recently, dedicated
edge nodes for ML have been developed (e.g. Google Coral, Nvidia Jetson, Intel Neural Compute), and hardware
accelerator add-ons for existing platforms are also on the market (e.g. Raspberry PI AI hat, Intel 101).
   One interesting configuration which has not gained much attention, is the ability of IoT devices running Linux
operating systems, to work together in a cluster computing configuration. This ability leverages many of the
known advantages of cloud computing (e.g. using a distributed file system such as HDFS, running big data
analytics engines such as Spark), providing a scalable solution for powerful and resilient local edge components,
while keeping deployment costs low. In this paper, we explore the performance of small Raspberry Pi (RPi)
clusters in the role of an IoT ML edge server, using the RPi-4 model, which is the latest release in this product
line. Although Pi clusters have been reported in previous literature, the RPi-4 model is newly released (Q2 2019)
and its significant hardware improvements that make it a more realistic option for this role than before.

2   Related Work
                                         Table 1: Raspberry Pi models


                                            Model 1B    Model 2B     Model 3B     Model 4B
                      Cores                 1           4            4            4
                      CPU Clock (GHz)       0.7         0.9          1.2-1.4      1.5
                      RAM (Gb)              0.5         1            1            1-4
                      Network (Mbps)        100         100          100          1000


   The performance of Raspberry Pi clusters has been investigated in the past literature, mostly using Model
1B an 2B devices (see Table 1). The first known paper to report findings on such a deployment is [TWJ+ ], with
a configuration of 56 Model-B RPis. No specific performance evaluations were reported but the advantages of
low-cost and the ability to use such clusters for research and educational purposes were highlighted in this paper.
An even larger Model-B cluster (300 nodes) is reported in [AHP+ ], although again no performance evaluation is
discussed. A smaller Model-B cluster (64 nodes) is discussed in [CCB+ ], demonstrating that network bandwidth
and limited memory are barriers for such clusters. The performance advantages in computing power depend on
the size of the computational task, with smaller problems not benefiting from additional computing resources
(nodes) due to communication overheads, and memory size limiting the size of the computational task that can
be performed. Similar results demonstrating the computational performance drop from the theoretical linear
increase line as nodes are added, are obtained by [CPW] using a 24 unit Model 2B cluster, and also in a 12-node
Model 2B [HT] and an 8-node Model 2B cluster [MEM+ ]. In [KS] the performance using external SSD storage
(attached via USB) is evaluated, demonstrating that big data applications on such clusters (20 x Model 2B) is
bound by the CPU limitations. In [QJK+ ], a 20-node RPi Model 2B cluster is investigated for real-time image
analysis. Its performance was lower than virtual machines running on traditional PCs, however, the small form
factor paired with the relatively good performance, make such clusters ideal for mobile application scenarios.
   With regard to RPi cluster performance in the execution of ML algorithms, [SGS+ ] describe the performance
of an 8-node Model 2B cluster. The researchers concluded that RPi clusters offer good tradeoffs between energy
consumption and execution time, though better support for parallel execution is needed to improve performance.
Such optimisations are demonstrated in [CBLM], with the 28-node Model 2B cluster outperforming a traditional
multicore server, in terms of processing power and power consumption, using 12 nodes only.
   RPi clusters have been proposed for use in educational settings, to teach distributed computing [DZ, PD, Tot],
as servers for container-based applications [PHM+ ] and as research instruments, e.g. to analyse data from social
networks [dABV] or in security research [DAP+ ].




                                                        2
    Overall, while significant interest in the research community has been shown towards RPi clusters, there is
presently no work demonstrating their performance in ML roles except in [SGS+ ]. Hence, our goal for this paper
is to investigate the performance of RPi clusters in an edge ML role, using the latest RPi Model 4B model which
overcomes some of the previous network and memory constraints.

3     RPi cluster configuration
Our cluster consists of 6 RPi4 Model B devices (Fig. 1), with 4Gb RAM available on each node. Additionally,
the devices were equipped with a 64Gb MicroSD card with a V30 write speed rating (30Mb/s). The devices
were connected to a Gigabit ethernet switch to take full advantage of the speed improvements in the network
card. The cluster was configured with a Hadoop distributed file system (3.2.0) and Apache Spark (2.4.3). As
such, we are able to leverage Spark’s MLlib algorithms for distributed machine learning.




           Figure 1: The RPi micro-cluster on 5mm grid paper (A4) demonstrating physical dimensions


3.1     Hadoop & Spark execution environment configuration
For the distributed file storage, we set the number of file replicas to 4. Since Hadoop is installed in the cluster,
we use the YARN resource allocator for the execution of Spark jobs. Each YARN container was configured
with 1920Mb of RAM (1.5Gb + 384Mb overhead), leaving 1Gb available for YARN master execution and 1Gb
of RAM for the operating system (Raspbian 10 - Buster). Although the cluster can be configured to run with
multiple Spark executors on each device, we opted for a ”fat executor” strategy, meaning one executor per device.
Additionally, as a baseline configuration scenario (S-base), we opted for reserving one processor core on each
device for use by the operating system, thus resulting in 3 cores per executor. Jobs were submitted from within
the cluster, therefore one device always played the role of the client, one device played the role of the application
manager (YARN), thus up to 4 devices were available as executors, in order to run the Spark jobs.

3.2     Datasets and ML algorithms
We used two datasets for our experiments. First, we perform experiments using a large dataset, in this case
the used car classifieds dataset1 . Secondly, since we aim to apply the RPi cluster in a tourism recommender
system for Greece, as part of an ongoing project, we used the global scale checkins dataset [YZQ] and the Greek
weather dataset2 . From the former, we selected all check-ins made in the Attica region of Greece, and we fused
the resulting data with the historical weather information from the latter dataset. Both datasets were uploaded
to the Hadoop distributed file system in the cluster.
   The used cars dataset was used to perform simple linear regression, using car model year and odometer reading,
to predict its sale price. The check-ins dataset was used with the decision tree classification algorithm. In this
case, by providing the geographical coordinates, month, day, hour and daily mean, high and low temperatures,
the target was to determine the type of venue a user might check into (a multi-class classification task). This
can be used to recommend types of venue that a user might like to visit depending on their current context.
    1 https://www.kaggle.com/austinreese/craigslist-carstrucks-data
    2 https://www.kaggle.com/spirospolitis/greek-weather-data




                                                                 3
4      Experiment 1 - programming language performance
Apache Spark provides programming interfaces for the Python language, popular amongst ML developers, and
also uses Scala natively. Since Python is a dynamically typed and interpreted language, where Scala is statically
typed and compiled, Scala should provide a performance advantage for applications in our cluster, however, this
depends on the type and volume of data used. In this first experiment, we compare the performance of a simple
application written in both languages. The application includes the following tasks in sequence:

    1. Load a dataset from HDFS into Spark Dataframe
    2. Select feature and label columns
    3. Filter out samples with NULL values
    4. Assemble feature columns into a single feature vector and append to dataset
    5. Split dataset to training and test datasets (0.7, 0.3)
    6. Train a machine learning model
    7. Perform predictions on the test dataset

   For this first experiment, we used the cars dataset to train a linear regression model. As a result of the data
cleansing process, the final dataset contained 9,585,316 samples. We measured the time taken to complete the
data loading, the model training and predictions tasks, as shown in Fig. 2. From these results we can see that
the Scala program executes more slowly when the number of executors is small, however, it achieves parity or
even outperforms Python in execution time as the number of executors increases. Based on these results, we
chose to proceed with the rest of the experimentation using Scala as the programming language.




                       (a) Model training                                  (b) Predictions on test set

                                     Figure 2: Programming language performance


5      Experiment 2 - Performance in ML tasks
Next, we implemented two ML algorithms to assess the cluster’s performance. In this case, we were inter-
ested in determining how the number of executors and number of cores per executor affect performance, in
the implementation of different ML algorithms. As such, we retain the baseline configuration scenario S-base
(fixed ncores/executor = 3, variable nexecutors ∈ [1, 4]) and add a further scenario (fixed nexecutors = 4, variable
ncores/executor ∈ [1, 4]). This scenario is termed S-core henceforth. Note that, despite standard practice, in
S-core we allocate up to 4 cores (the maximum available), to investigate the full resource utilisation contesting
the OS requirements. The sequence of tasks was identical to the previous experiment, changing of course the
type of model to be trained. Additionally, we implemented three extra steps:
    8. Write the trained model to distributed storage
    9. Load the pre-trained model from distributed storage
10. Perform a single prediction given a random feature vector from the test set




                                                           4
   These additional tasks emulate the concept of batch training at regular intervals on the edge node, and using
the pre-trained models for application purposes. For the second experiment, we used both datasets. We note
that as a result of the data pre-processing task, the final check-ins dataset contains 87,908 samples.

5.1   Application startup overhead
First, we measured the overhead time taken to obtain the necessary SparkContext environment (i.e. assigning a
YARN application master and attaching executor nodes to the process). From Fig. 3 we note that the required
overhead fluctuates but remains roughly constant in all cases (S-base cars : µ = 29.322s, σ = 2.419s, check-ins
µ = 27.089s, σ = 1.278s; S-core cars : µ = 30.271s, σ = 1.791s, check-ins µ = 28.013s, σ = 0.466s). Of course
this overhead is required only at application startup and is not incurred for every request, when the application
is written as a server waiting to receive ML result requests.




                 (a) Using 3 cores / executor                                    (b) Using 4 executors

                                           Figure 3: Spark context overhead


5.2   Data retrieval from HDFS
Another metric is the time taken to load the dataset from HDFS storage, as a Spark Dataframe. As seen from
Fig. 4, the dataset loading time is almost constant, demonstrating that any overhead comes from the HDFS
access process and not Spark itself (S-base cars : µ = 20.259s, σ = 0.243s, check-ins µ = 20.374s, σ = 0.263s;
S-core cars : µ = 20.077s, σ = 0.182s, check-ins µ = 20.372s, σ = 0.071s). Notably both datasets fit comfortably
within the memory allocated to each executor container.




                 (a) Using 3 cores / executor                                    (b) Using 4 executors

                                                Figure 4: Dataset loading time


5.3   Data transformation
After data is loaded into Spark, the first operation on the data is transformation, including cleansing null
samples, re-casting data columns to appropriate data types, assembling the feature vector column and encoding
the prediction label (for multi-class classification). As seen in Fig. 5, data transformation is more intensive for the
check-ins dataset, as a result of the encoding of the prediction label (> 300 labels). Interestingly, while at all other
cases the transformation times remain constant, for the S-base configuration we note that the transformation
time increases with more than 2 executors. This is the result of the distribution of the mapping operation of




                                                              5
the dataset across multiple nodes and the overhead caused by the communication requirements in reducing and
aggregating results across more nodes (S-base cars : µ = 0.828s, σ = 0.251s, check-ins µ = 16.518s, σ = 5.1923s;
S-core cars : µ = 0.804s, σ = 0.259s, check-ins µ = 21.736s, σ = 0.544s).




                (a) Using 3 cores / executor                                   (b) Using 4 executors

                                         Figure 5: Data transformation time


5.4   Model training
Next, we investigate the time required to train the models in each scenario. Notably, in all cases, increasing the
number of executors or cores per executor yields a performance advantage, even if small. This effect is significantly
more pronounced for the cars datasets, which is much larger in size (S-base cars : µ = 371.218s, σ = 197.369s,
check-ins µ = 60.329s, σ = 9.636s; S-core cars : µ = 108.332s, σ = 108.332s, check-ins µ = 56.893s, σ =
11.684s). A further observation is that allowing access to the 4th core (S-core) that is typically reserved for the
operating system, doesn’t particularly improve performance. Notably, the average time to train the models is
not prohibitive, even in the least favourable conditions, and does not exceed a few minutes of execution time.
This demonstrates that periodic training of the edge-based models is feasible and can be performed comfortably
in times of low resource demand, even when using large datasets.




                (a) Using 3 cores / executor                                   (b) Using 4 executors

                                               Figure 6: Model training time


5.5   Performing predictions
In terms of time required to evaluate test sets, again we note that the larger dataset (cars) benefits from multiple
executors and number of cores per executor (Fig. 7). As before, allowing access to the additional core in the
S-core scenario, doesn’t improve performance. Finally, it is noteworthy that for the smaller check-ins dataset,
the execution time for the prediction set is sub-second (S-base cars : µ = 256.117s, σ = 157.592s, check-ins
µ = 0.159s, σ = 0.114s; S-core cars : µ = 173.103s, σ = 68.294s, check-ins µ = 0.176s, σ = 0.156s). These results
demonstrate that the cluster is able to resolve predictions on even very large sets, in under 2 minutes.
   Related to these results, we report that the prediction of a single feature vector is almost instantanous, across
all conditions, often requiring sub-millisecond execution time (S-base cars : µ = 0.004s, σ = 0.001s, check-ins
µ = 0s, σ = 0.001s; S-core cars : µ = 0.003s, σ = 0.001s, check-ins µ = 0s, σ = 0s). Additionally, the time
required to store and load the trained models is very short, as can be seen in Fig. 8 (Saving: S-base cars :
µ = 5.152s, σ = 0.538s, check-ins µ = 6.323s, σ = 0.145s; S-core cars : µ = 4.951s, σ = 0.489s, check-ins




                                                            6
                   (a) Using 3 cores / executor                                   (b) Using 4 executors

                                             Figure 7: Predictions on test set time
µ = 5.924s, σ = 0.666s; Loading: S-base cars : µ = 3.928s, σ = 0.767s, check-ins µ = 4.394s, σ = 0.373s; S-core
cars : µ = 3.912s, σ = 0.731s, check-ins µ = 4.348s, σ = 0.248s).




                   (a) Using 3 cores / executor                                   (b) Using 4 executors

                                             Figure 8: Save and load model time

5.6    CPU loads
Finally, we used the SparkLint3 package to gather statistics from job execution history about CPU utilisation.
From these results we note that as the number of executors increases (S-base) the level of CPU utilisation across
the entire job decreases, meaning that an increased number of executors affords the cluster more capacity to run
additional parallel tasks, as can be expected (Fig. 9). However, with the maximum number of executors (S-core),
additional core allocation does not affect CPU utilisation when more than 2 cores are allocated, showing that the
additional resources are indeed utilised to decrease overall execution time. Further, plotting the two extremes of
this scenario (1 core/executor and 4 cores/executor) we see that the resource utilisation varies significantly. In
the former case, most work is carried out using 1 or 2 cores in the cluster (i.e. 1 - 2 executors), while in the latter
case, the majority of task execution is separated across all available 16 cores, leading to the reduced execution
time (Fig. 10). In this figure, the grey area is idle time (data has been transferred to the driver node), yellow
is node-local (data and code resides on the same node), orange is rack-local (processing where data needs to be
fetched from another node) and finally green represents local (in-memory) execution time. As part of the job
analysis, the main aim here is to minimize the grey area which means that the cluster resources are not being
utilised, and as can be seen, the greater number of cores per executor achieves this goal.

6     Training vs. Accuracy tradeoffs
In the preceding analyses, the ML model parameters used were the default values set by Spark’s MLlib. Specif-
ically for the decision tree (checkins dataset) case, the generated model is quite simple (max depth: 5, min
information gain: 0, min instances per node: 1, information gain measure: gini index). To assess the cluster’s
performance we considered a scenario where types of venue to check-in could be recommended for a large number
of cases, roughly correspondent to the number of venues in a typical city center. We took 5% of the dataset
for this purpose (4395 cases) and trained the decision tree on the remaining 95% of the data, using another ML
    3 https://github.com/groupon/sparklint




                                                               7
                (a) Using 3 cores / executor                                  (b) Using 4 executors

                                               Figure 9: CPU utilisation




                (a) Using 1 cores / executor                               (b) Using 4 cores / executor

                                       Figure 10: CPU utilisation distribution

environment (RapidMiner Studio) for convenience, and found (using random parameter search) that a very good
performance of 89.15% accuracy can be achieved (max depth: 30, min information gain: 0.01, min instances per
node: 2, information gain measure: entropy). Running this experiment on the RPi cluster however yielded an
unexpected surprise. While on the Rapidminer environment training took a few seconds, on the RPi cluster the
process failed due to insufficient memory on the Java heap, after several minutes of processing. As a reference,
YARN containers on the cluster consume up to 2.5G RAM, including 2G for Spark executors and the related
overhead (384M). To lighten the load, we experimented to find a smaller tree complexity (depth) and training
set size that would yield comparable performance using RapidMiner. We found that a max depth of 15 and
sample size at 40% of the original (33405 samples) yielded a good compromise (see Fig. 11). Thus, to assess
cluster performance in a more realistic scenario, we ran the experiment again for different sizes of the training
dataset between 10 and 40% and predicting on the same number of cases, to assess training time and performance
tradeoffs. As shown in Fig.12, an increase of the training set size leads to expectable increases in training time,
but not necessarily accuracy. For reference, a fair performance of 72.33% accuracy is attainable with 5m48s of
training time using 20% of the original training set (16702 samples).

7   Discussion
In the preceding sections, we have demonstrated the ability to run a small RPi cluster as an edge computing
resource, using industry standards such as the Hadoop distributed file system, and Apache Spark for machine
learning and data analytics. To the best of our knowledge, this is the first work to present an analytical evaluation
of the RPi Model 4B in a cluster configuration for ML tasks. We have demonstrated that the performance of this
cluster is sufficient for the purposes of both training and executing ML models in an edge computing context.
One of the most encouraging results from our analysis is the short time required to load pre-trained ML models
and execute predictions with very fast speed. However, significant additional work remains.




                                                          8
                        Figure 11: Decision tree model complexity vs. accuracy tradeoffs




                                 Figure 12: Training time vs. accuracy tradeoff

   Firstly, we have run two popular, albeit ”lightweight” ML algorithms (linear regression and decision trees).
The performance of the cluster should be evaluated using more complex models supported by Spark’s MLlib
(e.g. gradient boosted trees). Additionally, third party ML libraries such as DeepLearning4J and Tensorflow
should be tested for performance, since they support various implementations of artificial neural networks which
are better suited for heavier tasks (e.g. image classification and NLP tasks). Another aspect to examine is the
size of dataset that can be handled by the cluster. Even though we have tested with a relatively large and a
smaller dataset, more analysis is required to identify the performance tradeoffs between dataset size and speed
of model training. Additionally, we need to test the cluster’s capacity to serve under various request loads.
We have noted that single feature vectors can be regressed or classified with sub-millisecond timing, but a real
on-line application processing multiple simultaneous user/device requests or handling streaming data (e.g. from
IoT devices or social network feeds), will place strain on the cluster’s ability to respond in real-time. As a final
note, we highlight that our cluster is quite a small setup. This is intentional in our setup, since applications
requiring edge computing infrastructures may have strict form factor and physical size limitations. However, it
would be interesting to see how performance scales with additional nodes in the cluster.

7.0.1   Acknowledgements
Mr. Antonis Frengkou and Spyros Drimalas helped with the resources necessary for this experiment. Research
in this paper was funded by the Hellenic Government NSRF 2014-2020 (Filoxeno 2.0 project, T1EDK-00966)

References
[AHP+ ] P. Abrahamsson, S. Helmer, N. Phaphoom, L. Nicolodi, N. Preda, L. Miori, M. Angriman, J. Rikkilä,
        X. Wang, K. Hamily, and S. Bugoloni. Affordable and Energy-Efficient Cloud Computing Clusters:
        The Bolzano Raspberry Pi Cloud Cluster Experiment. In 2013 IEEE 5th International Conference on
        Cloud Computing Technology and Science, volume 2, pages 170–175.

[APZ]     Yuan Ai, Mugen Peng, and Kecheng Zhang. Edge computing technologies for Internet of Things: A
          primer. 4(2):77–86.




                                                         9
[CBLM] K. Candelario, C. Booth, A. S. Leger, and S. J. Matthews. Investigating a Raspberry Pi cluster
       for detecting anomalies in the smart grid. In 2017 IEEE MIT Undergraduate Research Technology
       Conference (URTC), pages 1–4.
[CCB+ ] Simon J. Cox, James T. Cox, Richard P. Boardman, Steven J. Johnston, Mark Scott, and Neil S.
        O’Brien. Iridis-pi: A low-cost, compact demonstration cluster. 17(2):349–358.
[CPW]     Michael F. Cloutier, Chad Paradis, and Vincent M. Weaver. A Raspberry Pi Cluster Instrumented for
          Fine-Grained Power Measurement. 5(4):61.
[dABV] Mariano d’ Amore, Rodolfo Baggio, and Enrico Valdani. A Practical Approach to Big Data in Tourism:
       A Low Cost Raspberry Pi Cluster. In Iis Tussyadiah and Alessandro Inversini, editors, Information
       and Communication Technologies in Tourism 2015, pages 169–181. Springer International Publishing.
[DAP+ ] S. Djanali, F. Arunanto, B. A. Pratomo, H. Studiawan, and S. G. Nugraha. SQL injection detection
        and prevention system with raspberry Pi honeypot cluster for trapping attacker. In 2014 International
        Symposium on Technology Management and Emerging Technologies, pages 163–166.
[DZ]      Kevin Doucet and Jian Zhang. Learning Cluster Computing by Creating a Raspberry Pi Cluster. In
          Proceedings of the SouthEast Conference, ACM SE ’17, pages 191–194. ACM.
[HT]      Wajdi Hajji and Fung Po Tso. Understanding the Performance of Low Power Raspberry Pi Cloud for
          Big Data. 5(2):29.
[KS]      C. Kaewkasi and W. Srisuruk. A study of big data processing constraints on a low-power Hadoop
          cluster. In 2014 International Computer Science and Engineering Conference (ICSEC), pages 267–
          272.
[LOD]     H. Li, K. Ota, and M. Dong. Learning IoT in Edge: Deep Learning for the Internet of Things with
          Edge Computing. 32(1):96–101.
[MEM+ ] A. Mappuji, N. Effendy, M. Mustaghfirin, F. Sondok, R. P. Yuniar, and S. P. Pangesti. Study of
        Raspberry Pi 2 quad-core Cortex-A7 CPU cluster as a mini supercomputer. In 2016 8th International
        Conference on Information Technology and Electrical Engineering (ICITEE), pages 1–4.
[PD]      A. M. Pfalzgraf and J. A. Driscoll. A low-cost computer cluster for high-performance computing
          education. In IEEE International Conference on Electro/Information Technology, pages 362–366.
[PHM+ ] C. Pahl, S. Helmer, L. Miori, J. Sanin, and B. Lee. A Container-Based Edge Cloud PaaS Architecture
        Based on Raspberry Pi Clusters. In 2016 IEEE 4th International Conference on Future Internet of
        Things and Cloud Workshops (FiCloudW), pages 117–124.
[QJK+ ] Basit Qureshi, Yasir Javed, Anis Koubâa, Mohamed-Foued Sriti, and Maram Alajlan. Performance of
        a Low Cost Hadoop Cluster for Image Analysis in Cloud Robotics Environment. 82:90–98.
[SGS+ ]   João Saffran, Gabriel Garcia, Matheus A. Souza, Pedro H. Penna, Márcio Castro, Luı́s F. W. Góes, and
          Henrique C. Freitas. A Low-Cost Energy-Efficient Raspberry Pi Cluster for Data Mining Algorithms. In
          Frédéric Desprez, Pierre-François Dutot, Christos Kaklamanis, Loris Marchal, Korbinian Molitorisz,
          Laura Ricci, Vittorio Scarano, Miguel A. Vega-Rodrı́guez, Ana Lucia Varbanescu, Sascha Hunold,
          Stephen L. Scott, Stefan Lankes, and Josef Weidendorfer, editors, Euro-Par 2016: Parallel Processing
          Workshops, Lecture Notes in Computer Science, pages 788–799. Springer International Publishing.
[Tot]     D. Toth. A Portable Cluster for Each Student. In 2014 IEEE International Parallel Distributed
          Processing Symposium Workshops, pages 1130–1134.
[TWJ+ ] F. P. Tso, D. R. White, S. Jouet, J. Singer, and D. P. Pezaros. The Glasgow Raspberry Pi Cloud:
        A Scale Model for Cloud Computing Infrastructures. In 2013 IEEE 33rd International Conference on
        Distributed Computing Systems Workshops, pages 108–112.
[YZQ]     Dingqi Yang, Daqing Zhang, and Bingqing Qu. Participatory Cultural Mapping Based on Collective
          Behavior Data in Location-Based Social Networks. 7(3):30:1–30:23.




                                                       10