Lightweight IO Virtualization On MPU Enabled
                              Microcontrollers

                  Francesco Paci                     Davide Brunelli                       Luca Benini
                 University of Bologna               University of Trento              University of Bologna
                    Bologna, Italy                      Trento, Italy                     Bologna, Italy
                   f.paci@unibo.it              davide.brunelli@unitn.it                       ETHZ
                                                                                        Zürich, Switzerland
                                                                                     luca.benini@unibo.it
                                                                                  luca.benini@iis.ee.ethz.ch

ABSTRACT                                                         more services on the same hardware. The possibility to have
In the era of the Internet of Things (IoT), millions of de-      multiple “application tasks” running on the same hardware,
vices and embedded platforms based on low-cost and lim-          possibly coming from different developers, introduces the
ited resources microcontroller units (MCUs) will be used in      challenge of protecting the resources from misuses and to
continuous operation. Even if over-the-air firmware update       guarantee adequate computing bandwidth to all the tasks
is today a common feature, many applications might require       or to prevent over-allocation of resources that would lead to
not to reboot or to support hardware resource sharing. In        collective starvation.
such a context stop, update and reboot the platform is un-          In such a scenario, well-known virtualization technologies
practical and dynamic loading of new user code is required.      already used in computing servers, gateways and other high-
This in turn requires mechanisms to protect the MCU hard-        end computing systems become fundamental also in low-end
ware resources and the continuously executing system tasks       and ultra-low cost programmable end-nodes for IoT. First,
from uncontrolled perturbation caused by new user code be-       the virtualization of the hardware resources becomes nec-
ing dynamically loaded. In this paper, we present a frame-       essary to execute securely multi-function software and dif-
work which provides a lightweight virtualization of the IO       ferent applications with well-controlled interference. Then,
and platform peripherals and permits the dynamic loading         the capability to remotely download new parts of code, to
of new user code. The aim of this work is to support critical    link dynamically the binary and to execute runtime within
isolation features typical of virtualization-ready CPUs on       the main application, avoids on-site maintenance or periodic
low-cost low-power microcontrollers with no MMU (Mem-            down-time and reboot.
ory Management Unit), IOMMU or dedicated instruction                These two requirements highlight the importance of IO
extensions. Our approach only leverages the Memory Pro-          virtualization and dynamic linking on low-cost, low-power
tection Unit (MPU), which is generally available in all ARM      microcontrollers. However, if this technology is well known
Cortex-M3 and Cortex-M4 microcontrollers. Experimental           and available in operating systems for high-end embedded
evaluations demonstrate not only the feasibility, but also a     systems (e.g. Linux on ARM Cortex-A microprocessors),
satisfactory level of performance of the proposed framework      providing mechanisms for dynamic linking in low-resource
in terms of memory requirements and overhead.                    microcontroller based embedded platforms, such as ARM
                                                                 Cortex-M class, is still a challenge, and only few and limited
                                                                 solutions have been proposed so far.
Keywords                                                            The dynamic linking proposed in this work executes on
Virtualization, MPU, Microcontrollers, Dynamic Linking           the FreeRTOS [3] operating system and it is based on the
                                                                 framework presented in [4] which addressed the capability to
                                                                 download new functions remotely. The main contributions
1.    INTRODUCTION                                               of this paper are:
  Many IoT applications envision the deployment of large             • a Lightweight Virtualization layer which separates the
numbers of microcontroller-based smart sensor nodes in hard-            user space from the kernel space, therefore now all
to-reach locations [1, 2]. This not only means that they                the physical peripherals are virtualized. Such a vir-
are supposed to operate unattended, without direct main-                tualization is a protection towards system tamper and
tenance, and likely with the same battery for many years;               ready to be extended to protect possible conflicts on
but also that the software could be updated (if necessary)              the hardware assignments;
only remotely; and in many scenarios it is expected that             • our solution is integrated with FreeRTOS and exploits
bug fixes, functional improvements, reconfiguration will be             standard communication API provided by the operat-
necessary over the time. Clearly the old fashion style for              ing system. Thus, it can be easily ported also on other
reprogramming embedded systems based on stopping the                    microcontrollers.
device, updating the firmware and restart, become unfeasi-           • we provide the capability to have the dynamic linking
ble when millions of low cost devices are spread all over and           of new user code, managing its life cycle as well as
are expected to be updated with new functionality many                  its orderly shutdown in case of attempted violations of
times over their life span.                                             protected memory regions;
  In addition, IoT devices are expected to provide more and         The paper is organized as follows. Section 2 gives an
                                                                 overview of works related to our contribution, Section 3 de-
EWiLi’16, October 6th, 2016, Pittsburgh, USA.                    scribes in depth the framework architecture and provides all
Copyright retained by the authors.                               technical details of this solution, Section 4 details our per-
formance and memory footprint, while Section 5 concludes          process and do not achieve the performance of native code
the paper.                                                        execution. Furthermore, they are difficult to use in contexts
                                                                  where real-time constraints cannot tolerate the jitter intro-
                                                                  duced by on-line compilation.
2.   RELATED WORKS                                                   Native Implementations
   Virtualization support for embedded systems based on           Native virtualization is the closest to hardware and extremely
high-end CPUs, such as the ARM Cortex-A series, has been          desirable for resource and performance-limited devices. This
extensively explored in the academic literature and has reached   technique usually relies on the use of MPU that is the only
industrial maturity [5]. This class of devices exploits the       hardware unit available for security in low-end systems.
hardware acceleration extensions to provide hardware ab-          Bhatti et al. [3] presented a complete operating system de-
straction and protection to critical resources. Recent Cortex-    signed for WSN (Wireless Sensor Network) and optimized
A CPUs feature native virtualization support like MMU and         to simultaneous execution of threads which can be loaded
IOMMU address translation, interrupt virtualization, Trust-       dynamically. Their work relies on Mantis OS, a custom op-
Zones [6, 7], etc. Cortex-M MCUs do not come with any of          erating system. They target Atmel and their solution is
those hardware extensions. Furthermore, available memory          highly customized, thus is not general, while our work relies
and computational resources are much more limited. Our            on FreeRTOS therefore it is highly extensible and portable
work and the related works surveyed below deal with Cortex-       to other platforms. Moreover they do not explicitly address
M3 and Cortex-M4 class of devices, where virtualization is        security and protection.
not a mature technology and several compromises with re-             To the best of our knowledge we find only one very recent
spect to full hardware-supported virtualization have to be        work that addresses the problem in a broad and general
made.                                                             sense, similarly to our solution. Andersen et al. [16] pre-
   Abstract Virtual Machines and Interpreters                     sented an embedded platform that relies on TinyOS. They
   One of the most common approaches for virtualization           use a mixed paradigm that permits to have Lua VM but the
on MCUs is based on interpreter-based virtual machines,           computational intensive part of code can be written in native
which have been originally conceived with the main purpose        C. To address security they use a task receiving event based
of creating high-level easy-to-use languages and run-times at     system calls, to separate kernel to user space tasks. Our
a higher abstraction level than the traditional C language.       work differentiates from the latter by permitting to have
Python [8, 9], Java [10, 11], Javascript [12], Lua [13] are all   both system call support and Event based peripheral vir-
lightweight multi-paradigm scripting languages employed in        tualization. Moreover Andersen et al. do not provide any
Virtual Machines for embedded systems. Their main bene-           information on the performance of the event based system
fit is the cross-platform support. They are interpreted by a      call paradigm.
native virtual machine loaded on the microcontroller, thus
they introduce high overhead in term of latency of access to
the resources in comparison to virtualization layers written      3.    SOFTWARE ARCHITECTURE
in native code, but they are designed for easy software ap-          In this section we present all the software layers in our
plication develoment and to meet the increasing demand of         runtime system, focusing on software protection. Figure 1
fast run time customization, without the need of complex or       shows the layer stacking from three viewpoints, first from
dedicated compiling toolchains. Such a kind of virtualiza-        a hardware point of view, then from address space access,
tion, usually, is focused on improving portability, extensibil-   divided in IO and Flash/RAM. We divided core hardware
ity, ease-of-use in development and protection but lacks of       from peripherals in two different stacks to underline that the
performance, multiple user level accesses and low-level hard-     OS can expose system calls to access to the core hardware
ware control. Only the exposed high level resources can be        resources, while the Virtual IO Layer is designed to access
leveraged by the user.                                            to the peripherals. The last stack shows that the access to
   Bogliolo et al. [14] presented Virtual Sense, a sensor node    memories is direct for privileged tasks, while the access from
which executes java-compatible virtual machine called Dar-        user mode tasks is strictly regulated by MPU. Two different
jeeling VM [11] on top of Contiki OS [15]. This work is           kinds of tasks are defined: privileged tasks and user mode
close to ours in the emphasis on supporting resource allo-        tasks, which will be discussed in next section.
cation and protection for multiple independent user tasks            Another important layer depicted in Figure 1 is FreeR-
on the MCU. However this solution, besides the overhead           TOS [17], a well known Real Time Operating System for a
introduced by the interpreter, is oriented to share only net-     broad range of Embedded Systems from 8 to 32bit, includ-
work stack between Darjeeling VM tasks, while our work is         ing low power and ultra-low power MCUs. We implemented
general to all peripherals.                                       our framework on an STM32F4 based platform, and even
   Just In Time/Ahead of Time Compilation                         if some details in the following description are related to
A well-explored approach to reduce the run-time overhead          this specific microcontroller, our framework can be easily
of VM interpeters is Just in Time or Ahead of Time Compi-         extended to be platform independent.
lation. Micropython [8] developers, for example, introduced          In Sections 3.1 and 3.2 we focus on the first and third
in their platform the concept of decorator to emit ARM            stack, namely on exploiting the MPU and providing Safety
native opcode and to use native C types, but not all na-          Extensions, while in Section 3.4 we discuss the second stack.
tive C types are supported and the implementation of this
optimization is platform dependent. A solution can be to          3.1   Real Time OS
extend with C wrapped functions called from python, but             The main reason for using FreeRTOS is its versatility:
there are drawbacks: marshaling and unmarshaling of data          many MCUs are supported and the code is maintained and
is very expensive in terms of computational resources and         upgraded often by Real Time Engineers Ltd. Moreover it is
with this solution the programmer loses the low level ab-         modular and there are some extensions available (e.g. MPU
straction. In comparison, using our solution, the developer       extension), which can be added to the core release. The
implements C functions which will be executed in user level       open source nature makes possible to extend it. It has more-
tasks. In general these approaches require a higher memory        over a small memory footprint and sources consist of a small
footprint to host the just-in-time or ahead-of-time compile       number of files. The scheduler supports real-time operation,
                                      Figure 1: Hardware, IO and Memories layers.


both time-triggered by a configurable system tick and with         follows:
support for priorities with preemption.
                                                                   Region 0 FLASH protection
3.2     FreeRTOS Additions                                                 Protects whole FLASH providing read-only privi-
                                                                           leges to both privileged and user mode tasks.
  To strengthen the security of the system, the FreeRTOS
MPU module has been integrated to enable the usage of              Region 1 OS FLASH protection
the Memory Protection Unit implemented on the microcon-                    Protects from accesses by user mode tasks to the
troller and to activate the two levels of privileges for the               OS code in FLASH
tasks execution. However, the original module is an ex-
perimental release, because of some limitations that we ad-        Region 2 OS RAM access
dressed in our work:                                                       Provides permission to privileged task to access the
   1. It does not have a proper way to access system re-                   OS structures stored in RAM
      sources. It provides only one system call. This system       Region 3 Peripheral access
      call raises the privileges of the caller from user mode to           Used to enable or disable the access to peripherals.
      privileged, executes the call and then sets the privileges
      back to user space. This behavior has sufficient protec-     Region 4 Task Stack access
      tion in an environment where a single developer wants                Used to give access to tasks own stack.
      to keep separation between tasks, i.e. the case where        Region 5-7 Not used
      a single company develops all the firmware. While in                 These three regions are not used by FreeRTOS MPU
      the case we want to give to a third-party user the ca-               module, thus they are open to developer purposes.
      pability to develop his own code, the knowledge of the
      existence of this backdoor is really dangerous for pro-        In Table 1, we show a list of MPU configurations used in
      tection.                                                     our solution. As the reader can notice, there is no access to
   2. The exploitation of the MPU is static. The protection        peripherals granted to user mode tasks. This access can be
      sections of the MPU are not reconfigurable at run-time       only allowed through the IO Virtualization Architecture.
      by privileged tasks.                                           One of the main constraints of the FreeRTOS MPU mod-
   3. The task termination is not correctly handled. When a        ule is that it permits to configure the last regions (from 5
      user mode task raises an MPU trap the exception ends         to 7) at compile time only. Thus, we implemented a specific
      the system execution. Hence it would be extremely            software module to reconfigure these regions at run-time for
      easy to create denial of service attacks.                    each task. This is done for the following reasons:
  In next sub-sections we describe our proposed solutions to          1. Access to Virtual IO Layer (deeply explained in Sub-
these limitations.                                                       section 3.4) can be restricted by an MPU Region and
                                                                         must be asked by a task. This makes the Virtual IO
3.2.1    MPU Extension                                                   Layer aware about the number of tasks that are using
   As already stated, this module permits to grant different             it.
access privileges on a task-by-task basis. For each task the          2. Moreover access to heap or other memory regions can
MPU settings are stored in the task descriptor, called Task              be granted at run-time. This is open to several future
Control Block (TCB) in FreeRTOS. When a task is created,                 applications.
it can be started with one out of two levels of privileges:
   1. Privileged Tasks (similar to Linux Kernel Mode exe-          3.2.2      Safety Extensions
      cution). The task executes with permission granted to          As previously stated, the single system call paradigm is
      access all system resources, memories and peripherals.       not safe. The raise privilege system call has been removed
   2. User mode Tasks (similar to Linux Kernel User Mode,          and replaced by more specific system calls for required cases.
      also called unprivileged tasks). The task is executed        For example to grant access to FreeRTOS Queues and Di-
      in more restrictive environment and has access only to       rect Task Notification, the following list of system calls are
      a limited subset of memory and IO addresses.                 added:
   STM32 Cortex-M4 has eight configurable MPU regions.                • MPU xTaskGenericNotify: Direct task notification No-
When activated the protection policy is white-list based: to             tify function
access to a specific position in the address space, the task          • MPU QueueReceive: Receive a message on a queue
should have a grant on one MPU region. The privileges on              • MPU xGetCurrentTaskHandle: Get the current task
an MPU region can be: NONE, READONLY AND READ-                           handle
WRITE. In FreeRTOS these MPU regions are configured as                • IO Layer REGISTER: Registration to Virtual IO Layer
Table 1: Default MPU region setting in FreeRTOS
 Privileged Perm. User Mode Perm. Region Desc.
 READ ONLY        READ ONLY       all Flash Protection
 READ ONLY        NONE            OS Code Segment
                                  in FLASH
 READ WRITE       NONE            OS RAM Protection
 READ WRITE       NONE            Peripherals
 READ WRITE       READ WRITE      Task Stack
 NOT USED         NOT USED        User configurable
 NOT USED         NOT USED        User configurable
 NOT USED         NOT USED        User configurable

3.2.3    Graceful Task Termination - Killer Task
   FreeRTOS does not provide task termination. Thus, when
an unprivileged task tries to access a memory address with-
out permission a trap is generated from the MPU and the
OS ends its execution in an endless loop. This is not accept-
able if we want to keep all other tasks and OS in execution.
The desired behavior is that the task causing the trap, is
aborted while the system continues its execution. Thus a
memory trap handler and a specific task, called Killer Task,
have been created to manage the termination of the task
that raised the trap. The Killer Task is a privileged task
created at boot time and it is in sleep state, when the MCU
is in normal usage. When a trap occurs the task is activated.
The Killer Task gets the task handles of the task that gen-      Figure 2: IO Virtualization High Level Architecture
erated the trap and removes it from the scheduler execution
queue. Then it resumes the scheduler execution and goes             The library consists of two subsets: a front-end functions
back into sleep, waiting for the next trap.                      subset and the relative back-end functions subset.
                                                                    When a user mode task wants to access peripherals, it
3.3     Software Protection                                      needs to subscribe to the Virtual IO Layer, using one spe-
                                                                 cial front-end function. Registration is required for two pur-
   In a software protection perspective, the MPU enables
                                                                 poses:
the OS to keep the control on the user mode tasks. Thus,
with the MPU all user mode tasks cannot tamper the whole             1. The user mode task must have read only access to
system. On the other hand, if we want to enable a third                 the Virtual IO task handle. This is needed to use the
party software developer to access only a small subset of               OS event notifications to notify the Virtual IO task.
peripherals, a fine grain control on address space must be              Therefore, one of the MPU regions of the task must
implemented. Usually in a MCU all peripherals addresses                 be run-time configured to read-only access to Virtual
are grouped from a starting to an ending address. However,              IO task handler.
if we want to provide fine grain access to a subset of them,         2. User mode tasks are not authorized to use interrupt
three free MPU regions are really limiting. Moreover there              handlers, because interrupt handler code is executed
are other two limitations: one is that the minimum area for             in privileged mode. We used a queue system to com-
an MPU regions is usually 32 Bytes (i.e. on STM32f4) that               municate from interrupt handlers to user mode tasks.
is usually larger than the register pool of a peripheral. The           Hence the registration routine creates a new queue and
other is that register set of several peripherals consists of           saves the queue handler in a structure. This will be
both control registers, and reading/writing ports, at subse-            used afterwards if the task will request access to one
quent memory positions. Thus it is not possible to grant the            peripheral in interrupt mode.
access to a read-only register and denying the permission to        The registration is done through a system call that was
a contiguous configuration register.                             previously mentioned in subsection 3.2.2, hidden by a front-
                                                                 end call. The system call is needed to configure an MPU
3.4     IO Virtualization Architecture                           region described in the former purpose. The registration
  The Virtual IO Layer architecture consists of two main         procedure works as follows: (1) The user mode task in-
parts: (1) a task called Virtual IO Task that invokes the        vokes the IO Layer init() routine, which through (2) the
callbacks to access to IO and to peripherals through the         IO Layer REGISTER system call (3) sets an MPU region
hardware abstraction layer (HAL); (2) a library named Vir-       of the caller task to access to Virtual IO Task descriptor
tual IO Library that contains the front-end calls forwarded      in read-only mode. This is needed to send Notifications.
transparently to the Virtual IO Task and the back-end calls      Then the framework create and initializes a System Queue
invoked by the Virtual IO Task to access the HAL Library.        (4) for using the DMA (the procedure is described in Back
The Virtual IO Task is a FreeRTOS task that handles all          End Subset subsection). Before returning, if the procedure
the IO calls from the user mode tasks to the peripherals. As     was successful, the task is added to the list of Virtual IO
shown in Figure 2: this task acts as a task-in-the-middle that   subscribed tasks.
receives all calls from user mode tasks that attempt to ac-         Front End Subset
cess to the peripherals, checks the permissions and forwards        The Front End subset is intended to be called from the
the requests through the HAL library.                            user mode tasks. These calls have the same signature of the
                                                                 original HAL library calls, beside the function name, which
3.4.1    Virtual IO Library                                      is extended with a prefix to make the programmer aware that
is using the Virtual IO Layer and, obviously, to avoid a name      voked by the caller. The pointer to HAE Structure is cast
space conflict. Thus for each HAL library function that we         to a generic structure common for all HAE Structures (we
want to expose to the third party developer a function must        always know that the first two fields are fixed: the user-
be written. Each function declares a structure that contains:      mode task task handler and the pointer to the call-back
   1. The user-mode task task handler.                             function), then the ACL permission check occurs. if the
   2. A pointer to the relative back-end function to be called     checking passed, the back-end function is invoked.
       by the Virtual IO Task
   3. A pointer for each original HAL Library function ar-         3.5   Dynamic Linking
       gument.                                                        The dynamic linking permits a task to be added to the
   4. If the original HAL function returns a non-void value,       run time tasks without rebooting the system. We imple-
       a field to store it.                                        mented dynamic linking to demonstrate the usage of the
   We refer to this structure with the name HAL Library Ar-        whole system. Therefore, we implemented a privileged task
gument Embedding Structure (HAE Structure). Then HAE               in charge of dynamic linking other user mode tasks. Tasks
structure is instantiated in the function, on the stack, and       are cross-compiled and unresolved dependencies to system
all structure’s fields are assigned with their values. A notifi-   library calls are run-time linked and the task is added to
cation is sent to the Virtual IO Layer Task with a pointer to      scheduler execution queue. The library in charge of dynamic
this structure. At the end optionally the HAL Library re-          linking usermode tasks is derived from the work of [4]. In
turn value is returned if the function is non-void. A recap of     Flash memory we reserved a section to store these new tasks
the embedding of this function is shown in right top corner        binaries to be linked and then added to FreeRTOS scheduler
of Figure 2.                                                       ready task list.
   Back End Subset
   The back end (or call back functions) is the part of the        4.    EXPERIMENTAL RESULTS
library meant to be called by the Virtual IO Task. For each           In this section we present results in term of performance
front-end function, there is one corresponding back-end one        and memory footprint. All tests were conducted on an
that takes in input a single argument, a void pointer. Its         ST M 32F 411RE NUCLEO-64 Board [18]. This is a plat-
body contains a declaration of the HAE structure written           form by ST Microelectronics, it embeds an ARM R 32-bit
for the corresponding front-end function. The void pointer         Cortex R -M4 CPU running up to 100 MHz with FPU and
is then cast in this structure, arguments are then used to         MPU. It features 512 KB of Flash memory and 128 KB of
call the original HAL function. When the HAL Library call          RAM memory. In our software setup we use the new driver
ends up, the return argument is written in the structure,          for accessing hardware peripherals provided by ST called
that still resides in the user-mode stack then control return      Hardware Abstraction Layer Driver (HAL Driver) [19].
to the Virtual IO Task. Then the Virtual IO Task suspends             We identified two main use cases, i.e. ways to access pe-
its execution waiting for the next call.                           ripherals in a Microcontroller unit, that must be considered
   This architecture has two advantages: (1) the ease of use,      separately:
the programmer does not need to learn a new interface to               1. Atomic Action:
use the HAL. (2) All front-end calls and back-end calls have              This is the case in which we call a HAL Driver routine
the same format, so they can be written by a programmer                   each time we access a peripheral. In other words, we
or generated by an automatic tool.                                        just want to access once an IO address or we may ac-
   To Handle DMA asynchronous calls and to get notified                   cess it in a loop, but call does not involve peripheral
when a DMA transfer is completed, we use the Queue re-                    transfer after it. An example of this behavior is when
turned when the user mode task subscribes the Virtual IO                  we want to configure or read a GPIO PIN, or write
Layer. For security it is important that all the interrupt ser-           something on the UART.
vice routines (ISR) are implemented by the system. More-               2. Continuous Action (or Tunneling Action):
over inside each service routine there is a Queue Send opera-             In this second case we consider all the peripheral us-
tion used to notify the task that wants to use the DMA that               ages that involve the use of DMA. For example when
the routine is called. To correctly notify the corresponding              we want to set Analog to Digital converter and read it
user mode queue a reference table is used. This reference ta-             at regular intervals by the DMA.
ble is set by the back-end, when the user mode task invokes
one of the DMA HAL Library functions.                              4.1   Virtual IO Layer Timing
                                                                     The time of accessing a peripheral using the Virtual IO
3.4.2    Virtual IO Task                                           Layer is reported in Table 2. The first row gives the cycles
   The Virtual IO Task is a privileged task that handles           to get the task handle through a system call. The MPU -
the communication from user mode tasks to peripherals. It          xTaskGenericNotify() is the direct task notification system
starts when the Virtual IO layer is initialized, typically at      call. The third row reports the cycles required to notify the
system boot time. The communication is handled via Di-             Virtual IO Task. The last row gives the number of cycles to
rect Task Notification. When started this task hangs in            return control, after the HAL Driver call back to the User
suspended state waiting for a call from one of the user mode       mode task. The cycles measurement has been done with
registered tasks through the front-end.                            the DWT CYCCNT hardware cycle count register of the
   The priority of this task is higher than all user mode tasks.   Cortex-M4 MCU.
Thus, when the notification is thrown from the front-end,            It is worth mentioning that with this paradigm, continu-
the user mode task waits that the Virtual IO task ends             ous mode operations pay the overhead just once, when the
its execution. Therefore even if task notifications are asyn-      setup of the peripheral or IO is performed. Thus when the
chronous, the call to HAL Library is blocking because in           DMA is working the only overhead is the queue used to syn-
FreeRTOS the preemption of the scheduler is priority based.        chronize the ISR with the user mode task.
   The body of this task, besides the Task Notify Wait, con-         The cycles overhead to check if the function that the user
sists of an Access Control List (ACL), shown in Figure 2,          mode task wants to use is permitted by the ACL grows lin-
that checks that the callee HAL Library function can be in-        early with the number of checks that occurs. In Table are
         Virtualization Step           VIO (Cycles)               the typical application scenarios. Future works will extend
         getTaskHandle                     97                     dynamic linking toward multiple upload channels and will
         MPU xTaskGenericNotify            47                     implement different permission policies to peripherals from
         xTaskNotify + CS                 490                     different user mode tasks.
         Notify wait + CS back            293
         TOTAL                            926
                                                                  6.   ACKNOWLEDGMENTS
Table 2: Timing overhead of accessing the IO using                  This work was partially supported by EU Project Eu-
the Virtual IO Layer in Cycles                                    roCPS H2020-ICT-2014 under Grant 644090 and in collab-
                                                                  oration with STMicroelectronics.

                                                                  7.   REFERENCES
                                                                   [1] Lu Tan et al. . Future internet: The internet of things.
                                                                       In 2010 3rd International Conference on Advanced
                                                                       Computer Theory and Engineering(ICACTE),
                                                                       volume 5, pages V5–376–V5–380, Aug 2010.
                                                                   [2] Ala Al-Fuqaha et al. . Internet of things: A survey on
                                                                       enabling technologies, protocols, and applications.
                                                                       IEEE Communications Surveys Tutorials,
                                                                       17(4):2347–2376, Fourthquarter 2015.
                                                                   [3] Shah Bhatti et al. . Mantis os: An embedded
                                                                       multithreaded operating system for wireless micro
                                                                       sensor platforms. Mob. Netw. Appl., 10(4):563–579,
     Figure 3: Overhead of the control in the ACL.                     August 2005.
                                                                   [4] Simon Holmbacka et al. Lightweight framework for
we report the the overhead As expected the number of cy-               runtime updating of c-based software in embedded
cles are proportional to the number of function addresses to           systems. In Presented as part of the 5th Workshop on
verify.                                                                Hot Topics in Software Upgrades, Berkeley, CA, 2013.
                                                                       USENIX.
4.2     Virtual IO Layer Memory Footprint                          [5] ARM Virtualization Extension.
   The overhead in terms of memory footprint is described              https://www.arm.com/.
in Table 3. We show the code size of the library and of the        [6] ARM Security Technology - Building a Secure System
Virtual IO Task separately, in case the compiler is invoked            using TrustZone Technology. Whitepaper, April 2009.
with the flag for performance (-O3) or space (-OS) optimiza-       [7] T. Alves and D. Felton. Trustzone: Integrated
tion. The Size of the Virtual IO Library is measured with              hardware and software security-enabling trusted
an average size of 50 functions (front end + back end). As             computing in embedded systems. White paper, arm,
we can notice from the results, the memory footprint is min-           july 2004.
imal, even if it scales with the number of driver functions        [8] Micropython website. http://micropython.org/.
that we want to provide to the user mode tasks.                    [9] PyMite. https://wiki.python.org/moin/PyMite.
        Optimization     VIO Task       VIO Library               [10] Oracle Java ME Embedded. http://www.oracle.com/.
        -O3                592 B          2876 B                  [11] Niels Brouwers et al. . Darjeeling, a feature-rich vm
        -OS                464 B          2314 B                       for the resource poor. In Proceedings of the 7th ACM
                                                                       Conference on Embedded Networked Sensor Systems,
        Table 3: Virtualization Layer code size                        SenSys ’09, pages 169–182, New York, NY, USA,
  As a concluding note, it is important to stress the fact that        2009. ACM.
the runtime of tasks when not interacting with the IOs is         [12] Espruino Javascript Interpreter.
exactly the same as native FreeRTOS tasks, with no perfor-             http://www.espruino.com/.
mance overhead for memory protection as the MPU is com-           [13] Embedded power driven by Lua.
pletely transparent from the performance viewpoint. This is            http://www.eluaproject.net/.
very similar to what happens in virtual machine execution         [14] Alessandro Bogliolo et al. . Virtualsense: A java-based
for high-end cores, and in sharp contrast with interpreted             open platform for ultra-low-power wireless sensor
virtual machines or even JIT-based systems.                            nodes. International Journal of Distributed Sensor
                                                                       Networks, 2012, 2012.
5.    CONCLUSIONS                                                 [15] Contiki: The Open Source OS for the Internet of
                                                                       Things. http://www.contiki-os.org/.
  In this paper we have presented a virtualization layer for
                                                                  [16] Michael P. Andersen et al. . System design for a
low-cost microcontrollers which creates a separation between
                                                                       synergistic, low power mote/ble embedded platform.
kernel mode and user mode and protects the hardware re-
                                                                       In Proceedings of the 15th International Conference on
sources from misuses when concurrent tasks or function are
                                                                       Information Processing in Sensor Networks, IPSN ’16,
written by different developers. Moreover we demonstrated
                                                                       pages 17:1–17:12, Piscataway, NJ, USA, 2016. IEEE
the effectiveness of a mechanism capable to execute new run-
                                                                       Press.
time code, without the need of system reboot. We have
focused on small size of the framework and on lower over-         [17] FreeRTOS website. http://www.freertos.org/.
head, because targeted for low-cost and limited computing         [18] ST Microelectronics Nucleo Boards.
capabilities microcontrollers such as the ones designed for            http://www.st.com/.
IoT and WSN. Experimental results demonstrate that the            [19] ST Microelectronics Hardware Abstraction Layer
overhead is limited and time delay is negligible considering           Driver. http://www.st.com/.