<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Static Analysis for GDPR Compliance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pietro Ferrara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fausto Spoto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Julia SRL</institution>
          ,
          <addr-line>Verona</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universita di Verona</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Information systems might access, manage and record sensitive data about citizens. In addition, the pervasiveness of these systems is dramatically increasing and increasing thanks to the mobile and the IoT revolutions. However, several unintended data breaches are reported every week, and this might compromise the privacy, safety, and security of citizens. For all these reasons, the European Parliament approved in April 2016 the EU General Data Protection Regulation (GDPR). The main goal of such regulation is to protect the privacy of citizens with regard to the processing of their personal data. It enforces a Privacy by Design and by Default approach, where personal data is processed only when needed by the functionalities of the information system. On the other hand, static analysis aims at proving at compile time various properties on information systems. This discipline has been widely applied during the last decades to identify potential software leaks of sensitive data. In this scenario, this paper discusses what role static analysis could play in a GDPR perspective. In particular, we introduce GDPR and static analysis, and we then propose how existing taint analyses and backward slicing algorithms might be combined to produce reports useful for GDPR compliance. We identify four main actors in the GDPR compliance process (namely, data protection o cers, chief information security o cers, project managers, and developers), and we propose a speci c level of reporting for each of them.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        On April, 27th 2016 the European Parliament adopted the \General Data Protection
Regulation" (GDRP). This regulation will become enforceable on May, 28th 2018, and its goal is
to lay down \rules relating to the protection of natural persons with regard to the processing
of personal data and rules relating to the free movement of personal data" [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. During the last
decades, in Europe several regulations were approved and enforced by various countries. The
EU GDPR is the rst e ort to unify these di erent legislations. The pervasiveness of
information systems that might collect sensitive data of citizens was de nitely one of the crucial
factors that pushed the European institutions to promote such regulation. Nowadays, almost
all citizens have mobile devices that might be exploited to track them, and they use various
IT services that contain a wide range of sensitive data (about their salary, their health, etc..).
In addition, the IoT revolution is coming: \Gartner, Inc. forecasts that 8.4 billion connected
things will be in use worldwide in 2017, up 31 percent from 2016, and will reach 20.4 billion
by 2020"[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. All these devices might record sensitive data about di erent environments and
people. In addition, many new breaches of sensitive data are reported every week.
      </p>
      <p>
        Roughly speaking, GDPR allows an information system to access, and manage sensitive
data that is needed to perform the functionalities of the system following the Privacy by Design
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] principles. Sensitive data might be \any information relating to an individual, whether it
relates to his or her private, professional or public life. It can be anything from a name, a home
address, a photo, an email address, bank details, posts on social networking websites, medical
information, or a computers IP address". While it is still under discussion how such regulation
1 public void onStart() f
2 send(" start ", "now");
3 TelephonyManager telephonyManager = ...;
4 String imei = telephonyManager.getDeviceId();
5 send("id", imei );
6 g
7
8 public void onLocationChanged(Location location) f
9 send(" latitude ", location . getLatitude ());
10 send("longitude", location .getLongitude ());
11 g
12
13
14
15 g
private void send(String key, String value) f
new URL("http://&lt;site&gt;?"+key+"="+value).openConnection().connect();
will be checked and enforced, tools that help to check how sensitive data is processed by a
software system could become a main asset for GDPR compliance.
      </p>
      <p>During the last decades, static program analysis has been widely applied to various
properties. From a GDPR perspective, privacy analyses aimed at detecting possible leakages of
sensitive data are particularly promising, since they allow to detect potential leaks at compile
time before the system is deployed. Therefore, they might help to prevent unintended data
breaches because of software vulnerabilities and leaks.</p>
      <sec id="sec-1-1">
        <title>Contribution</title>
        <p>In this scenario, the main contribution of this paper is to discuss how static analysis might
be applied for GDPR compliance. In particular, we discuss (i) how such tools can be applied
as a privacy enhancing technology (PET) in the Privacy by Design (PbD) approach, (ii) what
types of di erent static analyses for privacy exist, and (iii) what information might be tracked
by these tools and how this might be reported to the various actors (namely, data protection
o cers, chief information security o cers, project managers, and developers) in the GDPR
compliance.</p>
        <p>The paper is structured as follows. The rest of this Section will introduce a minimal running
example. Sec. 2 presents privacy enhancing technologies, privacy by design and the GDPR.
Sec. 3 discusses static analysis avors and its use for privacy analysis. Sec. 4 advocates and
formalizes its role for GDPR compliance. Sec. 5 concludes.
1.1</p>
        <sec id="sec-1-1-1">
          <title>Running Example</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>The EU General Data Protection Regulation (GDPR)</title>
      <p>This section provides background about privacy regulations and technologies and the GDPR.
2.1</p>
      <sec id="sec-2-1">
        <title>Privacy Enhancing Technologies (PET)</title>
        <p>
          The concept of Privacy Enhancing Technologies (PET) has been around for decades [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
ENISA [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] de nes it as \a manner of accomplishing a task especially using technical processes,
methods, or knowledge, to support privacy or data protection features, where the latter require
safeguards concerning speci c types of data since data processing may severely threaten
informational privacy" [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The identi cation of such types of data is a complicated problem by itself,
but orthogonal wrt. the scope of this article, that assumes it is a given fact. Historically, the
main principles of PETs have been data minimization and identity protection by anonymization.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Privacy by Design (PbD)</title>
        <p>
          While PETs intervene at the end of the chain of data manipulation (e.g., anonymization before
communicating or storing the data), \the Privacy by Design approach is characterized by
proactive rather than reactive measures. It anticipates and prevents privacy invasive events before
they happen. PbD does not wait for privacy risks to materialize, nor does it o er remedies for
resolving privacy infractions once they have occurred { it aims to prevent them from occurring.
In short, Privacy by Design comes before-the-fact, not after" [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. PbD embeds privacy into
the entire engineering process, from its early design phase to the operation of the productive
system, that is, it embeds privacy into the design of the system (Principle 3 of [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]). In
addition, Principle 1 of PbD lists, among the privacy practices, \established methods to recognize
poor privacy designs, anticipate poor privacy practices and outcomes, and correct any negative
impacts, well before they occur in proactive, systematic, and innovative ways".
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>The EU General Data Protection Regulation (GDPR)</title>
        <p>
          The GDPR will come into enforcement on May 28th, 2018. It enforces PbD as a legal obligation
for entities that control and/or process privacy data. Its Article 25 (data protection by design
and by default) recites: taking into account the state of the art, the cost of implementation and
the nature, scope, context and purposes of processing as well as the risks of varying likelihood
and severity for rights and freedoms of natural persons posed by the processing, the controller
shall, both at the time of the determination of the means for processing and at the time of the
processing itself, implement appropriate technical and organizational measures, such as
pseudonymization, which are designed to implement data-protection principles, such as
data minimization, in an e ective manner and to integrate the necessary safeguards into the
processing in order to meet the requirements of this Regulation and protect the rights of data
subjects. Article 25 further arguments that the controller shall implement appropriate technical
and organizational measures for ensuring that, by default, only personal data which are
necessary for each speci c purpose of the processing are processed. That obligation
applies to the amount of personal data collected, the extent of their processing, the period of their
storage and their accessibility. In particular, such measures shall ensure that by default personal
data are not made accessible without the individual's intervention to an inde nite number of
natural persons. PETs and PbD will become mainstream, to ful ll the legal obligations imposed
by GDPR (in particular, Principle 1 of PbD). Institutions, such as ENISA, already provide
community standards on the evaluation of PETs [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Given the maturity of static analysis of
privacy properties, we believe that it will play a relevant role in this context.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Static Analysis for Privacy</title>
      <p>
        Certi cations and guidelines often impose, discuss or suggest the use of static analysis [
        <xref ref-type="bibr" rid="ref14 ref15 ref24">14, 15,
24</xref>
        ]. But concrete static analyses can be very di erently evaluated along a few axes. Moreover,
they usually deal with reliability and security of software and not with its privacy issues.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Static Program Analysis</title>
        <p>Wikipedia1 de nes it as the analysis of computer software that is performed without actually
executing programs. This includes for instance manual software audit, but this article restricts
its scope to automatic software static analysis, to focus on a technical discussion.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Alarms and Soundness</title>
        <p>
          A static analyzer issues alarms at program points where an actual bug might occur at runtime,
such as a NullPointerException, or the code might be ine cient, because for instance of an
unused variable, or a security vulnerability might be exploited, such as an SQL injection, or
private data might be leaked, by sending for instance the user identi er to a website, as in
Fig. 1. Alarms might be (i) true alarms (such as signaling an information leak at line 14 in
Fig. 1); (ii) false alarms, that are not actually problems (such as reporting a warning at line
2); (iii) true negatives (no alarms at non-problematic program points, such as not reporting
a warning at line 2); or (iv) false negatives (missed true problems, such as not reporting an
information leak at line 12). Fig. 2 depicts these concepts: the left side contains the problematic
program points; the right side the non-problematic ones; the upper part is what is reported
by the analysis, while the lower part is what is not reported. Ideally, a static analyzer should
feature true alarms and true negatives only. This, however, is not computable in general [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>A warning should embed contextual information about its meaning. For instance, a warning
might explain where a null value was assigned to a variable that gets dereferenced. A privacy
violation warning might report the source program point where sensitive data was disclosed
and the data ow from the source to the sink. This lets the developer determine if the warning
is a real issue: if the source is not deemed relevant or if the ow is sanitized in the middle, the
developer might decide to ignore the warning. Privacy warnings might also report the API call
that discloses the information (such as URLConnection.connect()) or might specify the exact
kind of sensitive data that is disclosed (such as the longitude and the latitude of the user). For
instance, a static analyzer might be able to see that the method calls at lines 5, 9, and 10 in
Fig. 1 lead to a leak of con dential information, while that at line 2 does not. In addition, it
might report that line 5 discloses the IMEI of the telephone, line 9 the location projected on
the latitude, and line 10 the longitude.</p>
        <p>A sound static analyzer considers all possible executions of a program and consequently does
not miss any true alarm while still featuring some false alarm. If it does not issue any alarm
about a property, then the program always satis es that property. That is, sound analyzers
have no false negatives. In Fig. 1, it would report that there is an information leak at line 14
when send is called from lines 5, 9 and 10; it might also report a false information leak alarm
when send is called from line 2, if it does not track context-sensitive information.</p>
        <p>
          Full soundness is still an open problem for real programming languages such as Java and
Android, that feature complex execution behaviors. It is often compromised because of the
application lifecycle (components of mobile apps use complicated event-based execution models,
triggered by external actors); re ection (applications can use data as code and consequently
perform arbitrary dynamic executions); multithreading (arbitrary interleavings are too many for
the analysis); native code (machine language translated from any other programming language
can be linked, that the analyzer does not understand); dynamic class-loading (code is loaded
at runtime, unknown at analysis time). For instance, an analyzer unsound wrt. the Android
application lifecycle could miss that the operating system calls onLocationChanged in Fig. 1;
the analyzer could consequently miss the leak of the user location. To the best of our knowledge,
current analyzers are all unsound wrt. native code (which is platform-dependent), while some
support some of the rst four features, at di erent degrees [
          <xref ref-type="bibr" rid="ref18 ref2 ref3 ref9">2, 9, 18, 3</xref>
          ].
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Current Privacy Analyses</title>
        <p>
          Privacy leaks detection in software has been widely studied. It reduces to detection of
information ows from a source of con dential information (e.g., getDeviceIdentifier) to a sink (e.g.,
operations on Internet connections). From this perspective, it is similar to detecting injections
and XSS vulnerabilities, but sources and sinks are not xed but rather user-provided. Early
information ow analyses [
          <xref ref-type="bibr" rid="ref19 ref21">19, 21</xref>
          ] considered implicit ows as well (indirect information ows
through program control structures) but have been proved to be too conservative [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Hence
other approaches have been proposed, such as quantitative information ow [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] and taint
analysis [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. Several scienti c works [
          <xref ref-type="bibr" rid="ref12 ref2 ref26">2, 26, 12</xref>
          ] extend these analyses to mobile (Android) software.
Information disclosure through side channels has been studied as well [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] (e.g., running time
of an algorithm processing con dential data). This necessarily non-exhaustive list shows that
relevant scienti c theoretical results exist, whose application is however limited up to now.
        </p>
        <p>These analyses are the most appealing for GDPR, since they automatically identify leaks
and scale (in terms of e ciency and precision) up to the size of industrial applications. However,
they provide limited feedback: since a source-sensitive ow analysis would be too expensive,
they only track and discover that some sensitive data could be leaked, without further detail;
moreover, they only report the program point where data is leaked, not the complete ow.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Static Analysis for GDPR Compliance</title>
      <p>
        A few years ago, ENISA already underlined [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] that it is also possible to rely on formal
(mathematical) methods to prove that, based on appropriate assumptions on the PETs involved, a
given architecture meets the privacy requirements of the system. To comply with accountability
of practice, data controllers must be able to demonstrate that their actual data handling complies
with their obligations. However, the type of control forecast was limited to logs auditing. We
envision instead to apply static analysis as a PET, to check program behaviors wrt. sensitive
information at the di erent stages of the software lifecycle.
      </p>
      <p>One can identify four main actors in this lifecycle, who apply static analysis for GDPR
compliance. The Data Protection O ce (DPO) uses a static analyzer to inspect if the whole
system respects the privacy constraints identi ed during design. He needs a very high level
view of what sensitive data is leaked (such as the user identi er or location) and to whom. At
this level, no technical detail such as precise program points and chain of called methods is
needed. The Chief Information Security O cer (CISO) needs a similar view of the system,
but with detail about which software components leak which data, in order to identify the
project manager who is responsible of leaks unexpected at design time. The Project Manager
(PM) needs precise detail for each component about the program points that generated the
leak (exact sources, sinks and ow of sensitive data inside the program). This lets the Software
Developer (DEV) inspect the code where an alarm is issued. The developer is interested in
the exact ow of sensitive data.</p>
      <p>Higher levels require information that can be abstracted from lower levels. In particular,
the precise ows of sensitive data at DEV level, belonging to the same subcomponent, can
be abstracted (removing the exact program points accessing and leaking sensitive data) and
aggregated into a unique report at PM level. As a further approximation, many program
points can be abstracted into few sources of sensitive data (since the same information can
be accessed at di erent program points and in di erent ways), leading to a higher and more
readable representation at CISO level. Finally, information can be aggregated to represent
privacy leaks of the whole system at DPO level.</p>
      <p>Therefore, we start by considering how the exact ow of sensitive data can be reconstructed
from the results of a taint analysis and then present how this ows can be abstracted into the
information needed at di erent levels.
4.1</p>
      <sec id="sec-4-1">
        <title>Reconstructing Sensitive Data Flow from Taint Analysis Results</title>
        <p>Let us start by considering how one can obtain the information necessary at DEV level, by
using static analysis. As discussed in Sec. 3, taint analysis can be instantiated to report the
program points that could leak sensitive information. To achieve this, taint analysis typically
tracks, for each program statement, the local variables and heap locations that might be tainted,
that is, could contain sensitive information. It is too expensive to track precisely the source of
sensitive data during the analysis, but one can reconstruct full data ows backwards, starting
from the sinks.</p>
        <p>
          Namely, for each statement, one can apply backward slicing [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] that, relying on taint
information, tracks how the tainted data ows inside the program. At the end of this step, we
obtain many ows of sensitive information since the data could ow from di erent sources (and
through di erent paths) to the sink.
        </p>
        <sec id="sec-4-1-1">
          <title>Formalization</title>
          <p>Let St and W be the set of all program statements and warnings, respectively. Each warning
refers exactly to one program statement (formally, getStatement : W ! St). We assume that
each program p 2 P (where P is the set of all programs) is composed by a set of methods
(formally, P = }(Me) where Me is the set of all methods), and each method is represented
as a control ow graph (CFG) whose nodes are statements (Me = CFG (St) where CFG (St)
represents a control ow graph of statements). On this CFG, taint analysis, given a program,
infers a set of warnings, and an entry and exit state for each program statement. Formally,
taint : P ! (}(W) ) where is the set of states of the taint analysis reporting the local
variables that are tainted, and = St ! represents the results (that is, entry and exit
states) obtained by the taint analysis on each statement of the program.</p>
          <p>
            We can then reconstruct the ow of sensitive data by computing a backward slice [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]
(relying also on the information inferred by the taint analysis), starting from the results of the
taint analysis and the warning containing the statement and local variable that are leaked to
the sink (formally, backwardSlice : (W ) ! S where LV is the set of local variables, and S
represents the slices over a program).
          </p>
          <p>Running Example: Consider the running example we introduced in Section 1.1. We assume that
the location (received by the listener onLocationChanged at line 8) and the user IMEI (read
by calling TelephonyManager.getDeviceId at line 4) are considered as sources of sensitive
data, while sending information (through a call to URL.openConnection().connect() at line
14) is a sink. On this program, the taint analysis produces one warning at line 14 reporting
that some sensitive data might be leaked. We then apply backward slicing to this program
point and we obtain three distinct slices: (i) a slice from line 4 to 5 and then 14 (formalized
by s1 = 4 ! 5 ! 14) representing the leakage of the identi er, (ii) s2 = 9 ! 14 (leakage
of latitude), and (iii) s3 = 10 ! 14 (leakage of longitude). Note that the backward slicing
is in position to discard the ow coming from the call to send at line 2 (that is, the slice
2 ! 14) by checking that this method call does not pass tainted (i.e., sensitive) data through
the parameters.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Reporting</title>
        <p>We now discuss four di erent levels of reporting targeting the four main actors we identi ed
(namely, data protection o cers, chief information security o cers, project managers, and
developers). Starting from the information inferred by static analysis through taint analysis and
backward slicing (as formalized in Section 4.1), we present these reports as further abstractions
starting from the one that contains the most detailed information (that is, the report for
developers).</p>
        <p>DEV
The developer needs to know full details about ows of sensitive data, to determine if the ow
was real and which sensitive data is involved. The slices computed by the backward slicing
algorithm contains the information needed at DEV level since, for each ow of sensitive data
to a sink, it reports complete detail of the ow. Hence, one can formalize the DEV report as
follows:</p>
        <p>DEV : P ! }(S)
DEV (p) = Sw2W1 backwardSlice(w; ) where taint (p) = (W1; )
Running Example: The DEV report contains the full details of the three slices introduced in
Section 4.1. In this way, the developer can inspect the complete ow of sensitive.
PM
The project manager is interested in knowing what subcomponents generated the ows of
sensitive data and complete detail of the sources and sinks, at source code level. His goal is to
identify the developers that implemented such piece of code. One can abstract DEV into PM
by (i) aggregating all leaks whose source is in the same component; and (ii) abstracting the
ows into pairs composed by a source and a sink. Assume that a function component : St ! C
speci es which component a statement belongs to, and functions source : S ! }(Source) and
sink : S ! Sink return the possible sources and the sink of a given slice (where Source St and
Sink St), respectively. One can project DEV (that is, a set of slices) into PM as follows:
PM : }(S) ! (C ! }(Source Sink))</p>
        <p>PM (S1) = [c 7! Ss2S1;source2source(s)f(source; sink) : sink = sink (s) ^ component (source)g]
Running Example: Let us assume that the running example in Figure 1 is made by two
components: st concerns the start of the application (method onStart), while loc deals with location
updates (method onLocationChanged). Therefore, the PM report will related (i) component
st to the singleton f(4; 14)g (representing the leakage of user identi er), and (ii) component loc
to the set f(9; 14); (10; 14)g (representing the leakage of latitude and longitude, respectively).</p>
        <sec id="sec-4-2-1">
          <title>CISO</title>
          <p>High level managers such as the Chief Information Security O cer would be interested in
knowing which subcomponent of the system accessed and leaked which sensitive data. Hence,
as a further abstraction, one can aggregate sources and sinks by projecting them into their types.
Assume that functions sourceType : Source ! SourceTypes and sinkType : Sink ! SinkTypes
translate sources and sinks to their type, respectively, are provided. The projection of PM into
CISO is de ned as follow:</p>
          <p>CISO : (C ! }(Source Sink)) ! C ! }(SourceTypes SinkTypes)</p>
          <p>CISO (p) = [c 7! S(source;sink)2p(c)(source(source); sink (sink)) : c 2 dom(p)]
Running Example: Let us assume that longitude and latitude are both abstracted to the same
source type called Location (since they both represent very precise geographical information),
while the user identi er is represented by the source type IMEI. Instead, the sinks that transmit
data over Internet are represented by Internet. Therefore, the CISO report applied to our
running example of Figure 1 will relate component st to f(IMEI; Internet)g, and component
loc to f(Location; Internet)g. This report abstracts away the implementation details about the
method calls accessing and leaking the sensitive data, as well as the ow of such data inside
the program. However, it can be understood by actors in the software lifecycle that are not
involved in the technical implementation details.</p>
          <p>DPO
The data protection o cer needs a high-level and legal view of how the whole software system
deals with sensitive data. Hence, one can project away the information about components from
CISO and produce the DPO information. Formally,</p>
          <p>DPO : (C ! }(SourceTypes
DPO (t) = S(sourceType;sinkType)2dom(t)(sourceType; sinkType)</p>
          <p>SinkTypes)) ! }(SourceTypes</p>
          <p>SinkTypes)
Running Example: This last level of abstraction merges together the two components tracked
by the CISO into the same set, producing f(IMEI; Internet); (Location; Internet)g. Through this
information one might check if the information inferred by the static analysis matches what
was expected by the system design.
4.3</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>Summary</title>
        <p>Let us summarize the main components needed to develop the four GDPR reports DEV, PM,
CISO and DPO). First of all, one needs a set of sources Source and sinks Sink. For injection
analysis these sets are usually xed (for instance, reading servlet inputs are sources and SQL
methods are sinks for SQL-injection), for GDPR we expect that these sets will be speci c to
the software system under analysis, since di erent software rely on APIs providing sensitive
data in various ways. In addition, one needs the abstraction of software sources and sinks (such
as getDeviceId and connect in Fig.~reflst:runningexample) to types of sensitive data (device
identi er) and data leaks (leakage to &lt;site&gt;). These components are formalized by functions
sourceType and sinkType. In any case, the GDPR requires to identify how sensitive data is
accessed (that is, the set of sources together with their abstraction) and leaked (sinks). Finally,
one needs to know how the software system is split into components (function component ), a
common information available on all structured software systems.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>This paper discussed how static analysis can be applied for GDPR compliance. In particular,
it introduced some aspects of the EU General Data Protection Regulation, and existing
applications of static analysis to privacy properties. It then identi ed four main actors (developers,
project managers, chief information security o cers, and data protection o cers) that could be
involved at various stages of GDPR compliance, and formalized how combining existing static
analyses and information can lead to useful reports for GDPR compliance. The nal vision is
that static analyzers can be applied to help GPDR compliance re-using existing static
analyses (and in particular, taint analysis and backward slicing) and information (about source and
sinks, and software components).</p>
      <p>
        These ideas are currently being implemented into the Julia static analyzer [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], which already
contains an industrial implementation of taint analysis [
        <xref ref-type="bibr" rid="ref4 ref8">8, 4</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>EU</given-names>
            <surname>Regulation</surname>
          </string-name>
          <article-title>2016/679 of the European Parliament and of the Council</article-title>
          ,
          <year>April 2016</year>
          . http: //ec.europa.eu/justice/data-protection/reform/files/regulation_oj_en.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Arzt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rasthofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fritz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bodden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bartel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Le Traon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Octeau</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>McDaniel. FlowDroid: Precise Context</surname>
          </string-name>
          , Flow, Field,
          <article-title>Object-sensitive and Lifecycle-aware Taint Analysis for Android Apps</article-title>
          .
          <source>In Proceedings of PLDI '14. ACM</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bodden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sewe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sinschek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Oueslati</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mezini</surname>
          </string-name>
          .
          <article-title>Taming Re ection: Aiding Static Analysis in the Presence of Re ection and Custom Class Loaders</article-title>
          .
          <source>In Proceedings of ICSE '11. ACM</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Burato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Spoto</surname>
          </string-name>
          .
          <article-title>Security Analysis of the OWASP Benchmark with Julia</article-title>
          .
          <source>In Proceedings of ITASEC '17</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cavoukian</surname>
          </string-name>
          .
          <article-title>Privacy by Design - The 7 Foundational Principles</article-title>
          ,
          <year>January 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] ENISA. Privacy and Data Protection by Design</article-title>
          ,
          <year>December 2014</year>
          . https://www.enisa.europa. eu/publications/privacy-and
          <article-title>-data-protection-by-design.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>ENISA</surname>
          </string-name>
          .
          <article-title>Readiness Analysis for the Adoption and Evolution of Privacy Enhancing Technologies</article-title>
          ,
          <year>March 2016</year>
          . https://www.enisa.europa.eu/publications/pets.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Ernst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lovato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Macedonio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Spiridon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Spoto</surname>
          </string-name>
          .
          <article-title>Boolean Formulas for the Static Identi cation of Injection Attacks in Java</article-title>
          .
          <source>In Proceedings of LPAR '15, Lecture Notes in Computer Science</source>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ferrara</surname>
          </string-name>
          .
          <article-title>Static analysis via abstract interpretation of multithreaded programs</article-title>
          .
          <source>PhD thesis</source>
          , Ecole Polytechnique of Paris (France) and University "Ca'
          <article-title>Foscari" of Venice (Italy)</article-title>
          , May
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fischer-</surname>
          </string-name>
          <article-title>Hubner. IT-security and Privacy: Design and Use of Privacy-enhancing Security Mechanisms</article-title>
          . Springer-Verlag, Berlin, Heidelberg,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Gartner</surname>
          </string-name>
          .
          <source>Gartner says 8</source>
          .
          <article-title>4 billion connected "Things" will be in use in 2017, up 31 percent from 2016</article-title>
          ,
          <year>February 2017</year>
          . http://www.gartner.com/newsroom/id/3598917.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Gibler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Crussell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Erickson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>AndroidLeaks: Automatically Detecting Potential Privacy Leaks in Android Applications on a Large Scale</article-title>
          .
          <source>In Proceedings of TRUST '12. SpringerVerlag</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Horwitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. W.</given-names>
            <surname>Reps</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Binkley</surname>
          </string-name>
          .
          <article-title>Interprocedural slicing using dependence graphs</article-title>
          .
          <source>ACM TOPLAS</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ):
          <volume>26</volume>
          {
          <fpage>60</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] International Organization for Standardization.
          <source>Space systems { Dynamic and static analysis { Exchange of mathematical models</source>
          ,
          <year>2005</year>
          . ISO 14954:
          <year>2005</year>
          standard.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15] International Organization for Standardization.
          <source>Road vehicles Functional safety</source>
          ,
          <year>2011</year>
          .
          <article-title>ISO 26262 standard</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hicks</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Jaeger</surname>
          </string-name>
          . Implicit Flows:
          <string-name>
            <surname>Can'T Live</surname>
          </string-name>
          with `Em,
          <string-name>
            <surname>Can'T Live</surname>
          </string-name>
          <article-title>Without `Em</article-title>
          .
          <source>In Proceedings of ICISS '08</source>
          . Springer-Verlag,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ko</surname>
          </string-name>
          pf and
          <string-name>
            <given-names>D.</given-names>
            <surname>Basin</surname>
          </string-name>
          .
          <article-title>An Information-theoretic Model for Adaptive Side-channel Attacks</article-title>
          .
          <source>In Proceedings of CCS '07. ACM</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mine</surname>
          </string-name>
          .
          <article-title>Static Analysis of Run-Time Errors in Embedded Critical Parallel C Programs</article-title>
          .
          <source>In Proceedings of ESOP '11</source>
          . Springer-Verlag,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Myers</surname>
          </string-name>
          .
          <article-title>JFlow: Practical Mostly-static Information Flow Control</article-title>
          .
          <source>In Proceedings of POPL '99. ACM</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>H. G.</given-names>
            <surname>Rice</surname>
          </string-name>
          .
          <article-title>Classes of recursively enumerable sets and their decision problems</article-title>
          .
          <source>Trans. Amer. Math. Soc.</source>
          ,
          <volume>74</volume>
          :
          <fpage>358</fpage>
          {
          <fpage>366</fpage>
          ,
          <year>1953</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sabelfeld</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Myers</surname>
          </string-name>
          .
          <article-title>Language-based Information- ow Security</article-title>
          . IEEE J.
          <string-name>
            <surname>Sel</surname>
          </string-name>
          . A. Commun.,
          <volume>21</volume>
          (
          <issue>1</issue>
          ):5{
          <fpage>19</fpage>
          ,
          <string-name>
            <surname>Sept</surname>
          </string-name>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>G.</given-names>
            <surname>Smith.</surname>
          </string-name>
          <article-title>On the Foundations of Quantitative Information Flow</article-title>
          .
          <source>In Proceedings of FOSSACS '09</source>
          . Springer-Verlag,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>F.</given-names>
            <surname>Spoto</surname>
          </string-name>
          .
          <article-title>The Julia Static Analyzer for Java</article-title>
          .
          <source>In Proceedings of SAS '16, Lecture Notes in Computer Science</source>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>H. F.</given-names>
            <surname>Tipton</surname>
          </string-name>
          .
          <article-title>O cial (ISC)2 Guide to the CISSP CBK</article-title>
          .
          <source>Auerbach Publications, 2nd edition</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>O.</given-names>
            <surname>Tripp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pistoia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Weisman. TAJ: E ective Taint</surname>
          </string-name>
          <article-title>Analysis of Web Applications</article-title>
          .
          <source>In Proceedings of PLDI '09. ACM</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang. LeakMiner: Detect Information</surname>
          </string-name>
          <article-title>Leakage on Android with Static Taint Analysis</article-title>
          .
          <source>In Proceedings of WCSE '12. IEEE Computer Society</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>