=Paper=
{{Paper
|id=Vol-3776/paper10
|storemode=property
|title=Health data leaks to third parties in web-based health services
|pdfUrl=https://ceur-ws.org/Vol-3776/shortpaper10.pdf
|volume=Vol-3776
|authors=Sampsa Rauti,Robin Carlsson,Samuli Laato,Timi Heino,Panu Puhtila,Ville Leppänen
|dblpUrl=https://dblp.org/rec/conf/tktp/RautiCLHPL24
}}
==Health data leaks to third parties in web-based health services==
Health data leaks to third parties in web-based health services
Sampsa Rauti1,* , Robin Carlsson1 , Samuli Laato2 , Timi Heino1 , Panu Puhtila1 and Ville Leppänen1
1
University of Turku, Vesilinnantie 5, 20500 Turku, Finland
2
Tampere University, Kalevantie 4, 33100 Tampere, Finland
Abstract
Today, users may share sensitive health data on web-based health services. We rely on these services to keep our data safe and secured,
but this is not always the case. Therefore, this study investigates the privacy of a snapshot of 10 Finnish web-based health services,
providing an analysis of health data leaks. We show that all analyzed services leaked at least some kind of personal data to third parties
– from topics of visited pages to details on appointment bookings. While the situation has improved after we have notified the health
service providers about this issue, the study serves as a reminder of the ongoing challenges in protecting user privacy in online health
services and highlights the pressing need to address these issues.
Keywords
Medical websites, data leaks, data concerning health, web privacy, third-party services
1. Introduction key findings and their implications. Section 6 concludes the
paper.
Web-based health services have become a vital part of essen-
tial electronic services [1]. Booking appointments, viewing
personal health information and test results, and searching 2. Related work
for health-related information can be conveniently carried
out online. Many web-based healthcare services, such as In recent years, a number of papers pertinent to our re-
medical centers’ websites, process sensitive personal infor- search have been published. Huo et al. [5] analyzed 459
mation concerning health. Due to the sensitivity of this health-related web portals and found that Google Analyt-
data, it is critical to ensure it remains confidential and does ics was used in 14% of them. Sensitive health data leaks
not leak to third parties [2]. were present on 9 websites, and details on e.g. prescribed
However, previous research has demonstrated that across medicines and laboratory results were transferred to third
websites and services, regardless of sensitivity requirements, parties. Libert [6] investigates the problem of leaking health
numerous third-party services and components, such as web data contained in URL addresses to third parties. Zheutlin
analytics, are often used [3, 4]. Using such services makes et al. [7] studied user data tracking through third-party
monitoring business goals and improving user experience cookies on USA-based government, non-profit, and com-
more convenient, but at the same time, there is a risk that mercial health-related websites, but did not go into detail
sensitive information is leaked through these third party ser- about what personal data is sent to third parties.
vices. This typically happens without users’ knowledge, and Friedman et al. [8] discussed the risks of third-party track-
also unbeknownst to website developers and maintainers. ing technologies in hospital websites, highlighting poten-
This study conducts an in-depth examination of the tial legal liabilities. Yu et al. [9] conducted a large-scale
privacy of 10 web-based health services. We present an automated survey on hospital websites around the world,
overview of health data leaks, an issue that an even larger revealing that 53.5% of them employed tracking tools that
group of web-based health services is likely to have. Our collected user data. Friedman et al. [10] examined the preva-
study specifically focuses on the privacy and confidentiality lence of third-party tracking tools in abortion clinic websites
of Finnish web-based health services. Hence, in this study and concluded that the majority (99.1%) used some form
we address the following research question: Do web-based of tracking tool leaking user data to third parties. Surani
healthcare services leak sensitive data related to an individual et al. [11] found clear deficiencies in privacy policies of
user’s health status? This paper serves as an analysis and dis- web-based health services.
cussion on the privacy threats associated with integrating Huesch [12] reminds that searching and accessing free
third-party services in web-based health services. health-related information online raises concerns about pri-
The rest of the paper is organized as follows. Section 2 vacy and the potential for information on a user’s health to
reviews related work on the privacy of medical websites. be used for profiling and targeted advertising. Wesselkamp
Section 3 outlines the study setting and the method, describ- et al. [13] studied 385 medical websites in the EU area. They
ing how the studied websites were selected and how the found that 62% used tracking tools before user consent for
network traffic analysis was performed. Section 4 discusses data collection and 15% tracked the user even after consent
the results of our network traffic analysis and explores the rejection. Kes et al. [14] argue that collecting of users’ health
found data leaks. Section 5 presents a discussion on our data on websites, despite privacy concerns, can lead to an
improved user experience akin to a personalized customer
TKTP 2024: The Annual Symposium of Computer Science, June 10-11, relationship. Still, the actual benefits are debatable, and
2024, Vaasa, Finland transferring health data to third parties to improve targeted
*
Corresponding author. advertising is very problematic in the light of the GDPR.
$ sjprau@utu.fi (S. Rauti); crcarl@utu.fi (R. Carlsson); Compared to many earlier studies, the current study con-
samuli.laato@tuni.fi (S. Laato); tdhein@utu.fi (T. Heino);
ducts a more in-depth examination of types of personal
papuht@utu.fi (P. Puhtila); ville.leppanen@utu.fi (V. Leppänen)
0000-0002-1891-2353 (S. Rauti); 0009-0003-7255-0239 (R. Carlsson); data that web-based health services leak to third parties in
0000-0003-4285-0073 (S. Laato); 0009-0008-4798-5261 (T. Heino); different scenarios. We show that the issue of third-party
0009-0004-6418-1063 (P. Puhtila); 0000-0001-5296-677X (V. Leppänen) analytics being present in web-based health services re-
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribu-
tion 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
mains a significant problem despite having been addressed the health services. The chosen scenarios were key func-
in research well over ten years ago [15]. tionalities of the web-based health services that involved
processing of sensitive personal data, and the scenarios var-
ied based on the tested service. Network traffic was recorded
3. Study Setting and Method when 1) booking an appointment, 2) viewing personal in-
formation, 3) using the search function, and 4) accessing
We selected 10 Finnish web-based health services for closer
information pages.
inspection in this study. We chose the websites of several
For the appointment booking scenario, network traffic
important healthcare providers in Finland, such as medi-
was recorded from clicking the appointment link on the
cal centers, therapy houses, and laboratories. We searched
front page to the final stage of making the appointment. In
healthcare providers using the Google search engine, with
other words, the test was concluded before the final confir-
keywords "lääkärikeskus" (medical center), "terapia" (ther-
mation of the appointment. In the appointment scenario,
apy) and "laboratorio" (laboratory). Instead of analyzing a
an appointment was scheduled with a specific specialist
large number of health services, our study examines the net-
(such as a doctor or therapist). We also conducted a separate
work traffic of these services more thoroughly. It includes
test for booking an appointment for a specific procedure or
various usage scenarios where sensitive health data web
service (e.g. a COVID-19 test or influenza vaccination) if
services process can leak to third parties. We examined the
such an option was available in the tested health service.
data leaks in the chosen services two times, first in Decem-
The second scenario, viewing personal information, refers
ber 2022 and then again in February 2024 after the service
to the section behind the authentication of the web service.
providers had been informed of the issue.
In this section of the web service, users can usually review
It is important to note that we aim to address privacy
their own prescriptions, test results, vaccinations, or previ-
challenges at a general level and avoid singling out the
ous appointments. In this scenario, we investigated whether
affected health service providers in a negative light. To
data leaks occur when the user displays different types of
adhere to ethical research practices, the chosen web services
personal information. For example, information about labo-
are not referred to by their actual names but are denoted by
ratory results and previous appointments could potentially
abbreviations WS1–WS10.
be disclosed to third parties.
In our test sequence, the browser cache was first cleared,
We also examined the possible leaks when using the
cookies were deleted, and then the front page of the health
search functions of the studied web services. The leakage of
service under examination was opened. On the front page,
search terms to third parties can be particularly dangerous,
all cookies and data collection were accepted. When using
because users may input highly sensitive terms, such as
the health service, all network traffic was recorded using
the name of a specific disease or symptom. If user-defined
Google Chrome browser developer tools (DevTools). The
search terms are transmitted to third parties, these exter-
network traffic recordings were saved as HAR files (HTTP
nal actors can possibly build a detailed profile of the user’s
Archive) for more detailed analysis. We manually examined
assumed health status and medical history.
the log files, searching through the HTTP request payloads
The fourth usage scenario was related to information
and documented all instances of personal data meticulously.
pages within web services, often containing information
Here we considered two distinct categories of personal data:
about specific diseases. It can be problematic if information
• Identifying data, capable of uniquely identifying about the pages a user browses is sent to third parties, as
the website user, such as IP addresses, User-Agent users can be profiled based on this. This can be especially
strings, and device-specific identifiers. Identification effective over a longer time period.
may also happen with a combination of technical de-
tails, including operating system or browser details,
window size, etc.
4. Results
• Sensitive contextual data, for example an URL ad- Figure 1 displays information leaked to third parties on the
dress containing a sensitive search term used on a studied websites (December 2022). Each cell in Figure 1 indi-
medical website, or details on a booked appointment. cates a leak of specific information type in a specific health
Although this kind of sensitive contextual data is service. The numbers indicate how many third parties the
often contained in URL addresses sent to a third information was leaked to. For example, information about
party, it may also be elsewhere in the HTTP request initiating an appointment booking was leaked to 5 different
payload. third parties in WS1.
A common data leak pertained to the use of the appoint-
What makes data leaks dangerous is the combination of
ment booking function. Even though the appointment book-
these two categories: identifying a user by e.g. their IP ad-
ing process was not completed in this study, the information
dress and then combining this to sensitive contextual data
about initiating this process indicates the user’s intention
such as details on doctor’s appointment. This enables third
to make a booking. In all services except for one (WS7),
parties to infer user’s potential medical conditions, for ex-
information about initiating the appointment booking pro-
ample. It is also worth noting that while the identifying
cess leaked to at least one third party. In three services,
personal data such as an IP address cannot always be imme-
details about entering specific stages of the appointment
diately combined to a person’s identity (real name), large
booking process (e.g., selecting a time for the appointment,
technology companies such and Google and Meta often have
entering personal information) also leaked. Leaking any
the capability to fully identify the user, as users may use
information about the appointment booking process is a
the same device to login to the other services run by these
problem because it strongly indicates a relationship between
companies.
the patient and health provider. This kind of relationship
Four common usage scenarios where the leakage of health
must be kept confidential according to the Finnish Deputy
data to third parties is possible were recorded while using
Figure 1: Data leaked in the web-based health services in December 2022.
Ombudsman1 . each examined health service, information leaked to third
Seven of the studied web services leaked additional infor- parties either from the appointment booking page or search
mation about appointments to third parties. These included function, in most cases, both. These pieces of information –
the selected clinic location (3 web services), appointment possibly combined with the pages the user browsed – can,
date (3), appointment time (1), the name of the specialist in just one visit, give a third party an accurate picture of the
(e.g., doctor) (3), the specialist’s field of expertise (2), and user’s current health.
whether the appointment was made as a private or occu- Figure 2 shows the most common third parties (two in-
pational health customer (2). The selected service (e.g., in- stances or more) present in the studied health services in
fluenza vaccination, COVID-19 test, or STD test) also leaked December 2022. Google Analytics and Meta Pixel were the
on three of the studied websites. In one case (WS10), the most common ones, Google appearing in every single ser-
specific region (e.g., Central Finland) leaked instead of the vice and Meta in 8 services out of 10. The average number
exact clinic location. of third parties per health service was 5.2, which we con-
The information transmitted to the third party about the sider a large number in websites processing such sensitive
initiation of the appointment is problematic by itself, be- data. WS1 had a staggering 9 third parties, WS2 and WS6
cause it implies a relationship between a patient and a health- following close behind with 8 third parties.
care provider. Details about the reserved health service or After discovering the data leaks in December 2022, the
the doctor’s name reveal the nature of this relationship even studied healthcare providers were informed about the issue.
more precisely. It is also important to understand that a third Figure 3 shows the updated status of data leaks in February
party can often track a specific individual’s online activities 2024. The number of data leaks has decreased. For exam-
over a long period of time. When multiple appointments ac- ple, calculating the sum of all data leaks in Figure 1 yields
cumulate, a clear picture of the patient’s treatment measures 116, while this sum is 70 in Figure 3. However, this number
and health status begins to emerge. is still very disappointing. Figure 3 shows clearly that re-
Figure 1 also shows how users’ searches were tracked. vealing the initiation of the appointment booking process,
Notably, in all seven cases where a health service website and leaking viewed pages and search terms to third parties
had a search function, potentially sensitive search terms are still a significant issue in majority of the studied health
were transferred to at least one third party, and in the worst services, although the number of leaks has gone down. It
cases (WS4 and WS8), even up to four separate analytics is also surprising that highly sensitive information such
services. as the selected health service or the name of the specialist
In all 10 examined health services, the URL addresses of the patient is going to see is still being leaked. Only a sin-
information pages opened by the user were delivered to at gle service, WS5, has completely removed third-party web
least one third party. In the case of one service (WS2), the analytics and eliminated data leaks.
URL was sent to six third parties. Of course, viewing an
information page about a specific illness does not necessarily
imply that the visitor has that illness or even suspicion of 5. Discussion
it. However, the exposure of sensitive browsed pages to
While the sensitivity of the data leaked by studied services
multiple third-party analytics services is not favorable.
ranged from visited information pages (not so sensitive) to
Lastly, in our experiments we found no data leaks when
details on booked appointments (highly sensitive), this data
viewing personal information such as laboratory results af-
is still often directly related to the visitor’s health status [6].
ter logging in to the studied services. It seems these more
Also, even though the dataset we collected for the current
sensitive sections of the health services have been imple-
study is not large in quantity, the finding that all of the
mented with the privacy-by-design approach in mind.
analyzed web services leaked personal data to third parties
To sum up, the findings of Figure 1 are concerning: for
cannot be simply dismissed. Although the situation has
1
https://yle.fi/a/3-11213545
improved with time, web-based health services in Finland
Figure 2: The most common third-party services present in the web-based health services in December 2022. Each third-party
has only been counted once for each web service.
Figure 3: Data leaked in the web-based health services in February 2024.
still appear to have many privacy challenges. Regrettably, A convincing argument can be made that third-party
it is highly likely that these issues extend well beyond the web analytics do not belong to websites processing sensi-
scope of the websites we examined. tive health data. A straightforward alternative would be
Compared to many other studies (e.g. [5]), we found a eliminating third-party analytics entirely. In the cases web
high number of data leaks and observed these data leaks analytics are necessary, locally hosted services like Matomo
were widespread among the services we studied. One rea- [16, 17] should be used. With the use of such self-hosted
son for this is likely to be different data collection methods. analytics, the health service provider now has full control
While many previous studies use automatic collection meth- over the collected data and there is no need to transfer it to
ods, we analyzed the network traffic and data leaks manually. a third party.
Also, the other studies may not consider all the same data If third-party services really are necessary, chosen ser-
items our study does. Our goal was to consider all contex- vices should be thoroughly assessed and their use should be
tual data items that may relate to the user’s health status. carefully justified. Of course, there are some well-justified
Some previous studies may only include the most sensitive use cases for trusted third-party services such as chat ser-
data leaks like leaking laboratory results and medications vices or appointment booking systems that are vital for the
and possibly exclude appointment booking related informa- functionality of the web-based health service. On the other
tion, for example. Therefore, our set of studied data items hand, third-party analytics cannot be deemed essential for
and included use scenarios was more extensive than in most the functionality of web-based health services to the same
studies, which affects the numbers of found data leaks. extent.
The use of third-party analytics is very difficult to justify During the software testing phase, a careful assessment of
on web-based health services. While we strongly believe data leakages to third parties should be conducted, similar to
the studied web-based services have not leaked sensitive the approach taken in the current study. In this examination
personal data intentionally and while the third parties may of outgoing network traffic, special attention should be paid
not abuse it, the fact this data is sent to third parties remains to pages that handle sensitive data, such as appointment
a concern. There are multiple precautionary measures web bookings pages. Analyzing network traffic gives developers
developers and website maintainers should adopt to prevent an accurate understanding of the data third parties collect.
such leaks. This analysis also helps website administrators in decid-
ing which third-party services should be excluded from the healthcare providers’ online systems, in: Proceedings
service altogether. It is worth noting developers may un- of the 21st Workshop on Privacy in the Electronic Soci-
knowingly incorporate third-party analytics into websites, ety, WPES’22, Association for Computing Machinery,
as off-the-shelf platforms commonly offer easy integration New York, NY, USA, 2022, p. 197–211.
options or include them by default. This is why a network [6] T. Libert, Privacy implications of health information
traffic analysis is essential. seeking on the web, Communications of the ACM 58
A good understanding of the application area, such as (2015) 68–77.
the healthcare sector, holds great significance. The develop- [7] A. R. Zheutlin, J. D. Niforatos, J. B. Sussman, Data-
ment team should aim to gain knowledge about the privacy tracking on government, non-profit, and commercial
regulations governing this particular industry. Effective health-related websites, Journal of general internal
communication with stakeholders is important in order to medicine (2021) 1–3.
understand the requirements for protecting sensitive health [8] A. B. Friedman, R. M. Merchant, A. Maley, K. Farhat,
data. When talking about essential online services such as K. Smith, J. Felkins, R. E. Gonzales, L. Bauer, M. S.
medical center websites, the implemented service should McCoy, Widespread third-party tracking on hospital
also undergo an external privacy audit. websites poses privacy risks for patients and legal
liability for hospitals, Health Affairs 42 (2023) 508–
515.
6. Conclusion [9] X. Yu, N. Samarasinghe, M. Mannan, A. Youssef, Got
sick and tracked: Privacy analysis of hospital websites,
Our alarming discoveries should urge software developers
in: 2022 IEEE European Symposium on Security and
and data protection officers overseeing web-based health-
Privacy Workshops (EuroS&PW), IEEE, 2022, pp. 278–
care services to carefully assess the used third-party ser-
286.
vices and adopt a privacy-by-design approach. Developers
[10] A. B. Friedman, L. Bauer, R. Gonzales, M. S. McCoy,
and administrators of web services have to acknowledge
Prevalence of third-party tracking on abortion clinic
their responsibility in protecting sensitive customer data
web pages, JAMA Internal Medicine 182 (2022) 1221–
and following fair data processing practices. The nature of
1222.
processed personal data and the involved third parties have
[11] A. Surani, A. Bawaked, M. Wheeler, B. Kelsey,
to be transparently communicated to users. When it comes
N. Roberts, D. Vincent, S. Das, Security and privacy of
to web-based medical services, it is unreasonable to rely on
digital mental health: An analysis of web services and
external services that may collect sensitive data. Failing to
mobile apps, in: Conference on Data and Applications
address serious data leaks, such as the ones presented in
Security and Privacy, 2023.
this study, increases the vulnerability of specific user groups
[12] M. D. Huesch, Privacy threats when seeking online
online, especially in terms of privacy. Users of web-based
health information, JAMA Internal Medicine 173
health services should be able to see these websites as trust-
(2013) 1838–1840.
worthy and confidential equivalents to traditional onsite
[13] V. Wesselkamp, I. Fouad, C. Santos, Y. Boussad,
healthcare.
N. Bielova, A. Legout, In-depth technical and legal
analysis of tracking on health related websites with
Acknowledgments ernie extension, in: Proceedings of the 20th Work-
shop on Workshop on Privacy in the Electronic Soci-
This research has been funded by Academy of Finland ety, WPES ’21, Association for Computing Machinery,
project 327397, IDA – Intimacy in Data-Driven Culture. New York, NY, USA, 2021, p. 151–166.
[14] I. Kes, D. Heinrich, D. M. Woisetschlager, Behav-
ioral targeting in health care marketing: Uncover-
References ing the sunny side of tracking consumers online, in:
Let’s Get Engaged! Crossing the Threshold of Mar-
[1] P. Wang, Z. Ding, C. Jiang, M. Zhou, Design and im-
keting’s Engagement Era: Proceedings of the 2014
plementation of a web-service-based public-oriented
Academy of Marketing Science (AMS) Annual Confer-
personalized health care platform, IEEE Transactions
ence, Springer, 2016, pp. 297–297.
on Systems, Man, and Cybernetics: Systems 43 (2013)
[15] K. Masters, The gathering of user data by national
941–957.
medical association websites, The Internet Journal of
[2] S. Saha, C. Chowdhury, S. Neogy, A novel two phase
Medical Informatics 6 (2012).
data sensitivity based access control framework for
[16] J. Gamalielsson, B. Lundell, S. Butler, C. Brax, T. Pers-
healthcare data, Multimedia Tools and Applications
son, A. Mattsson, T. Gustavsson, J. Feist, E. Lönroth,
83 (2024) 8867–8892.
Towards open government through open source soft-
[3] R. Carlsson, S. Rauti, S. Laato, T. Heino, V. Leppänen,
ware for web analytics: The case of matomo, JeDEM-
Privacy in popular children’s mobile applications: A
eJournal of eDemocracy and Open Government 13
network traffic analysis, in: 2023 46th MIPRO ICT
(2021) 133–153.
and Electronics Convention (MIPRO), IEEE, 2023, pp.
[17] D. Quintel, R. Wilson, Analytics and privacy, Informa-
1213–1218.
tion Technology and Libraries 39 (2020).
[4] S. Rauti, R. Carlsson, S. Mickelsson, T. Mäkilä, T. Heino,
E. Pirjatanniemi, V. Leppänen, Analyzing third-party
data leaks on online pharmacy websites, Health and
Technology (2024) 1–18.
[5] M. Huo, M. Bland, K. Levchenko, All eyes on me:
Inside third party trackers’ exfiltration of phi from