=Paper= {{Paper |id=Vol-3776/paper10 |storemode=property |title=Health data leaks to third parties in web-based health services |pdfUrl=https://ceur-ws.org/Vol-3776/shortpaper10.pdf |volume=Vol-3776 |authors=Sampsa Rauti,Robin Carlsson,Samuli Laato,Timi Heino,Panu Puhtila,Ville Leppänen |dblpUrl=https://dblp.org/rec/conf/tktp/RautiCLHPL24 }} ==Health data leaks to third parties in web-based health services== https://ceur-ws.org/Vol-3776/shortpaper10.pdf
                         Health data leaks to third parties in web-based health services
                         Sampsa Rauti1,* , Robin Carlsson1 , Samuli Laato2 , Timi Heino1 , Panu Puhtila1 and Ville Leppänen1
                         1
                             University of Turku, Vesilinnantie 5, 20500 Turku, Finland
                         2
                             Tampere University, Kalevantie 4, 33100 Tampere, Finland


                                              Abstract
                                             Today, users may share sensitive health data on web-based health services. We rely on these services to keep our data safe and secured,
                                             but this is not always the case. Therefore, this study investigates the privacy of a snapshot of 10 Finnish web-based health services,
                                             providing an analysis of health data leaks. We show that all analyzed services leaked at least some kind of personal data to third parties
                                             – from topics of visited pages to details on appointment bookings. While the situation has improved after we have notified the health
                                             service providers about this issue, the study serves as a reminder of the ongoing challenges in protecting user privacy in online health
                                             services and highlights the pressing need to address these issues.

                                              Keywords
                                              Medical websites, data leaks, data concerning health, web privacy, third-party services



                         1. Introduction                                                                                                      key findings and their implications. Section 6 concludes the
                                                                                                                                              paper.
                         Web-based health services have become a vital part of essen-
                         tial electronic services [1]. Booking appointments, viewing
                         personal health information and test results, and searching                                                          2. Related work
                         for health-related information can be conveniently carried
                         out online. Many web-based healthcare services, such as                                                              In recent years, a number of papers pertinent to our re-
                         medical centers’ websites, process sensitive personal infor-                                                         search have been published. Huo et al. [5] analyzed 459
                         mation concerning health. Due to the sensitivity of this                                                             health-related web portals and found that Google Analyt-
                         data, it is critical to ensure it remains confidential and does                                                      ics was used in 14% of them. Sensitive health data leaks
                         not leak to third parties [2].                                                                                       were present on 9 websites, and details on e.g. prescribed
                            However, previous research has demonstrated that across                                                           medicines and laboratory results were transferred to third
                         websites and services, regardless of sensitivity requirements,                                                       parties. Libert [6] investigates the problem of leaking health
                         numerous third-party services and components, such as web                                                            data contained in URL addresses to third parties. Zheutlin
                         analytics, are often used [3, 4]. Using such services makes                                                          et al. [7] studied user data tracking through third-party
                         monitoring business goals and improving user experience                                                              cookies on USA-based government, non-profit, and com-
                         more convenient, but at the same time, there is a risk that                                                          mercial health-related websites, but did not go into detail
                         sensitive information is leaked through these third party ser-                                                       about what personal data is sent to third parties.
                         vices. This typically happens without users’ knowledge, and                                                             Friedman et al. [8] discussed the risks of third-party track-
                         also unbeknownst to website developers and maintainers.                                                              ing technologies in hospital websites, highlighting poten-
                            This study conducts an in-depth examination of the                                                                tial legal liabilities. Yu et al. [9] conducted a large-scale
                         privacy of 10 web-based health services. We present an                                                               automated survey on hospital websites around the world,
                         overview of health data leaks, an issue that an even larger                                                          revealing that 53.5% of them employed tracking tools that
                         group of web-based health services is likely to have. Our                                                            collected user data. Friedman et al. [10] examined the preva-
                         study specifically focuses on the privacy and confidentiality                                                        lence of third-party tracking tools in abortion clinic websites
                         of Finnish web-based health services. Hence, in this study                                                           and concluded that the majority (99.1%) used some form
                         we address the following research question: Do web-based                                                             of tracking tool leaking user data to third parties. Surani
                         healthcare services leak sensitive data related to an individual                                                     et al. [11] found clear deficiencies in privacy policies of
                         user’s health status? This paper serves as an analysis and dis-                                                      web-based health services.
                         cussion on the privacy threats associated with integrating                                                              Huesch [12] reminds that searching and accessing free
                         third-party services in web-based health services.                                                                   health-related information online raises concerns about pri-
                            The rest of the paper is organized as follows. Section 2                                                          vacy and the potential for information on a user’s health to
                         reviews related work on the privacy of medical websites.                                                             be used for profiling and targeted advertising. Wesselkamp
                         Section 3 outlines the study setting and the method, describ-                                                        et al. [13] studied 385 medical websites in the EU area. They
                         ing how the studied websites were selected and how the                                                               found that 62% used tracking tools before user consent for
                         network traffic analysis was performed. Section 4 discusses                                                          data collection and 15% tracked the user even after consent
                         the results of our network traffic analysis and explores the                                                         rejection. Kes et al. [14] argue that collecting of users’ health
                         found data leaks. Section 5 presents a discussion on our                                                             data on websites, despite privacy concerns, can lead to an
                                                                                                                                              improved user experience akin to a personalized customer
                         TKTP 2024: The Annual Symposium of Computer Science, June 10-11,                                                     relationship. Still, the actual benefits are debatable, and
                         2024, Vaasa, Finland                                                                                                 transferring health data to third parties to improve targeted
                         *
                           Corresponding author.                                                                                              advertising is very problematic in the light of the GDPR.
                         $ sjprau@utu.fi (S. Rauti); crcarl@utu.fi (R. Carlsson);                                                                Compared to many earlier studies, the current study con-
                         samuli.laato@tuni.fi (S. Laato); tdhein@utu.fi (T. Heino);
                                                                                                                                              ducts a more in-depth examination of types of personal
                         papuht@utu.fi (P. Puhtila); ville.leppanen@utu.fi (V. Leppänen)
                          0000-0002-1891-2353 (S. Rauti); 0009-0003-7255-0239 (R. Carlsson);                                                 data that web-based health services leak to third parties in
                         0000-0003-4285-0073 (S. Laato); 0009-0008-4798-5261 (T. Heino);                                                      different scenarios. We show that the issue of third-party
                         0009-0004-6418-1063 (P. Puhtila); 0000-0001-5296-677X (V. Leppänen)                                                  analytics being present in web-based health services re-
                                      © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribu-
                                      tion 4.0 International (CC BY 4.0).



CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
mains a significant problem despite having been addressed         the health services. The chosen scenarios were key func-
in research well over ten years ago [15].                         tionalities of the web-based health services that involved
                                                                  processing of sensitive personal data, and the scenarios var-
                                                                  ied based on the tested service. Network traffic was recorded
3. Study Setting and Method                                       when 1) booking an appointment, 2) viewing personal in-
                                                                  formation, 3) using the search function, and 4) accessing
We selected 10 Finnish web-based health services for closer
                                                                  information pages.
inspection in this study. We chose the websites of several
                                                                     For the appointment booking scenario, network traffic
important healthcare providers in Finland, such as medi-
                                                                  was recorded from clicking the appointment link on the
cal centers, therapy houses, and laboratories. We searched
                                                                  front page to the final stage of making the appointment. In
healthcare providers using the Google search engine, with
                                                                  other words, the test was concluded before the final confir-
keywords "lääkärikeskus" (medical center), "terapia" (ther-
                                                                  mation of the appointment. In the appointment scenario,
apy) and "laboratorio" (laboratory). Instead of analyzing a
                                                                  an appointment was scheduled with a specific specialist
large number of health services, our study examines the net-
                                                                  (such as a doctor or therapist). We also conducted a separate
work traffic of these services more thoroughly. It includes
                                                                  test for booking an appointment for a specific procedure or
various usage scenarios where sensitive health data web
                                                                  service (e.g. a COVID-19 test or influenza vaccination) if
services process can leak to third parties. We examined the
                                                                  such an option was available in the tested health service.
data leaks in the chosen services two times, first in Decem-
                                                                     The second scenario, viewing personal information, refers
ber 2022 and then again in February 2024 after the service
                                                                  to the section behind the authentication of the web service.
providers had been informed of the issue.
                                                                  In this section of the web service, users can usually review
   It is important to note that we aim to address privacy
                                                                  their own prescriptions, test results, vaccinations, or previ-
challenges at a general level and avoid singling out the
                                                                  ous appointments. In this scenario, we investigated whether
affected health service providers in a negative light. To
                                                                  data leaks occur when the user displays different types of
adhere to ethical research practices, the chosen web services
                                                                  personal information. For example, information about labo-
are not referred to by their actual names but are denoted by
                                                                  ratory results and previous appointments could potentially
abbreviations WS1–WS10.
                                                                  be disclosed to third parties.
   In our test sequence, the browser cache was first cleared,
                                                                     We also examined the possible leaks when using the
cookies were deleted, and then the front page of the health
                                                                  search functions of the studied web services. The leakage of
service under examination was opened. On the front page,
                                                                  search terms to third parties can be particularly dangerous,
all cookies and data collection were accepted. When using
                                                                  because users may input highly sensitive terms, such as
the health service, all network traffic was recorded using
                                                                  the name of a specific disease or symptom. If user-defined
Google Chrome browser developer tools (DevTools). The
                                                                  search terms are transmitted to third parties, these exter-
network traffic recordings were saved as HAR files (HTTP
                                                                  nal actors can possibly build a detailed profile of the user’s
Archive) for more detailed analysis. We manually examined
                                                                  assumed health status and medical history.
the log files, searching through the HTTP request payloads
                                                                     The fourth usage scenario was related to information
and documented all instances of personal data meticulously.
                                                                  pages within web services, often containing information
Here we considered two distinct categories of personal data:
                                                                  about specific diseases. It can be problematic if information
     • Identifying data, capable of uniquely identifying          about the pages a user browses is sent to third parties, as
       the website user, such as IP addresses, User-Agent         users can be profiled based on this. This can be especially
       strings, and device-specific identifiers. Identification   effective over a longer time period.
       may also happen with a combination of technical de-
       tails, including operating system or browser details,
       window size, etc.
                                                                  4. Results
     • Sensitive contextual data, for example an URL ad-          Figure 1 displays information leaked to third parties on the
       dress containing a sensitive search term used on a         studied websites (December 2022). Each cell in Figure 1 indi-
       medical website, or details on a booked appointment.       cates a leak of specific information type in a specific health
       Although this kind of sensitive contextual data is         service. The numbers indicate how many third parties the
       often contained in URL addresses sent to a third           information was leaked to. For example, information about
       party, it may also be elsewhere in the HTTP request        initiating an appointment booking was leaked to 5 different
       payload.                                                   third parties in WS1.
                                                                     A common data leak pertained to the use of the appoint-
   What makes data leaks dangerous is the combination of
                                                                  ment booking function. Even though the appointment book-
these two categories: identifying a user by e.g. their IP ad-
                                                                  ing process was not completed in this study, the information
dress and then combining this to sensitive contextual data
                                                                  about initiating this process indicates the user’s intention
such as details on doctor’s appointment. This enables third
                                                                  to make a booking. In all services except for one (WS7),
parties to infer user’s potential medical conditions, for ex-
                                                                  information about initiating the appointment booking pro-
ample. It is also worth noting that while the identifying
                                                                  cess leaked to at least one third party. In three services,
personal data such as an IP address cannot always be imme-
                                                                  details about entering specific stages of the appointment
diately combined to a person’s identity (real name), large
                                                                  booking process (e.g., selecting a time for the appointment,
technology companies such and Google and Meta often have
                                                                  entering personal information) also leaked. Leaking any
the capability to fully identify the user, as users may use
                                                                  information about the appointment booking process is a
the same device to login to the other services run by these
                                                                  problem because it strongly indicates a relationship between
companies.
                                                                  the patient and health provider. This kind of relationship
   Four common usage scenarios where the leakage of health
                                                                  must be kept confidential according to the Finnish Deputy
data to third parties is possible were recorded while using
              Figure 1: Data leaked in the web-based health services in December 2022.



Ombudsman1 .                                                            each examined health service, information leaked to third
   Seven of the studied web services leaked additional infor-           parties either from the appointment booking page or search
mation about appointments to third parties. These included              function, in most cases, both. These pieces of information –
the selected clinic location (3 web services), appointment              possibly combined with the pages the user browsed – can,
date (3), appointment time (1), the name of the specialist              in just one visit, give a third party an accurate picture of the
(e.g., doctor) (3), the specialist’s field of expertise (2), and        user’s current health.
whether the appointment was made as a private or occu-                     Figure 2 shows the most common third parties (two in-
pational health customer (2). The selected service (e.g., in-           stances or more) present in the studied health services in
fluenza vaccination, COVID-19 test, or STD test) also leaked            December 2022. Google Analytics and Meta Pixel were the
on three of the studied websites. In one case (WS10), the               most common ones, Google appearing in every single ser-
specific region (e.g., Central Finland) leaked instead of the           vice and Meta in 8 services out of 10. The average number
exact clinic location.                                                  of third parties per health service was 5.2, which we con-
   The information transmitted to the third party about the             sider a large number in websites processing such sensitive
initiation of the appointment is problematic by itself, be-             data. WS1 had a staggering 9 third parties, WS2 and WS6
cause it implies a relationship between a patient and a health-         following close behind with 8 third parties.
care provider. Details about the reserved health service or                After discovering the data leaks in December 2022, the
the doctor’s name reveal the nature of this relationship even           studied healthcare providers were informed about the issue.
more precisely. It is also important to understand that a third         Figure 3 shows the updated status of data leaks in February
party can often track a specific individual’s online activities         2024. The number of data leaks has decreased. For exam-
over a long period of time. When multiple appointments ac-              ple, calculating the sum of all data leaks in Figure 1 yields
cumulate, a clear picture of the patient’s treatment measures           116, while this sum is 70 in Figure 3. However, this number
and health status begins to emerge.                                     is still very disappointing. Figure 3 shows clearly that re-
   Figure 1 also shows how users’ searches were tracked.                vealing the initiation of the appointment booking process,
Notably, in all seven cases where a health service website              and leaking viewed pages and search terms to third parties
had a search function, potentially sensitive search terms               are still a significant issue in majority of the studied health
were transferred to at least one third party, and in the worst          services, although the number of leaks has gone down. It
cases (WS4 and WS8), even up to four separate analytics                 is also surprising that highly sensitive information such
services.                                                               as the selected health service or the name of the specialist
   In all 10 examined health services, the URL addresses of             the patient is going to see is still being leaked. Only a sin-
information pages opened by the user were delivered to at               gle service, WS5, has completely removed third-party web
least one third party. In the case of one service (WS2), the            analytics and eliminated data leaks.
URL was sent to six third parties. Of course, viewing an
information page about a specific illness does not necessarily
imply that the visitor has that illness or even suspicion of            5. Discussion
it. However, the exposure of sensitive browsed pages to
                                                                        While the sensitivity of the data leaked by studied services
multiple third-party analytics services is not favorable.
                                                                        ranged from visited information pages (not so sensitive) to
   Lastly, in our experiments we found no data leaks when
                                                                        details on booked appointments (highly sensitive), this data
viewing personal information such as laboratory results af-
                                                                        is still often directly related to the visitor’s health status [6].
ter logging in to the studied services. It seems these more
                                                                        Also, even though the dataset we collected for the current
sensitive sections of the health services have been imple-
                                                                        study is not large in quantity, the finding that all of the
mented with the privacy-by-design approach in mind.
                                                                        analyzed web services leaked personal data to third parties
   To sum up, the findings of Figure 1 are concerning: for
                                                                        cannot be simply dismissed. Although the situation has
1
    https://yle.fi/a/3-11213545
                                                                        improved with time, web-based health services in Finland
          Figure 2: The most common third-party services present in the web-based health services in December 2022. Each third-party
          has only been counted once for each web service.




          Figure 3: Data leaked in the web-based health services in February 2024.



still appear to have many privacy challenges. Regrettably,              A convincing argument can be made that third-party
it is highly likely that these issues extend well beyond the         web analytics do not belong to websites processing sensi-
scope of the websites we examined.                                   tive health data. A straightforward alternative would be
    Compared to many other studies (e.g. [5]), we found a            eliminating third-party analytics entirely. In the cases web
high number of data leaks and observed these data leaks              analytics are necessary, locally hosted services like Matomo
were widespread among the services we studied. One rea-              [16, 17] should be used. With the use of such self-hosted
son for this is likely to be different data collection methods.      analytics, the health service provider now has full control
While many previous studies use automatic collection meth-           over the collected data and there is no need to transfer it to
ods, we analyzed the network traffic and data leaks manually.        a third party.
Also, the other studies may not consider all the same data              If third-party services really are necessary, chosen ser-
items our study does. Our goal was to consider all contex-           vices should be thoroughly assessed and their use should be
tual data items that may relate to the user’s health status.         carefully justified. Of course, there are some well-justified
Some previous studies may only include the most sensitive            use cases for trusted third-party services such as chat ser-
data leaks like leaking laboratory results and medications           vices or appointment booking systems that are vital for the
and possibly exclude appointment booking related informa-            functionality of the web-based health service. On the other
tion, for example. Therefore, our set of studied data items          hand, third-party analytics cannot be deemed essential for
and included use scenarios was more extensive than in most           the functionality of web-based health services to the same
studies, which affects the numbers of found data leaks.              extent.
    The use of third-party analytics is very difficult to justify       During the software testing phase, a careful assessment of
on web-based health services. While we strongly believe              data leakages to third parties should be conducted, similar to
the studied web-based services have not leaked sensitive             the approach taken in the current study. In this examination
personal data intentionally and while the third parties may          of outgoing network traffic, special attention should be paid
not abuse it, the fact this data is sent to third parties remains    to pages that handle sensitive data, such as appointment
a concern. There are multiple precautionary measures web             bookings pages. Analyzing network traffic gives developers
developers and website maintainers should adopt to prevent           an accurate understanding of the data third parties collect.
such leaks.                                                          This analysis also helps website administrators in decid-
ing which third-party services should be excluded from the             healthcare providers’ online systems, in: Proceedings
service altogether. It is worth noting developers may un-              of the 21st Workshop on Privacy in the Electronic Soci-
knowingly incorporate third-party analytics into websites,             ety, WPES’22, Association for Computing Machinery,
as off-the-shelf platforms commonly offer easy integration             New York, NY, USA, 2022, p. 197–211.
options or include them by default. This is why a network          [6] T. Libert, Privacy implications of health information
traffic analysis is essential.                                         seeking on the web, Communications of the ACM 58
   A good understanding of the application area, such as               (2015) 68–77.
the healthcare sector, holds great significance. The develop-      [7] A. R. Zheutlin, J. D. Niforatos, J. B. Sussman, Data-
ment team should aim to gain knowledge about the privacy               tracking on government, non-profit, and commercial
regulations governing this particular industry. Effective              health-related websites, Journal of general internal
communication with stakeholders is important in order to               medicine (2021) 1–3.
understand the requirements for protecting sensitive health        [8] A. B. Friedman, R. M. Merchant, A. Maley, K. Farhat,
data. When talking about essential online services such as             K. Smith, J. Felkins, R. E. Gonzales, L. Bauer, M. S.
medical center websites, the implemented service should                McCoy, Widespread third-party tracking on hospital
also undergo an external privacy audit.                                websites poses privacy risks for patients and legal
                                                                       liability for hospitals, Health Affairs 42 (2023) 508–
                                                                       515.
6. Conclusion                                                      [9] X. Yu, N. Samarasinghe, M. Mannan, A. Youssef, Got
                                                                       sick and tracked: Privacy analysis of hospital websites,
Our alarming discoveries should urge software developers
                                                                       in: 2022 IEEE European Symposium on Security and
and data protection officers overseeing web-based health-
                                                                       Privacy Workshops (EuroS&PW), IEEE, 2022, pp. 278–
care services to carefully assess the used third-party ser-
                                                                       286.
vices and adopt a privacy-by-design approach. Developers
                                                                  [10] A. B. Friedman, L. Bauer, R. Gonzales, M. S. McCoy,
and administrators of web services have to acknowledge
                                                                       Prevalence of third-party tracking on abortion clinic
their responsibility in protecting sensitive customer data
                                                                       web pages, JAMA Internal Medicine 182 (2022) 1221–
and following fair data processing practices. The nature of
                                                                       1222.
processed personal data and the involved third parties have
                                                                  [11] A. Surani, A. Bawaked, M. Wheeler, B. Kelsey,
to be transparently communicated to users. When it comes
                                                                       N. Roberts, D. Vincent, S. Das, Security and privacy of
to web-based medical services, it is unreasonable to rely on
                                                                       digital mental health: An analysis of web services and
external services that may collect sensitive data. Failing to
                                                                       mobile apps, in: Conference on Data and Applications
address serious data leaks, such as the ones presented in
                                                                       Security and Privacy, 2023.
this study, increases the vulnerability of specific user groups
                                                                  [12] M. D. Huesch, Privacy threats when seeking online
online, especially in terms of privacy. Users of web-based
                                                                       health information, JAMA Internal Medicine 173
health services should be able to see these websites as trust-
                                                                       (2013) 1838–1840.
worthy and confidential equivalents to traditional onsite
                                                                  [13] V. Wesselkamp, I. Fouad, C. Santos, Y. Boussad,
healthcare.
                                                                       N. Bielova, A. Legout, In-depth technical and legal
                                                                       analysis of tracking on health related websites with
Acknowledgments                                                        ernie extension, in: Proceedings of the 20th Work-
                                                                       shop on Workshop on Privacy in the Electronic Soci-
This research has been funded by Academy of Finland                    ety, WPES ’21, Association for Computing Machinery,
project 327397, IDA – Intimacy in Data-Driven Culture.                 New York, NY, USA, 2021, p. 151–166.
                                                                  [14] I. Kes, D. Heinrich, D. M. Woisetschlager, Behav-
                                                                       ioral targeting in health care marketing: Uncover-
References                                                             ing the sunny side of tracking consumers online, in:
                                                                       Let’s Get Engaged! Crossing the Threshold of Mar-
 [1] P. Wang, Z. Ding, C. Jiang, M. Zhou, Design and im-
                                                                       keting’s Engagement Era: Proceedings of the 2014
     plementation of a web-service-based public-oriented
                                                                       Academy of Marketing Science (AMS) Annual Confer-
     personalized health care platform, IEEE Transactions
                                                                       ence, Springer, 2016, pp. 297–297.
     on Systems, Man, and Cybernetics: Systems 43 (2013)
                                                                  [15] K. Masters, The gathering of user data by national
     941–957.
                                                                       medical association websites, The Internet Journal of
 [2] S. Saha, C. Chowdhury, S. Neogy, A novel two phase
                                                                       Medical Informatics 6 (2012).
     data sensitivity based access control framework for
                                                                  [16] J. Gamalielsson, B. Lundell, S. Butler, C. Brax, T. Pers-
     healthcare data, Multimedia Tools and Applications
                                                                       son, A. Mattsson, T. Gustavsson, J. Feist, E. Lönroth,
     83 (2024) 8867–8892.
                                                                       Towards open government through open source soft-
 [3] R. Carlsson, S. Rauti, S. Laato, T. Heino, V. Leppänen,
                                                                       ware for web analytics: The case of matomo, JeDEM-
     Privacy in popular children’s mobile applications: A
                                                                       eJournal of eDemocracy and Open Government 13
     network traffic analysis, in: 2023 46th MIPRO ICT
                                                                       (2021) 133–153.
     and Electronics Convention (MIPRO), IEEE, 2023, pp.
                                                                  [17] D. Quintel, R. Wilson, Analytics and privacy, Informa-
     1213–1218.
                                                                       tion Technology and Libraries 39 (2020).
 [4] S. Rauti, R. Carlsson, S. Mickelsson, T. Mäkilä, T. Heino,
     E. Pirjatanniemi, V. Leppänen, Analyzing third-party
     data leaks on online pharmacy websites, Health and
     Technology (2024) 1–18.
 [5] M. Huo, M. Bland, K. Levchenko, All eyes on me:
     Inside third party trackers’ exfiltration of phi from