Health data leaks to third parties in web-based health services Sampsa Rauti1,* , Robin Carlsson1 , Samuli Laato2 , Timi Heino1 , Panu Puhtila1 and Ville Leppänen1 1 University of Turku, Vesilinnantie 5, 20500 Turku, Finland 2 Tampere University, Kalevantie 4, 33100 Tampere, Finland Abstract Today, users may share sensitive health data on web-based health services. We rely on these services to keep our data safe and secured, but this is not always the case. Therefore, this study investigates the privacy of a snapshot of 10 Finnish web-based health services, providing an analysis of health data leaks. We show that all analyzed services leaked at least some kind of personal data to third parties – from topics of visited pages to details on appointment bookings. While the situation has improved after we have notified the health service providers about this issue, the study serves as a reminder of the ongoing challenges in protecting user privacy in online health services and highlights the pressing need to address these issues. Keywords Medical websites, data leaks, data concerning health, web privacy, third-party services 1. Introduction key findings and their implications. Section 6 concludes the paper. Web-based health services have become a vital part of essen- tial electronic services [1]. Booking appointments, viewing personal health information and test results, and searching 2. Related work for health-related information can be conveniently carried out online. Many web-based healthcare services, such as In recent years, a number of papers pertinent to our re- medical centers’ websites, process sensitive personal infor- search have been published. Huo et al. [5] analyzed 459 mation concerning health. Due to the sensitivity of this health-related web portals and found that Google Analyt- data, it is critical to ensure it remains confidential and does ics was used in 14% of them. Sensitive health data leaks not leak to third parties [2]. were present on 9 websites, and details on e.g. prescribed However, previous research has demonstrated that across medicines and laboratory results were transferred to third websites and services, regardless of sensitivity requirements, parties. Libert [6] investigates the problem of leaking health numerous third-party services and components, such as web data contained in URL addresses to third parties. Zheutlin analytics, are often used [3, 4]. Using such services makes et al. [7] studied user data tracking through third-party monitoring business goals and improving user experience cookies on USA-based government, non-profit, and com- more convenient, but at the same time, there is a risk that mercial health-related websites, but did not go into detail sensitive information is leaked through these third party ser- about what personal data is sent to third parties. vices. This typically happens without users’ knowledge, and Friedman et al. [8] discussed the risks of third-party track- also unbeknownst to website developers and maintainers. ing technologies in hospital websites, highlighting poten- This study conducts an in-depth examination of the tial legal liabilities. Yu et al. [9] conducted a large-scale privacy of 10 web-based health services. We present an automated survey on hospital websites around the world, overview of health data leaks, an issue that an even larger revealing that 53.5% of them employed tracking tools that group of web-based health services is likely to have. Our collected user data. Friedman et al. [10] examined the preva- study specifically focuses on the privacy and confidentiality lence of third-party tracking tools in abortion clinic websites of Finnish web-based health services. Hence, in this study and concluded that the majority (99.1%) used some form we address the following research question: Do web-based of tracking tool leaking user data to third parties. Surani healthcare services leak sensitive data related to an individual et al. [11] found clear deficiencies in privacy policies of user’s health status? This paper serves as an analysis and dis- web-based health services. cussion on the privacy threats associated with integrating Huesch [12] reminds that searching and accessing free third-party services in web-based health services. health-related information online raises concerns about pri- The rest of the paper is organized as follows. Section 2 vacy and the potential for information on a user’s health to reviews related work on the privacy of medical websites. be used for profiling and targeted advertising. Wesselkamp Section 3 outlines the study setting and the method, describ- et al. [13] studied 385 medical websites in the EU area. They ing how the studied websites were selected and how the found that 62% used tracking tools before user consent for network traffic analysis was performed. Section 4 discusses data collection and 15% tracked the user even after consent the results of our network traffic analysis and explores the rejection. Kes et al. [14] argue that collecting of users’ health found data leaks. Section 5 presents a discussion on our data on websites, despite privacy concerns, can lead to an improved user experience akin to a personalized customer TKTP 2024: The Annual Symposium of Computer Science, June 10-11, relationship. Still, the actual benefits are debatable, and 2024, Vaasa, Finland transferring health data to third parties to improve targeted * Corresponding author. advertising is very problematic in the light of the GDPR. $ sjprau@utu.fi (S. Rauti); crcarl@utu.fi (R. Carlsson); Compared to many earlier studies, the current study con- samuli.laato@tuni.fi (S. Laato); tdhein@utu.fi (T. Heino); ducts a more in-depth examination of types of personal papuht@utu.fi (P. Puhtila); ville.leppanen@utu.fi (V. Leppänen)  0000-0002-1891-2353 (S. Rauti); 0009-0003-7255-0239 (R. Carlsson); data that web-based health services leak to third parties in 0000-0003-4285-0073 (S. Laato); 0009-0008-4798-5261 (T. Heino); different scenarios. We show that the issue of third-party 0009-0004-6418-1063 (P. Puhtila); 0000-0001-5296-677X (V. Leppänen) analytics being present in web-based health services re- © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribu- tion 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings mains a significant problem despite having been addressed the health services. The chosen scenarios were key func- in research well over ten years ago [15]. tionalities of the web-based health services that involved processing of sensitive personal data, and the scenarios var- ied based on the tested service. Network traffic was recorded 3. Study Setting and Method when 1) booking an appointment, 2) viewing personal in- formation, 3) using the search function, and 4) accessing We selected 10 Finnish web-based health services for closer information pages. inspection in this study. We chose the websites of several For the appointment booking scenario, network traffic important healthcare providers in Finland, such as medi- was recorded from clicking the appointment link on the cal centers, therapy houses, and laboratories. We searched front page to the final stage of making the appointment. In healthcare providers using the Google search engine, with other words, the test was concluded before the final confir- keywords "lääkärikeskus" (medical center), "terapia" (ther- mation of the appointment. In the appointment scenario, apy) and "laboratorio" (laboratory). Instead of analyzing a an appointment was scheduled with a specific specialist large number of health services, our study examines the net- (such as a doctor or therapist). We also conducted a separate work traffic of these services more thoroughly. It includes test for booking an appointment for a specific procedure or various usage scenarios where sensitive health data web service (e.g. a COVID-19 test or influenza vaccination) if services process can leak to third parties. We examined the such an option was available in the tested health service. data leaks in the chosen services two times, first in Decem- The second scenario, viewing personal information, refers ber 2022 and then again in February 2024 after the service to the section behind the authentication of the web service. providers had been informed of the issue. In this section of the web service, users can usually review It is important to note that we aim to address privacy their own prescriptions, test results, vaccinations, or previ- challenges at a general level and avoid singling out the ous appointments. In this scenario, we investigated whether affected health service providers in a negative light. To data leaks occur when the user displays different types of adhere to ethical research practices, the chosen web services personal information. For example, information about labo- are not referred to by their actual names but are denoted by ratory results and previous appointments could potentially abbreviations WS1–WS10. be disclosed to third parties. In our test sequence, the browser cache was first cleared, We also examined the possible leaks when using the cookies were deleted, and then the front page of the health search functions of the studied web services. The leakage of service under examination was opened. On the front page, search terms to third parties can be particularly dangerous, all cookies and data collection were accepted. When using because users may input highly sensitive terms, such as the health service, all network traffic was recorded using the name of a specific disease or symptom. If user-defined Google Chrome browser developer tools (DevTools). The search terms are transmitted to third parties, these exter- network traffic recordings were saved as HAR files (HTTP nal actors can possibly build a detailed profile of the user’s Archive) for more detailed analysis. We manually examined assumed health status and medical history. the log files, searching through the HTTP request payloads The fourth usage scenario was related to information and documented all instances of personal data meticulously. pages within web services, often containing information Here we considered two distinct categories of personal data: about specific diseases. It can be problematic if information • Identifying data, capable of uniquely identifying about the pages a user browses is sent to third parties, as the website user, such as IP addresses, User-Agent users can be profiled based on this. This can be especially strings, and device-specific identifiers. Identification effective over a longer time period. may also happen with a combination of technical de- tails, including operating system or browser details, window size, etc. 4. Results • Sensitive contextual data, for example an URL ad- Figure 1 displays information leaked to third parties on the dress containing a sensitive search term used on a studied websites (December 2022). Each cell in Figure 1 indi- medical website, or details on a booked appointment. cates a leak of specific information type in a specific health Although this kind of sensitive contextual data is service. The numbers indicate how many third parties the often contained in URL addresses sent to a third information was leaked to. For example, information about party, it may also be elsewhere in the HTTP request initiating an appointment booking was leaked to 5 different payload. third parties in WS1. A common data leak pertained to the use of the appoint- What makes data leaks dangerous is the combination of ment booking function. Even though the appointment book- these two categories: identifying a user by e.g. their IP ad- ing process was not completed in this study, the information dress and then combining this to sensitive contextual data about initiating this process indicates the user’s intention such as details on doctor’s appointment. This enables third to make a booking. In all services except for one (WS7), parties to infer user’s potential medical conditions, for ex- information about initiating the appointment booking pro- ample. It is also worth noting that while the identifying cess leaked to at least one third party. In three services, personal data such as an IP address cannot always be imme- details about entering specific stages of the appointment diately combined to a person’s identity (real name), large booking process (e.g., selecting a time for the appointment, technology companies such and Google and Meta often have entering personal information) also leaked. Leaking any the capability to fully identify the user, as users may use information about the appointment booking process is a the same device to login to the other services run by these problem because it strongly indicates a relationship between companies. the patient and health provider. This kind of relationship Four common usage scenarios where the leakage of health must be kept confidential according to the Finnish Deputy data to third parties is possible were recorded while using Figure 1: Data leaked in the web-based health services in December 2022. Ombudsman1 . each examined health service, information leaked to third Seven of the studied web services leaked additional infor- parties either from the appointment booking page or search mation about appointments to third parties. These included function, in most cases, both. These pieces of information – the selected clinic location (3 web services), appointment possibly combined with the pages the user browsed – can, date (3), appointment time (1), the name of the specialist in just one visit, give a third party an accurate picture of the (e.g., doctor) (3), the specialist’s field of expertise (2), and user’s current health. whether the appointment was made as a private or occu- Figure 2 shows the most common third parties (two in- pational health customer (2). The selected service (e.g., in- stances or more) present in the studied health services in fluenza vaccination, COVID-19 test, or STD test) also leaked December 2022. Google Analytics and Meta Pixel were the on three of the studied websites. In one case (WS10), the most common ones, Google appearing in every single ser- specific region (e.g., Central Finland) leaked instead of the vice and Meta in 8 services out of 10. The average number exact clinic location. of third parties per health service was 5.2, which we con- The information transmitted to the third party about the sider a large number in websites processing such sensitive initiation of the appointment is problematic by itself, be- data. WS1 had a staggering 9 third parties, WS2 and WS6 cause it implies a relationship between a patient and a health- following close behind with 8 third parties. care provider. Details about the reserved health service or After discovering the data leaks in December 2022, the the doctor’s name reveal the nature of this relationship even studied healthcare providers were informed about the issue. more precisely. It is also important to understand that a third Figure 3 shows the updated status of data leaks in February party can often track a specific individual’s online activities 2024. The number of data leaks has decreased. For exam- over a long period of time. When multiple appointments ac- ple, calculating the sum of all data leaks in Figure 1 yields cumulate, a clear picture of the patient’s treatment measures 116, while this sum is 70 in Figure 3. However, this number and health status begins to emerge. is still very disappointing. Figure 3 shows clearly that re- Figure 1 also shows how users’ searches were tracked. vealing the initiation of the appointment booking process, Notably, in all seven cases where a health service website and leaking viewed pages and search terms to third parties had a search function, potentially sensitive search terms are still a significant issue in majority of the studied health were transferred to at least one third party, and in the worst services, although the number of leaks has gone down. It cases (WS4 and WS8), even up to four separate analytics is also surprising that highly sensitive information such services. as the selected health service or the name of the specialist In all 10 examined health services, the URL addresses of the patient is going to see is still being leaked. Only a sin- information pages opened by the user were delivered to at gle service, WS5, has completely removed third-party web least one third party. In the case of one service (WS2), the analytics and eliminated data leaks. URL was sent to six third parties. Of course, viewing an information page about a specific illness does not necessarily imply that the visitor has that illness or even suspicion of 5. Discussion it. However, the exposure of sensitive browsed pages to While the sensitivity of the data leaked by studied services multiple third-party analytics services is not favorable. ranged from visited information pages (not so sensitive) to Lastly, in our experiments we found no data leaks when details on booked appointments (highly sensitive), this data viewing personal information such as laboratory results af- is still often directly related to the visitor’s health status [6]. ter logging in to the studied services. It seems these more Also, even though the dataset we collected for the current sensitive sections of the health services have been imple- study is not large in quantity, the finding that all of the mented with the privacy-by-design approach in mind. analyzed web services leaked personal data to third parties To sum up, the findings of Figure 1 are concerning: for cannot be simply dismissed. Although the situation has 1 https://yle.fi/a/3-11213545 improved with time, web-based health services in Finland Figure 2: The most common third-party services present in the web-based health services in December 2022. Each third-party has only been counted once for each web service. Figure 3: Data leaked in the web-based health services in February 2024. still appear to have many privacy challenges. Regrettably, A convincing argument can be made that third-party it is highly likely that these issues extend well beyond the web analytics do not belong to websites processing sensi- scope of the websites we examined. tive health data. A straightforward alternative would be Compared to many other studies (e.g. [5]), we found a eliminating third-party analytics entirely. In the cases web high number of data leaks and observed these data leaks analytics are necessary, locally hosted services like Matomo were widespread among the services we studied. One rea- [16, 17] should be used. With the use of such self-hosted son for this is likely to be different data collection methods. analytics, the health service provider now has full control While many previous studies use automatic collection meth- over the collected data and there is no need to transfer it to ods, we analyzed the network traffic and data leaks manually. a third party. Also, the other studies may not consider all the same data If third-party services really are necessary, chosen ser- items our study does. Our goal was to consider all contex- vices should be thoroughly assessed and their use should be tual data items that may relate to the user’s health status. carefully justified. Of course, there are some well-justified Some previous studies may only include the most sensitive use cases for trusted third-party services such as chat ser- data leaks like leaking laboratory results and medications vices or appointment booking systems that are vital for the and possibly exclude appointment booking related informa- functionality of the web-based health service. On the other tion, for example. Therefore, our set of studied data items hand, third-party analytics cannot be deemed essential for and included use scenarios was more extensive than in most the functionality of web-based health services to the same studies, which affects the numbers of found data leaks. extent. The use of third-party analytics is very difficult to justify During the software testing phase, a careful assessment of on web-based health services. While we strongly believe data leakages to third parties should be conducted, similar to the studied web-based services have not leaked sensitive the approach taken in the current study. In this examination personal data intentionally and while the third parties may of outgoing network traffic, special attention should be paid not abuse it, the fact this data is sent to third parties remains to pages that handle sensitive data, such as appointment a concern. There are multiple precautionary measures web bookings pages. Analyzing network traffic gives developers developers and website maintainers should adopt to prevent an accurate understanding of the data third parties collect. such leaks. This analysis also helps website administrators in decid- ing which third-party services should be excluded from the healthcare providers’ online systems, in: Proceedings service altogether. It is worth noting developers may un- of the 21st Workshop on Privacy in the Electronic Soci- knowingly incorporate third-party analytics into websites, ety, WPES’22, Association for Computing Machinery, as off-the-shelf platforms commonly offer easy integration New York, NY, USA, 2022, p. 197–211. options or include them by default. This is why a network [6] T. Libert, Privacy implications of health information traffic analysis is essential. seeking on the web, Communications of the ACM 58 A good understanding of the application area, such as (2015) 68–77. the healthcare sector, holds great significance. The develop- [7] A. R. Zheutlin, J. D. Niforatos, J. B. Sussman, Data- ment team should aim to gain knowledge about the privacy tracking on government, non-profit, and commercial regulations governing this particular industry. Effective health-related websites, Journal of general internal communication with stakeholders is important in order to medicine (2021) 1–3. understand the requirements for protecting sensitive health [8] A. B. Friedman, R. M. Merchant, A. Maley, K. Farhat, data. When talking about essential online services such as K. Smith, J. Felkins, R. E. Gonzales, L. Bauer, M. S. medical center websites, the implemented service should McCoy, Widespread third-party tracking on hospital also undergo an external privacy audit. websites poses privacy risks for patients and legal liability for hospitals, Health Affairs 42 (2023) 508– 515. 6. Conclusion [9] X. Yu, N. Samarasinghe, M. Mannan, A. Youssef, Got sick and tracked: Privacy analysis of hospital websites, Our alarming discoveries should urge software developers in: 2022 IEEE European Symposium on Security and and data protection officers overseeing web-based health- Privacy Workshops (EuroS&PW), IEEE, 2022, pp. 278– care services to carefully assess the used third-party ser- 286. vices and adopt a privacy-by-design approach. Developers [10] A. B. Friedman, L. Bauer, R. Gonzales, M. S. McCoy, and administrators of web services have to acknowledge Prevalence of third-party tracking on abortion clinic their responsibility in protecting sensitive customer data web pages, JAMA Internal Medicine 182 (2022) 1221– and following fair data processing practices. The nature of 1222. processed personal data and the involved third parties have [11] A. Surani, A. Bawaked, M. Wheeler, B. Kelsey, to be transparently communicated to users. When it comes N. Roberts, D. Vincent, S. Das, Security and privacy of to web-based medical services, it is unreasonable to rely on digital mental health: An analysis of web services and external services that may collect sensitive data. Failing to mobile apps, in: Conference on Data and Applications address serious data leaks, such as the ones presented in Security and Privacy, 2023. this study, increases the vulnerability of specific user groups [12] M. D. Huesch, Privacy threats when seeking online online, especially in terms of privacy. Users of web-based health information, JAMA Internal Medicine 173 health services should be able to see these websites as trust- (2013) 1838–1840. worthy and confidential equivalents to traditional onsite [13] V. Wesselkamp, I. Fouad, C. Santos, Y. Boussad, healthcare. N. Bielova, A. Legout, In-depth technical and legal analysis of tracking on health related websites with Acknowledgments ernie extension, in: Proceedings of the 20th Work- shop on Workshop on Privacy in the Electronic Soci- This research has been funded by Academy of Finland ety, WPES ’21, Association for Computing Machinery, project 327397, IDA – Intimacy in Data-Driven Culture. New York, NY, USA, 2021, p. 151–166. [14] I. Kes, D. Heinrich, D. M. Woisetschlager, Behav- ioral targeting in health care marketing: Uncover- References ing the sunny side of tracking consumers online, in: Let’s Get Engaged! Crossing the Threshold of Mar- [1] P. Wang, Z. Ding, C. Jiang, M. Zhou, Design and im- keting’s Engagement Era: Proceedings of the 2014 plementation of a web-service-based public-oriented Academy of Marketing Science (AMS) Annual Confer- personalized health care platform, IEEE Transactions ence, Springer, 2016, pp. 297–297. on Systems, Man, and Cybernetics: Systems 43 (2013) [15] K. Masters, The gathering of user data by national 941–957. medical association websites, The Internet Journal of [2] S. Saha, C. Chowdhury, S. Neogy, A novel two phase Medical Informatics 6 (2012). data sensitivity based access control framework for [16] J. Gamalielsson, B. Lundell, S. Butler, C. Brax, T. Pers- healthcare data, Multimedia Tools and Applications son, A. Mattsson, T. Gustavsson, J. Feist, E. Lönroth, 83 (2024) 8867–8892. Towards open government through open source soft- [3] R. Carlsson, S. Rauti, S. Laato, T. Heino, V. Leppänen, ware for web analytics: The case of matomo, JeDEM- Privacy in popular children’s mobile applications: A eJournal of eDemocracy and Open Government 13 network traffic analysis, in: 2023 46th MIPRO ICT (2021) 133–153. and Electronics Convention (MIPRO), IEEE, 2023, pp. [17] D. Quintel, R. Wilson, Analytics and privacy, Informa- 1213–1218. tion Technology and Libraries 39 (2020). [4] S. Rauti, R. Carlsson, S. Mickelsson, T. Mäkilä, T. Heino, E. Pirjatanniemi, V. Leppänen, Analyzing third-party data leaks on online pharmacy websites, Health and Technology (2024) 1–18. [5] M. Huo, M. Bland, K. Levchenko, All eyes on me: Inside third party trackers’ exfiltration of phi from