1. Introduction

Automatic medical record synthesis using generative AI models: a review⋆

Carmen Sogan

Ida Tognisse

Pélagie Houngue

Hénoc Soude

Abel Yeni M'PO

Jules Degila

0 0 Institute of mathematics and physical sciences (IMSP), Republic of Benin , Dangbo

Electronic Medical Records (EMRs) is a major challenge for modern healthcare systems. Although EMRs centralize information that is essential for patient care, their increasing complexity is facing challenges for healthcare professionals, particularly in terms of time and accuracy of analysis. Generative artificial intelligence (AI) models, such as those based on transformative architectures (e.g. GPT), offer innovative solutions for automating EMR synthesis, reducing clinicians' cognitive load and improving decisionmaking processes. However, challenges remain, including biases in training data, textual hallucinations and ethical and confidentiality issues. This paper reviews current research on the use of generative AI models for EMR synthesis, assessing their benefits and limitations. It explores solutions to enhance their reliability and acceptability, including standardized methodologies, bias reduction, and better integration of ethical concerns. Finally, it highlights future directions for improving these technologies and their adoption in clinical practice.

eol>Electronic Medical Records generative artificial synthesis transformative architectures

1. Introduction

Electronic Medical Records (EMR) centralize essential patient data, including clinical notes, test results, medical history, prescriptions and consultation summaries. Although EMRs have transformed information management in the medical sector, their sheer volume and diversity make them complex to analyze and synthesize. The diversity of formats (free text, medical images, structured data) and the exponential growth of data make their efficient exploitation particularly challenging for healthcare professionals [ 1, 2 ]. They spend a significant proportion of their time sorting, reading and summarizing this information to obtain a clear, usable overview. This information overload can lead to human error and delays in decision-making process, and, in some cases, compromise the quality of care. In a context where rapid and accurate decisions need to be made, the lack of effective tools to synthesize EMRs exacerbates these challenges [ 3 ]. Artificial Intelligence (AI) models, particularly generative models such as those based on transform architectures (e.g., GPT), offer innovative perspectives to address these issues. These models can generate synthetic text from unstructured data, making it possible to automatically summarize EMRs, extract key information, and produce summaries tailored to clinicians’ needs. Studies have shown that these technologies can lighten the cognitive load of caregivers, reduce the risk of human error and improve the efficiency of the decision-making process [ 4, 5 ]. For example, Myers et al [ 2 ] have shown that generative AI models sometimes outperform humans in terms of speed and consistency when summarizing hospital reports. Moreover, Shing et al [ 3 ], have highlighted the ability of these models to improve access to essential information while reducing the administrative burden on clinicians. However, despite these advances, major challenges remain. Models can be sensitive to biases inherent in training data, leading to inconsistent or erroneous results in critical clinical settings [ 6 ]. Nguyen et al [ 7 ] have warned of the potential impact of these biases on automated medical decisions, increasing the risks to patients.

In addition, textual hallucinations where the model generates incorrect or unsubstantiated information remain a major concern [ 10 ]. Simmons et al [ 8 ] have proposed mechanisms such as validation filters to alleviate these problems. At the same time, ethical issues, data confidentiality and acceptability to healthcare professionals are holding back the adoption of these technologies in clinical practice. The work of Goodman et al [ 1 ] has highlighted the importance of robust regulation and ethical standards to ensure the safe and fair use of AI models in the medical field. Faced with these challenges, a thorough assessment of existing research is crucial to better understand the potential and limitations of AI models applied to EMR synthesis.

This review aims to answer the following questions: •

What are the main benefits and challenges associated with their use? • How do these models influence clinical practice and decision-making in healthcare? • •

What are the future directions for improving their reliability, safety and acceptability? What approaches based on AI models have been explored for DME (Electronic Medical Record) synthesis? By exploring these dimensions, this analysis intends to provide enlightening perspectives on the future of AI in healthcare systems while highlighting the steps needed to overcome current obstacles. In this article, we begin with a literature review, before presenting the methodology used and the results of the bibliometric study. We then provide a summary table of the ten best articles identified, before concluding.

2. Literature review

The use of artificial intelligence (AI) models in medical record synthesis represents a significant advance in the field of digital health. Several researchers have conducted surveys to gain a deeper understanding of AI’s contribution to medical record synthesis. Xie H and al [ 9 ] have developed a bibliometric analysis of Australian literature (1991-2022) on electronic medical records research trends and patterns. Overall, they reveal the impact of EMRs on the digital transformation of the healthcare system in the face of demographic and pandemic challenges. The study serves both as academic mapping and as a decision- support tool for public policy. Similarly, Ananda Haris and al [ 10 ] conducted a bibliometric analysis of the acceptance and adoption of electronic medical records (EMRs). This study provides a bibliometric analysis of research on the adoption of electronic medical and health records (EMR/EHR). The study also reveals gaps in research, notably on cybersecurity and user satisfaction, while highlighting the potential of these systems to optimize care on a global scale. Jeena Joseph Jr and al [ 11 ] set out to map the landscape of electronic medical records and health information exchange using bibliometric analysis and visualization. Their findings reveal major challenges such as interoperability, data privacy and the integration of emerging technologies like AI and blockchain, while also highlighting research gaps and innovation opportunities to improve the efficiency and quality of care. Elsewhere Yaojue Xie et al [ 12 ] have explored the evolution of artificial intelligence in healthcare: a 30-year bibliometric study. Their analysis reveals a gradual increase in publications since the 1990s, with a marked acceleration after 2015, linked to the emergence of deep learning techniques. The study also identifies the clinical areas most influenced by AI, such as assisted diagnosis and medical imaging analysis. Saadat M. Alhashmi et al [ 13 ] carried out a bibliometric analysis of the application of artificial intelligence in healthcare. They highlight the most prolific countries, institutions, and authors in this field, while highlighting the evolution of keywords and themes over time. Their study also maps international collaborations and suggests a growing need for standardization and ethics in medical AI research. Feng chen et al [ 14 ] carry out a systematic review of biases in artificial intelligence (AI) models based on electronic medical records (EHR). The authors highlight the need for standardized norms and real-life testing to ensure fair applications of AI in the medical field. Abdul Khalique Shaikh et al [ 15 ] carry out a bibliometric analysis to study the adoption of artificial intelligence (AI) applications in the e-health sector. They analyze research trends over a 25-year period (1996-2021) using the Scopus database, highlighting the most influential authors, institutions, countries, journals and keywords. Evrim Özmen et al [ 16 ] carry out a retrospective bibliometric analysis to assess the role of machine learning algorithms in the diagnosis of sepsis. Silvana Secinaro et al [ 17 ] carry out a structured literature review on the role of artificial intelligence (AI) in the healthcare sector. Their findings highlight the potential of AI to improve diagnosis, personalized treatment and patient data management. Elham Asgari et al [ 18 ] examine the impact of electronic medical records (EHR) on clinicians’ cognitive load and burnout.

While much of the work focuses on the evolution, adoption, biases, cybersecurity and overall impact of AI and EMRs in healthcare, unlike, this systematic review paper would focus on the ability of AI models to synthesize and generate medical information. It would provide a targeted overview of algorithmic approaches, corpora used, evaluation metrics, and concrete applications in clinical settings, while highlighting methodological limitations, clinical validation needs, and avenues for improvement, thus contributing to an original and operationalizationoriented way to the existing literature.

3. Methodology

This applied research also included a bibliometric study. The data for this research was collected from the Scopus database, comprising 691 documents published between 2020 and 2025. Among the different types of documents available in Scopus, several relevant categories were considered for this study, including Articles, Journals, Proceeding Papers, Review Articles, as well as other types specific to this database. These documents were selected based on their relevance to the research topic. We have decided to consider only those studies carried out and published between 2020 and 2025, as it is during this period that transformative and generative models in the medical field appear and gain in importance. We also note an explosion in publications following the arrival of models such as GPT-3 (2020) Tom Brown et al [ 19 ] and especially ChatGPT/GPT-4 (2022-2023) Open AI GPT-4 Technical Report (2023) [ 20 ], Kung et al. (2023) [ 21 ]. The choice of a bibliometric approach is explained by the desire to obtain a quantitative overview of scientific production on the use of generative AI models for the synthesis of electronic medical records (EMRs), between 2020 and 2025. Unlike a systematic review, which would have enabled an in-depth analysis of the content of the selected studies. The specifics of the data collection phase are summarized in Table 1.

Attribut Search chain Database Year Value

(Synthesis OR generation) AND ("Medical Record" OR "medical file" OR "electronic health records" OR "health records") AND ("artificial intelligence" OR "deep learning" OR "machine Learning" OR "automatique Learning")

Scopus

Two main tools were used to analyze the data collected: VOSviewer (version 1.6.18), for visualizing co- occurrence and collaboration networks, and RStudio’s Bibliometric library, enabling detailed bibliometric analyses. A comparative systematic review was carried out on the 10 best articles that we had selected using the bibliometric study based on the highest to lowest number of citations.

4. Bibliometric study result

4.1. Annual scientific production 4.2. Most relevant sources

Although some sources have a limited number of documents, their specialization can make them crucial for niche topics. Overall, this distribution highlights the most influential publications for research in fields related to informatics and health. 4.3. Scientific output by country Figure 2 illustrates the scientific output by country. The graph was generated using the Bibliometric package under R. The USA dominates scientific output in AI applied to medical records, with a strong acceleration after 2022. China shows rapid growth since 2021, indicating increased investment. India, the UK and Australia show more modest but growing contributions, especially after 2020. These dynamics reflect a global rise in interest in healthcare AI, with development concentrated in certain leading countries. 4.4. Most popular countries 4.5. Co-citation network This co-citation map illustrates the publications most frequently referenced together within the analyzed corpus. It reveals several thematic clusters: a central group focused on the application of Artificial Intelligence in healthcare (notably around Esteban C. and Choi E.), a green cluster centered on foundational work related to generative adversarial networks (GANs), and another cluster linked to large language models, including the GPT-4 technical report. Additionally, the map highlights publications addressing ethical considerations and fairness in AI. The size of each name indicates its influence in the field, while the proximity between them reflects strong bibliographic connections, helping to uncover the main theoretical underpinnings of the research landscape. 4.6. Keyword co-occurrence network The keyword co-occurrence network reveals a threefold thematic structure within AI-driven health research. This structure encompasses a technological dimension (e.g., electronic health records, machine learning), a clinical dimension (e.g., diagnosis, medical data), and a humanistic-conceptual dimension (e.g., terms like “human” and “article”), all intricately connected. Artificial intelligence and the human factor emerge as central elements, symbolizing the convergence of algorithmic innovation, real-world clinical implementation, and ethical considerations. Notably, the frequent appearance of the term “article” underscores the enduring role of scholarly output in shaping this interdisciplinary dialogue, where technical, clinical, and ethical domains intersect. 4.7. Trend analysis results The analysis of thematic trends illustrates the evolution of major scientific terms from the 1990s to the present day. The analysis reveals an evolution of research between 2020 and 2024 centered on three major axes: advanced AI technologies (adversarial network, transfer learning, natural language processing, data mining), medical applications (electronic health records, standardized medical nomenclature, genotyping), and technical infrastructures (IT security, analysis software, learning systems). Recurring terms such as “artificial intelligence”, “human” and “article” underline the interdependence between algorithmic innovation, clinical needs and scientific production, reflecting an increasingly integrated research environment where technology serves as a bridge between medical theory and practice. 4.8. Analysis of word frequency over time Analysis of the cumulative frequency of words over time highlights the evolution of dominant themes in scientific publications. Analysis of cumulative occurrences highlights the following key concepts: articles, artificial intelligence, deep learning, electronic medical record, woman, human, human being, machine learning, human and automatic natural language processing. These terms reflect a strong focus on AI technologies applied to healthcare, with particular attention paid to electronic medical data, gender differences (female/male), and human aspects, while integrating advanced techniques such as deep learning and NLP, illustrating the interdisciplinarity of contemporary digital health research. 4.9. Local impact of authors using the H index The table on authors’ local impact, measured by the H-index, highlights a hierarchy among scientific contributors in the field of automatic medical record synthesis with generative AI. We note that all three authors have an H-index of 5, five authors have 4 and two authors have 3. These results show that a handful of authors dominate this emerging field, reflecting a concentration of research efforts. 4.10. The best papers The ten selected articles collectively explore the diverse and evolving applications of artificial intelligence, particularly deep learning models—in the healthcare sector, with a focus on electronic medical records (EMRs), real-world data, mental health, and ethical considerations. Chang Su et al. review the use of deep learning for mental health outcome prediction, highlighting its superior performance compared to traditional techniques, though challenges persist in validating results due to anonymized social network data. Szu-Wei Cheng et al. examine the current and potential uses of ChatGPT in psychiatry, recognizing its value in administrative and communicative tasks but noting its limited clinical reliability and ethical concerns. Fang Liu et al. offer a comprehensive overview of real-world data (RWD) sources and their analytical frameworks, underlining the need for rigorous preprocessing to ensure reliability in clinical trials. Yinan Huang et al. focus on machine learning algorithms predicting hospital readmissions, identifying neural networks and decision trees as the most effective, albeit with issues related to validation consistency and data standardization. Junyu Luo et al. propose HiTANet, a temporal attention network that captures non-stationary disease progression, outperforming traditional static models, although its sensitivity to EHR data quality remains a limitation. Feng Xie et al. intro- duce AutoScore, a machinelearning- based tool that generates interpretable clinical scoring systems, achieving strong predictive performance with fewer variables and improved usability. Isotta Landi et al. develop a deep unsupervised learning framework (ConvAE) for large-scale EHR-based patient stratification, effectively identifying meaningful subgroups in diseases like Parkinson’s and diabetes, though semantic depth and generalizability require further work. Gaurav Dhiman et al. present a hybrid CNN model for tumor identification in medical imaging, significantly improving information extraction, but with concerns over the quality and semantic coherence of pseudo-data. Ping Wang et al. build TREQS, a model translating natural language questions into SQL queries to simplify EMR access for healthcare professionals, demonstrating high accuracy and robustness to typos and abbreviations, yet struggling with complex queries. Finally, Karan Bhanot et al. address fairness in synthetic health data by proposing equity metrics and applying them across datasets like MIMICIII, revealing systematic under representation of subgroups such as elderly or minority populations. Together, these studies emphasize the transformative potential of generative and predictive AI models in digital health, while underlining the importance of model interpretability, data diversity, ethical safeguards, and validation in real-world clinical environments.

The background to this research is based on the How are deep growing recognilearning tion that mental algorithms being illnesses such as applied to depression are improve the common and diagnosis and affect the physical systematic treatment of health of literature remental health individuals. The view, guided conditions? What study explores by PRISMA is the different cat- how artificial guidelines. egories of data used intelligence, in these studies, particularly deep and how do these learning, can data influence the assist mental results? health professionals in their clinical decisions.

Proposed a Current risk approaches prediction How can we assume model model disease stationary capable of progress in anon- disease better captur- stationary, progression, ing temporal dynamic way? which is not the information What are the case in reality. in electronic key time This work health steps for patient- criticizes this records (EHR) specific disease assumption and through a prediction? proposes a model hierarchical, that incorporates time-aware mechanisms to attention net- reflect the

The method Explore other may be sen- attention proposal of an sitive to the mechanisms, innovative approach quality of the integrate adto deal with the EHR data, ditional data limitations of and results from different previous models in may vary sources, and taking account of depending on temporal information the charac- extend the teristics of the model to other datasets. pathologies.

Future research could focus on improving the model’s robustness in the face of incomplete or unbalanced data, as well as on methods for integrating external medical knowledge.

Isotta Landi and al.[ 25 ] work, called HiTANet.

decision making process.

EHRs, although heterogeneous,

How can dis- ease icnofnotraminatviaolnuaobnle Propose a subtypes be patients’ health general extracted tra- jectories. frame- work automati- cally However, their based on and efficiently use for patient unsupervised from large, het- stratification deep learning erogeneous EHR remains limited to exploit databases? Can we by their elec- design an unsu- complexity, tronic health pervised method volume and records capable of pro- vari- able quality. Implementa(EHRs) on a ducing clinically Many diseases, tion, large scale meaningful group- such as type 2 simulation and ings? Finally, do diabetes, scalable, the Parkinson’s without the representations disease or Alzhe need for generated by Con- imer’s, present a manual vAE enable better high degree of feature performance in clinical engineering terms of patient heterogeneity. or clustering, com- This complexity supervision. pared with makes them difexisting methods? ficult to model using conventional approaches.

ConvAE significantly outperforms existing methods such as Deep Patient and other ap- proaches based on linear representations.

The model identified several clinically rele- vant subgroups for complex diseases such as type 2 diabetes, Parkin- son’s and Alzheimer’s.

Which machine ltsoehynaenrtnlhiitemnesrgaiazct(ehuMirneLe) aumlelsgaoerosdntriitctnoohgmpm(rMmsedaoLirnc)etly ibTmahspeeodcrotonanntectxheteoisf tpuhmmhrsoieeeesstddpshiiiooctatdnlss rUeSaAdtino-, trdphheoeoaeresdfpsUomitrSthmaiAselsa?inoHcneoswoi nf ttrcopaoeofaldsrcirtumtasecor,apiefnndrtaoomghnvtehiaedsbosqclsiyouupntaia-lssity issPneyUcaslBtruecMdmhiEnaoDtgfic,tahe focusing on ML models, in Hospital Read- MEDLINE model particu- lar the mission Reduction and EMBASE performance AUC (area under Program (HRRP) databases and the types the curve), vary in the United from 2015 to of algorithms according to the States. 2019, with employed. different meth- qualitative ods? What are and propose a How can we im- The context of hybrid model prove the joint this research is GDahuimraavn cbhasineed loenarmnian-g cerxettreacattitornibouftedsiso-f lbaansgeudaogne natural Itmiopnl,ementaand al.[ 28 ] ftoifrictuamtioonr iinden- te ud mmeodri--crale l a t apnrodctehsesing (NLP) sciamseuslatutidoyn, medical image events, such as extraction of processing. primary site and medical informa

Limitations The Future research include the authors plan could explore the strong to improve the integration of prerandom- pseudo-data trained language ization of the generation models with less pseudo-data algorithm dependence on generation based on external resources, algorithm, semantic as well as the tion from electronic tumor size? How medical records. can pseudo-data The authors be generated to empha- size the overcome the importance of t lack of annotated u m o re r e l a t data and improve e d events, such the model’s as primary site, transfer learning size and capability? How metastasis, does the

and highlight proposed model the limitations compare with of existing existing methods methods, such on the CCKS2019 as lack of and generalizabiliy CCKS2020 and reliance on datasets? pre-trained models. which can produce data that does not the evaluation conform similarity to i(fttssopm+moahiiuazgse8rertpeknt.t9hCrpi.ipco3foeICruitvrdoeicMlefmaxanaowmlr-NnsraCliometrytNnCyhsttKr,ataSuc2mt0io1o9nr Itttsadphmnohhefeeifogmeuprenadhfsecdmaaoeltndtnyriloudnmittdriegicaaeonslnlnt,ci,se. lttcodpTwmiohofarkhooeontetoeuarhdsatye.tlihuepdoscetpmaeeerllxniotsctytodaepetnieldosn tsfodipipaoeettnlaicdmitasfei.uiczcghamtmnioieeqnndutioeacsfa-lfor aCnCdK+S72.05210o).n on the quality eovfemnetsdical of the av ailable annotations. The results show Limitations Future work The research dithat the TREQS include could include rection aims to model outperforms depen- extending improve interexisting meth- ods dence the model to action between in terms of on the quality other medical healthcare accuracy, of the training databases, professionals and particularly for the data and the improving medical generation of difficulty of the handling information condition values. It generalizing of complex systems, by is also robust to the model to questions and developing more caqobunbertsaetiivoninaintsigons or litaenecactterhivngneriinaqgtuinesg fraoodrbaupgsetitnvaeenrmadtoindgelSsQL evmraerloucrhesa,rnetihcsoamvne.krsyto its dwmliifatfyihceuhnlitgcieohsulynter tanoneenredodtfauotcreedtdhaeta. rqqauuleelrsaitneiosgnufsrao.gme natucomplex or ambiguous questions.

The theoretical This is a what are the cur- framework is critical and rent uses of based on the forwardChat- GPT in evolution of NLP looking psychiatry? What models towards review of are its limita- more contextual current GPT tions? how can it and generative usages, be ethically and approaches with enriched with effectively GPT, which concrete integrated into offers unique examples, use mental health care potential in cases and in the future? natural comparisons language with previous interpretation. approaches.

for The background The authors The authors to the article is propose two point out that based on the main metrics: The results reveal existing syn- The authors challenges of “log disparity” significant biases thetic suggest exttsaduhwohucaseheccetaaiehplct.srohhdsifHivanhiatnodgsaacwcaysycrtneaealtveanlhaedsrdewrient,utaisogecl , tttrsabioheunavpebsdaeregnseadrsesoemseosusnnseptotsatrfh-ice raaduwireFboduTMneetlrollianneshahdsppeItmrtddoMcceianrrholeerekeer,peiseeIvrrxcssshCopsppeweeauryomre-aernnmecahnwonopItthilpiItpea-pcrtIhmdtlealielneieedees.soeo,sptaeoatnitiretncrinhrrianoittoeecsietpfesehdlstee,,. irrssseabbgooddCneeiuuttiyfreeaegha-ttspcbnftremrnespanrhtigtdonenreisraniirfedisceasostiobeir.enueitrstductuseudencoyaaptpc,entcrengset-adseetdaned,rre- i(tssceeagodpGmmnshtyoqqntealuAroeechennnuutdacttoareNtdiieirahhirttehrninrinyysopxcteatad)gaidsostt,tocsael,iisroyicnotnananosnatdtiti-losinhntgegor iirsssceaaFgdhmnmeyuhrxspueeeeicasncopnvpatptteoluheuelltel-hastirihrrlrlchccsopwaoiedeaiaitasdoptnttmotr,floiisriipygeocnranoafolsfrcsgtgdndtobdeeuehwasr-aiesae,terldeasscoqil.hnculitnhtaiitasvyte ccearpttauirninsgubt-regnrdosupfosr. tcwhohemifcpahrirocnmaenissse tcyopnetesxatns.d of analyses and models synthetic data.

using different datasets (MIMIC-III, ATUS and ASD). Statistical tests and visualizations are used to identify biases.

5. Approach: Fair NLP-based synthetic EMR generation pipeline in Benin

In this article, we propose a comprehensive methodological architecture aimed at generating, testing, and validating synthetic Electronic Medical Records (EMRs) from real data from medical platforms. The objective is to: • preserve patient confidentiality, • ensure the fairness of generative models, • and facilitate rigorous clinical validation. 5.1. Description of the pipeline Collection and Preprocessing of Electronic Medical Records (EMRs) The data are collected from platforms such as e-Alafia or GoMedical (web or mobile solutions used in Benin). Two key steps are involved: • Standard preprocessing: data cleaning and structuring. • Anonymization using NLP techniques: identifying and masking sensitive information such as names, addresses, and dates, without distorting the clinical content. Development of the Generative Model (NLP) We use a medical text generation model (e.g., GPT, T5, MedPaLM) with the objective of generating realistic, non-identifiable synthetic EMRs. These synthetic records can then be used to train or test other clinical models without violating ethical or legal restrictions on patient data.

Equity-Based Evaluation A dedicated testing phase evaluates the equity of the generative model: • Does it produce balanced records across gender, age, geographic region,

or socioeconomic status? • Are there signs of bias, omission, or overrepresentation? We propose the use of equity indicators such as entity distribution, case diversity, and demographic balance to assess the fairness of the generated content.

Clinical Validation The final step involves human expert validation. The synthetic records are submitted to medical professionals for clinical validation, ensuring consistency, plausibility, and potential utility for training, testing, or research.

Original Contribution Unlike traditional approaches, this contribution structures the entire pipeline—from EMR collection to clinical validation—while placing equity as a central evaluation criterion. The proposed framework can be applied in contexts where real patient data are too sensitive to use directly, but where synthetic alternatives can support safe and effective model development. 5.2. Discussion and limitations of the performed bibliometric study Analysis of the results of this study highlights the advances and challenges involved in using generative artificial intelligence models to synthesize Electronic Medical Records (EMRs). Recent publications show a marked growth in research on this subject, particularly between 2020 and 2024. Indeed, the approaches being explored are mainly based on Transformer- type architectures, deep learning, unsupervised learning, GPT and its variants. These models are distinguished by their ability to process large quantities of unstructured data, such as free text in medical records, and to generate coherent summaries tailored to clinical needs.

Key benefits include a significant reduction in the cognitive load on healthcare professionals, and faster decision-making thanks to clear, targeted summaries.

The use of generative AI models is transforming clinical practice by enabling rapid access to essential information, thereby reducing delays in decision-making. This improves the quality of care while minimizing human error. However, clinical adoption remains hampered by concerns about the transparency of models and their ability to adapt to specific cases. Models have yet to prove their reliability in complex and diverse environments, such as intensive care or rare diagnoses.

To improve the reliability, safety and acceptability of generative AI models, there are several avenues to explore:

Standardization of evaluations: clear evaluation frameworks need to be developed to compare the relevance and accuracy of models.

Bias reduction and model robustness: The integration of more diversified databases and the use of federated learning techniques can help limit bias.

Data confidentiality: Advanced encryption solutions and anonymization techniques must be implemented to protect sensitive medical data.

Clinical adoption: Collaboration between researchers, healthcare professionals and regulators is essential to develop user-centered tools that meet ethical and regulatory requirements.

These directions offer promising prospects for integrating generative AI models into healthcare systems in an efficient and ethical way.

Despite the contributions of this study, certain limitations must be acknowledged. Firstly, the literature search was limited to the Scopus database, which may restrict coverage of relevant literature available in other databases such as PubMed, IEEE Xplore or Web of Science. The absence of qualitative evaluation or clinical case studies also restricts appreciation of the real impact of generative AI models in practical medical contexts. Finally, publication bias and methodological variations between the studies analyzed may influence the overall synthesis, justifying caution in generalizing the results obtained. This bibliometric analysis could be supplemented in future work by a systematic review focusing on the best- performing models or concrete clinical use cases.

Conclusion

The application of generative artificial intelligence models to the synthesis of electronic medical records (EMRs) represents a major advance in healthcare information management. These technologies offer unique opportunities to lighten the cognitive load on healthcare professionals, speed up clinical decision- making and improve the quality of care. However, significant challenges remain, notably related to data bias, textual hallucination errors, and issues of confidentiality and clinical acceptability. The results of this research highlight the progress made in integrating generative models, such as GPT and others, but also underline the need to standardize assessment methods and enhance data security. The clinical adoption of these technologies will depend on the ability to resolve these challenges, based on multidisciplinary collaborations between researchers, clinicians and regulators. Finally, future directions must focus on developing more robust and transparent models, improving ethical practices, and educating healthcare professionals in the use of these tools. By overcoming these obstacles, generative AI models could sustainably transform the digital health landscape and become indispensable allies in EMR management.

Declaration on Generative AI

In preparing this work, the authors used X-GPT-4 for grammar and spelling checking. After using these tools/services, the authors reviewed and corrected the content as needed and take full responsibility for the content of the publication.

[1]

Goodman ,

Yi , D. Morgan, Ai-generated clinical summaries require more than accuracy , JAMA Network ( 2024 ). doi: 10 .1001/jama. 2024 . 0555 .

[2]

Myers , al, Ai can outperform humans in writing medical summaries, Stanford HAI ( 2024 ).

[3] Shing , al, Ai can outperform humans in writing medical summaries , arXiv ( 2023 ).

[4] Tan , al, Transformer -based approaches for medical document summarization , J. Biomed. Informatics ( 2023 ).

[5] Vishal , al, Challenges in medical data for generative ai models , J. Med . Informatics ( 2023 ).

[6] Lee , al, Ontology -driven medical record summarization , Health Tech ( 2023 ).

[7] Nguyen , al, Biases in medical data and their impact on ai model generalization , Health Data Science ( 2023 ).

[8] Simmons , al, Reducing hallucinations in medical text summarization with validation filters ( 2023 ).

[9]

Xie ,

Cebulla ,

Bastani ,

Balasubramanian , Trends and patterns in electronic health record research (1991-2022): A bibliometric analysis of australian literature , Int J Environ Res Public Health , vol. 21 , no. 3 ( 2022 ). doi: 10 .3390/ijerph21030361.

[10]

Haris ,

Aini , Bibliometric analysis of electronic medical records (emr) acceptance and adoption: Trends, insights, and future directions , Eman Research ( 2024 ). doi: 10 .25163/ angiotherapy.859700.

[11]

J. J.

Jr. , A. S. Jose,

G. G.

Ettaniyil , J. S , J. Jose, Mapping the landscape of electronic health records and health information exchange through bibliometric analysis and visualization , Cureus, Apr. 2024 ( 2024 ). doi: 10 .7759/cureus.59128.

[12]

Xie ,

Zhai , G. Lu, Evolution of artificial intelligence in healthcare: a 30-year bibliometric study , Front Med (Lausanne) , vol. 11 ( 2024 ). doi: 10 .3389/fmed. 2024 . 1505692 .

[13]

Secinaro ,

Calandra ,

Secinaro , V. Muthurangu3, P. Biancone1, Artificial intelligence applications in healthcare: A bibliometric and topic model-based analysis ( 2024 ).

[14]

Chen ,

Wang ,

Hong ,

Jiang ,

Zhou , Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record- based models , May 01 , 2024 , Oxford University Press ( 2024 ). doi: 10 .1093/jamia/ocae060.

[15] A. K. Shaikh , S. M.

Alhashmi , N.

Khalique , A. M.

Khedr , K.

Raahemifar , S. Bukhari4 , Bibliometric analysis on the adoption of artificial intelligence applications in the e-health sector , Jan. 01 , 2023 ,

SAGE

Publications

Inc (

2023 ). doi: 10 .1177/20552076221149296.

[16]

Özmen ,

Emir , The role of machine learning algorithms in sepsis diagnosis: A retrospective overview using bibliometric analysis , OSMANGAZİ JOURNAL OF MEDICINE , vol. 46 , no. 6 , Sep . 2024 ( 2024 ). doi: 10 .20515/otd.1532158.

[17]

Secinaro ,

Calandra ,

Secinaro , V. Muthurangu3, P. Biancone1, The role of artificial intelli- gence in healthcare: a structured literature review , BMC Med Inform Decis Mak , vol. 21 , no. 1 , Dec . 2021 ( 2021 ). doi:10.1186/s12911-021-01488-9.

[18]

Asgari ,

Kaur , G. Nuredini,

Balloch ,

A. M.

Taylor , N. Sebire,

Robinson ,

Peters ,

Sridharan ,

Pimenta , Impact of electronic health record use on cognitive load and burnout among clinicians: Narrative review , 2024 ,

JMIR

Publications

Inc (

2024 ). doi:10.1186/s12911-021- 01488-9.

[19] T. B. Brown , B.

Mann , N.

Ryder , Language models are few-shot learners ( 2020 ). doi: 10 .48550/ arXiv. 2005 . 14165 .

[20]

AI , J. Achiam, Gpt-4 technical report ( 2023 ). doi: 10 .48550/arXiv.2303.08774.

[21]

T. H.

Kung ,

Cheatham ,

Medenilla , Performance of chatgpt on usmle: Potential for aiassisted medical education using large language models ( 2023 ). doi: 10 .1371/journal. pdig. 0000198 .

[22]

Su ,

Xu ,

Pathak ,

Wang , Deep learning in mental health outcome research: a scoping review , Springer Nature ( 2020 ). doi:10.1038/s41398-020-0780-3.

[23]

Liu ,

Demosthenes , Real-world data: a brief review of the methods, applications, challenges and opportunities, BioMed Central Ltd . ( 2022 ). doi:10.1186/s12874-022- 01768-6.

[24]

Luo ,

Ye ,

Xiao ,

Ma , Hitanet: Hierarchical time-aware attention networks for risk prediction on electronic health records , Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , Association for Computing Machinery ( 2020 ) 647 - 656 . doi: 10 .1145/3394486.3403107.

[25]

Landi ,

B. S.

Glicksberg ,

H.-C.

Lee ,

Cherng , G. Landi,

Danieletto ,

J. T.

Dudley ,

Furlanello ,

Miotto , Deep representation learning of electronic health records to unlock patient stratification at scale , NPJ Digit Med ( 2020 ). doi1 : 0 .1038/s41746-020-0301-z.

[26]

Xie ,

Chakraborty ,

M. E. H.

Ong ,

B. A.

Goldstein , N. Liu, Autoscore: A machine learningbased automatic clinical score generator and its application to mortality prediction using electronic health records , JMIR Med Inform ( 2020 ). doi: 10 .2196/21798.

[27]

Huang ,

Talwar ,

Chatterjee ,

R. R.

Aparasu , Application of machine learning in predicting hospital readmissions: a scoping review of the literature , BMC Med Res Methodol ( 2021 ). doi: 10 . 1186/s12874-021-01284-z.

[28]

Dhiman ,

Juneja ,

Viriyasitavat ,

Mohafez ,

Hadizadeh ,

M. A.

Islam ,

I. E.

Bayoumy ,

Gulati , A novel machine-learning-based hybrid cnn model for tumor identification in medical image processing , Sustainability (Switzerland) ( 2022 ). doi: 10 .3390/su14031447.

[29]

Wang ,

Shi ,

C. K.

Reddy , Text-to-sql generation for question answering on electronic medical records , The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 , Association for Computing Machinery ( 2020 ) 350 - 361 . doi: 10 .1145/3366423.3380120.

[30] S.-W. Cheng, C.-W. Chang,

W.-J.

Chang ,

H.-W.

Wang ,

C.-S.

Liang ,

Kishimoto ,

J. P.-C.

Chang ,

J. S.

Kuo ,

K.-P.

Su , The now and future of chatgpt and gpt in psychiatry , John Wiley and Sons Inc ( 2023 ). doi: 10 .1111/pcn.13588.

[31]

Bhanot ,

Qi ,

J. S.

Erickson , I. Guyon,

K. P.

Bennett , The problem of fairness in synthetic healthcare data , Entropy ( 2021 ). doi: 10 .3390/e23091165.