<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X. Song);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring the Role of Knowledge Graph-Based RAG in Japanese Medical Question Answering with Small-Scale LLMs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yingjian Chen</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Feiyang Li</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xingyu Song</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tianxiao Li</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zixin Xu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiujie Chen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Issey Sukeda</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irene Li</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dokkyo Medical University</institution>
          ,
          <addr-line>Tochigi</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Graduate School of Frontier Sciences, The University of Tokyo</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>NEC Laboratories America</institution>
          ,
          <addr-line>Palo Alto</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>The University of Tokyo</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1838</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Large language models (LLMs) perform well in medical QA, but their efectiveness in Japanese contexts is limited due to privacy constraints that prevent the use of commercial models like GPT-4 in clinical settings. As a result, recent eforts focus on instruction-tuning open-source LLMs, though the potential of combining them with retrieval-augmented generation (RAG) remains underexplored.To bridge this gap, we are the first to explore a knowledge graph-based (KG) RAG framework for Japanese medical QA small-scale open-source LLMs. Experimental results show that KG-based RAG has only a limited impact on Japanese medical QA using small-scale open-source LLMs. Further case studies reveal that the efectiveness of the RAG is sensitive to the quality and relevance of the external retrieved content. These findings ofer valuable insights into the challenges and potential of applying RAG in Japanese medical QA, while also serving as a reference for other low-resource languages.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Japanese Medical Question Answering</kwd>
        <kwd>Retrieval-Augmented Generation</kwd>
        <kwd>Small-Scale LLMs</kwd>
        <kwd>Knowledge Graph</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Large language models (LLMs) have achieved remarkable performance in medical question answering
(QA), even demonstrating the ability to pass medical licensing exams (e.g., the United States Medical
Licensing Examination, USMLE) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which highlights their potential to understand complex medical
knowledge. In particular, recent research [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ] has explored the use of retrieval-augmented
generation (RAG) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to incorporate external medical knowledge into LLMs, efectively mitigating
“hallucination” issues [
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6, 7, 8, 9</xref>
        ] and further enhancing their applicability in medical QA tasks.
      </p>
      <p>
        While these advances are promising, most existing eforts [
        <xref ref-type="bibr" rid="ref10">10, 11</xref>
        ] focus on English-only
scenarios, with limited exploration of multilingual medical QA, particularly in Japanese. Since mainstream
LLMs [12, 13] are predominantly trained on English-centric medical data, the significant imbalance in
resource distribution [14] limits their efectiveness in Japanese medical QA scenarios. While GPT-4
has demonstrated strong performance on the Japanese NMLE [15], strict privacy regulations prevent
its use in clinical settings. As a result, research has shifted toward open-source LLMs, but progress is
hindered by the scarcity of high-quality Japanese medical data―highlighting the broader challenges
faced by low-resource languages.
      </p>
      <p>To address this limitation, in this paper, we are the first to explore the use of Knowledge Graph-based
RAG for Japanese medical question answering, specifically targeting small-scale LLMs. Given the
limited accessibility of external Japanese medical resources, we use the easily accessible external medical
や嘔吐は、喘息の症状や他の健康問題の兆候である可能性があります。特
に、喘息を持っている場合、ストレスや不安が発作を引き起こすことがあり
ます。また、禁煙のためのパッチが体に与える影響も考慮する必要がありま
す。 ニコチンパッチは、禁煙を助けるために使用されることが多いです
が、喘息の症状を悪化させる可能性があるため、注意が必要です。..........</p>
      <p>胸部絞扼感は喘息の症状である
嘔喘吐息はは消好化酸管球の性非...肺特炎異の的一症部状ででああるる
今夜2回嘔吐し、胸部に非常に強い締め付け感があります
が、鼻や喉は通っています。禁煙を助けるためにパッチを使
い始めたばかりで、私は喘息持ちですが、現在は酸素が体内
に留まっていないような感覚があります。脈拍はあり、また
不安症状も経験しています。このパッチが私の喘息や呼吸に
影響しているのでしょうか？
あなたの症状について心配されるのは理解できます。胸部の
強い締め付け感や嘔吐は、喘息の症状や他の健康問題の兆候
である可能性があります。特に、喘息を持っている場合、ス
トレスや不安が発作を引き起こすことがあります。また、禁
煙のためのパッチが体に与える影響も考慮する必要がありま
す。 ニコチンパッチは、..........</p>
      <p>Step 1: Entity Extraction
胸部の締め付け感</p>
      <p>不安症状
LLM 喘嘔息吐
Step 4: Declarative Conversion</p>
      <p>Question
胸部絞扼感は喘息の症状である
嘔吐は消化管の.非特異的症状である
.</p>
      <p>.
喘息は好酸球性肺炎の一部である</p>
      <sec id="sec-1-1">
        <title>Kownledge Graph based RAG</title>
        <p>Step 2: Knowledge Graph Retrieval
chest tightness</p>
        <p>asthma
anxievtoymsiytminpgtoms UMLS</p>
        <p>Top 1: 0.84
Top 2: 0.76
.
.</p>
        <p>.</p>
        <p>Top k: 0.64</p>
        <p>Step 3: Ranking</p>
        <p>Re-rank itsunoeQ liiitraySm</p>
        <p>Triple 1
Triple 2
.
.</p>
        <p>.</p>
        <p>Triple n
knowledge base UMLS [16] to support knowledge retrieval, following the design of MKG-Rank [17].
By applying word-level translation, LLMs can integrate non-Japanese-centric medical knowledge,
mitigating the limitations caused by the relative scarcity of Japanese medical data and its access restrictions.
Empirical results show that the KG-based RAG has limited efect on Japanese medical QA with small
open-source LLMs, largely depending on the quality and relevance of the retrieved KGs from external
knowledge bases.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Method</title>
      <sec id="sec-2-1">
        <title>2.1. Base Models</title>
        <p>Our work focuses on evaluating the performance of small-scale LLMs in Japanese medical QA.
Specifically, we select: (1) models with fewer than 5B parameters, including
Borea-Phi-3.5-Mini-InstructCommon [18] and LLaMA-3-ELYZA-JP-8B [19]; (2) models with 5B–10B parameters, including
LLMJP-3-7.2B-Instruct3 [20], Mistral-7B-Instruct-v0.3 [21], Qwen2.5-7B-Instruct [22], and
LLaMA-3.1-8BInstruct [23]; and (3) 10B–32B models, including Qwen2.5-14B-Instruct [24], Phi-4-14B [25], and
Gemma-3-12B-it [26]. In addition, we evaluate the commercial LLM GPT-4o-mini [27] as a proprietary
baseline.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Knowledge Graph-based RAG</title>
        <p>Our pipeline, as illustrated in Fig. 1, consists of four main steps: (1) Given a medical question  , we
ifrst use an LLM to extract relevant medical entities ℰ ; (2) The extracted entities ℰ are translated into
English via word-level translation and used to query external medical knowledge base UMLS to obtain
relevant KGs  . (3) A ranking mechanism is applied to select the most relevant triples  ′ based on
their semantic relevance to the question  ; (4) The selected triples are then converted into declarative
sentences by an LLM, along with the original question, are provided as input to the LLM for answer
generation.</p>
        <p>Prompts. We provide the prompts used in our knowledge graph-based RAG mechanism, including
Medical Entity Extraction, Declarative Conversion and Answer Generation.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiment</title>
      <sec id="sec-3-1">
        <title>3.1. Datasets</title>
        <p>We selected three long-form medical question answering datasets: ExpertQA-Bio, ExpertQA-Med, and
LiveQA. ExpertQA is a high-quality QA dataset verified by domain experts, from which we used 96
bioMedical Entity Extraction
text: {question}
Please extract at most 4 terms related to medical that you think are the most important from the provided text.
Returns the result in the following json form. All the results are merged into one json.
-- Examples of results:
{”medical terminologies” : [”term1”, ”term2”, ...]}
result:
logical questions (ExpertQA-Bio) and 504 medical questions (ExpertQA-Med) [28]. LiveQA consists of
consumer health questions submitted to the National Library of Medicine, comprising 627 training QA
pairs and 104 test pairs [29]. All datasets were originally in English and were translated into Japanese
using a multi-stage agentic machine translation framework [30]. We provide detailed statistics of the
evaluation datasets, including the number of samples and the average word count of questions and
answers, as shown in Table 2.</p>
        <p>Dataset
ExpertQA-Bio
ExpertQA-Med
LiveQA</p>
        <p>Size
96
504
627</p>
        <p>Question Length</p>
        <p>Answer Length
56.7
56.0
118.9
410.7
378.1
438.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Metrics</title>
        <p>We adopt ROUGE-L [31] and BERTScore [32] to evaluate the quality of answers generated by LLMs
for Japanese medical QA tasks. These metrics provide a comprehensive assessment of both lexical and
semantic accuracy.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Results and Analysis</title>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Case Study</title>
        <p>To further investigate the reasons behind the relatively limited impact of RAG, we analyze the
intermediate reasoning process, as shown in Table 4. Specifically, in Case 1 (top), the retrieved medical
knowledge includes the information that is relevant and helpful for answering the question (e.g. “ワ
ルファリンは生理的に凝固因子濃度を低下させる効果がある”). However, we observe that
ROUGE-L BERTScore ROUGE-L BERTScore ROUGE-L BERTScore
4.33 61.20 4.92 51.54 4.05 59.98
4.77 (+0.44%) 61.32 (+0.12%) 4.82 (-0.10%) 61.39 (-0.15%) 3.97 (-0.08%) 59.71 (-0.28%)</p>
        <p>Question: ワルファリン（ワーファリン）を 服用 している人は 避ける べき 野菜 は何ですか？(Which vegetables should people
taking warfarin (Coumadin) avoid?)
Retrieved Medical Knowledge: 1. ワルファリンは生理的に凝固因子濃度を低下させる効果がある。(Warfarin physiologically
reduces the concentration of coagulation factors.) 2. ワルファリンはクマリン系の抗凝固薬である。(Warfarin is a coumarin-based
anticoagulant.) …
Ground Truth: ワーファリンを服用している人は、ビタミン K を多く含む野菜の摂取を避けるか制限する必要があります。な
ぜなら、それが薬の血液凝固抑制効果（血液をサラサラにする効果）を妨げる可能性があるからです。…(People taking warfarin
should avoid or limit the intake of vegetables that are high in vitamin K, as it may interfere with the medication’s blood clotting inhibition
efect (the efect of making the blood thinner). )
Generated Answer: ワルファリンを服用している人が避けるべき野菜は、特にビタミン K を多く含むものです。ビタミン K は
血液凝固に関与しており、ワルファリンの効果を減少させる可能性があります。…(People taking warfarin should avoid vegetables
that are particularly high in vitamin K. Vitamin K is involved in blood clotting and may reduce the efectiveness of warfarin. )
ついては、ではどうでしょうか？(Antibiotics are typically administered to patients with bacterial or fungal infections. But what about
diseases that are often caused by viruses? )
Retrieved Medical Knowledge: 1. バイ菌感染は、他のバイ菌性疾患に分類されます。(Bacterial infections are classified under other
bacterial diseases.) 2. バイ菌感染は、肝膿瘍に分類されます。(Bacterial infections are classified as liver abscesses.) …
Ground Truth: 抗生物質は、感冒、インフルエンザ、様々な咳などのウイルス感染症に対しては効果がありません。…これは、ウ
イルスが細胞ではなく、細菌細胞と比較して細胞壁を持たないためです。…(Antibiotics are inefective against viral infections such
as the common cold, influenza, and various types of cough. …This is because viruses do not have a cell wall, unlike bacterial cells. …)
Generated Answer: 抗生物質は主に細菌感染に対して効果がありますが、ウイルス感染に対しては効果がありません。ウイル
スが原因の疾患、例えばインフルエンザや風邪などには、抗生物質は無効です。…(Antibiotics are mainly efective against bacterial
infectionsbutareinefectiveagainstviralinfections. Forillnessescausedbyviruses, influenzaorthecommoncold, antibioticsdonotwork. )
most LLMs already possess this knowledge internally, which explains why RAG ofers only marginal
improvements in such cases. On the other hand, for Case 2 (bottom), the retrieved medical knowledge
does not help answer the question (e.g. “バイ菌感染は、肝膿瘍に分類されます”). This is primarily
because the specialized biomedical concepts in UMLS may not align well with broader, more general
medical questions. Consequently, the retrieved information provides limited assistance and may even
introduce noise that negatively impacts the LLM’s reasoning.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This work presents the first exploration of a knowledge graph-based RAG framework for Japanese
medical QA using small-scale open-source LLMs. Empirical findings show that its overall impact is
limited, primarily constrained by the quality and relevance of the retrieved content. These insights
highlight the challenges and potential of applying RAG to Japanese and other low-resource language
medical QA tasks.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgment</title>
      <p>This work is supported by JST ACT-X (Grant JPMJAX24CU) and JSPS KAKENHI (Grant 24K20832).
This work used supercomputers provided by the Research Institute for Information Technology,
Kyushu University, through the HPCI System Research Project (Project ID: hp250092). This work is
also supported by NVIDIA Academic Grant Program, Google Cloud (Gemma 3 Academic Program), and
Henan Province International Science and Technology Cooperation Program (Grant 252102521047).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>In this work, the first author, Yingjian Chen, used ChatGPT and Grammarly in order to: Grammar
and spelling check, sentence polishing. After using these tools, Yingjian Chen reviewed and edited the
content as needed and takes full responsibility for the publication’s content.
[11] H. Yang, H. Chen, H. Guo, Y. Chen, C.-S. Lin, S. Hu, J. Hu, X. Wu, X. Wang, Llm-medqa:
Enhancing medical question answering through case studies in large language models, arXiv preprint
arXiv:2501.05464 (2024).
[12] A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda,</p>
      <p>A. Hayes, A. Radford, et al., Gpt-4o system card, arXiv preprint arXiv:2410.21276 (2024).
[13] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal,
E. Hambro, F. Azhar, et al., Llama: Open and eficient foundation language models, arXiv preprint
arXiv:2302.13971 (2023).
[14] C. Chataigner, A. Taïk, G. Farnadi, Multilingual hallucination gaps in large language models,
arXiv preprint arXiv:2410.18270 (2024).
[15] J. Kasai, Y. Kasai, K. Sakaguchi, Y. Yamada, D. Radev, Evaluating gpt-4 and chatgpt on japanese
medical licensing examinations, arXiv preprint arXiv:2303.18027 (2023).
[16] O. Bodenreider, The unified medical language system (umls): integrating biomedical terminology,</p>
      <p>Nucleic acids research 32 (2004) D267–D270.
[17] F. Li, Y. Chen, H. Liu, R. Yang, H. Yuan, Y. Jiang, T. Li, E. M. Taylor, H. Rouhizadeh, Y. Iwasawa,
et al., Mkg-rank: Enhancing large language models with knowledge graph for multilingual
medical question answering, arXiv preprint arXiv:2503.16131 (2025).
[18] AXCXEPT, Borea-phi-3.5-mini-instruct-common, https://huggingface.co/AXCX
EPT/Borea-Phi3.5-mini-Instruct-Common, 2024. Retrieved April 08, 2025.
[19] ELYZA, Llama-3-elyza-jp-8b, https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B, 2024.
Retrieved April 08, 2025.
[20] LLM-jp, llm-jp-3-7.2b-instruct3, https://huggingface.co/llm-jp/llm-jp-3-7.2b-instruct3, 2024.
Retrieved April 08, 2025.
[21] M. AI, Mistral-7b-instruct-v0.3, https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3, 2024.</p>
      <p>Retrieved April 08, 2025.
[22] Qwen, Qwen2.5-7b-instruct, https://huggingface.co/Qwen/Qwen2.5-7B-Instruct, 2024. Retrieved</p>
      <p>April 08, 2025.
[23] M. Llama, Llama-3.1-8b-instruct, https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct, 2024.</p>
      <p>Retrieved April 08, 2025.
[24] Qwen, Qwen2.5-14b-instruct, https://huggingface.co/Qwen/Qwen2.5-14B-Instruct, 2024.
Retrieved April 08, 2025.
[25] Microsoft, phi-4, https://huggingface.co/microsoft/phi-4, 2024. Retrieved April 08, 2025.
[26] Google, gemma-3-12b-it, https://huggingface.co/google/gemma-3-12b-it, 2025. Retrieved April 08,
2025.
[27] OpenAI, Gpt-4o-mini,
https://openai.com/index/gpt-4o-mini-advancing-cost-eficientintelligence/, 2024. Retrieved April 08, 2025.
[28] C. Malaviya, S. Lee, S. Chen, E. Sieber, M. Yatskar, D. Roth, Expertqa: Expert-curated questions
and attributed answers, arXiv preprint arXiv:2309.07852 (2023).
[29] A. B. Abacha, E. Agichtein, Y. Pinter, D. Demner-Fushman, Overview of the medical question
answering task at trec 2017 liveqa., in: TREC, 2017, pp. 1–12.
[30] W. Xuan, R. Yang, H. Qi, Q. Zeng, Y. Xiao, Y. Xing, J. Wang, H. Li, X. Li, K. Yu, et al.,
Mmluprox: A multilingual benchmark for advanced large language model evaluation, arXiv preprint
arXiv:2503.10497 (2025).
[31] C.-Y. Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization
branches out, 2004, pp. 74–81.
[32] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, Bertscore: Evaluating text generation
with bert, arXiv preprint arXiv:1904.09675 (2019).
す。 ニコチンパッチは、..........
喘息は好酸球性肺炎の一部である
あなたの症状について心配されるのは理解できます。胸部の
強い締め付け感や嘔吐は、喘息の症状や他の健康問題の兆候
である可能性があります。特に、喘息を持っている場合、ス
トレスや不安が発作を引き起こすことがあります。また、禁
煙のためのパッチが体に与える影響も考慮する必要がありま
す。
ニコチンパッチは、禁煙を助けるために使用されることが多
いですが、喘息の症状を悪化させる可能性があるため、注意
が必要です。特に、胸部の締め付け感や呼吸困難を感じてい
る場合は、パッチの使用を中止し、医師に相談することをお
勧めします。
また、嘔吐や不安症状も、身体のストレス反応の一部である
可能性があります。これらの症状が続く場合や悪化する場合
は、早急に医療機関を受診することが重要です。あなたの健
康が最優先ですので、専門家の意見を仰ぐことを強くお勧め
します。</p>
      <sec id="sec-6-1">
        <title>Retrieved Medical Knowledge</title>
        <p>胸部絞扼感は喘息の症状である
胸部絞扼感はアナフィラキシー反応の一部である
嘔吐は消化管の非特異的症状である
喘息は好酸球性肺炎の一部である</p>
      </sec>
      <sec id="sec-6-2">
        <title>I vomited twice tonight and have a very strong tightness in</title>
        <p>my chest, although my nose and throat are clear. I just
started using a patch to help me quit smoking. I have asthma,
but right now it feels like oxygen isn’t staying in my body. I
have a pulse, and I’m also experiencing anxiety symptoms.</p>
      </sec>
      <sec id="sec-6-3">
        <title>Could this patch be affecting my asthma or breathing?</title>
      </sec>
      <sec id="sec-6-4">
        <title>It is understandable that you are concerned about your</title>
        <p>symptoms. Severe chest tightness and vomiting could be
signs of asthma or other health issues. Especially for
individuals with asthma, stress and anxiety can sometimes
trigger symptoms. It is also important to consider the
potential effects of the nicotine patch on your body.</p>
      </sec>
      <sec id="sec-6-5">
        <title>Nicotine patches are commonly used to support smoking</title>
        <p>cessation, but they may worsen asthma symptoms in some
cases, so caution is necessary. If you are experiencing
chest tightness or difficulty breathing, it is advisable to
stop using the patch and consult a doctor.</p>
      </sec>
      <sec id="sec-6-6">
        <title>Additionally, vomiting and anxiety symptoms may be part</title>
        <p>of your body’s stress response. If these symptoms
persist or worsen, it is crucial to seek medical attention
promptly. Your health is the top priority, and we strongly
recommend consulting a medical professional.</p>
      </sec>
      <sec id="sec-6-7">
        <title>Retrieved Medical Knowledge</title>
        <p>Chest tightness is a symptom of asthma</p>
        <p>Chest tightness can be part of an anaphylactic reaction
Vomiting is a nonspecific symptom of the gastrointestinal tract
Asthma is a component of eosinophilic pneumonia</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>A. English Translation of the Content in Fig. 1</title>
      <p>For clearer presentation, we provide the English translation and the full content of the question and
answer in the Fig. 1, as illustrated in Fig. 2.</p>
      <p>Declarative Conversion
あなたは医学分野の知能助手です。
すべての背景知識をそれぞれ日本語の平叙文に変換する。医学に関係ないと思うものは何でも削除で
きます。
- Background Knowledge: {triple}</p>
      <p>Converted Background Knowledge:</p>
    </sec>
    <sec id="sec-8">
      <title>B. Case Study Details</title>
      <p>For clarity and ease of understanding, we provide the full content and English translations of the cases
from Table 4 here, as shown in Fig. 3 and Fig. 4.</p>
      <p>Answer Generation with Medical Knowledge
あなたは医学分野の知能助手です。質問をよく分析し、提供された背景知識とあなた自身の知識に基
づいて以下の質問に答えてください。できるだけ 512 の token 内で完全に回答します。
日本語で質問に答える。
- 問題: {question}
- 背景知識: {background_knowledge}
- 答える:
ワルファリン（ワーファリン）を服用している 1.ワルファリンは生理的に凝固因子濃度を低下させる効果がある。
人は避けるべき野菜は何ですか？
Which vegetables should people taking
warfarin (Coumadin) avoid?</p>
      <p>Retrieved Medical Knowledge:
Warfarin physiologically reduces the concentration of
coagulation factors. 
2.ワルファリンは生理的に凝固因子活性を低下させる効果がある。
Warfarin has the effect of lowering the activity of coagulation
factors physiologically.
3.ワルファリンはクマリン系の抗凝固薬である。
Warfarin is a coumarin-based anticoagulant.
4。ワルファリンはWARFARIN SODIUM 3 mg経口錠剤の有効成分である。
Warfarin is the active ingredient in WARFARIN SODIUM 3 mg oral
tablets. 
ワーファリンを服用している人は、ビタミンKを多く含 ワルファリンを服用している人が避けるべき野菜は、特に
む野菜の摂取を避けるか制限する必要があります。なぜ ビタミンKを多く含むものです。ビタミンKは血液凝固に関
なら、それが薬の血液凝固抑制効果（血液をサラサラに 与しており、ワルファリンの効果を減少させる可能性があ
する効果）を妨げる可能性があるからです。避けるか適 ります。具体的には、以下のような野菜が挙げられます。
度に摂取すべき野菜には、ケール、ほうれん草、コラー
ドグリーン（ケールに似た葉野菜）、フダンソウ（スイ - ケール - ほうれん草 - ブロッコリー -
キャベツ スチャード）、カラシナ（からし菜）、ビートの葉、か パセリ
ぶの葉、ブロッコリーなどの葉物野菜が含まれます。さ
らに、パセリやコリアンダー（パクチー）などのハーブ これらの野菜はビタミンKの含有量が高いため、ワルファリ
もビタミンKを多く含むため、ワーファリン服用中は適 ンの効果を妨げる可能性があるため、摂取を控えるか、医
度に摂取すべきです。 師と相談の上で適切な量を守ることが重要です。
People taking warfarin should avoid or limit the People taking warfarin should avoid vegetables that
intake of vegetables rich in vitamin K, as they are particularly high in vitamin K, as vitamin K is
may interfere with the blood-thinning effect of involved in blood clotting and may reduce the
the medication. Vegetables that should be avoided effectiveness of warfarin. Specifically, the
or consumed in moderation include kale, spinach, following vegetables should be considered:
collard greens (leafy greens similar to kale), - Kale -Spinach -Broccoli -Cabbage -Parsley
Swiss chard, mustard greens, beet greens, turnip
greens, and broccoli. Additionally, herbs such as These vegetables contain high levels of vitamin K and
parsley and coriander (cilantro), which are also may interfere with the effect of warfarin. It is
high in vitamin K, should be consumed in important to either avoid them or consult with a
moderation while taking warfarin. doctor to ensure an appropriate amount is consumed.
抗生薬は通常、細菌感染または真菌感染のある 1.バイ菌感染は、ニトロフラン抗菌薬によって予防されることがあり
患者に投与されます。ウイルスが原因であるこ ます。
とが多い疾患については、ではどうでしょう Bacterial infections may be prevented by nitrofuran
か？ antibiotics.</p>
      <p>Antibiotics are typically administered to 2.バイ菌感染は、他のバイ菌性疾患に分類されます。
patients with bacterial or fungal
infections. But what about diseases that
are often caused by viruses? 3.バイ菌感染は、肝膿瘍に分類されます。</p>
      <p>Bacterial infections are classified under other bacterial
diseases.</p>
      <p>Bacterial infections are classified as liver abscesses.
4.バイ菌感染は、他のバイ菌感染に分類されます。
Bacterial infections are classified as other bacterial
infections.
5.バイ菌感染は、ニトロフラン抗菌薬によって治療されることがあり
ます。
Bacterial infections may be treated with nitrofuran
antibiotics.
抗生物質は、感冒、インフルエンザ、様々な咳などのウ 抗生物質は主に細菌感染に対して効果がありますが、ウイ
イルス感染症に対しては効果がありません。ウイルス感 ルス感染に対しては効果がありません。ウイルスが原因の
染症の患者に対して、医師は症状管理のための一般用医 疾患、例えばインフルエンザや風邪などには、抗生物質は
薬品を含む治療を推奨することがあり、抗生物質を使用 無効です。ウイルス感染に対しては、抗ウイルス薬が使用
する場合は必要な場合にのみ使用することが常に推奨さ されることがあります。したがって、ウイルスが原因であ
れます。場合によっては、HIVに対する抗ウイルス療 ることが多い疾患に対しては、抗生物質を投与することは
法、インフルエンザ、COVID-19、サル痘(mpox)、または 適切ではありません。ウイルス感染に対しては、症状を緩
インフルエンザに対するオセルタミビルリン酸塩（タミ 和するための対症療法や、必要に応じて抗ウイルス薬が用
フル）など、特定のウイルスにのみ効果のある特定の薬 いられます。
剤が処方されることがあります。抗生物質は細菌感染症
に特化して処方されるものであり、ウイルス感染症には Antibiotics are primarily effective against bacterial
効果がないことに注意することが重要です。これは、ウ infections but are not effective against viral
イルスが細胞ではなく、細菌細胞と比較して細胞壁を持 infections. For diseases caused by viruses, such as
たないためです。したがって、抗生物質はウイルス感染 influenza and the common cold, antibiotics are
症に対して効果を示しません。 ineffective. Antiviral medications may be used for
viral infections. Therefore, it is not appropriate to
Antibiotics are not effective against viral administer antibiotics for diseases that are often
infections such as the common cold, influenza, and caused by viruses. For viral infections, symptomatic
various types of coughs. For patients with viral treatments are used to relieve symptoms, and
infections, doctors may recommend treatments that antiviral medications may be used when necessary.
include over-the-counter medications for symptom
management, and antibiotics should only be used
when necessary. In some cases, specific antiviral
medications may be prescribed for certain viruses,
such as antiretroviral therapy for HIV,
oseltamivir phosphate (Tamiflu) for influenza, or
treatments for COVID-19 or monkeypox (mpox). It is
important to note that antibiotics are prescribed
specifically for bacterial infections and are not
effective for viral infections. This is because
viruses do not have a cell wall, unlike bacterial
cells. Therefore, antibiotics are ineffective
against viral infections.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Nori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>McKinney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Carignan</surname>
          </string-name>
          , E. Horvitz,
          <article-title>Capabilities of gpt-4 on medical challenge problems</article-title>
          , arXiv preprint arXiv:
          <volume>2303</volume>
          .13375 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , T. Liu,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Liu</surname>
          </string-name>
          , Mkrag:
          <article-title>Medical knowledge retrieval augmented generation for medical question answering</article-title>
          ,
          <source>arXiv preprint arXiv:2309.16035</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Zhang,</surname>
          </string-name>
          <article-title>Benchmarking retrieval-augmented generation for medicine</article-title>
          , in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , V. Srikumar (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2024</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Bangkok, Thailand,
          <year>2024</year>
          , pp.
          <fpage>6233</fpage>
          -
          <lpage>6251</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .findings-acl.
          <volume>372</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          . findings- acl.372.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Marrese-Taylor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. H.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          , L. Cheng,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Caverlee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          , et al.,
          <article-title>Kg-rank: Enhancing large language models for medical qa with knowledge graphs and ranking techniques</article-title>
          ,
          <source>arXiv preprint arXiv:2403.05881</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Edge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Trinh</surname>
          </string-name>
          , N. Cheng, J.
          <string-name>
            <surname>Bradley</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chao</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mody</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Truitt</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Metropolitansky</surname>
            ,
            <given-names>R. O.</given-names>
          </string-name>
          <string-name>
            <surname>Ness</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          ,
          <article-title>From local to global: A graph rag approach to query-focused summarization</article-title>
          ,
          <source>arXiv preprint arXiv:2404.16130</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>McKenna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          , L. Cheng, M. J.
          <string-name>
            <surname>Hosseini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , M. Steedman,
          <article-title>Sources of hallucination by large language models on inference tasks</article-title>
          ,
          <source>arXiv preprint arXiv:2305.14552</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ning</surname>
          </string-name>
          , E. Keppo, M. Liu,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Bitterman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C. L.</given-names>
            <surname>Ong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S. W.</given-names>
            <surname>Ting</surname>
          </string-name>
          , N. Liu,
          <article-title>Retrieval-augmented generation for generative artificial intelligence in health care, npj Health Systems 2 (</article-title>
          <year>2025</year>
          )
          <article-title>2</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , O. Press,
          <string-name>
            <given-names>W.</given-names>
            <surname>Merrill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith,</surname>
          </string-name>
          <article-title>How language model hallucinations can snowball</article-title>
          ,
          <source>arXiv preprint arXiv:2305.13534</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          , H. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Caverlee</surname>
          </string-name>
          , et al.,
          <article-title>Graphcheck: Breaking long-term text barriers with extracted knowledge graph-powered factchecking</article-title>
          ,
          <source>arXiv preprint arXiv:2502.16514</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Lie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. X. Y.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Abdullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S. W.</given-names>
            <surname>Ting</surname>
          </string-name>
          , N. Liu,
          <article-title>Mitigating cognitive biases in clinical decision-making through multi-agent conversations using large language models: simulation study</article-title>
          ,
          <source>Journal of Medical Internet Research</source>
          <volume>26</volume>
          (
          <year>2024</year>
          )
          <article-title>e59439</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>