<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Compliance: A Multi-Agent Schema-Light Knowledge Graph for Regulatory Compliance QA</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hemant Sunil Jomraj</string-name>
          <email>hjomraj@mastercontrol.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bhavik Agarwal</string-name>
          <email>bagarwal@mastercontrol.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Viktoria Rojkova</string-name>
          <email>vrojkova@mastercontrol.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Regulatory Compliance, Knowledge Graph, Large Language Model, Ontology Free</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MasterControl AI Research, MasterControl</institution>
          ,
          <addr-line>6350 South 3000 East, Salt Lake City, Utah</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Section Overlap @ 0.50 Section Overlap @ 0.60 Section Overlap @ 0.75 Answer Accuracy (avg) Avg. Degree Avg. Shortest Path</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>Regulatory QA demands precise, verifiable answers grounded in domain text. We present a multi‑agent framework that fuses a schema light (ontology minimal) knowledge graph of subject-predicate-object (SPO) triplets with retrieval‑augmented generation (RAG). Agents continuously extract, normalize, and deduplicate triplets from regulatory documents; each triplet is embedded and stored, together with linked source segments and metadata, in a unified vector index. At query time, triplet level retrieval aligns user intent with concise “who‑did‑what‑to‑whom” facts and returns both the triplets and their provenance text to an LLM for answer synthesis. In complex regulatory queries, the system improves traceability and supports subgraph visualization, while achieving higher strict‑threshold section overlap and better graph connectivity versus text-only baselines.</p>
      </abstract>
      <kwd-group>
        <kwd>Compliance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Regulated domains (e.g., health and life sciences) demand high precision, verifiability, and domain
grounding in QA[
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. General LLMs, including recent model families [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ], excel in language
but risk hallucinations [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ], especially where compliance evidence and provenance are required
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. We propose a practical system combining: (i) schema-light triplet extraction and KG maintenance
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], (ii) a unified vector store with triplets and source text, and (iii) a multi-agent QA pipeline that
retrieves at the triplet level and returns answers with verifiable evidence.Our contributions are based on
knowledge graph methods [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15">12, 13, 14, 15</xref>
        ] and regulatory KG/RAG applications [
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref19 ref20 ref21">16, 17, 18, 19, 20, 21</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>2.1. Units, extraction, and provenance</title>
        <p>
          The regulatory text is segmented into atomic sections ( : C→X={x1, … ,   }, then an extraction pipeline
produces SPO triplets Φ(Ω()) = {  = (  ,   ,   )}. The provenance is captured by Λ ∶  → 2  , mapping
each   to one or more source sections for auditability. Open IE and related practices inform the extraction
side [
          <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
          ], with open-world learning and schema emergence supported by previous work [24, 25].
Canonicalization and entity linking address vocabulary fragmentation [26, 27], while ontology-driven
precedents [28] and community KGs [29, 30] motivate minimal, reusable meta-relations.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>2.2. Embedding and unified index</title>
        <p>search [31]. Queries  are embedded as   .
(</p>
        <p>Each triplet   is rendered as text  (  ), embedded via a transformer encoder into    ∈ ℝ ; we store
,   , Λ(  )) in a vector index. The density retrieval choices are inspired by DPR and modern similarity</p>
        <p>CEUR</p>
        <p>ceur-ws.org</p>
      </sec>
      <sec id="sec-1-3">
        <title>2.3. Triplet-first retrieval with text evidence</title>
        <p>
          We compute   = TopK(sim(  ,   )) and recover evidence   = ⋃

  ∈ 
Λ(  ), then pass (, 
 ,   ) to an
LLM to generate the answer  . This implements RAG [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] with structured facts to reduce hallucinations
[32], and has shown utility in healthcare / pharmaceutical QA [
          <xref ref-type="bibr" rid="ref20 ref21">21, 20</xref>
          ].
        </p>
      </sec>
      <sec id="sec-1-4">
        <title>2.4. Design notes</title>
        <p>We supplement the answers with an interactive subgraph of the retrieved triplets (Figure 1) to expose
how the evidence pieces connect. This improves user trust and supports auditability. The completeness
/ consistency of  , retrieval suficiency, and auditable provenance are central. The schema light choice
accelerates ingestion while relying on canonicalization to temper emergent vocabularies [26, 27].</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Multi-Agent Architecture</title>
      <p>
        We deploy specialized agents for ingestion, extraction, normalization/cleaning, indexing, retrieval,
storybuilding, and generation (Figure 2). This follows established multi-agent design principles for modularity
and scalability [33, 34, 35, 36] and is in line with recent regulatory KG/RAG systems [
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref19">16, 17, 19, 18</xref>
        ].
      </p>
      <p>Without Triplets</p>
      <p>With Triplets</p>
    </sec>
    <sec id="sec-3">
      <title>4. Evaluation</title>
      <sec id="sec-3-1">
        <title>4.1. Protocol</title>
        <p>
          We sample target sections  ′ ⊂  , build a ground-truth story per section by concatenating related
mentions, generate Q/A with an LLM, and compare our system’s retrieval and answers against these
references. This mirrors open-domain QA/RAG setups [
          <xref ref-type="bibr" rid="ref11">11, 31</xref>
          ] while focusing on regulatory corpora
[
          <xref ref-type="bibr" rid="ref16 ref19">16, 19</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Metrics</title>
        <p>Section-level overlap. For   = {  } ∪  (  ) and retrieved  , , (
, ,   ) =
a similarity threshold  for near-matches.</p>
        <p>
          Factual correctness. A secondary judge (LLM or expert) marks  ⋆ consistent with the ground truth
story; structured facts are expected to reduce hallucination [
          <xref ref-type="bibr" rid="ref7">7, 32</xref>
          ].
        </p>
        <p>Navigation. For sections   and  ℓ ∈  (  ), let  () be extracted triplets. We compute Nav( ′) =
1 ∑=1 ∑∑ ℓℓ ∈∈((  )) || ((  ))∩∪ (( ℓℓ ))|| and graph connectivity (avg. degree, shortest path).
| , ∩  | , optionally with
| , |</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Discussion and Limitations</title>
      <p>Schema-light design. Fast ingestion and adaptability come with vocabulary fragmentation; the
emergence of selective schema plus canonicalization mitigates this [26, 27].</p>
      <p>Extraction quality. Regulatory jargon and cross references may produce missing / noisy triplets;
iterative curation and weak supervision help. Temporal/conditional logic may need rules beyond SPO.</p>
      <p>
        Eficiency. Large, changing corpora benefit from incremental updates and eficient vector/graph
indexing. The approach complements the domain-specific RAG work [
        <xref ref-type="bibr" rid="ref20 ref21">21, 20</xref>
        ] and the regulatory
deployments of KG / RAG [
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref19">16, 19, 17, 18</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>
        A schema light KG with triplet-first retrieval and textual evidence has been successfully deployed at
scale. It supports numerous compliance professionals with precise and auditable QA in regulatory
domains, addressing known LLM risks [
        <xref ref-type="bibr" rid="ref10 ref7 ref8 ref9">7, 8, 9, 10</xref>
        ], with user counts rapidly expanding across regulatory
teams
(https://www.prnewswire.com/news-releases/mastercontrol-launches-ai-powered-regulatorychat-to-simplify-compliance-navigation-for-life-sciences-manufacturers-302533086.html). Future work
includes deeper temporal/conditional reasoning and tighter human-in-the-loop curation.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Overleaf AI Assist paid context-aware edits to
improve grammar, spelling, word choice, and sentence structure, all specifically trained for academic
writing. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed
and take(s) full responsibility for the publication’s content.
[24] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr., T. M. Mitchell, Toward an
architecture for never-ending language learning, in: Proceedings of the 24th AAAI Conference on
Artificial Intelligence (AAAI), 2010, pp. 1306–1313.
[25] S. Riedel, L. Yao, A. McCallum, Relation extraction with matrix factorization and universal schemas,
in: Proceedings of NAACL-HLT, 2013, pp. 74–84.
[26] L. Galárraga, C. Teflioudi, K. Hose, F. M. Suchanek, Canonicalizing open knowledge bases, in:
Proceedings of the 23rd ACM International Conference on Information and Knowledge Management
(CIKM), 2014, pp. 1679–1688.
[27] W. Shen, J. Wang, P. Luo, M. Wang, A survey on entity linking: Methods, techniques, and
applications, IEEE Transactions on Knowledge and Data Engineering 27 (2014) 443–460.
[28] F. Probst, S. Eck, W. Kuhn, Scalable semantics: A case study on ontology-driven geographic
information integration, International Journal of Geographical Information Science 20 (2006)
563–583.
[29] J. Lehmann, et al., Dbpedia – a large-scale, multilingual knowledge base extracted from wikipedia,</p>
      <p>Semantic Web (2015).
[30] F. M. Suchanek, G. Kasneci, G. Weikum, Yago: A core of semantic knowledge unifying wikipedia
and wordnet, in: Proceedings of the 16th International Conference on World Wide Web (WWW),
2007, pp. 697–706.
[31] V. Karpukhin, et al., Dense passage retrieval for open-domain question answering, 2020. ArXiv
preprint.
[32] J. Li, et al., Enhancing llm factual accuracy with rag to counter hallucinations: A case study on
domain-specific queries in private knowledge-bases, 2024. ArXiv preprint.
[33] Y. Shoham, K. Leyton-Brown, Multiagent Systems: Algorithmic, Game-Theoretic, and Logical
Foundations, Cambridge University Press, 2008. URL: https://www.eecs.harvard.edu/cs286r/courses/
fall08/files/SLB.pdf.
[34] M. Wooldridge, An Introduction to MultiAgent Systems, 2 ed., Wiley, 2009.
[35] G. Weiss, Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, MIT</p>
      <p>Press, 2000. URL: https://ieeexplore.ieee.org/book/6267355.
[36] A. Zygmunt, et al., Agent-based environment for knowledge integration, 2013. ArXiv preprint.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>U.S.</given-names>
            <surname>Food</surname>
          </string-name>
          and
          <string-name>
            <given-names>Drug</given-names>
            <surname>Administration</surname>
          </string-name>
          , Fda guidance documents,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Cordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Dudley</surname>
          </string-name>
          , L. Washington, Regulatory Compliance Burden,
          <source>Technical Report, GW Regulatory Studies Center</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Ceross,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <article-title>More than red tape: Exploring complexity in medical device regulatory afairs</article-title>
          ,
          <source>BMJ Innovations</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhong</surname>
          </string-name>
          , et al.,
          <source>Evaluation of openai o1: Opportunities and challenges of agi</source>
          ,
          <year>2024</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          , et al.,
          <source>Qwen2.5 technical report</source>
          ,
          <year>2024</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abdin</surname>
          </string-name>
          , et al.,
          <source>Phi-4 technical report</source>
          ,
          <year>2024</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          , et al.,
          <source>Survey of hallucination in natural language generation</source>
          ,
          <year>2024</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ling</surname>
          </string-name>
          , et al.,
          <article-title>Domain specialization as the key to make large language models disruptive: A comprehensive survey</article-title>
          ,
          <year>2024</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Zhang,</surname>
          </string-name>
          <article-title>Large language models in medical and healthcare fields: Applications, advances, and challenges</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>J. B. Hakim</surname>
          </string-name>
          , et al.,
          <article-title>The need for guardrails with large language models in medical safety-critical settings: An artificial intelligence application in the pharmacovigilance ecosystem</article-title>
          ,
          <year>2024</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , et al.,
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <year>2021</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>graphs</given-names>
          </string-name>
          ,
          <year>2021</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , et al.,
          <source>Knowledge Graphs</source>
          , volume
          <volume>12</volume>
          of Synthesis Lectures on Data, Semantics, and Knowledge, Morgan &amp; Claypool,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          , et al.,
          <article-title>A review of relational machine learning for knowledge graphs</article-title>
          ,
          <year>2015</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>A review: Knowledge reasoning over knowledge graph</article-title>
          ,
          <source>Expert Systems with Applications</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ershov</surname>
          </string-name>
          ,
          <article-title>A case study for compliance as code with graphs and language models</article-title>
          ,
          <year>2023</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chattoraj</surname>
          </string-name>
          , et al., Semantically Rich Approach to Automating Regulations of Medical Devices,
          <source>Technical Report</source>
          , University of Maryland, Baltimore County,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiang</surname>
          </string-name>
          , et al.,
          <article-title>Integrating knowledge graph and large language model for safety management regulatory texts</article-title>
          ,
          <source>in: Lecture Notes in Computer Science</source>
          , volume
          <volume>14250</volume>
          ,
          <year>2025</year>
          , pp.
          <fpage>976</fpage>
          -
          <lpage>988</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hillebrand</surname>
          </string-name>
          , et al.,
          <article-title>Advancing risk and quality assurance: A rag chatbot for improved regulatory compliance</article-title>
          ,
          <year>2024</year>
          . Available on IEEE Xplore.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <article-title>From rag to qa-rag: Integrating generative ai for pharmaceutical regulatory compliance process</article-title>
          ,
          <year>2024</year>
          . ArXiv preprint.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yang</surname>
          </string-name>
          , et al.,
          <article-title>Retrieval-augmented generation for generative artificial intelligence in health care, npj Digital Medicine (</article-title>
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Christensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soderland</surname>
          </string-name>
          , Mausam, Open information extraction:
          <article-title>The second generation</article-title>
          ,
          <source>in: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI)</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soderland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <article-title>Open information extraction for the web</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>80</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>