<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Agentic Knowledge Computing for Automated Biomarker Validation: Triangulated Causal Graph Construction in ALS Research</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Krishna Nidamarthi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kevin Zhu</string-name>
          <email>kevin@algoverse.us</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Algoverse AI Research</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Emerald High School</institution>
          ,
          <addr-line>Dublin, California</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Amyotrophic Lateral Sclerosis (ALS) generates vast literature containing critical relationships between biomarkers, pathogenic mechanisms, and therapeutic targets. Extracting and validating these relationships at scale remains challenging due to biomedical language complexity and domain expertise requirements. We present a novel NLP framework combining foundation models with domain-specific embeddings to automatically extract, validate, and organize ALS knowledge from scientific literature. Our approach introduces the Triangulated Causal Validation Score (TCVS), a three-tier scoring mechanism fusing outputs from Mistral-7B, BioLinkBERT-large, and PubMedBERT-MNLI models against four curated gold standard ALS term lists. The framework processes documents through GROBID-based extraction, validates 4,689 unique terms and 3,840 causal relationships, achieving 94.62% precision and 95.65% recall against expert-labeled datasets. We construct a Causal Knowledge Graph (CKG) with weighted edges and apply Louvain community clustering to identify 150 major functional groups, revealing novel connections between biomarkers and ALS disease progression pathways. Counterfactual analysis demonstrates the framework's ability to predict downstream efects of biomarker or genetic perturbations. We further propose agentic extensions enabling collaborative multi-agent systems for specialized knowledge curation and graph-based retrieval augmented generation. This work contributes: (1) TCVS - a generalizable validation methodology; (2) hybrid node-matching and similarity computation; (3) demonstration of multi-model fusion advantages; and (4) a reproducible pipeline with agentic extensibility for domain-specific knowledge graph construction, reducing manual curation efort by 40% while maintaining expert-level accuracy.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Causal Knowledge Graphs (CKG)</kwd>
        <kwd>Triangulated Causal Validation Score (TCVS)</kwd>
        <kwd>Multi-Model Node Matching</kwd>
        <kwd>Louvain Community Clusters</kwd>
        <kwd>Knowledge Curator Agents</kwd>
        <kwd>Counterfactual Analysis</kwd>
        <kwd>Amyotrophic Lateral Sclerosis (ALS)</kwd>
        <kwd>Computational Biology</kwd>
        <kwd>Agentic AI Sytems</kwd>
        <kwd>Automatic Knowledge Curation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <sec id="sec-1-1">
        <title>1.1. Motivation</title>
        <p>Amyotrophic Lateral Sclerosis (ALS) is a devastating neurodegenerative disease afecting approximately
5,000 new patients annually in the United States, with a median survival of 3-5 years from symptom onset
[1]. Despite decades of research, only two FDA-approved treatments (Riluzole and Edaravone) exist,
ofering modest disease-modifying efects [ 2]. The complexity of ALS pathophysiology—involving motor
neuron degeneration, protein aggregation, neuroinflammation, and mitochondrial dysfunction—has
hindered therapeutic development [3, 4].</p>
        <p>Recent advances in cerebrospinal fluid (CSF) biomarker research have identified promising diagnostic
and prognostic indicators, including neurofilament light chain (NfL), phosphorylated neurofilament
heavy chain (pNfH), and inflammatory markers such as chitotriosidase-1 (CHIT1) [ 5, 6]. However, the
rapidly expanding literature creates a critical bottleneck: researchers cannot manually synthesize the
thousands of published relationships between biomarkers, genetic factors, and disease mechanisms at
the pace of discovery.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. The Challenge</title>
        <p>Traditional systematic reviews and meta-analyses, while rigorous, are time-intensive and quickly
become outdated. Automated text mining approaches face three fundamental challenges in the ALS
domain:
1. Validation Accuracy: Generic NLP models lack domain-specific knowledge to distinguish valid
biomedical relationships from methodological descriptions or spurious correlations. For example,
distinguishing "CSF NfL levels correlate with disease progression" (valid biomarker relationship)
from "we measured CSF samples using ELISA" (methodological statement) requires specialized
understanding.
2. Semantic Ambiguity: Biomedical terminology exhibits high polysemy and synonymy. The
term "SOD1" may refer to the gene, protein, or mutation context, while "motor neuron death" and
"motor neuron degeneration" represent semantically equivalent concepts requiring normalization.
3. Relationship Complexity: ALS literature contains multiple relationship types—causal
mechanisms (e.g., "TDP-43 aggregation causes motor neuron toxicity"), correlational observations (e.g.,
"NfL levels associate with survival time"), and temporal progressions (e.g., "bulbar onset precedes
respiratory failure")—each requiring diferent validation criteria.</p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Our Approach</title>
        <p>
          We address these challenges through a novel multi-model fusion framework that combines: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
Foundation Model Expertise via Mistral-7B for broad scientific reasoning; (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Domain-Specific Embeddings
through BioLinkBERT-large capturing biomedical semantic relationships; (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Entailment Validation
using PubMedBERT-MNLI for logical consistency assessment; and (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) Gold-Standard Grounding via
four curated term lists derived from NIH MeSH and expert curation. Figure A.1 presents our complete
pipeline from document ingestion through multi-model fusion to knowledge graph construction to
counter factual analysis, illustrating how these components work together to enable robust biomarker
validation and knowledge discovery. The framework operates through five stages: document ingestion
via GROBID, relationship and term extraction using Mistral-7B, three-tier validation producing TCVS
scores, Causal Knowledge Graph construction with hybrid node matching, and community detection
with counterfactual analysis. Our architecture naturally extends to collaborative multi-agent systems
where specialized agents curate domain-specific subgraphs.
        </p>
      </sec>
      <sec id="sec-1-4">
        <title>1.4. Contributions</title>
        <p>This work makes four primary contributions:
Methodological Innovation:</p>
        <sec id="sec-1-4-1">
          <title>1. Triangulated Causal Validation Score (TCVS): A novel scoring mechanism that adaptively</title>
          <p>weights three complementary validation signals based on relationship type, achieving 94.62%
precision versus 71.2% for single-model baselines.
2. Hybrid Node Matching Algorithm: GPU-accelerated similarity computation combining lexical
overlap (40%), Mistral embeddings (35%), and BioLinkBERT embeddings (25%) for robust entity
linking, reducing false positive edges by 64% compared to string-matching approaches.
Empirical Findings:
3. Multi-Model Superiority: Systematic ablation studies demonstrated that three-tier fusion
outperformed any single model across all relationship categories, with particularly strong gains
for biomarker relationships (ΔF1 = +12.3%).</p>
        </sec>
        <sec id="sec-1-4-2">
          <title>4. Reproducible Pipeline with Agentic Extensibility: Open methodology for domain-specific</title>
          <p>knowledge graph construction, validated on 15 ALS research papers containing 4,689 terms and
3,840 relationships, with clear pathways for multi-agent collaborative extensions.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Biomedical Relationship Extraction</title>
        <p>Automated extraction of biomedical relationships evolved from rule-based systems [7] to neural
approaches [8, 9]. Pawar et al. [10] present a knowledge-based approach for extracting Cause-Efect
(CE) relations from biomedical text using a combination of unsupervised machine learning to discover
causal triggers and high-precision linguistic rules to identify cause/efect arguments. While their work
demonstrated efectiveness on Leukaemia literature, it lacked domain-specific validation mechanisms
and relied solely on linguistic patterns. In contrast, our TCVS framework incorporates multiple
complementary validation signals through foundation models, domain-specific embeddings, and entailment
validation to achieve robust biomarker relationship extraction specifically tailored to ALS research.</p>
        <p>Recent transformer-based models like BioBERT [11], PubMedBERT [12], and BioLinkBERT [13]
leveraged domain-specific pretraining on PubMed abstracts and PMC full-text articles, achieving
state-of-the-art performance on benchmark tasks. However, these models primarily addressed binary
classification tasks rather than open-ended relationship extraction with validation. Our work extends
this by introducing multi-model fusion specifically for causal relationship validation.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Knowledge Graph Construction in Biomedicine</title>
        <p>Biomedical knowledge graphs have been constructed for various domains: UMLS [14] integrated
multiple terminologies, DisGeNET [15] focused on gene-disease associations, and Hetionet [16] created
heterogeneous networks spanning genes, compounds, diseases, and pathways. These resources relied
primarily on structured databases and manual curation. However, these approaches lacked validation
mechanisms beyond simple filtering, resulting in high false positive rates (for SemMedDB [ 17] reported
equal percentage of true and false positives). Our framework addresses this gap by introducing TCVS
for relationship validation before CKG construction.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. ALS Computational Research</title>
        <p>Computational approaches in ALS research focused on three areas: biomarker discovery using machine
learning models [18, 19], network-based genetic analysis [20, 21], and causal feature dependency
modeling [22]. These networks used manually curated databases rather than literature mining. From
our extensive literature search, we found no prior work that constructed a validated causal knowledge
graph specifically for ALS biomarkers.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Multi-Model Fusion and AI Agents</title>
        <p>Ensemble methods combining multiple models showed consistent improvements across NLP tasks
[23, 24]. In biomedical NLP, Peng et al. [9] combined BioBERT variants for NER, achieving +2.1%
F1 over single models. Recent work on AI agents [25] demonstrates the potential for collaborative
multi-agent systems in complex reasoning tasks. Graph-based retrieval augmented generation (Graph
RAG) [26] has shown promise in enhancing LLM reasoning over structured knowledge. Our TCVS
approach difers by fusing models with complementary strengths using adaptive weighting, and our
framework uniquely positions itself for agentic extensions through modular architecture. The closest
related work was BERN2 [27], which combined multiple biomedical NER models but lacked relationship
validation and KG construction capabilities.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Overview and Data Preparation</title>
        <p>
          Our framework processed ALS research papers through five stages: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) document ingestion and
normalization, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) entity and relationship extraction, (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) three-tier validation with TCVS scoring, (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) Causal
Knowledge Graph (CKG) construction, and (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) community detection and counterfactual analysis. For
our framework development and testing, as presented in this paper, we selected 15 papers from PubMed
using keywords "amyotrophic lateral sclerosis, biomarkers, and CSF proteomics".
        </p>
        <p>We employed GROBID v0.7.2 [28] to extract text chunks with section labels, figures with captions
and context, and tables as structured data with surrounding context. Each extracted element received
a unique identifier for provenance tracking. We preserved document structure to maintain semantic
coherence during relationship extraction.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Gold-Standard Term Lists</title>
        <p>We created four gold-standard term lists from NIH MeSH using their respective root URIs and tree
patterns, supplemented with expert review:</p>
        <p>Each term list was embedded using BioLinkBERT-large (1024-dim) and PubMedBERT-base (768-dim),
creating reference embedding matrices</p>
        <p>G(bio) ∈ R×1024 ,</p>
        <p>
          G(pu)b ∈ R×768
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
where  in {pathogenic, biomarker, therapeutic, general}. This created distinct embedding spaces for
context-segregated clustering and similarity measures.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Triangulated Causal Validation Score (TCVS)</title>
        <p>We realized that single-model validation sufered from complementary weaknesses: generic LLMs
lacked domain specificity, domain-specific embeddings missed reasoning capabilities, and entailment
models required carefully constructed premises. By fusing three complementary signals with adaptive
weighting, TCVS achieved robust validation across diverse relationship types.</p>
        <p>We computed three scores per extracted term and relationship: (i) domain similarity (domain) using
BioLinkBERT centroid/goldlist alignment; (ii) textual entailment (entail) using PubMedBERT with
contextual paragraph as premise; and (iii) semantic routing/interpretive score (expert) from the instruct
LLM (Mistral).
3.3.1. Tier 1: Generic LLM Expert Validation
We used Mistral-7B’s broad scientific reasoning to categorize relationships and assess domain relevance.
This categorization helped choose appropriate gold lists for domain-specific scoring. We employed a
two-stage prompt structure (see Appendix B for complete prompts):</p>
        <p>Stage 1 - Relevance Check: We asked the model to classify whether a statement was about ALS
disease biology, biomarkers, or therapeutics versus methodological/administrative content, requesting
JSON-formatted responses to reduce parsing errors.</p>
        <p>Stage 2 - Detailed Assessment: We provided an expert validation rubric with six confidence levels
ranging from 0.0–0.24 (weak/unclear relationship) to 0.85–1.0 (well-established mechanism).</p>
        <p>The output provided expert confidence score expert ∈ [0, 1] and relationship category . Similar
prompts were used to categorize and validate extracted terms.
3.3.2. Tier 2: Domain-Specific Embedding Similarity
We assessed semantic similarity between extracted relationships and validated ALS terminology using
categorized gold list embeddings with multi-scale similarity computation.</p>
        <p>Given a relationship statement  (or term) with embedding e, we computed three similarity metrics
against its categorized gold list G:</p>
        <p>1. Maximum Similarity (Exact Concept Match):
to find exact matches such as “neurofilament light chain” ↔ “NfL”.</p>
        <p>2. Cluster Similarity (Semantic Neighborhood):
max = max cos(e, g())
=1</p>
        <p>10
cluster = ∑︁  ·  ()</p>
        <p>=1
where () denotes the -th highest similarity score with exponential decay weights () to find cluster
matches such as “motor neuron degeneration” ↔ “motor neuron death”.</p>
        <p>3. Context Similarity (Distributional Match):</p>
        <p>context = 0.75({cos(e, g())}=1)
where 0.75 denotes the 75th percentile to find contextual matches such as “ALS progression” ↔
“disease advancement”.</p>
        <p>The final domain similarity score was:</p>
        <p>domain = 0.45 ·  max + 0.35 ·  cluster + 0.2 ·  context</p>
        <p>This multi-scale matching reduced false negatives by 28% compared to using maximum similarity
alone.
3.3.3. Tier 3: Entailment-Based Validation
Entailment-based validation represents a natural language inference approach that evaluates the logical
consistency between extracted biomedical relationships and established domain knowledge. We used
PubMedBERT-MNLI, a domain-specific BERT model fine-tuned on medical literature and natural
language inference tasks, to assess whether extracted statements logically follow from known ALS
research premises.</p>
        <p>
          The validation operates through a dual-premise framework where each extracted relationship serves
as a hypothesis tested against two complementary premises: For example, (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) a specific CSF biomarker
premise: "Cerebrospinal fluid (CSF) biomarkers for amyotrophic lateral sclerosis (ALS) diagnosis and
monitoring," and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) a general pathogenesis premise: "Amyotrophic lateral sclerosis (ALS) pathogenesis,
genetics, and neurodegeneration mechanisms." For each premise-hypothesis pair, the system extracts
CLS token embeddings from PubMedBERT-MNLI’s final hidden layer:
hCLS = PubMed(premise, hypothesis),
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
then computes cosine similarity scores with the ALS general gold list, and these scores were weighted
and normalized. To reduce bimodality of MNLI models (their tendency to produce scores clustered
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
around [0.45, 0.55]), we applied distributional correction. The final  represents this corrected,
calibrated score that quantifies how well the extracted relationship aligns with established ALS domain
knowledge, efectively serving as a measure of scientific plausibility within the broader context of ALS
research literature.
3.3.4. TCVS Computation
We combined the three tiers with dynamic weighting:
        </p>
        <p>
          TCVS = 1 ·  domain + 2 ·  entail + 3 ·  expert
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
where weights (1, 2, 3) were determined empirically for each score type. Base weights were:
[0.2, 0.3, 0.5].
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Expert Validation</title>
        <p>To validate TCVS performance, we compared classifications against 300 expert annotations (15%
randomly selected from identified relationships), which served as ground truth. Two ALS experts (10+
years’ experience) independently labeled relationships as “valid” or “invalid.” They did not use the
“flagged for review” category, while our algorithm employed it for ambiguous cases requiring manual
input.</p>
        <p>Table B.1, Appendix B, shows TCVS performance across confidence thresholds. TCVS &lt; 0.5 efectively
ifltered non-biomedical procedural statements with 98.2% accuracy. The intermediate range (0.5–0.75)
captured relationships requiring human expert intervention, including valid discoveries flagged for
review and edge cases where vocabulary similarity led to misclassification. TCVS ≥ 0.75 identified
high-confidence valid relationships with 100% accuracy.</p>
        <p>Comparing valid and invalid cases against expert labels, our algorithm achieved 95.08% accuracy,
94.62% precision, 95.65% recall, and 0.95 F1 score (Table 1).</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Causal Knowledge Graph Construction</title>
        <p>Organizing validated relationships into a graph structure enabled network analysis, community
detection, and counterfactual reasoning. However, entity linking (matching relationship phrases to term
nodes) became challenging due to terminology variation.
3.5.1. Node Creation
Each validated term, that is extracted and validated using TCVS, became a node with attributes including
term name, category, validation status, definition, synonyms, biomarker status, repetition count across
papers, embeddings from both Mistral and BioLinkBERT, LLM &amp; domain specific validation scores, and
source paper identifiers. Terms were deduplicated by case-insensitive matching.
3.5.2. Hybrid Node Matching
Relationship cause/efect phrases (e.g., “TDP-43 protein aggregation”) had to be linked to term nodes
(e.g., “TDP-43,” “protein aggregation”). Simple string matching failed due to partial matches, synonymy,
and specificity variations. We developed a hybrid similarity approach combining lexical and semantic
signals.</p>
        <p>For each valid relationship with cause phrase  and efect phrase , we computed similarity against
all nodes (terms)  using three components:</p>
        <p>Lexical Score from token overlap and fuzzy matching:
lex(, ) = max
︂( |tokens() ∩ tokens()| , FuzzyMatch(, ) )︂</p>
        <p>
          |tokens()| 100
Embedding Scores from Mistral and BioLinkBERT:
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
(
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
(
          <xref ref-type="bibr" rid="ref10">10</xref>
          )
(
          <xref ref-type="bibr" rid="ref11">11</xref>
          )
(
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
Combined Similarity:
mistral(, ) = cos(em istral, emistral)
biolink(, ) = cos(ebiolink, ebiolink)
hybrid(, ) = 0.40 ·  lex + 0.35 ·  mistral + 0.25 ·  biolink
        </p>
        <p>For valid matches, we created edges where hybrid(, ) &gt;  and hybrid(, ) &gt;  . The base
threshold  = 0.70 was reduced by 0.05 for biomarker relationships to increase recall. This hybrid
matching algorithm reduced false positive edges by 64% compared to string-only matching while
maintaining 91% recall. The biomarker prioritization ensured that ALS and CSF-related biomarker
relationships critical for diagnosis were well-represented in the graph.
3.5.3. Edge Attributes and Weight Normalization
Each edge stored the original author statement, extracted cause/efect phrases, validation status (valid
only), edge confidence from hybrid matching, biomarker relationship status, TCVS components,
repetition count across papers, detailed matching scores, and source paper identifiers.</p>
        <p>Edge weights were normalized for community detection:
norm =
edge_confidence × log(1 + repeats)
max(edge_conf. × log(1 + repeats))</p>
        <p>This balanced confidence (from matching) and importance (from frequency). Considering only valid
terms and relationships and prioritizing biomarker-related nodes and edges, we constructed a Causal
Knowledge Graph with 2,273 nodes (out of 4,689 terms) and 20,401 edges.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Community Detection and Counterfactual Analysis</title>
        <p>ALS pathophysiology involves multiple interconnected mechanisms. Community detection identifies
functional modules—groups of densely connected terms representing coherent biological processes. We
applied the Louvain method Blondel et al. [29], which optimizes modularity:
 =
1 ∑︁ [︂  −
2 
 ]︂ ( ,  )
2
where  is the adjacency matrix,  is the degree of node ,  is total edge weight,  is the community
assignment, and ( ,  ) = 1 if  =  , else 0.</p>
        <p>We used resolution parameter  = 1.0 (default). Louvain’s hierarchical approach revealed multi-scale
organization: large communities represented major pathways while sub-communities captured specific
mechanisms.</p>
        <p>For counterfactual analysis exploring queries like “If we intervene on node 0, what are the predicted
downstream efects?”, we implemented a hybrid path-based propagation and community co-cluster
validation method:</p>
        <p>
          Path-Based Propagation:
1. Identified intervention node  0
2. Computed reachable nodes: all  such that a directed path 0 → · · · →   exists
3. Calculated path strength with exponential decay penalizing long paths:
(
          <xref ref-type="bibr" rid="ref13">13</xref>
          )
(14)
(15)
For each path  = ( =  0, 1, . . . ,  =  ), () =
−1
∏︁ norm(, +1) × exp(−0.1 · )
=0
4. Aggregated across all paths:
total( ⇝  ) =
        </p>
        <p>∑︁ ()
:⇝
5. Ranked nodes by impact to identify most afected targets</p>
        <p>Co-Cluster Validation: We validated predictions using community structure: strong predictions when
0 and  were in the same community (direct functional relationship), moderate when in adjacent
communities (indirect relationship), and weak when in distant communities (spurious or long-range
efect).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. TCVS Performance and Validation</title>
        <p>Table B.1, Appendix B, demonstrates that TCVS efectively stratified relationships across confidence
levels. The multi-model fusion approach successfully distinguished between experimental methodology
descriptions and genuine biomarker/therapeutic relationships while appropriately flagging ambiguous
cases for expert curation.</p>
        <p>Representative examples from each TCVS range illustrate the framework’s performance (Table B.2).
In the lowest range (TCVS &lt; 0.5), the algorithm correctly invalidated methodological statements like
“Adding varying amounts of SIL peptides causes the SIL peptides to be quantified by PRM analysis”
(TCVS = 0.295). Although BioLinkBERT and PubMedBERT gave higher scores due to medical vocabulary,
Mistral as an expert helped recognize these as non-relevant relationships.</p>
        <p>In the intermediate range (0.5–0.75), the algorithm correctly identified cases requiring human
expertise. For instance, “Tofersen was recently approved merely based on decreases in NfL” (TCVS = 0.756)
was flagged for review due to lack of contextual evidence, and experts subsequently labeled it valid.
However, “ROPI treatment causes decrease in protein group enriched in Parkinson’s disease” (TCVS =
0.768) was a false positive that should have been invalidated as it was not ALS-related.</p>
        <p>The high-confidence range (TCVS ≥ 0.75) contained unambiguous validations such as “The presence
of a mutation in C9orf72 gene causes an upregulation of CHI3L2 in CSF of symptomatic ALS patients”
(TCVS = 0.805) and “Increased levels of oxidative stress contribute to the pathogenesis of sporadic ALS”
(TCVS = 0.896).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Knowledge Graph Structure</title>
        <p>The constructed Causal Knowledge Graph contained 2,273 validated terms and 20,401 weighted
relationships. Louvain community detection identified 15 major communities (Table C.1, Appendix C), with
the top six being: Markers (279 nodes), ALS (213 nodes), progression (143 nodes), APOE (139 nodes),
patients (131 nodes), and C9orf72 (121 nodes). Figure C.1 visualizes the network structure showing
dense connectivity within communities and sparse connections between them.</p>
        <p>These communities aligned with established ALS research areas, validating the biological relevance
of our automated extraction and organization.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Counterfactual Analysis</title>
        <p>We tested the framework’s predictive capability by performing counterfactual analysis on SOD1 mutation
as the intervention node. Table D.1 (Appendix D) shows the top 15 predicted downstream biomarker
responses ranked by combined score (weighted by impact, uncertainty, and cluster proximity).</p>
        <p>The results showed path-based predictions consistent with known ALS literature: SOD1 mutation →
SOD1 protein (0.075 impact via 83 paths), SOD1 mutation → familial ALS (0.068 impact, 0.600 cluster
score), and SOD1 mutation → protein abundance/glycosylation (impacts 0.066–0.068), indicating efects
on protein homeostasis.</p>
        <p>Cluster validation showed all SOD1-related terms were in the same community (cluster scores
0.49–0.69), correctly placing them in the genetic module. This demonstrated both validation of the
cluster structure and proof of feasibility for using the Causal Knowledge Graph for predictive analysis.
The intervention on SOD1 mutation showed strongest predicted impact on protein-related processes,
validated by high cluster proximity within the same genetic module. Path diversity (83–271 pathways
across diferent predictions) indicated robust multi-mechanism efects, while low uncertainty (±0.02–
0.05) reflected convergent evidence across multiple literature sources.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <sec id="sec-5-1">
        <title>5.1. Principal Findings</title>
        <p>This work presented the first validated computational framework for automated extraction and
organization of ALS biomarker knowledge from scientific literature. Our three-tier validation approach
(TCVS) achieved 95.08% accuracy with 94.62% precision and 0.95 F1 score compared to expert-labeled
datasets. The resulting Causal Knowledge Graph contained 2,273 validated terms and 20,401 weighted
relationships, organized into 15 functional communities that recapitulated known ALS pathophysiology
while enabling novel connection discovery.</p>
        <p>Key Contributions:</p>
        <p>Methodological Innovation: TCVS demonstrated that multi-model fusion with adaptive weighting
significantly outperformed single-model approaches for biomedical relationship validation. The
framework is generalizable to other disease domains by substituting domain-specific gold lists and adjusting
category-specific weights.</p>
        <p>Domain Impact: The SOD1 mutation connections identified through community analysis represent
testable hypotheses for therapeutic development. The framework reduced manual curation efort by
40% while maintaining expert-level accuracy.</p>
        <p>Reproducible Pipeline: Complete methodology with mathematical formulations enables replication
and extension to other neurodegenerative diseases (Alzheimer’s, Parkinson’s, Huntington’s).</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Comparison with Existing Approaches</title>
        <p>
          Our precision (94.62%) substantially exceeded SemMedDB’s reported 62.3% on ALS relationships [17],
and 60% linguistic rule based relationship extraction [10] . Our improvement stemmed from: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
multi-tier validation versus simple co-occurrence, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) domain-specific gold lists versus generic UMLS
concepts, and (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) causal focus versus all semantic predications. While BERN2 achieved state-of-the-art
entity recognition (F1=90.2%) [27], rule-based relationship extraction proved brittle (F1=65.0% in our
evaluation). TCVS’s learned validation approach generalized better to diverse linguistic expressions of
causal relationships. Expert curation remains the gold standard but is time-intensive (∼ 2–3 hours per
paper). Our framework processed 15 papers in 18 hours with 94.62% accuracy, demonstrating more
than 40% productivity improvement.
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Biological Insights</title>
        <p>The 15 identified communities aligned with established ALS research areas. The SOD1 mutation
counterfactual analysis validated known pathophysiology: SOD1 mutations account for approximately
20% of familial ALS [30, 31]. Our framework correctly identified SOD1’s strong association with
familial forms and anterior horn motor neuron pathology. The predicted impact on protein homeostasis
pathways aligned with established understanding that SOD1 mutations cause protein misfolding and
aggregation through toxic gain-of-function mechanisms [32, 33].</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Limitations</title>
        <p>Gold List Coverage: Our gold lists captured major ALS concepts but missed emerging terminology.
Periodic gold list updates using recent high-impact papers and expert review could address this limitation.
Scalability: Processing 15 papers in 18 hours demonstrated feasibility for moderate-scale applications.
Scaling to hundreds of papers would require GPU-accelerated batch processing, incremental graph
updates, and distributed computing for community detection.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Agentic Extensions and Future Directions</title>
        <p>Our framework’s modular architecture naturally extends to collaborative multi-agent systems. We
propose two key directions that leverage our validated CKG infrastructure:</p>
        <p>Graph Retrieval Augmented Generation (Graph RAG) in Agentic Systems: Traditional RAG
systems retrieve text chunks, but biomedical reasoning requires structured knowledge traversal. We
envision specialized query agents that leverage our CKG’s community structure for context-aware
retrieval. For example, a “Biomarker Discovery Agent” could traverse the Markers community (279
nodes) to identify novel diagnostic candidates, while a “Therapeutic Hypothesis Agent” explores paths
between the C9orf72 genetic cluster (121 nodes) and therapeutic intervention nodes. Graph RAG [26]
enables agents to retrieve multi-hop subgraphs rather than isolated facts, providing richer context for
LLM reasoning. Our weighted edges and TCVS scores serve as confidence signals for retrieval ranking,
ensuring high-quality evidence chains.</p>
        <sec id="sec-5-5-1">
          <title>Agentic Information Extraction and Retrieval: We propose a multi-agent curator system where</title>
          <p>
            specialized agents maintain domain-specific subgraphs: (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) Pathogenic Curator Agent monitors genetic
and molecular mechanism literature, updating the C9orf72 and SOD1 communities; (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) Biomarker
Curator Agent tracks diagnostic marker studies, maintaining the Markers and APOE communities; (
            <xref ref-type="bibr" rid="ref3">3</xref>
            )
Therapeutic Curator Agent extracts drug-target relationships; and (
            <xref ref-type="bibr" rid="ref4">4</xref>
            ) Coordinator Agent orchestrates
cross-domain queries and resolves conflicts using TCVS consensus. Each agent employs our three-tier
validation pipeline but specializes its gold lists and weighting schemes. This architecture enables
continuous knowledge base evolution as new papers emerge, with agents autonomously proposing
graph updates that undergo collective validation. The coordinator agent can answer complex queries
like “What biomarkers predict response to SOD1-targeted therapies?” by orchestrating retrieval across
multiple specialized subgraphs and synthesizing evidence through multi-agent deliberation [25].
          </p>
          <p>Implementation details and architectural diagrams for these agentic extensions are provided in
Appendix E. Future work will evaluate multi-agent coordination strategies and benchmark Graph RAG
performance against traditional retrieval methods on complex biomedical queries.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The lead author acknowledges Andy Zhang for his mentoring on how to approach a research problem.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Anthropic Claude 4.5 in order to: Grammar
and spelling check, reword, and ease LaTeX translations between NeurIPS template at the time of first
submission to CEURART for the final paper. After using the LLM service, the authors reviewed and
edited the content as needed and take full responsibility for the publication’s content.
[14] O. Bodenreider, The Unified Medical Language System (UMLS): integrating biomedical terminology, Nucleic</p>
      <p>Acids Research 32 (2004) D267–D270. URL: https://doi.org/10.1093/nar/gkh061.
[15] J. Piñero, J. M. Ramírez-Anguita, J. Saüch-Pitarch, et al., The DisGeNET knowledge platform for disease
genomics: 2019 update, Nucleic Acids Research 48 (2020) D845–D855. URL: https://doi.org/10.1093/nar/
gkz1021.
[16] D. S. Himmelstein, A. Lizee, C. Hessler, et al., Systematic integration of biomedical knowledge prioritizes
drugs for repurposing, eLife 6 (2017) e26726. URL: https://doi.org/10.7554/eLife.26726.
[17] R. Frijters, M. van Vugt, R. Smeets, R. van Schaik, J. de Vlieg, W. Alkema, Literature mining for the discovery
of hidden connections between drugs, genes and diseases, PLoS Computational Biology 6 (2010) e1000943.</p>
      <p>URL: https://doi.org/10.1371/journal.pcbi.1000943.
[18] R. Küfner, N. Zach, R. Norel, et al., Crowdsourced analysis of clinical trial data to predict amyotrophic
lateral sclerosis progression, Nature Biotechnology 33 (2015) 51–57. URL: https://doi.org/10.1038/nbt.3051.
[19] V. Grollemund, P.-F. Pradat, G. Querin, F. Delbot, G. Le Chat, J.-F. Pradat-Peyre, P. Bede, Machine learning
in amyotrophic lateral sclerosis: Achievements, pitfalls, and future directions, Frontiers in Neuroscience 13
(2019) 135. URL: https://doi.org/10.3389/fnins.2019.00135.
[20] D. Karagkouni, M. D. Paraskevopoulou, S. Chatzopoulos, et al., DIANA-TarBase v8: a decade-long collection
of experimentally supported miRNA-gene interactions, Nucleic Acids Research 46 (2018) D239–D245. URL:
https://doi.org/10.1093/nar/gkx1141.
[21] G. Morello, M. Guarnaccia, A. G. Spampinato, S. Salomone, V. D’Agata, F. L. Conforti, E. Aronica, S.
Cavallaro, From multi-omics approaches to precision medicine in amyotrophic lateral sclerosis, Frontiers in
Neuroscience 14 (2020) Article 577755. URL: https://doi.org/10.3389/fnins.2020.577755.
[22] M. Ahangaran, M. R. Jahed-Motlagh, B. Minaei-Bidgoli, Causal discovery from sequential data in als disease
based on entropy criteria, Journal of biomedical informatics 89 (2019) 41–55. URL: https://doi.org/10.1016/j.
jbi.2018.10.004.
[23] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers
for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019, pp.
4171–4186. URL: https://doi.org/10.18653/v1/N19-1423. doi:10.18653/v1/N19-1423.
[24] Z. Li, Q. Wei, L. Huang, J. Li, Y. Hu, et al., Ensemble pretrained language models to extract biomedical
knowledge from literature, Journal of the American Medical Informatics Association 31 (2024) 1904–1911.</p>
      <p>URL: https://doi.org/10.1093/jamia/ocae061.
[25] Z. Xi, W. Chen, X. Guo, W. He, et al., The rise and potential of large language model based agents: A survey,</p>
      <p>NeurIPS 2023, arXiv:2309.07864 (2023). URL: https://doi.org/10.48550/arXiv.2309.07864.
[26] D. Edge, H. Trinh, N. Cheng, et al., From local to global: A graph rag approach to query-focused
summarization, NeurIPS 2024, arXiv:2404.16130 (2024). URL: https://doi.org/10.48550/arXiv.2404.16130.
[27] M. Sung, H. Jeon, J. Lee, J. Kang, BERN2: an advanced neural biomedical named entity recognition and
normalization tool, Bioinformatics 38 (2022) 4837–4839. URL: https://doi.org/10.1093/bioinformatics/btac598.
[28] P. Lopez, Grobid: Combining automatic bibliographic data recognition and term extraction for scholarship
publications, in: M. Agosti, J. Borbinha, S. Kapidakis, C. Papatheodorou, G. Tsakonas (Eds.), Research and
Advanced Technology for Digital Libraries, Springer Berlin Heidelberg, 2009, pp. 473–474.
[29] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks,
Journal of Statistical Mechanics: Theory and Experiment 2008 (2008) p10008. URL: http://dx.doi.org/10.1088/
1742-5468/2008/10/P10008.
[30] D. Rosen, T. Siddique, D. Patterson, et al., Mutations in cu/zn superoxide dismutase gene are associated
with familial amyotrophic lateral sclerosis, Nature 362 (1993) 59–62. URL: https://doi.org/10.1038/362059a0.
[31] P. Andersen, A. Al-Chalabi, Clinical genetics of amyotrophic lateral sclerosis: what do we really know?,</p>
      <p>Nature Reviews, Neurology 7 (2011) 603–615. URL: https://doi.org/10.1038/nrneurol.2011.150.
[32] L. I. Bruijn, M. W. Becher, M. K. Lee, et al., Als-linked sod1 mutant g85r mediates damage to astrocytes and
promotes rapidly progressive disease with sod1-containing inclusions, Neuron 18 (1997) 327–338. URL:
https://doi.org/10.1016/s0896-6273(00)80272-x.
[33] L. Grad, J. Yerbury, B. Turner, et al., Intercellular propagated misfolding of wild-type cu/zn superoxide
dismutase occurs via exosome-dependent and -independent mechanisms, Neuroscience 111 (2014) 3620–3625.</p>
      <p>URL: https://doi.org/10.1073/pnas.1312245111.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Main Components of The Research Method and The Flow of</title>
    </sec>
    <sec id="sec-9">
      <title>Information</title>
      <p>Our framework processes biomedical literature through a multi-stage pipeline that combines document processing,
knowledge validation, and graph-based analysis. The pipeline begins with parallel ingestion of research papers
via GROBID extraction and curated gold standard term lists across four categories (pathogenic, biomarker,
therapeutic, and general ALS terms) totaling 168,580 terms. These gold lists are embedded using BioLinkBERT
and PubMedBERT to create reference matrices for validation. The core of our approach is the Triangulated Causal
Validation Score (TCVS), which fuses three complementary signals: expert validation from Mistral-7B, domain
similarity using BioLinkBERT, and entailment validation through PubMedBERT-MNLI. This multi-model fusion
approach enables robust validation of extracted relationships against domain knowledge. The validated terms
and relationships are organized into a Causal Knowledge Graph (CKG), which undergoes community detection
to identify functional modules and enables counterfactual analysis for predicting intervention outcomes. The
pipeline achieves 94.62% precision and 95.65% recall in biomarker relationship extraction, demonstrating the
efectiveness of our multi-tier validation strategy.</p>
      <p>The framework extends to agentic capabilities through specialized curator agents coordinated by an Agent
Orchestrator, enabling continuous knowledge base evolution and sophisticated query processing through Graph
RAG. Each specialized agent maintains domain-specific subgraphs while collaborating through TCVS consensus
mechanisms.</p>
    </sec>
    <sec id="sec-10">
      <title>B. LLM Prompts and Validation Details</title>
      <p>For the LLM validation component, we selected Mistral-7B-Instruct-v0.3 as our large language model after
systematic &amp; manual evaluation of multiple alternatives including GPT-4, Claude-3.5-Sonnet, and Llama-3-8B. While
proprietary models like GPT-4 and Claude demonstrated superior performance in preliminary testing, budget
constraints as a high school student research project necessitated the use of open-source alternatives. Among
the open-source options, Llama-3-8B exceeded our hardware limitations (requiring &gt;16GB VRAM), whereas
Mistral-7B provided an optimal balance of performance and computational eficiency, operating efectively within
our 16GB GPU memory constraints (we also made it work on 8GB GPU by loading in 4 bit with quant_type "nf4"
and torch_dype float16). Mistral-7B demonstrated reasonable classification accuracy for biomedical relationship
validation tasks while maintaining the practical advantage of local deployment, ensuring reproducibility and data
privacy by locally executing it.</p>
      <sec id="sec-10-1">
        <title>B.1. Stage 1: Relevance Check Prompt</title>
        <p>The following prompt was used for initial relevance classification:
You are an ALS domain expert. Classify this statement’s
relevance to ALS research.
Classification task:
1. Is this about ALS disease biology, biomarkers,</p>
        <p>or therapeutics? (YES/NO)
2. If NO, is it methodological/administrative/other?</p>
        <p>(YES/NO)
Respond in JSON:
{
}</p>
        <p>Although Mistral-7B is a general-purpose language model, we refer to its output as  because our
prompts explicitly instruct the model to assume the role of an ALS domain expert, leveraging its biomedical
knowledge to provide assessments of relationship categorization and biomarker relevance.</p>
      </sec>
      <sec id="sec-10-2">
        <title>B.2. Stage 2: Detailed Assessment Prompt</title>
        <p>
          For validated ALS-relevant statements, we applied detailed assessment:
Evaluate this relationship for ALS research relevance.
Expert Validation Rubric:
- Level 1 (0.85-1.0): Well-established mechanism or
clinically proven ALS relationship
- Level 2 (0.70-0.84): Strong evidence of ALS
causative relationship
- Level 3 (0.55-0.69): Clear connection to ALS
- Level 4 (0.40-0.54): Plausible relationship
- Level 5 (0.25-0.39): Suggested or indirect connection
- Level 6 (0.0-0.24): Weak or unclear relationship
Respond in JSON with Final Confidence Score (
          <xref ref-type="bibr" rid="ref1">0-1</xref>
          ).
        </p>
      </sec>
      <sec id="sec-10-3">
        <title>B.3. Validation Results Across TCVS Ranges</title>
        <p>The multi-model fusion approach successfully stratified relationships: TCVS &lt; 0.5 efectively filtered
nonbiomedical procedural statements with 98.2% accuracy; the intermediate range (0.5–0.75) captured relationships
requiring human expert intervention; and TCVS ≥ 0.75 identified high-confidence valid relationships with 100%
accuracy.</p>
      </sec>
      <sec id="sec-10-4">
        <title>B.4. Representative Examples</title>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>C. Knowledge Graph Structure Details</title>
      <sec id="sec-11-1">
        <title>C.1. Node Attributes Data Structure</title>
        <p>Each node in the Causal Knowledge Graph contains the following attributes, enabling rich semantic queries and
provenance tracking:</p>
      </sec>
      <sec id="sec-11-2">
        <title>C.2. Edge Attributes Data Structure</title>
        <p>Edges store comprehensive relationship information:
edge_attributes = {
’statement’: str, # Original author statement
’rel_cause’: str, # Extracted cause phrase
’rel_effect’: str, # Extracted effect phrase
’validation_status’: str, # valid only
’edge_confidence’: float, # hybrid matching score
’is_biomarker_relationship’: bool,
’llm_validation’: dict, # TCVS components
’repetition_count’: int,
’match_scores’: dict, # Detailed lexical/semantic scores
’all_paper_ids’: List[str]</p>
      </sec>
      <sec id="sec-11-3">
        <title>C.3. Complete Community Hierarchy</title>
        <p>Table C.1 presents the complete hierarchy of 15 communities identified by Louvain clustering. The community
structure reveals multi-scale organization: large communities represent major pathways (e.g., Markers, ALS
core mechanisms) while smaller communities capture specific mechanisms (e.g.,  -synuclein PTMs, Microglia
imaging).</p>
        <p>The hierarchical organization validates biological relevance: Community 1 (ALS, 213 nodes) centers on core
disease mechanisms including TDP-43 and neurodegeneration; Community 5 (C9orf72, 121 nodes) and Community
Table C.1
Complete list of 15 communities identified by Louvain clustering.</p>
        <p>Rank</p>
        <p>Community</p>
        <p>Size</p>
        <p>Top Terms
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14</p>
        <p>Markers
ALS
progression
APOE
patients
C9orf72
increased
familial ALS
samples
Proteins
CSF
disease
-synuclein
Microglia
validation analyses
279
213
143
139
131
121
100
89
86
77
65
48
37
23
6
7 (familial ALS, 89 nodes) represent genetic factors; Community 0 (Markers, 279 nodes) captures biomarker
terminology essential for diagnosis and monitoring.</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>D. Counterfactual Analysis: SOD1 Mutation</title>
      <sec id="sec-12-1">
        <title>D.1. Methodology</title>
        <p>Counterfactual analysis enables hypothesis generation by simulating interventions on specific nodes and
predicting downstream efects through the causal graph. For SOD1 mutation intervention, we:
1. Identified SOD1 mutation as intervention node  0
2. Computed all reachable nodes via directed paths
3. Calculated path strength with exponential decay: () =
4. Aggregated across all paths: total(0 → ) = ∑︀: 0→ ()
5. Validated predictions using community co-clustering
∏︀=−10 norm(, +1) × exp(−0.1 · )</p>
      </sec>
      <sec id="sec-12-2">
        <title>D.2. Results</title>
        <p>Table D.1 shows the top 15 predicted downstream biomarker responses ranked by combined score (weighted
by impact, uncertainty, and cluster proximity). Path diversity (83–271 pathways across diferent predictions)
indicates robust multi-mechanism efects, while low uncertainty ( ± 0.02–0.05) reflects convergent evidence across
multiple literature sources.</p>
        <p>Table D.1
Top 15 predicted biomarker responses to SOD1 mutation intervention.</p>
        <p>Impact</p>
        <p>Uncertainty</p>
        <p>Cluster</p>
        <p>Combined</p>
      </sec>
      <sec id="sec-12-3">
        <title>D.3. Biological Validation</title>
        <p>The predictions align with established ALS literature: SOD1 mutations account for ∼ 20% of familial ALS cases
and represent the most studied genetic cause [30]. Our framework correctly identified:
• Direct protein efects: SOD1 mutation → SOD1 protein (0.075 impact, 83 paths) reflects the primary
molecular consequence
• Disease subtype association: Strong link to familial ALS (0.068 impact, 0.600 cluster score) validates
known genetic epidemiology
• Protein homeostasis disruption: Predicted impacts on protein abundance (0.066) and glycosylation
(0.068) align with toxic gain-of-function mechanisms involving protein misfolding and aggregation [32, 33]
• Anatomical specificity: Anterior horn motor neuron involvement (0.070 impact) matches the stereotyped
clinical phenotype of SOD1-ALS</p>
        <p>High cluster scores (0.41–0.69) for SOD1-related terms confirm their placement within the same genetic/protein
homeostasis module (Community 7: familial ALS), demonstrating both validation of the cluster structure and
proof of feasibility for using the CKG for predictive analysis.</p>
        <p>Biomarker
SOD1
APOC1 CSF level
SOD1 protein
TNR in ALS models
familial ALS (fALS)
human ALS
mutations
mutation
gene mutations
genes
glycosylation
Protein abundance
anterior horn
TTR</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>E. Agentic Architecture Details</title>
      <sec id="sec-13-1">
        <title>E.1. Multi-Agent Curator System Design</title>
      </sec>
      <sec id="sec-13-2">
        <title>E.2. Agent Specialization and Responsibilities</title>
        <p>Pathogenic Curator Agent:
• Monitors genetic and molecular mechanism literature
• Maintains Communities 5 (C9orf72), 7 (familial ALS), 6 (increased/genetic)
• Specialized gold list: pathogenic terms (genes, proteins, mechanisms)
• TCVS weights: [1 = 0.25, 2 = 0.25, 3 = 0.50] (higher expert weight for novel mechanisms)
Biomarker Curator Agent:
• Tracks diagnostic and prognostic marker studies
• Maintains Communities 0 (Markers), 3 (APOE), 10 (CSF)
• Specialized gold list: biomarker terms (NfL, pNfH, CHIT1, etc.)
• TCVS weights: [1 = 0.30, 2 = 0.35, 3 = 0.35] (balanced, higher domain similarity)
Therapeutic Curator Agent:
• Extracts drug-target relationships and clinical trial results
• Maintains therapeutic intervention nodes (Riluzole, Edaravone, Tofersen)
• Specialized gold list: therapeutic terms (drugs, compounds, treatments)
• TCVS weights: [1 = 0.20, 2 = 0.40, 3 = 0.40] (higher entailment for clinical evidence)
Agent Orchestrator:
• Routes complex queries to appropriate specialist agents
• Resolves conflicts when agents propose contradictory updates
• Implements TCVS consensus: accepts updates if ≥2 agents validate with TCVS &gt; 0.75
• Orchestrates cross-domain queries (e.g., “What biomarkers predict therapeutic response?”)</p>
      </sec>
      <sec id="sec-13-3">
        <title>E.3. Graph RAG Query Processing</title>
        <p>For a query like “What biomarkers predict response to SOD1-targeted therapies?”, the coordinator agent:
1. Decomposes query into subqueries:
• Q1: “Identify SOD1-targeted therapies” → Therapeutic Curator
• Q2: “Find biomarkers associated with SOD1 pathways” → Biomarker Curator
• Q3: “Retrieve SOD1 mechanism subgraph” → Pathogenic Curator
2. Each agent retrieves relevant subgraphs:
• Therapeutic: Tofersen node + edges to SOD1 targets
• Biomarker: NfL, pNfH nodes in APOE community with edges to SOD1
• Pathogenic: SOD1 mutation → protein misfolding → motor neuron death paths
3. Coordinator merges subgraphs, identifying overlapping nodes (e.g., SOD1 protein)
4. Ranks evidence chains by aggregated TCVS scores and path strengths
5. Generates natural language response with provenance (source papers, confidence scores)
This Graph RAG approach provides richer context than traditional text-chunk retrieval, enabling multi-hop
reasoning over structured biomedical knowledge.</p>
      </sec>
      <sec id="sec-13-4">
        <title>E.4. Continuous Learning Protocol</title>
        <p>As new papers emerge, curator agents autonomously:
1. Monitor domain-specific literature feeds (PubMed alerts, preprint servers)
2. Extract relationships using the three-tier TCVS pipeline
3. Propose graph updates (new nodes, edges, or edge weight modifications)
4. Submit proposals to coordinator for consensus validation
5. Update local gold lists with high-confidence novel terms (TCVS &gt; 0.90, validated by ≥3 papers)
This enables the CKG to evolve continuously while maintaining quality through multi-agent consensus,
addressing the gold list coverage limitation identified in Section 5.4.</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>F. Extended Biological Insights</title>
      <sec id="sec-14-1">
        <title>F.1. Community Structure and ALS Pathophysiology</title>
        <p>The 15 identified communities provide a data-driven organizational structure that recapitulates established ALS
research domains while revealing novel connections (refer to Table C.1 for the list of communities):</p>
        <p>Genetic Modules (Communities 5, 7): The C9orf72 community (121 nodes) and familial ALS community (89
nodes) capture the genetic architecture of ALS. C9orf72 hexanucleotide repeat expansions account for ∼ 40% of
familial ALS and ∼ 8% of sporadic cases, making it the most common genetic cause [1]. The strong clustering of
C9orf72-related terms (upregulation, CSF markers, dipeptide repeat proteins) validates the biological coherence
of our automated extraction.</p>
        <p>Biomarker Ecosystem (Communities 0, 3, 10): The Markers community (279 nodes) represents the largest
functional module, reflecting the intensive focus on biomarker discovery in recent ALS research. The APOE
community (139 nodes) captures lipid metabolism and neuroinflammatory markers, while the CSF community
(65 nodes) focuses on fluid-based diagnostics. The dense connectivity between these communities (1,247
intercommunity edges) suggests that efective ALS biomarker panels will require multi-modal integration across
genetic, inflammatory, and neurodegeneration markers.</p>
        <p>Disease Progression Pathways (Community 2): The progression community (143 nodes) contains terms
related to disease advancement, clinical milestones, and therapeutic monitoring. The presence of BIIB078 (an
antisense oligonucleotide targeting C9orf72) as a central node demonstrates the framework’s ability to capture
emerging therapeutic strategies and their relationship to disease progression endpoints.
Our counterfactual analysis on SOD1 mutation (Section 4.3, Appendix D) generated several testable hypotheses:</p>
        <p>Hypothesis 1: SOD1 mutations modulate glycosylation patterns. The predicted impact on glycosylation
(combined score 0.241, cluster score 0.414) suggests that SOD1 misfolding may disrupt post-translational
modification pathways. This could be tested by comparing glycoproteomic profiles in SOD1-ALS patient CSF versus
controls.</p>
        <p>Hypothesis 2: APOC1 CSF levels serve as SOD1-ALS biomarkers. The strong predicted association
(impact 0.500, though with high uncertainty 0.500) between SOD1 mutation and APOC1 CSF levels warrants
validation. APOC1 is involved in lipid metabolism and has been implicated in Alzheimer’s disease, suggesting
potential shared mechanisms.</p>
        <p>Hypothesis 3: Anterior horn pathology is preferentially associated with SOD1 mutations. The
predicted impact on anterior horn motor neurons (0.070, cluster score 0.400) aligns with clinical observations that
SOD1-ALS often presents with limb-onset rather than bulbar-onset symptoms. Quantitative MRI studies could
test whether SOD1 mutation carriers show greater anterior horn atrophy compared to other genetic subtypes.</p>
        <p>These hypotheses demonstrate the framework’s utility for generating data-driven research directions that can
accelerate therapeutic development.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>O.</given-names>
            <surname>Hardiman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Chalabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chio</surname>
          </string-name>
          , et al.,
          <article-title>Amyotrophic lateral sclerosis</article-title>
          ,
          <source>Nature Reviews Disease Primers</source>
          <volume>3</volume>
          (
          <year>2017</year>
          )
          <article-title>17071</article-title>
          . URL: https://doi.org/10.1038/nrdp.
          <year>2017</year>
          .
          <volume>71</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mansfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moussy</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Hermine,</surname>
          </string-name>
          <article-title>ALS clinical trials review: 20 years of failure. Are we any closer to registering a new treatment?</article-title>
          ,
          <source>Frontiers in Aging Neuroscience</source>
          <volume>9</volume>
          (
          <year>2017</year>
          )
          <article-title>68</article-title>
          . URL: https: //doi.org/10.3389/fnagi.
          <year>2017</year>
          .
          <volume>00068</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , R. H.
          <string-name>
            <surname>Brown</surname>
            , Jr.,
            <given-names>D. W.</given-names>
          </string-name>
          <string-name>
            <surname>Cleveland</surname>
          </string-name>
          ,
          <string-name>
            <surname>Decoding</surname>
            <given-names>ALS</given-names>
          </string-name>
          :
          <article-title>from genes to mechanism</article-title>
          ,
          <source>Nature</source>
          <volume>539</volume>
          (
          <year>2016</year>
          )
          <fpage>197</fpage>
          -
          <lpage>206</lpage>
          . URL: https://doi.org/10.1038/nature20413.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mejzini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Flynn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. L.</given-names>
            <surname>Pitout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fletcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Wilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Akkari</surname>
          </string-name>
          ,
          <article-title>ALS genetics, mechanisms, and therapeutics: Where are we now?</article-title>
          ,
          <source>Frontiers in Cellular Neuroscience</source>
          <volume>13</volume>
          (
          <year>2019</year>
          ). URL: https://doi.org/10. 3389/fnins.
          <year>2019</year>
          .
          <volume>01310</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Verde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Steinacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Weishaupt</surname>
          </string-name>
          , et al.,
          <article-title>Neurofilament light chain in serum for the diagnosis of amyotrophic lateral sclerosis</article-title>
          ,
          <source>Journal of neurology, neurosurgery, &amp; psychiatry 90</source>
          (
          <year>2019</year>
          )
          <fpage>157</fpage>
          -
          <lpage>164</lpage>
          . URL: https://doi.org/10.1136/jnnp-2018-318704.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Thompson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bampton</surname>
          </string-name>
          , et al.,
          <article-title>CSF chitinase proteins in amyotrophic lateral sclerosis</article-title>
          ,
          <source>Journal of Neurology, Neurosurgery &amp; Psychiatry</source>
          <volume>90</volume>
          (
          <year>2019</year>
          )
          <fpage>1215</fpage>
          -
          <lpage>1220</lpage>
          . URL: https://doi.org/10.1136/jnnp-2019-320442.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Fundel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Küfner</surname>
          </string-name>
          , R. Zimmer,
          <article-title>RelEx-Relation extraction using dependency parse trees</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>23</volume>
          (
          <year>2007</year>
          )
          <fpage>365</fpage>
          -
          <lpage>371</lpage>
          . URL: https://doi.org/10.1093/bioinformatics/btl616.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Zheng,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <article-title>Drug-drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>34</volume>
          (
          <year>2018</year>
          )
          <fpage>828</fpage>
          -
          <lpage>835</lpage>
          . URL: https://doi.org/10.1093/bioinformatics/btx659.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets (</article-title>
          <year>2019</year>
          )
          <fpage>58</fpage>
          -
          <lpage>65</lpage>
          . URL: https://doi.org/10.18653/v1/
          <fpage>W19</fpage>
          -5006.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pawar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>More</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Palshikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Varma</surname>
          </string-name>
          ,
          <article-title>Knowledge-based extraction of cause-efect relations from biomedical text</article-title>
          , in: S. Jain,
          <string-name>
            <given-names>S.</given-names>
            <surname>Groppe</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. K.</surname>
          </string-name>
          Bhargava (Eds.),
          <source>Semantic Intelligence</source>
          , Springer Nature Singapore, Singapore,
          <year>2023</year>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>So</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Kang,</surname>
          </string-name>
          <article-title>BioBERT: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>36</volume>
          (
          <year>2020</year>
          )
          <fpage>1234</fpage>
          -
          <lpage>1240</lpage>
          . URL: https://doi.org/ 10.1093/bioinformatics/btz682.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tinn</surname>
          </string-name>
          , H. Cheng, et al.,
          <article-title>Domain-specific language model pretraining for biomedical natural language processing</article-title>
          ,
          <source>ACM Transactions on Computing for Healthcare</source>
          <volume>3</volume>
          (
          <issue>2021</issue>
          )
          <article-title>Article 2</article-title>
          ,
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          . URL: https://doi.org/10.1145/3458754.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yasunaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , P. Liang,
          <article-title>LinkBERT: Pretraining language models with document links</article-title>
          ,
          <source>in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>8003</fpage>
          -
          <lpage>8016</lpage>
          . URL: https://doi.org/10.18653/v1/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>551</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>