<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Construction of a Patent Term Thesaurus with Fine-Tuned ChatGPT</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hidetsugu Nanba</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kohei Iwakuma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Satoshi Fukuda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chuo University</institution>
          ,
          <addr-line>1-13-27 Kasuga, Bunkyo-ku, Tokyo, 112-8551</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <fpage>84</fpage>
      <lpage>91</lpage>
      <abstract>
        <p>Technical terms in patent documents are expressed with highly variable, domain-specific language, hindering cross-lingual prior-art search and knowledge discovery. Existing automatic thesaurus construction pipelines either rely on handcrafted patterns, which sufer from low recall, or on graph-augmented representation learning, which is accurate but complex and largely monolingual. We present a lightweight three-stage framework that: (1) filters candidate term pairs with of-the-shelf embeddings, (2) assigns fine-grained semantic relations via a ChatGPT-4o model fine-tuned on 36k English patent pairs, and (3) enforces cross-lingual consistency through ifxed-expression hypernym seeds automatically aligned between Japanese and English. The final output is written directly into an incrementally updateable multilingual thesaurus graph. On the Google Patent Phrase Similarity Dataset, our fine-tuned LLM attains 0.762 Pearson / 0.738 Spearman, outperforming strong baselines (SBERT, Patent-BERT) and the recent graph-based RA-Sim model by up to 0.14 correlation points.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Large Language Models</kwd>
        <kwd>Term Relation Extraction</kwd>
        <kwd>Thesaurus Construction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>• It presents a workflow that organises the relations predicted by the LLM into a graph whose
nodes are terms and edges are relations, and then expands this graph recursively to build an
automatically updatable multilingual patent thesaurus.
• It introduces an evaluation procedure that combines pattern-based hypernym candidates extracted
independently from Japanese and English patents with their translation alignments, enabling the
community to verify whether the LLM’s predictions remain consistent across languages.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Automatic prediction of semantic relations—synonymy, hypernymy, meronymy, and the like—between
technical terms has long underpinned knowledge acquisition and high-recall retrieval. Historical
approaches fall into three broad families: symbolic pattern rules, distributional or embedding methods,
and, most recently, large language models (LLMs). Below we survey their evolution in chronological
order, emphasising patent-specific work and highlighting how our study difers.</p>
      <p>
        Early research relied on explicit lexico-syntactic patterns. Hearst’s seminal paper introduced templates
such as “X is a kind of Y ” to harvest thousands of hypernym–hyponym pairs at negligible cost [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The
same idea was later applied to Japanese patent corpora: Nanba et al. mined the pattern “A nado no B” (B
such as A), aligned the resulting pairs with English equivalents, and built a bilingual thesaurus with 78 %
F1 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Building on this, their subsequent study translated scholarly terms into patent terminology by
combining citation analysis with an automatically constructed thesaurus, significantly broadening the
candidate space [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Their scope, however, is restricted to hypernym–hyponym relations only, whereas
the present study predicts a full spectrum of relations—including synonymy, meronymy, and graded
similarity—across languages. Symbolic methods moreover demand handcrafted patterns for every
language and domain; even in English, Roller et al. revisited Hearst rules with modern corpora to
boost accuracy, yet still faced recall limits when wording drifted from canonical templates [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Patents
exacerbate this problem: identical concepts are phrased idiosyncratically (“soccer ball” vs. “spherical
recreational device”), so surface patterns alone capture only a fraction of true relations. A broader survey
of how rule-based and other NLP techniques transfer—or fail to transfer—between patent sub-genres
is given by Andersson et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Complementary work by Judea et al. shows that figure references
themselves can be harvested as symbolic cues, yielding fully unsupervised, high-quality training data
for patent terminology extraction [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Distributional approaches learn continuous vectors from large corpora. Word2Vec[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and GloVe[10]
established that words with similar contexts occupy nearby positions in an embedding space. Jana et al.
projected a distributional thesaurus into such a space and achieved strong co-hyponym detection by
clustering context-similar terms [11]. However, plain similarity cannot distinguish type (synonymy
versus hypernymy). Subsequent work trained classifiers or added constraints; Liu et al. prompted
BERT with masked templates (“X is a type of __”) to recover hypernyms more robustly [12].
Contextual models improved further with Transformer pre-training: BERT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and its Siamese variant
Sentence-BERT (SBERT) [13] achieved state-of-the-art semantic similarity. Yet domain adaptation
proved essential—Patent-BERT, trained on claim corpora, vastly outperformed general BERT on patent
relation benchmarks.
      </p>
      <p>The advent of LLMs enabled direct reasoning over relations. Models such as ChatGPT-4 store vast
world knowledge and can generate definitions, synonyms, or hypernyms with minimal prompting.
Recent reports show ChatGPT-4 successfully deriving taxonomic links for multilingual cultural terms,
indicating latent cross-lingual competence unavailable to earlier systems. In the patent realm, Peng
and Yang combined a contextual encoder with a citation-derived phrase graph; their self-supervised
method captured global evidence beyond local context and raised similarity correlation by seven points
[14]. Such hybrids improve accuracy but demand heavy pipelines (citation crawling, graph learning)
and remain monolingual.</p>
      <p>
        Cross-domain evaluation has been invigorated by resources tailored to patents. The Google Patent
Phrase Similarity Dataset supplies 50 k phrase pairs with graded similarity and relation labels [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ];
Kaggle competitions around it confirmed SBERT-style models as strongest baselines and revealed the
benefit of patent-specific pre-training. Yet most entries handled English only and did not automate
thesaurus induction.
      </p>
      <p>Our study departs from prior art in three ways. First, we retain a lightweight embedding filter but
rely on a minimally fine-tuned ChatGPT-4o to infer relations, avoiding bespoke citation graphs or rule
sets. Second, we enforce cross-lingual consistency via pattern-harvested bilingual seed pairs, allowing
the same model to populate a thesaurus in Japanese and English without extra translation resources.
Third, the LLM’s output is written directly into an incrementally expandable graph, turning relation
inference into immediate thesaurus construction rather than a separate post-processing step. In doing
so, we address the lingering gaps of multilingual coverage, domain knowledge acquisition, and pipeline
complexity that earlier approaches left open.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Method</title>
      <p>Our framework builds a multilingual patent thesaurus through two alternative relation–inference
strategies plus a multilingual verification step: (i) an embedding-based similarity inference, (ii) an
LLM-based explicit-label inference, and (iii) pattern-driven multilingual enrichment. Stages
(i) and (ii) pursue the same objective—predicting the semantic relation of a term pair—but difer in
the signal they exploit: dense vectors vs. generative reasoning. Stage (iii) then enforces cross-lingual
consistency and incrementally expands the thesaurus graph.</p>
      <sec id="sec-3-1">
        <title>3.1. Embedding-Based Similarity Inference</title>
        <p>Given a term pair (,  ), we obtain vectors e, e ∈ R from either OpenAI Embeddings ( = 1536) or
multilingual-e5-large1 ( = 1024). Their cosine similarity,
sim(,  ) =</p>
        <p>e · e ,
‖e‖ ‖e ‖
serves as a proxy score for semantic relatedness. Pairs whose score exceeds a threshold  (0.35 for OpenAI,
0.30 for e5) are tentatively regarded as related (synonym or taxonomic) and forwarded to the multilingual
verification in Stage (iii). This embedding view ofers a fast, language-agnostic approximation that
requires no fine-tuning.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. LLM-Based Explicit Relation Inference</title>
        <p>Alternatively, the same pair can be passed to ChatGPT-4o mini, fine-tuned on the Google Patent Phrase
Similarity Dataset. The prompt asks:</p>
        <p>Based on ’ reading machine’, what is the relationship of ’ photocopier’? Please choose the most
appropriate one from the following:
1: ’Not related.’
2: ’Other high level domain match.’
3: ’Holonym (a whole of).’
4: ’Meronym (a part of).’
5: ’Antonym.’
6: ’Structural match.’
7: ’Hypernym (narrow-broad match).’
8: ’Hyponym (broad-narrow match).’
9: ’Highly related.’
10: ’Very highly related.’
The model chooses a single label from Table 1; we map it to a numerical score
{1.00, 0.75, 0.50, 0.25, 0.00}. Compared with Stage (i), the LLM returns an explicit relation
type (e.g., Hyponym, Meronym) rather than a scalar similarity.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Pattern-Driven Multilingual Enrichment</title>
        <p>1. Seed extraction:
• Japanese: phrases matching “A nado no B”
• English: phrases matching “B such as A”</p>
        <sec id="sec-3-3-1">
          <title>These patterns produce provisional hyponym () / hypernym () pairs.</title>
          <p>2. Translation alignment: English pairs are machine-translated to Japanese using ChatGPT and
intersected with the Japanese set;high-confidence bilingual pairs.
3. Cross-lingual verification: Each pair is checked by either Stage (i) or (ii); only pairs whose</p>
          <p>Japanese and English predictions agree are accepted.
4. Thesaurus graph update: Accepted pairs become edges (relation type) between term nodes.</p>
          <p>The graph updates automatically as new pairs arrive.</p>
          <p>By ofering two complementary inference routes—fast embedding similarity or explicit LLM
labelling—and a verification layer that fuses them across languages, our method achieves multilingual
coverage with minimal fine-tuning while avoiding complex citation graphs or handcrafted rules.
Experimental details follow in Section 4.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <p>Alternatives
Datasets For the English task we adopt the Google Patent Phrase Similarity Dataset, using 36,473 pairs
for training and 9,232 for validation and testing.</p>
        <p>
          • Embedding models: Word2Vec, GloVe, BERT, SBERT, Patent-BERT (baselines reported by [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]),
        </p>
        <p>OpenAI Embeddings (text-embedding-3-large), and multilingual-e5-large.
• Graph + encoder: the phrase-graph embeddings released with RA-Sim (a baseline reported by
[14]).
• LLMs: ChatGPT-4o and ChatGPT-4o mini in their pretrained form, plus fine-tuned versions on
the English training split.</p>
        <p>Metrics For English we report Pearson and Spearman correlation between predicted similarity scores
and gold scores.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>The results are shown in Table 2. The fine-tuned ChatGPT-4o attains the strongest correlation (Pearson
0.762), outperforming the graph-augmented RA-Sim by 0.14 Pearson / 0.09 Spearman.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Discussion</title>
        <p>To verify the efectiveness of fine-tuning, we compared similarity scores before and after adaptation.
Table 3 shows that scores improved for 42% of pairs with ChatGPT-4o and 52% with ChatGPT-4o mini,
while only 10 % deteriorated. The overall distribution shifted toward values closer to the gold standard,
indicating that fine-tuning successfully supplements the model’s domain knowledge and yields more
accurate similarity estimates.</p>
        <p>Because the LLM classifies each pair into ten semantic relations, we can compute precision and
recall for every class. Tables 5a and 5b list the fine-tuned ChatGPT-4o and ChatGPT-4o mini results,
respectively. Both models excel at Not related, Antonym, and high-similarity classes, while Holonym,
Meronym, and Structural match remain challenging—mainly due to their scarcity in the training data.
Therefore, we constructed a multilingual thesaurus while improving these problems using the method
proposed in Section 3.3.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Automatic Construction of a Multilingual Thesaurus Using</title>
    </sec>
    <sec id="sec-6">
      <title>Cross-Lingual Verification</title>
      <p>We automatically construct a multilingual thesaurus from the full text of Japanese and US patents
published between 1993 and 2023. Our main objective is to extract hypernym-hyponym relationships,
but we also extract other relationships in the process. The procedure is described below.
1. Using the expressions “A nado no B” (Japanese) and “B such as A” (English), we extracted 613,251
Japanese and 518,166 English candidate pairs and kept 42,784 bilingual pairs after translation
alignment using ChatGPT.
2. ChatGPT-4o mini (fine-tuned) predicted relations for both languages; only pairs with matching
labels were retained (21,673 pairs).</p>
      <p>In Step 2, we decided to use ChatGPT-4o mini (fine-tuned), which is comparable to ChatGPT-4o,
which had the highest value in Table 2, because processing large amounts of data is extremely costly.</p>
      <p>Tables 6a and 6b show the distribution of labels obtained by classifying the top and bottom candidates
in English and Japanese from Step 1 using ChatGPT-4o mini (fine-tuned). Additionally, Table 6 shows
the distribution of labels for the results where English and Japanese agree.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion</title>
      <p>We introduced a three–stage pipeline that combines a lightweight embedding filter, a minimally
finetuned ChatGPT-4o, and pattern-driven cross-lingual verification to build a continuously expandable
multilingual patent thesaurus. Experiments on the Google Patent Phrase Similarity Dataset
demonstrated that the proposed LLM surpasses both embedding baselines and the recent graph-augmented
RA-Sim model (Pearson 0.762 vs. 0.622). On 42,784 automatically aligned Japanese–English hypernym
pairs, the pattern + LLM strategy achieved 97 % accuracy.</p>
      <p>The framework requires no citation crawling, no external knowledge base, and no language-specific
rules beyond a handful of fixed expressions, yet delivers state-of-the-art accuracy while remaining fully
incremental. These traits make it attractive for industry settings where frequent thesaurus updates and
multilingual coverage are essential.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <sec id="sec-8-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
        <p>Conference on Computational Linguistics: Technical Papers, Dublin City University and the
Association for Computational Linguistics, Dublin, Ireland, 2014, pp. 290–300.
[10] J. Pennington, R. Socher, C. Manning, Glove: Global vectors for word representation, in:
Proceedings of EMNLP 2014, 2014, pp. 1532–1543.
[11] A. Jana, N. R. Varimalla, P. Goyal, Using distributional thesaurus embedding for co-hyponymy
detection, in: Proceedings of LREC 2020, 2020, pp. 5766–5771.
[12] C. Liu, T. Cohn, L. Frermann, Seeking clozure: Robust hypernym extraction from bert with
anchored prompts, in: Proceedings of *SEM 2023, 2023, pp. 193–206.
[13] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks, in:</p>
        <p>Proceedings of EMNLP–IJCNLP 2019, 2019, pp. 3982–3992.
[14] Z. Peng, Y. Yang, Connecting the dots: Inferring patent phrase similarity with retrieved phrase
graphs, in: Proceedings of NAACL 2024, 2024, pp. 1877–1890.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hearst</surname>
          </string-name>
          ,
          <article-title>Automatic acquisition of hyponyms from large text corpora</article-title>
          ,
          <source>in: Proceedings of COLING '92</source>
          ,
          <year>1992</year>
          , pp.
          <fpage>539</fpage>
          -
          <lpage>545</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Eficient estimation of word representations in vector space</article-title>
          ,
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of NAACL</source>
          <year>2019</year>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Aslanyan</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Wetherbee</surname>
          </string-name>
          ,
          <article-title>Patents phrase to phrase semantic matching dataset</article-title>
          ,
          <source>arXiv preprint arXiv:2208.01171</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Nanba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mayumi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Takezawa</surname>
          </string-name>
          ,
          <article-title>Automatic construction of a bilingual thesaurus using citation analysis</article-title>
          ,
          <source>in: Proceedings of the PaIR'11 Workshop</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Nanba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kamaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Takezawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Okumura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shinmori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tanigawa</surname>
          </string-name>
          ,
          <article-title>Automatic translation of scholarly terms into patent terms</article-title>
          ,
          <source>in: Proceedings of the 2nd International Workshop on Patent Information Retrieval (PaIR '09)</source>
          , Association for Computing Machinery, Hong Kong, China,
          <year>2009</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>24</lpage>
          . DOI forthcoming.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Roller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          ,
          <article-title>Hearst patterns revisited: Automatic hypernym detection from large text corpora</article-title>
          ,
          <source>in: Proceedings of ACL</source>
          <year>2018</year>
          ,
          <year>2018</year>
          , pp.
          <fpage>358</fpage>
          -
          <lpage>363</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Andersson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          ,
          <article-title>The portability of three types of text mining techniques into the patent text genre</article-title>
          , in: Mihai Lupu, Katja Mayer, John Tait, Anthony J.
          <string-name>
            <surname>Trippe</surname>
          </string-name>
          (Eds.),
          <source>Current Challenges in Patent Information Retrieval</source>
          , volume
          <volume>37</volume>
          <source>of The Information Retrieval Series</source>
          , Springer, Berlin / Heidelberg,
          <year>2017</year>
          , pp.
          <fpage>241</fpage>
          -
          <lpage>280</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>662</fpage>
          -53817-3\_9.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Judea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brügmann</surname>
          </string-name>
          ,
          <article-title>Unsupervised training set generation for automatic acquisition of technical terminology in patents</article-title>
          ,
          <source>in: Proceedings of COLING</source>
          <year>2014</year>
          , the 25th International
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>