<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Revealing the Research Deviation of AI Research Between China and the U.S.⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Han Sun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guo Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Nanjing University of Science and Technology</institution>
          ,
          <addr-line>Nanjing ,210994, Jiangsu</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>75</fpage>
      <lpage>78</lpage>
      <abstract>
        <p>China and the U.S. are recognized as leading forces in artificial intelligence (AI) research. Understanding the research differences between these two nations is crucial for grasping the global AI landscape. This paper moves beyond traditional methods reliant on frequency statistics and topic analysis. By analyzing both co-occurrence and vector semantic fields, we delineate the research focuses and content preference on specific domain entities in AI between China and the U.S. This framework enables a thorough examination of the distribution of research efforts within each zone, providing valuable insights into the distinctive research profiles and potential collaboration pathways in AI between these two technological giants..</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Research difference</kwd>
        <kwd>Semantic deviation</kwd>
        <kwd>Domain entity</kwd>
        <kwd>Semantic field</kwd>
        <kwd>Artificial Intelligence 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Artificial Intelligence (AI) has become a critical catalyst for economic and cultural progress [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Within the sphere of AI, China and the United States are acknowledged leaders in the global arena
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, in the comparative analysis of AI development, the aforementioned studies do not
extensively address the semantic nuances in the disparities of research between China and the U.S.
within this domain. Thus, this study adopts a theoretical framework grounded in semantic
deviation and semantic fields to undertake a comparative analysis of AI research disparities
between China and the U.S.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Overview of the papers</title>
      <p>Based on the aforementioned approach, we propose the implementation process illustrated as
follows:</p>
      <p>Data processing and construction of word vectors: This study takes 404,168 journal articles in
the field of AI from the WOS core collection from January 1996 to May 2023 as the data source.
After data processing and identifying problem and method entities from titles and abstracts, each
entity is represented by a 100-dimensional vector using the Word2Vec model.</p>
      <p>Two-dimensional matrix analysis: To calculate the quantitative values of each domain entity,
we focus on research scale and semantic deviation: the research scale for a given entity in a
particular country can be represented by the document frequency of the entity in the two corpora,
and the semantic deviation between the two corpora can be quantified by weighted vector distance.</p>
      <p>Analysis based on semantic field: Semantic field analysis consists of two distinct parts:
cooccurrence semantic field and semantic distance semantic field. Identifying the top 10 words that
have the smallest vector distance and highest co-occurrence with with the selected words in the
corpus provides insights into the research scale and content preference.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Result analysis</title>
      <sec id="sec-3-1">
        <title>3.1. Overall analysis</title>
        <p>The resulting distribution is detailed as follows.</p>
        <p>From the data, we can see that there are 4020 entities with large semantic differences,
accounting for 37.55% of the total, and the rest are those with small semantic differences. More
than one-third of the topics have large differences in content preferences, which is a considerable
proportion. Subsequent analysis will delve deeper into the differences and the reasons behind them
from a smaller perspective.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Macro level analysis - top 100 entities</title>
        <p>To further explore the differences in research focuses between China and the U.S., we construct
two co-occurrence networks of top 100 high-frequency entities in each corpus, followed by topic
clustering and visualization as shown in Figure 3 and Figure 4. In the figure, the color of the nodes
indicates the cluster category to which the entities belong, the shape of the nodes represents the
types of semantic differences associated with the entities.</p>
        <p>As shown in the figures, the results align closely with the overall findings shown in Figure 2,
indicating that for nearly 80% of popular research topics in AI, China and the United States have
similar levels of semantic deviation (content preference).</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Micro level analysis - case study</title>
        <p>To further explore the details and underlying causes of semantic deviation at a micro level, we
taking the term "facial recognition" as an example.</p>
        <p>Table 1 shows that the differences lie in the areas of research emphasis. From the semantic
neighbors, we see that China’s research on facial recognition leans more towards specific
individual traits. In contrast, U.S. research on facial recognition is more ocused on feature
fdetection and image differentiation. Overall, China’s research is conducted at a finer level of
granularity.</p>
        <p>
          These differences are closely related to privacy concerns. In China, historical practices have
fostered greater acceptance of facial recognition technology, leading to detailed research.
Conversely, U.S. citizens prioritize privacy protection [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Consequently, U.S. research focuses
more on technologies that are less connected to personal identity.
        </p>
        <p>This case study demonstrates that the semantic field constructed by semantic neighbors
uncovers more detailed information.</p>
        <p>The last paper in this section is by Qiu and Li, “Research on Paper Semantic Novelty
Measurement Based on Large Language Model”, they proposed a semantic novelty measurement
model for scientific papers using a large language model to generate question and method words.
Enhanced by LoRA and prompts, the model achieved high precision and recall, proving
effective and robust, with optimal cost- effectiveness at 3,000 training samples..</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conslusion</title>
      <p>The innovative approach we proposed that integrates a word embedding model with semantic field
analysis vectors to investigate differences in semantics and research applications across various
entities. This novel method surpasses traditional co-occurrence-based semantic field studies.
Nonetheless, the study acknowledges certain methodological limitations, which is inherent to the
word2vec model used in the analysis. Future research should focus on identifying more precise
methods for representing entity semantics.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This study is supported by the Social Science Foundation of Jiangsu Province (No. 24TQB001).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>(by using the activity taxonomy in ceur-ws.org/genai-tax.html):
During the preparation of this work, the author(s) used GPT-4 in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed
and take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Benko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lányi</surname>
            ,
            <given-names>C. S. :</given-names>
          </string-name>
          <article-title>History of artificial intelligence</article-title>
          .
          <source>In Encyclopedia of Information Science and Technology, Second Edition</source>
          (pp.
          <fpage>1759</fpage>
          -
          <lpage>1762</lpage>
          ).
          <source>IGI Global</source>
          (
          <year>2009</year>
          ).doi:
          <volume>10</volume>
          .4018/978-1-
          <fpage>60566</fpage>
          -026-4.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Saveliev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhurenkov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Artificial intelligence and social responsibility: the case of the artificial intelligence strategies in the United States, Russia, and</article-title>
          <string-name>
            <surname>China.</surname>
          </string-name>
           Kybernetes, 
          <volume>50</volume>
          (
          <issue>3</issue>
          ),
          <fpage>656</fpage>
          -
          <lpage>675</lpage>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1108/k-01
          <string-name>
            <surname>-</surname>
          </string-name>
          2020-
          <volume>0060</volume>
          ..
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sadeh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>Facial recognition: Understanding privacy concerns and attitudes across increasingly diverse deployment scenarios</article-title>
          .
          <source>In Seventeenth Symposium on Usable Privacy and Security (SOUPS</source>
          <year>2021</year>
          )
          <article-title>(pp</article-title>
          .
          <fpage>243</fpage>
          -
          <lpage>262</lpage>
          ).doi:
          <volume>10</volume>
          .1037/t25710-
          <fpage>000</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>