<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Construction and Analysis of Surrounding Travel Demanding Graph Based on Dual Contrastive Learning Text Classification and Graph Neural Network1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guoping Lai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiheng Chi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fan Pan</string-name>
          <email>panfan2022@163.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhihao Xu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hao Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Engineering University</institution>
          ,
          <addr-line>Zhengzhou 450001</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>203</fpage>
      <lpage>213</lpage>
      <abstract>
        <p>Understanding the main information about the current situation of the tourism market has become an urgent need and new trends in the development of the tourism market. In this paper, we use natural language processing technology to analyze the development of tourism around Maoming City, Guangdong Province during the COVID-19 epidemic by means of data mining methods to build a local tourism graph, refine and design models and methods such as RoBERTa-BiGRU-Attention fusion model, dual contrastive learning, BERT-BiLSTM-CRF named entity identification technique, improved Apriori algorithm, GNNLP model based on conventional models and proved the rationality and efficiency of the improved model by comparative test, provide oriented suggestions to help government departments promote tourism and tourism enterprises product supply, optimize resource allocation and explore the market constantly during the epidemic period after scientific analysis and summary.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;RoBERTa-BiGRU-Attention fusion model</kwd>
        <kwd>Dual Contrastive Learning</kwd>
        <kwd>BERT-BiLSTM-CRF</kwd>
        <kwd>sentiment analysis</kwd>
        <kwd>the improved Wilson interval method</kwd>
        <kwd>improved Apriori</kwd>
        <kwd>GNNLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In the circumstance of the regular prevention and control of the COVID-19 epidemic in recent years,
there has been a clear shift in the way that tourists consume tourism in China. Nowadays tourists are
more likely to choose short distance travel, the local surrounding travel size skyrocketed ushered in the
wind. Under such changes, accurate and rapid understanding the preferences and consumer psychology
of tourists has a long-term and positive effect on promoting tourism enterprises product supply,
optimizing resource allocation and exploring the market constantly.</p>
      <p>With the promotion of "Internet+Tourism" services and the boom of self-media, the main source of
information in understanding the current situation of tourism market is Online Travel Agency and User
Generated Content data, and using Natural Language Processing technology to analyze tourism text has
gradually become a trend. Tourism enterprises and tourism administrators need to use NLP technology
to discover relevant tourism elements from relevant tourism texts and tourism product reviews, at the
same time digging the correlations between elements and implied high-level concepts, thus predicting
and mastering consumer psychology as to make better tourism resource allocation.</p>
      <p>
        Facing the above market demand, Zhang Ju [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] et al. proposed a sentiment classification method by
fusing Text-Rank and conducted experiments by using deep learning models such as RNN, LSTM,
Text-CNN, BERT; Cui Li Ping [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] et al. proposed a directed graph neural network (L-CGNN) model
fused with lexical information for named entity identification in the tourism field to extract tourism
entities, Zhang Nuo [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] analyzed the tourism text by constructing knowledge graphs.
      </p>
      <p>However, the current analysis of tourism text mainly consists of single task and does not make full
use of text data for comprehensive analysis. Therefore, establishing a tourism demand analysis system
based on natural language processing technology has become an urgent need and a new trend in the
development of tourism market.</p>
      <p>This paper examines the following three questions by analysing the demand for peripheral travel in
the circumstance with the normalized prevention and control of the COVID-19 epidemic.
1. Identify and classify the huge amount of travel-related WeChat articles pushed online.
2. Analyse the popularity of numerous tourism products quantitatively and rank them according to
their popularity.</p>
      <p>3. Construct local tourism graph to mine and analyze implied relationships among tourism products.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data Processing</title>
      <p>We collected 3385 online travel-related articles by web crawlers from travel-related texts on Sohu
News, Tencent News, China Travel Network etc. The numerous tourism product data were obtained
from the data files extracted from major tourism websites.</p>
      <p>
        In 2018 Google team released a pre-training model in natural language processing, BERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It uses
large-scale unlabeled corpus training to obtain textual expressions containing rich meanings, which
pioneered the pre-training model. This paper uses an improved fusion model of BERT for text
classification. However, the input length of BERT is limited to a maximum of 512 characters [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which
also needs to include two flag bits [CLS] and [SEP]; on the other hand, each character may also be
divided into several parts after Tokenizer, so the actual input sentence length may be less than 512 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Meanwhile the length of tourism text is generally quite long, if directly truncate the text that exceed the
maximum length, some effective information will be lost, which is detrimental to the classification task,
so we need to extract the text summary to solve this problem. This paper tried two approaches to extract
text summaries: the unsupervised algorithm Text Rank [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] based on graph ranking and the BiGRU [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
model with bidirectional recognition of text.
      </p>
      <p>Rouge is a set of metrics evolved from the recall rate, and its main idea is to compare the
algorithmgenerated summaries with the manually generated standard summaries and evaluate the quality of the
summaries by measuring their overlapping degree in N-gram, word sequences and word pairs. Rouge
contains Rouge-N, Rouge-L, Rouge-W, and Rouge-S 4 indicators. The comparison of the two Rouge
metrics is shown in Table 1 below, which reveals that BiGRU has better results for generative
summarization of text and generative summaries work better, have the advantage of synthesizing
fulltext information and incorporating external perceptions compared to extractive summaries.</p>
      <sec id="sec-2-1">
        <title>Winter travel know how much 9.6</title>
        <p>million square kilometers of the
motherland, the four seasons have a
unique beauty of winter travel also has
a special flavor ...... Miss Xie
13902544039 (1190 words)</p>
      </sec>
      <sec id="sec-2-2">
        <title>Welcome to</title>
        <p>attention.(321 words)
Winter travel also has
a special flavor. But
the harsh winter
climate discourages
many people ......
physical strength will
decline. (323 words)</p>
      </sec>
      <sec id="sec-2-3">
        <title>New Year air tickets. (289 words)</title>
      </sec>
      <sec id="sec-2-4">
        <title>Although winter</title>
        <p>travel has its own
flavor, the cold</p>
        <p>weather is a
deterrent ......(313
words)</p>
        <p>In the original comment text, there are comment texts with the same content but different IDs, and
the duplicate comments are sorted and filtered by time to keep the earliest comments posted. As the
travel guide text is unstructured text, it is not uniform in structure with hotel, restaurant and scenic spot
comments data. To accurately extract tourism products from unstructured travelogue text data, named
entity identification is required, and each sentence of a travelogue guide may contain entities. If using
the text summarization algorithm to compress the travel tips, a large number of valid entities may be
lost. Therefore, it is necessary to divide each travelogue guide into sentences.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Model building and analysis</title>
    </sec>
    <sec id="sec-4">
      <title>3.1.Tourism text classification</title>
    </sec>
    <sec id="sec-5">
      <title>3.1.1.Text classification based on RoBERTa-BiGRU-Attention fusion model</title>
      <p>
        Currently, updates on the way machine learning classifies and extracts information from text are
changing rapidly, The first RNN can adequately learn the text context information, but it is likely to
have the problem of gradient dispersion, which is not suitable for learning long-distance text
information, and then improve to get the long and short term memory neural network(LSTM). Due to
the complex structure, the computational parameters and more and more computationally intensive of
LSTM neural network, there comes the GRU model, which is simpler than LSTM and has fewer
parameters. In order to further reduce the model training time, improve the accuracy and reduce the loss
rate, researchers proposed the BiGRU-Attention model, which can reduce the computational effort of
the model and fully extract the feature information of the text context compared to a single hybrid model
of LSTM or GRU neural network [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, in effectiveness of text classification and information
extraction, the complexity of the original sentence makes the model not as effective as it could be. If
dividing the original sentence into several word vectors and then merging them into sentence vectors,
the classification and extraction effect will be greatly improved. Therefore, this paper incorporates the
RoBERTa [12] model and designs and applies a RoBERTa-BiGRU-Attention fusion model.
      </p>
      <sec id="sec-5-1">
        <title>Input layer</title>
      </sec>
      <sec id="sec-5-2">
        <title>BiGRU-Attention output layer hidden layer</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3.1.2.Text classification based on Dual Contrastive Learning</title>
      <p>
        Because the deep learning model network is deep and needs a large amount of data, and the data set
used in this paper is limited, it may be difficult to achieve the best results; this paper introduced dual
contrastive learning. Dual Contrastive Learning is a new learning framework. In unsupervised learning
tasks, contrastive learning has been proved effective in characterizing downstream tasks and achieving
good results [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The contrastive learning approach can also be applied to supervised learning, but the
supervised contrastive learning approach lacks principled application and reduces representation
validity compared to traditional supervised representation learning, which requires developing another
classification algorithm to solve the classification task.
      </p>
      <p>attract repel
eCLS
erelevant</p>
      <p>eCLS
erelevant
attract
eCLS
repel</p>
      <p>eirrelevant
eirrelevant
Input feature
representation</p>
      <p>Classifier
representation
BERT Encoder
shared</p>
      <p>BERT Encoder
shared</p>
      <p>BERT Encoder
[CLS] relevant irrelevant a good film
[CLS] relevant irrelevant love this movie</p>
      <p>[CLS] relevant irrelevant very sloppy drama
Relevant sample
(Class:RELEVANT)</p>
      <p>Target sample
(Class:RELEVANT)</p>
      <p>Irrelevant sample
(Class:IRRELEVANT)</p>
      <p>Using Roberta model as encoder  , obtaining each token feature of the sequence, splicing the
labeled text with the input text with [ SEP] and fusing the original position vectors, text vectors, and
word vectors in the model. where  and  are classifier representations and  is
the feature representation. After DuaCL training, the positive samples keep approaching while the
negative samples keep moving away.</p>
    </sec>
    <sec id="sec-7">
      <title>3.2.Tourism Product Heat Ranking</title>
      <p>Since the travel guide text is unstructured text, it is necessary to extract the valid entities from the
travel guide text. Then do sentiment analysis on the sentence in the travel guide where the entity is
located and evaluating and ranking the heat of tourism products each year based on the analysis results.</p>
    </sec>
    <sec id="sec-8">
      <title>3.2.1.BERT-BiLSTM-CRF named entity identification</title>
      <p>
        This paper takes a deep learning based approach, The BERT-BiLSTM-CRF model is an end-to-end
deep learning model developed based on the BiLSTM-CRF model without manual feature induction,
which can fulfill the current needs of Chinese address parsing and address element annotation tasks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
This model from the bottom up consists of an encoder, a BiLSTM neural network layer, and a
conditional random field (CRF) layer: The encoder is a character-level Chinese BERT-based model,
which maps the input Chinese address characters into a low-dimensional dense real number space, and
mines the potential semantics embedded in each type of address element in the Chinese address; The
BiLSTM neural network layer takes the character vector transformed from the encoder as input and
captures the forward (left-to-right) and backward (right-to-left) bi-directional features of the Chinese
address sequence; The conditional random field layer takes the bi-directional features extracted from
the upstream BiLSTM as input, and combines the Bioes labeling paradigm to generate the labels
corresponding to each character in the address, so as to further parse the Chinese address into various
address elements according to the labels.
using the adversarial training approach [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], as shown in Figure 3. During the training process, first
BERT will generate initial vectors from the input text, and then add some perturbations on it to generate
adversarial samples as variants of the original samples, which are easily misleading to the model. The
initial vectors and the adversarial samples will be fed together into BiLSTM for training, during which
the neural network will learn more robust parameters to resist the adversarial sample attack.
multidimensional heat evaluation
model based on the improved
      </p>
    </sec>
    <sec id="sec-9">
      <title>Wilson interval method</title>
      <p>When processing the evaluation data of the sample, the traditional heat analysis algorithm based on
user voting has obvious shortcomings: Delicious algorithm simply ranks by the number of users’
comments per unit of time, ignoring comment emotion; Reddit sorting algorithm simply takes the
absolute value of the difference between positive and negative reviews as the depth of affirmation,
regardless of the positive rating; The traditional Wilson interval sorting algorithm works well in solving
small samples, but lacks the consideration of the problem that product heat decays as time goes on. For
this reason, this paper proposed an improved time factor-incorporating algorithm that using the lower
bound of the confidence interval to replace the favorable rating by introducing and improving the
Wilson confidence interval estimation.</p>
      <p>
        The Wilson score interval correction formula proposed by Wilson [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]:
      </p>
      <p>In the formula, p denotes the proportion of the sample rated as good; n denotes the number of samples;
denotes the statistic corresponding to a certain confidence level and is a constant, for example,
the statistical value of z is 1.96 at 95% confidence level. Then calculates the heat score based on the
lower bound of formula (1).</p>
      <p>When n is large enough, formula (2) tends to ̂ . Since the score calculated by formula (2) is a
number between (0, 1), the ranking can be based on the lower value of this confidence interval; the
higher the value, the higher the ranking. Also considering the user's browsing, commenting and time
factors of the information, defining the calculation formula for the product heat analysis algorithm based
on user comments as:

log</p>
    </sec>
    <sec id="sec-10">
      <title>3.3.Local Tourism Graph Construction</title>
    </sec>
    <sec id="sec-11">
      <title>3.3.1.Association rule mining based on improved Apriori algorithm</title>
      <p>
        Obtain the entities set in each travel guide by named entity identification technology, there are
redundant identical items between sets and it is difficult to find the association between point sets. The
Apriori algorithm can find the association items to get the relationships between tourism entities. The
improved Apriori algorithm [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] only needs to traverse the database once to obtain the association rule
results between frequent item sets. The main steps to improve the Apriori algorithm are as follows:
      </p>
      <p>Start</p>
      <p>Scan Database D</p>
      <p>Number of items is 1 or no interest set</p>
      <p>Delete the transaction and get a new database
Define the minimum degree of support and the degree of</p>
      <p>confidence
Scan the database and count each item</p>
      <p>Candidated item set C1
Is it greater than the minimum support</p>
      <p>N
L1×L1 scan and count</p>
      <p>Y</p>
      <p>Frequent item set L1
Y</p>
      <p>Candidated item set C2
Is it greater than the minimum degree of support</p>
      <p>Frequent item set L2</p>
      <p>L2 pruning
L2 selflink
...</p>
      <p>Frequent item set L2
|Lk-K|</p>
      <p>Y
Is it greater than the minimum degree of
confidence</p>
      <p>N
End</p>
      <p>N</p>
      <p>Continue scanning and</p>
      <p>pruning
Y</p>
      <p>Generate strong association
rules</p>
      <p>Step 1: Delete irrelevant transaction records.</p>
      <p>Let the total number of transaction items be m and the traversal database be D. When  (x=1, 2, ...,
m). count=1, delete  , the number of deleted transaction items is counted as 1, and so on after the
traversal loop to get the new database D'. Let the set of interest be B. If  , (x=1, 2, ..., n), B∉  , , then
delete  , and traverse the loop to get the new data set D″.</p>
      <p>Step 2: Mine the frequent item sets.</p>
      <p>Counts each transaction item to obtain the candidate 1-item set, where the items greater than or equal
to min_sup will form the frequent item set  . Self-connect the generated frequent item set  to
generate the candidate 2-item set, and perform the set intersection operation to obtain the transaction
TID set, where the items greater than or equal to min_sup will form the frequent item set  . Compute
the modulus | | of  and end the operation when | |≤k to obtain the frequent item set L. Otherwise
repeat step B.</p>
      <p>Step 3: Mine association rules.</p>
      <p>Calculate the degree of support and confidence, analyze the association relationship between
variables, summarize certain regularity between variables and generate association rules, the process is
shown in Figure 6.</p>
    </sec>
    <sec id="sec-12">
      <title>3.3.2.Implicit relationship discovery based on GNNLP model</title>
      <p>Since the improved Apriori algorithm can only identify frequent item sets from known relations and
mine known associated edges but cannot predict unknown missing edges, the constructed graphs are
not complete with node relations when constructing the knowledge graph. For this reason, this paper
proposes the GNNLP model. After generating the knowledge graph, adopt neural network function to
nonlinearly fit the nodes in the graph, and fulfill the aggregation and update of the node information in
the graph by GNN-related algorithm to convert the Maoming tourism knowledge graph into a GNN
graph with neural network.</p>
      <p>
        The aggregation operation collects information at the neighbors of each node by means of an
aggregation function, set the aggregation function aggregate(x), where x denotes aggregating the
information from all neighboring nodes of the target node [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>denotes the kth aggregation result of a node, N(v) denotes the neighbor nodes of node v, and
denotes the k-1th state of the aggregation of neighbor node u. Different functions are suitable for
different graph structures.</p>
      <p>Update Process, perform a specific operation between the result after information aggregation and
the central node as the initial state of the node in the next layer (i.e., update the hidden state of the node).
Set the update function combine(y), where y denotes a specific operation between the result of the
previous step of aggregation and the target node.</p>
      <p>denotes the kth update result of node v, and denotes the k-1th state of node v. Once repeat
the above operation, the number of layers of the neural network adds 1. Keep aggregating and updating
until the number of updates reaches l. Then dividing the nodes in the GNN graph into subgraphs
according to the number of paths and distances between different node pairs; then calculate the path
similarity and node similarity between different node pairs respectively, fuse and process the two to
obtain the final link similarity between node pairs; finally, ranking according to the final link similarity,
then perform graph neural network link prediction to fulfill the discovery of implicit relationships
between nodes, the GNNLP model process is shown in Figure 7.</p>
      <p>Figure 6 is a local tourism graph constructed by visualization techniques based on the mining results
of the improved Apriori algorithm, on this basis, using the GNNLP model constructed in this paper to
discover the implicit relationships between nodes, the result is obtained as shown in Figure 7, where
the blue bolded edges represent the newly discovered relationships between nodes after passing the
GNNLP model.</p>
    </sec>
    <sec id="sec-13">
      <title>4. Experiment and Analysis</title>
      <p>In order to verify the rationality of the model constructed in this paper, the following validation
experiments are designed.</p>
    </sec>
    <sec id="sec-14">
      <title>4.1.Text classification results and analysis</title>
      <p>On the basis of the introduction and analysis above, this paper divides the training set and test set in
the ratio of 4:1, trains and tests the commonly used text classification models and the
RoBERTaBiGRU-Attention fusion model and RoBERTa-DualCL model used in this paper. The effects of each
model are shown in Table 3:</p>
      <p>From the above results, the Roberta-DuaCL model with dual contrastive learning has the highest
accuracy, classifying 1312 correctly in 1354 test sets, achieving a correct rate of 96.90%. Using the
Dual Contrastive Learning framework for data enhancement achieves better results on small samples,
so the model can be used to classify texts.</p>
      <p>Using the Roberta-DuaCL model to classify tourism texts, the results showed 4315 texts in the
tourism-related category and 1971 texts in the tourism-unrelated category.</p>
    </sec>
    <sec id="sec-15">
      <title>4.2.Named entity identification results and analysis</title>
      <p>There is no standard for entities in the tourism field, and most of the existing naming identification
tasks in the tourism field are only for attraction identification and cannot meet the needs of this topic.
This paper carefully analyzes the travel guide data and defines 6 entities in the tourism field following
the principle of each entity type can completely cover the entities in the tourism field and has no
intersection according to the task requirements: SCENIC, HOTEL, DIET, ENTERTAINMENT,
CULTURE and VILLAGE. The model obtained after training and optimizing with the constructed
named entity identification dataset in the tourism field works well for entity recognition, extracting
totally 2246 entities from the travel guide.</p>
      <sec id="sec-15-1">
        <title>SCENIC</title>
      </sec>
      <sec id="sec-15-2">
        <title>DIET</title>
      </sec>
      <sec id="sec-15-3">
        <title>DIET</title>
        <p>Publish Time
2021-04-0818:33
2019-02-1821:12
2019-08-2621:28</p>
      </sec>
      <sec id="sec-15-4">
        <title>Rubber tube</title>
      </sec>
      <sec id="sec-15-5">
        <title>Dragon Head Mountain</title>
      </sec>
      <sec id="sec-15-6">
        <title>White cut chicken</title>
      </sec>
      <sec id="sec-15-7">
        <title>Fantasy Crystal Church</title>
      </sec>
      <sec id="sec-15-8">
        <title>Hot spring area ENTERTAINMENT 2020-08-02 11:16 SCENIC DIET</title>
      </sec>
    </sec>
    <sec id="sec-16">
      <title>4.3.Results and analysis of relation extraction</title>
      <p>Some association relationships mined by the improved Apriori algorithm are shown in Table 6 below:</p>
      <p>Based on the strong association rules mined by the improved Apriori algorithm, 11 implied
highlevel association concepts were predicted by the GNNLP model in 2018 and 2019. For example, Fangji
island and Seaview Bay Hotel are linked through the upper concept Fangji island tourist area, Hailing
island and dredging powder are linked through the upper concept hailing island Ten-Mile silver beach
scenic spot ...... 7 implied high-level association concepts were predicted in 2020 and 2021, romantic
coast and lobster are linked through the upper concept Wyndham Hotel, Opencast Mine Good Lake
Ecopark and Shijue temple are linked through the upper concept Opencast Mine ...... This leads to the
implied high-level concept, for example, the connection between Fangji island and Seaview Bay Hotel</p>
      <sec id="sec-16-1">
        <title>DIET——DIET</title>
      </sec>
      <sec id="sec-16-2">
        <title>DIET——DIET</title>
      </sec>
      <sec id="sec-16-3">
        <title>DIET——HOTEL</title>
      </sec>
      <sec id="sec-16-4">
        <title>DIET——HOTEL</title>
      </sec>
      <sec id="sec-16-5">
        <title>SCENIC——SCENIC</title>
      </sec>
      <sec id="sec-16-6">
        <title>SCENIC——SCENIC</title>
        <p>through Fangji island can lead to the inference that tourists tend to stay in sea view hotels by the coast
when visiting Fangji island, which conclusion can promote the development of the surrounding hotels
and B&amp;Bs.</p>
      </sec>
    </sec>
    <sec id="sec-17">
      <title>5. Concluding remarks</title>
      <p>This paper uses natural language processing data mining methods to analyze the development of
surrounding travel of the city during the COVID-19 epidemic by building a local tourism graph; based
on 2 core technologies: Dual Contrastive Learning text classification and graph neural network, solves
4 problems of WeChat public article classification, surrounding travel tourism product heat analysis,
local tourism graph construction and analysis and change analysis of tourism product demand before
and after the epidemic; based on traditional models, improved designs RoBERTa-BiGRU-Attention
fusion model, Dual Contrastive Learning, BERT-BiLSTM-CRF named entity identification technique,
improved Apriori algorithm, GNNLP model and other models and methods; demonstrates the rationality
and efficiency of the improved model through comparative tests; essentially overcomes the
shortcomings and loopholes of the traditional model and achieves a satisfactory result.</p>
      <p>The results show that the method adopted in this paper and the improved model algorithm both
achieve good results. Firstly, they solve the problem of data decentralization and fragmentation,
improving the accuracy of text classification; secondly, they extract the relevant tourism elements from
the text clearly and accurately, enhance the comprehensiveness and accuracy of heat analysis; finally,
they fulfill the deep-level mining of the implied high-level concepts and the weak relationships obtained
from prediction can enhance and complete the original graph, construct a knowledge graph with
reference significance to the development of local travel in the circumstance of the epidemic.</p>
    </sec>
    <sec id="sec-18">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Zhang</given-names>
            <surname>Ju</surname>
          </string-name>
          , Feng Ao, Zhang Xuelei et al.
          <article-title>A Sentiment Analysis Method for Travel Text Fused with Text-Rank [J]</article-title>
          .
          <source>Computer Science and Applications</source>
          ,
          <year>2022</year>
          ,
          <fpage>12</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Cui</given-names>
            <surname>Liping</surname>
          </string-name>
          , Gulila Adonbek,
          <string-name>
            <given-names>Wang</given-names>
            <surname>Zhiyue</surname>
          </string-name>
          .
          <article-title>Named entity identification in tourism field based on directed graph model[J]</article-title>
          .
          <source>Computer Engineering</source>
          ,
          <year>2022</year>
          ,
          <volume>48</volume>
          (
          <issue>2</issue>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3] Zhang,
          <string-name>
            <surname>Nuo.</surname>
          </string-name>
          <article-title>Research on Knowledge Graph Construction Method for Shanxi Tourism [D]</article-title>
          . Shanxi University.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Cai</given-names>
            <surname>Wenxing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Li</given-names>
            <surname>Xingdong</surname>
          </string-name>
          .
          <article-title>Sentiment analysis of scenic spot reviews based on BERT model[J]</article-title>
          .
          <source>Journal of Guizhou University (Natural Science Edition)</source>
          ,
          <year>2021</year>
          ,
          <volume>38</volume>
          (
          <issue>2</issue>
          ):
          <fpage>57</fpage>
          -
          <lpage>60</lpage>
          . DOI:
          <volume>10</volume>
          .15958/j.cnki.gdxbzrb.
          <year>2021</year>
          .
          <volume>02</volume>
          .11.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Niu</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiong</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher R . Deleter</surname>
          </string-name>
          : Leveraging BERT to Perform Unsupervised Successive Text Compression[J].
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Zhao</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>LY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wan</surname>
            <given-names>Y</given-names>
          </string-name>
          et al.
          <article-title>Named entity identification of Chinese attractions based on BERT+BiLSTM+CRF[J]</article-title>
          .
          <source>Computer System Applications</source>
          ,
          <year>2020</year>
          ,
          <volume>29</volume>
          (
          <issue>6</issue>
          ):
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Cao</given-names>
            <surname>Liujuan</surname>
          </string-name>
          , Kuang Huafeng, Liu Hong et al.
          <article-title>Geometric constrained adversarial training with twolabel supervision[J]</article-title>
          .
          <source>Journal of Software</source>
          ,
          <year>2022</year>
          ,
          <volume>33</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1218</fpage>
          -
          <lpage>1230</lpage>
          . DOI:
          <volume>10</volume>
          .13328/j.cnki.jos.
          <volume>006477</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Xu</given-names>
            <surname>Linlong</surname>
          </string-name>
          , Fu Jiansheng, Jiang Chunheng et al.
          <article-title>A ranking algorithm of product favorability based on Wilson interval</article-title>
          [J].
          <source>Computer Technology and Development</source>
          ,
          <year>2015</year>
          (5):
          <fpage>168</fpage>
          -
          <lpage>171</lpage>
          . DOI:
          <volume>10</volume>
          .3969/j.issn.
          <volume>1673</volume>
          -
          <fpage>629X</fpage>
          .
          <year>2015</year>
          .
          <volume>05</volume>
          .040.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Liu</given-names>
            <surname>Wenya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Xu</given-names>
            <surname>Yongneng</surname>
          </string-name>
          .
          <article-title>Subway fault association rule mining based on improved Apriori algorithm[J]</article-title>
          .
          <source>Journal of Arms and Equipment Engineering</source>
          ,
          <year>2021</year>
          ,
          <volume>42</volume>
          (
          <issue>12</issue>
          ):
          <fpage>210</fpage>
          -
          <lpage>215</lpage>
          . DOI:
          <volume>10</volume>
          .11809/bqzbgcxb2021.
          <fpage>12</fpage>
          .033.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Guodong.</given-names>
          </string-name>
          <article-title>Research on personalized item recommendation based on deep learning [D]</article-title>
          . Shanghai: Donghua University,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Li</given-names>
            <surname>Jiahui</surname>
          </string-name>
          .
          <article-title>Research on multi-domain text classification methods based on RoBERTa and cyclic convolutional multi-task learning [D]</article-title>
          . Harbin Institute of Technology.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>