Research on Network Prediction of Hidden Danger in Logistics
Enterprises 1
Xiying Chen*, Yufeng Zhuang, Lingyi Lu
Beijing University of Posts and Telecommunications, Beijing, China

                Abstract
                In view of the large amount of manually recorded safety inspection data accumulated in the
                current logistics industry, as well as the problem of hidden danger neglect caused by lack of
                experience of inspectors and other reasons, this paper constructs a knowledge graph to show
                the structure of hidden danger of logistics enterprises in the form of graph data, targeted anal-
                ysis of various unsafe hidden factors in logistics enterprises, based on the original unstructured
                data, To realize the construction of the relationship network between enterprises and the hidden
                danger factors, and analyse the relationship between the hidden danger of logistics enterprises.
                At the same time, the time sequence information is integrated, the graph data of continuous
                time slices is integrated, and the time-varying rule is analysed to realize the dynamic prediction
                of the hidden danger network of logistics enterprises. The accuracy of prediction is increased
                by learning historical data, so as to provide more targeted and specific inspection focus for the
                hidden danger inspection of logistics enterprises, and become an important auxiliary tool in
                manual inspection.

                Keywords
                hidden danger prediction, knowledge graph, feature of time sequence

1. Introduction

    With the development of express delivery in the logistics industry and the increase of the size of
logistics enterprises, safety risks are also increasing. The safety of production activities of logistics
enterprises is facing severe challenges. The failure of timely inspection and prevention of hidden risks
often causes accidents, resulting in huge losses. In October 2018, 15 people died and 46 people were
injured in a major accident on the Shenhai Expressway. According to the investigation, it was caused
by the failure of the logistics enterprise to fulfill the main responsibility of production safety, the long-
term neglect of the attached vehicles, and the substandard quality inspection of the vehicles. In Novem-
ber 2019, Shanghai Xinde Logistics Co., Ltd. caught fire due to combustible materials on the ground
and ignition during welding operations. In the final analysis, the fire was caused by the company's
inadequate implementation of housing safety management responsibilities. In July 2021, a logistics
warehouse in Changchun also caught fire due to improper storage of combustible materials, resulting
in 15 deaths and 25 injuries. In recent years, the state also attaches great importance to the investigation
and inspection of the safety risks of enterprises, vigorously carry out the responsibility of the imple-
mentation of enterprise safety production and law enforcement inspection arrangements, the post in-
spection into the prevention and investigation in advance, really reduce the occurrence of accidents
from the source.
    In view of the important problems of accident hidden danger, knowledge graph has been applied in
the field of emergency safety. Aditya Pingle et al. [1] proposed a system to create semantic triples of
network security texts and extract possible relationships using deep learning methods. Security analysts
can form decisions about cyber attacks from the knowledge graph. Celebi R et al. [2] used the
knowledge map of the drug library to predict unknown drug interactions based on RDF2Vec. Based on

ICCEIC2022@3rd International Conference on Computer Engineering and Intelligent Control
EMAIL: *brianna_ xiying@163.com (Xiying Chen)
             © 2022 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)


                                                                              182
BiLSTM-CRF information extraction model, Sutphin C[3] identified adverse reactions and their causes
from FDA drug labels. Elluri, L et al. [4] trained the custom named entity recognition (NER) model,
constructed the network security knowledge graph (CKG) to infer the subjective association between
the network security text and users, and generated the relevant features of the text. Wang Yibao et al.
[5] drew a map of diversified, time-sharing and dynamic scientific knowledge of urban security research
by using the core documents of urban security in CNKI database from 1993 to 2018.
    From the research status of domain knowledge graph and its application in emergency security, it
can be seen that there is no research on the construction of knowledge graph based on the relationship
between hidden dangers in logistics enterprises, and most of them are still applied in the fields of net-
work security and traffic management. In addition, the current construction of knowledge graph is often
based on existing data, forming a triplet form to obtain static knowledge graph, which often ignores the
time-varying law of hidden trouble data of logistics enterprises and fails to make full use of time infor-
mation. Therefore, based on the existing hidden danger detection data and combined with the time series
characteristics, this paper constructs the hidden danger knowledge map of logistics enterprises. In the
form of graph data, the co-occurrence relationship of hidden dangers and the connection between en-
terprises and hidden dangers are analyzed, and the information of people, objects, environment and
management is extracted to form a more comprehensive and relevant hidden dangers analysis.

2. Hidden Danger Prediction of Logistics Enterprise Based on Knowledge
   Graph
2.1. Overview of hidden danger prediction methods for logistics enterprises

    This paper mainly includes two parts, one is to build the knowledge map of hidden dangers in the
field of logistics, so as to realize the targeted study of hidden dangers in logistics enterprises; the other
is to carry out the dynamic prediction of hidden dangers network of logistics enterprises in time se-
quence, reduce the risk of accidents, and convert the after-repair into pre-prevention. By predicting the
hidden dangers that may occur in slice logistics enterprises in the next time, guide logistics enterprises
to inspect the hidden dangers of safety production. The time knowledge graph is constructed in a top-
down way. The pattern layer is designed first, and then the fact representation of the data layer is real-
ized. The ontology design is used to standardize the actual data, which ensures the rationality of the
knowledge graph. The process of prediction is to embed entity and time with the Diachronic Embedding
(DE) [6] model and realize the training of model combined with the scoring function of complEx[7]
model. The design of knowledge graph of logistics enterprise is shown in Figure 1.

                                                                                  original database


                                              safety check                                                            enterprise basic
                                                  data                                                                      data
                                                                                                   enterprise name
                            enterprise name


                                              inspection time


                                                                hidden danger
                                                                                inspection


                                                                                                                        enterprise


                                                                                                                                                  enterprise
                                                                                                                                     enterprise
                                                                                                                         category
                                                                   contents


                                                                                                                                      features
                                                                                 place of


                                                                                                                                                   address


                                                       structured data                                               unstructured data


                                                                entities, attributes, relationships, time


Figure 1 Design of knowledge graph of logistics enterprise


                                                                                             183
    Entity, attribute, relation and time are extracted from the data, and the quadruples are constructed by
"entity-relationship-entity-time" and "entity-attribute-attribute value-time". According to the time in-
formation, the data is divided into multiple time windows, then the corresponding triplets in each time
window can form a static knowledge graph 𝐺 (𝜏 represents the 𝜏th time window). Temporal knowledge
graph is the set of knowledge graph under different time windows, which is defined as 𝐺 =
{𝐺 , 𝐺 , … , 𝐺 }. 𝐸 is the set of all entities under this time window, 𝑅 is the set of all relations, and 𝑄
is the set of all quadruples (ℎ, 𝑟, 𝑡, 𝜏). The hidden danger network prediction of logistics enterprises is
shown in Figure 2.
                                                                        DE(Diachronic
                                           Qtrain                        Embedding)
                                  (head,relation,tail,time)               knowledge
                                 Gt-1                    Gt         representation learning
                                    deleted     true                         Qtrain
                                      facts     facts               (head,relation,tail,time)
                                                                       complEx scoring
                                                                            function
                             negative                   positive
                              sample                    sample       evaluation of model
                                                                      (HITS@n,MRR)

Figure 2 Process of hidden danger prediction

    This paper employs DE as the basic model of linkage prediction of the sequential knowledge graph,
and combines the complEx scoring function for model training. DE has strong generalization ability.
Any static KG embedded model can be extended to TKGC by using DE, and complete expression of
temporal knowledge graph can be realized by using DE and complEx. E represents the set of all entities,
and R represents the set of all relations. In the process of entity embedding, DE takes both entity and
time as input and outputs entity embedding vector that integrates time information. In the same way,
relationship embedding is completed.
                                   𝑎 [𝑛]𝜎(𝜔 [𝑛]𝜏 + 𝑏 [𝑛]),          1 ≤ 𝑛 ≤ 𝛾𝑑
                       𝑚 [𝑛] =
                                                       𝑎 [𝑛],       𝛾𝑑 < 𝑛 ≤ 𝑑
    𝑚 is the output vector of the entity embedded at time 𝜏, and 𝑚 [𝑛] represents the nth element of
the vector. Where 𝑒 ∈ 𝐸, 𝜏 is the corresponding time, 𝑎 ∈ ℝ 、𝜔 , 𝑏 ∈ ℝ are all vectors with
learnable parameters associated with a particular entity or relationship, 𝜎(∗) is the activation function.
It can be seen from the formula that the first 𝛾𝑑 elements of the vector are used to capture time sequence
features changing with time, while the remaining (1 − 𝛾)𝑑 elements are used to capture static features.
𝛾 is between 0 and 1, which is the hyperparameter controlling the percentage of time sequence features.
    ComplEx model introduces complEx space and uses complex 𝑎 + 𝑏𝑖 to represent entities and rela-
tions in the static knowledge graph, which can better model the asymmetric relations.
                                  𝑓(ℎ, 𝑟, 𝑡) = 𝑅𝑒(< 𝒎 , 𝒎 , 𝒎 >)
                                  =< 𝑅𝑒(𝒎 ), 𝑅𝑒(𝒎 ), 𝑅𝑒(𝒎 ) >
                                  +< 𝑅𝑒(𝒎 ), 𝐼𝑚(𝒎 ), 𝐼𝑚(𝒎 ) >
                                  +< 𝐼𝑚(𝒎 ), 𝑅𝑒(𝒎 ), 𝐼𝑚(𝒎 ) >
                                  −< 𝐼𝑚(𝒎 ), 𝐼𝑚(𝒎 ), 𝑅𝑒(𝒎 ) >
    𝒎 , 𝒎 ∈ ℂ is the complex vector corresponding to the head and tail entities, 𝒎 ∈ ℂ is the re-
lation vector, ·̅ means take the conjugate vector, 𝑅𝑒(·) means take the real part, 𝐼𝑚(·) means take the
imaginary part, <·,·,·> represents the product sum of vector elements. The function is antisymmetric
when 𝒎 has only an imaginary part, and symmetric when 𝒎 has only a real part.
    In the process of model training, deleted facts are also used to construct negative samples, and pos-
itive and negative samples are combined. On the one hand, the prediction effect and stability of the
model are improved. On the other hand, the prediction accuracy of the model for the facts that were
correct in the past but are wrong now (i.e. deleted) is improved.
                 𝑄             = (ℎ, 𝑟, 𝑡, 𝜏) (ℎ, 𝑟, 𝑡, 𝜏) ∈ 𝑄    ∩ (ℎ, 𝑟, 𝑡, 𝜏) ∉ 𝑄


                                                              184
              𝑄               = (ℎ, 𝑟, 𝑡, 𝜏) (ℎ, 𝑟, 𝑡, 𝜏) ∉ 𝑄      ∩ ∃(ℎ, 𝑟, 𝑡, 𝜏 ) ∈ 𝑄


2.2. Evaluation indicators of predictive effectiveness

   In terms of the selection of evaluation indexes, same evaluation indexes as the static knowledge
graph are used. For the quintuples in the test set, two types of candidate quintuples are constructed by
replacing the head or tail entities, and the scores of these entities are output in order.
   1. MRR (Mean Reciprocal Ranking) is the average proportion of quadrilateral ranking in the predic-
tion result. The larger the MRR value is, the higher the triplet ranking with correct prediction will be,
and the better the prediction effect of the model will be.
                                                           | |
                                               1         1
                                      𝑀𝑅𝑅 =
                                              |𝑆|     𝑟𝑎𝑛𝑘
                                 1    1         1               1
                             =     (       +         + ⋯+            )
                                |𝑆| 𝑟𝑎𝑛𝑘     𝑟𝑎𝑛𝑘            𝑟𝑎𝑛𝑘| |
   2. HITS@n refers to the average proportion of triples ranked less than n in the link prediction.
                                                     | |
                                             1
                                   𝐻𝐼𝑇𝑆@𝑛 =                Ⅱ(𝑟𝑎𝑛𝑘 ≤ 𝑛)
                                            |𝑆|
    | S | is the number of triples, 𝑟𝑎𝑛𝑘 refers to the ranking of link predictions for the ith triplet. Ⅱ(∗) is
the indicator function. The value of the function is 1 if the condition is true, and 0 otherwise. In general,
n is equal to 1, 3, or 10. The larger HITS@n is, the higher the probability that the predicted results rank
less than n, and the better the link prediction effect is.

3. Construction and Prediction of Knowledge Graph
3.1. Basic information of data

    This paper takes the inspection records of security risks in a city in a certain year as the core data,
with a total of 512840 pieces of data content, including 169 fields such as the enterprise, time, place,
place, and specific content related to the inspection of potential risks. Delete the fields with more than
80% missing values, and the remaining 68 fields are divided into 19 indicators related to the company,
27 indicators related to hidden dangers, and 22 indicators of other categories.
    According to statistics, the top ten feature words of hidden dangers in logistics enterprises are secu-
rity, fire extinguisher, logo, clutter, production, warehouse, use, channel, time and line. The top ten
feature words of all enterprises in the city are security, fire extinguisher, logo, clutter, distribution box,
exit, cover, record, training and use. As can be seen from the statistics, logistics enterprises and all
enterprises have both common problems and great differences. If all analysis is done using enterprise
data, considerations of operational safety, workplace, and problem urgency will inevitably be ignored,
thereby affecting the reliability and value of real-world applications.

3.2. Hidden danger network construction and sample setting

   Using the method of fuzzy matching and enterprise classification, 14,831 pieces of data of logistics
enterprises were selected, involving 4042 logistics enterprises. According to the top-level structure de-
sign of knowledge graph, 11 fields are extracted and six types of quad pattern are designed, which is
shown in Table 1.

Table 1 Basic quadruple form and quantity
  head                           relation                          tail                                t
  enterprise features            happen                            hidden danger category              t
  enterprise category            happen                            hidden danger category              t


                                                     185
   industry classification          happen                     hidden danger category          t
   hidden danger category           appear                     address                         t
   hidden danger category           appear                     place                           t
   hidden danger category           co-occurrence              hidden danger category          t
    According to the concept layer's quadruple pattern, the knowledge graph under different time win-
dows can be constructed. As shown in Figure 3, the time sequence changes of corresponding facts in
the knowledge graph can be used to complete the construction of positive and negative samples.
                                           8
                                   D                B                                                                     7
                                                                7
                           1               C                                                                      B
                                   A                B                                                                         10
                                                                                                                      C
                                                                                                          9
                                           9                                                 2  A
                           2       D                C           6                              A A
                                    A                                                                     C                   11
                                           C                                        12
                                                                                                 3                    B
                               3                                5
                                       A            B                                                         4
                                               4
                                                                    sample data


                                                                            8
                                                                    D               B
                                                                                             7
                                                        1                   C
                                                                    A               B                10
                                                                                         C
                                                                            9
                                                        2  D                        B
                                                          A A                                6
                                                                            C                        11
                                               12
                                                            3                            B
                                                                        A       4    B       5


Figure 3 Hidden danger network and positive and negative sample settings

   Where, the number represents the entity, the letter represents the relationship. And the green indi-
cates that the current time window is deleted relative to the previous time window, that is, the negative
sample; and red represents the real sample under the current time window, that is, the positive sample.

3.3. Prediction process and result analysis

    In this paper, DE is used as the embedded representation method of timing knowledge graph, the
time information is embedded into the representation of entity and relationship, and the model is trained
by combining the complEx scoring function. In this paper, 70% data is taken as the training set and 30%
data as the verification set. The learning rate of the model is set as 10-3, and the embedding size of
representation learning is set as 128. When using positive and negative samples for model training,
ensure that the total number of samples for each training is less than 1024. Through the performance
test and comparison of the verification set, 100 samples are taken as the negative sampling rate in this
paper, that is, 100 negative samples are set for each prediction fact, including 50 for head entity replace-
ment and 50 for tail entity replacement. The experiment found that this ratio achieved an appropriate
trade-off between task performance and training time.
    When the loss value of the model is no longer reduced, the average ranking of the real entity is
between 2 and 3. The evaluation results on the test set are shown in Table 2.

Table 2 Evaluation indicators of model predictions
    Hit@10                             Hit@3                                                 Hit@1                                 MRR
    0.87                               0.59                                                  0.33                                  0.51

   It can be seen that Hit@10 of this model has achieved a good result, that is, the link prediction
ranking corresponding to most correct facts is less than 10, and the link prediction ranking correspond-
ing to more than half of correct facts is less than 3, which indicates that the ranking conforms to the
facts well.

                                                                                186
   The entity importance under different time windows is analysed by using the measurement method
of complex network centrality. Take point-degree centrality as an example, the five most important
hidden danger categories under the three continuous time Windows are shown in Table 3.
Table 3 Entity importance under different time windows
       t=17                           t=18                         t=19
       safety equipment and           emergency rescue plan and
                                                                   safety in production
       facilities                     implementation
                                      auxiliary system equipment other security
       operating environment
                                      and facilities               management
       auxiliary system equipment                                  safety equipment and
                                      material
       and facilities                                              facilities
                                      operation behavior of the    education and training of
       safety signs and identifiers
                                      practitioner                 practitioners
       other equipment and            safety equipment and
                                                                   operating environment
       facilities                     facilities

   According to the importance of hidden trouble entities under different time windows, it can be seen
that the emphasis of key hidden trouble under different time windows is different, and the attention of
the same type of hidden trouble has time sequence change. For example, the indexes of safety equipment
and facilities are all in the top five hidden danger categories in the three time windows, indicating their
universality, but their importance is different in different time windows. The job environment class
ranked the top five in t=17, and after t=18 was improved, t=19 again attracted attention.

4. Summary

    Combined with the previous experimental analysis, we found that the time-series knowledge graph
can be used to predict the hidden dangers of logistics enterprises. The time-series knowledge graph can
be combined with various information to learn in the way of graph structured data, and the triplet rela-
tionship between data can be used to improve the ability of hidden dangers prediction. The method in
this paper provides effective data support for daily inspection and hidden danger prevention of logistics
enterprises, improves the comprehensiveness of hidden trouble investigation, and greatly reduces the
probability of hidden trouble occurrence.

5. References

[1] Aditya Pingle, Aritran Piplai, Sudip Mittalet al. Relext: Relation extraction using deep learning
    approaches for cybersecurity knowledge graph improvement[M]. Vancouver, BC, Canada: Asso-
    ciation for Computing Machinery, Inc,2019: 879-886.
[2] Celebi R, Uyar H, Yasar E, et al. Evaluation of knowledge graph embedding approaches for drug-
    drug interaction prediction in realistic settings[J]. BMC Bioinformatics, 2019,20(1).
[3] Sutphin C, Lee K, Yepes A J, et al. Adverse drug event detection using reason assignments in FDA
    drug labels.[J].Journal of biomedical informatics, 2020,110:103552.
[4] Elluri L, Nagar A, Joshi K P. An Integrated Knowledge Graph to Automate GDPR and PCI DSS
    Compliance[M]. Abe N, Liu H, Pu C, et al. IEEE International Conference on Big Data.
    2018:1266-1271.
[5] Wang Yibao, Yang Tinghui. Visual analysis of Knowledge graph of Urban security Research [J].
    Urban Development Studies,2019,26(03):116-124.
[6] Rishab Goel, Seyed Mehran Kazemi, Marcus Brubaker, Pascal Poupart. Diachronic Embedding
    for Temporal Knowledge Graph Completion[J]. Proceedings of the AAAI Conference on Artificial
    Intelligence,2020,34(04).
[7] Trouillon T , Welbl J , Riedel S , et al. Complex Embeddings for Simple Link Prediction[J].
    JMLR.org, 2016.


                                                   187