Research on Network Prediction of Hidden Danger in Logistics Enterprises 1 Xiying Chen*, Yufeng Zhuang, Lingyi Lu Beijing University of Posts and Telecommunications, Beijing, China Abstract In view of the large amount of manually recorded safety inspection data accumulated in the current logistics industry, as well as the problem of hidden danger neglect caused by lack of experience of inspectors and other reasons, this paper constructs a knowledge graph to show the structure of hidden danger of logistics enterprises in the form of graph data, targeted anal- ysis of various unsafe hidden factors in logistics enterprises, based on the original unstructured data, To realize the construction of the relationship network between enterprises and the hidden danger factors, and analyse the relationship between the hidden danger of logistics enterprises. At the same time, the time sequence information is integrated, the graph data of continuous time slices is integrated, and the time-varying rule is analysed to realize the dynamic prediction of the hidden danger network of logistics enterprises. The accuracy of prediction is increased by learning historical data, so as to provide more targeted and specific inspection focus for the hidden danger inspection of logistics enterprises, and become an important auxiliary tool in manual inspection. Keywords hidden danger prediction, knowledge graph, feature of time sequence 1. Introduction With the development of express delivery in the logistics industry and the increase of the size of logistics enterprises, safety risks are also increasing. The safety of production activities of logistics enterprises is facing severe challenges. The failure of timely inspection and prevention of hidden risks often causes accidents, resulting in huge losses. In October 2018, 15 people died and 46 people were injured in a major accident on the Shenhai Expressway. According to the investigation, it was caused by the failure of the logistics enterprise to fulfill the main responsibility of production safety, the long- term neglect of the attached vehicles, and the substandard quality inspection of the vehicles. In Novem- ber 2019, Shanghai Xinde Logistics Co., Ltd. caught fire due to combustible materials on the ground and ignition during welding operations. In the final analysis, the fire was caused by the company's inadequate implementation of housing safety management responsibilities. In July 2021, a logistics warehouse in Changchun also caught fire due to improper storage of combustible materials, resulting in 15 deaths and 25 injuries. In recent years, the state also attaches great importance to the investigation and inspection of the safety risks of enterprises, vigorously carry out the responsibility of the imple- mentation of enterprise safety production and law enforcement inspection arrangements, the post in- spection into the prevention and investigation in advance, really reduce the occurrence of accidents from the source. In view of the important problems of accident hidden danger, knowledge graph has been applied in the field of emergency safety. Aditya Pingle et al. [1] proposed a system to create semantic triples of network security texts and extract possible relationships using deep learning methods. Security analysts can form decisions about cyber attacks from the knowledge graph. Celebi R et al. [2] used the knowledge map of the drug library to predict unknown drug interactions based on RDF2Vec. Based on ICCEIC2022@3rd International Conference on Computer Engineering and Intelligent Control EMAIL: *brianna_ xiying@163.com (Xiying Chen) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 182 BiLSTM-CRF information extraction model, Sutphin C[3] identified adverse reactions and their causes from FDA drug labels. Elluri, L et al. [4] trained the custom named entity recognition (NER) model, constructed the network security knowledge graph (CKG) to infer the subjective association between the network security text and users, and generated the relevant features of the text. Wang Yibao et al. [5] drew a map of diversified, time-sharing and dynamic scientific knowledge of urban security research by using the core documents of urban security in CNKI database from 1993 to 2018. From the research status of domain knowledge graph and its application in emergency security, it can be seen that there is no research on the construction of knowledge graph based on the relationship between hidden dangers in logistics enterprises, and most of them are still applied in the fields of net- work security and traffic management. In addition, the current construction of knowledge graph is often based on existing data, forming a triplet form to obtain static knowledge graph, which often ignores the time-varying law of hidden trouble data of logistics enterprises and fails to make full use of time infor- mation. Therefore, based on the existing hidden danger detection data and combined with the time series characteristics, this paper constructs the hidden danger knowledge map of logistics enterprises. In the form of graph data, the co-occurrence relationship of hidden dangers and the connection between en- terprises and hidden dangers are analyzed, and the information of people, objects, environment and management is extracted to form a more comprehensive and relevant hidden dangers analysis. 2. Hidden Danger Prediction of Logistics Enterprise Based on Knowledge Graph 2.1. Overview of hidden danger prediction methods for logistics enterprises This paper mainly includes two parts, one is to build the knowledge map of hidden dangers in the field of logistics, so as to realize the targeted study of hidden dangers in logistics enterprises; the other is to carry out the dynamic prediction of hidden dangers network of logistics enterprises in time se- quence, reduce the risk of accidents, and convert the after-repair into pre-prevention. By predicting the hidden dangers that may occur in slice logistics enterprises in the next time, guide logistics enterprises to inspect the hidden dangers of safety production. The time knowledge graph is constructed in a top- down way. The pattern layer is designed first, and then the fact representation of the data layer is real- ized. The ontology design is used to standardize the actual data, which ensures the rationality of the knowledge graph. The process of prediction is to embed entity and time with the Diachronic Embedding (DE) [6] model and realize the training of model combined with the scoring function of complEx[7] model. The design of knowledge graph of logistics enterprise is shown in Figure 1. original database safety check enterprise basic data data enterprise name enterprise name inspection time hidden danger inspection enterprise enterprise enterprise category contents features place of address structured data unstructured data entities, attributes, relationships, time Figure 1 Design of knowledge graph of logistics enterprise 183 Entity, attribute, relation and time are extracted from the data, and the quadruples are constructed by "entity-relationship-entity-time" and "entity-attribute-attribute value-time". According to the time in- formation, the data is divided into multiple time windows, then the corresponding triplets in each time window can form a static knowledge graph 𝐺 (𝜏 represents the 𝜏th time window). Temporal knowledge graph is the set of knowledge graph under different time windows, which is defined as 𝐺 = {𝐺 , 𝐺 , … , 𝐺 }. 𝐸 is the set of all entities under this time window, 𝑅 is the set of all relations, and 𝑄 is the set of all quadruples (β„Ž, π‘Ÿ, 𝑑, 𝜏). The hidden danger network prediction of logistics enterprises is shown in Figure 2. DE(Diachronic Qtrain Embedding) (head,relation,tail,time) knowledge Gt-1 Gt representation learning deleted true Qtrain facts facts (head,relation,tail,time) complEx scoring function negative positive sample sample evaluation of model (HITS@n,MRR) Figure 2 Process of hidden danger prediction This paper employs DE as the basic model of linkage prediction of the sequential knowledge graph, and combines the complEx scoring function for model training. DE has strong generalization ability. Any static KG embedded model can be extended to TKGC by using DE, and complete expression of temporal knowledge graph can be realized by using DE and complEx. E represents the set of all entities, and R represents the set of all relations. In the process of entity embedding, DE takes both entity and time as input and outputs entity embedding vector that integrates time information. In the same way, relationship embedding is completed. π‘Ž [𝑛]𝜎(πœ” [𝑛]𝜏 + 𝑏 [𝑛]), 1 ≀ 𝑛 ≀ 𝛾𝑑 π‘š [𝑛] = π‘Ž [𝑛], 𝛾𝑑 < 𝑛 ≀ 𝑑 π‘š is the output vector of the entity embedded at time 𝜏, and π‘š [𝑛] represents the nth element of the vector. Where 𝑒 ∈ 𝐸, 𝜏 is the corresponding time, π‘Ž ∈ ℝ γ€πœ” , 𝑏 ∈ ℝ are all vectors with learnable parameters associated with a particular entity or relationship, 𝜎(βˆ—) is the activation function. It can be seen from the formula that the first 𝛾𝑑 elements of the vector are used to capture time sequence features changing with time, while the remaining (1 βˆ’ 𝛾)𝑑 elements are used to capture static features. 𝛾 is between 0 and 1, which is the hyperparameter controlling the percentage of time sequence features. ComplEx model introduces complEx space and uses complex π‘Ž + 𝑏𝑖 to represent entities and rela- tions in the static knowledge graph, which can better model the asymmetric relations. 𝑓(β„Ž, π‘Ÿ, 𝑑) = 𝑅𝑒(< π’Ž , π’Ž , π’Ž >) =< 𝑅𝑒(π’Ž ), 𝑅𝑒(π’Ž ), 𝑅𝑒(π’Ž ) > +< 𝑅𝑒(π’Ž ), πΌπ‘š(π’Ž ), πΌπ‘š(π’Ž ) > +< πΌπ‘š(π’Ž ), 𝑅𝑒(π’Ž ), πΌπ‘š(π’Ž ) > βˆ’< πΌπ‘š(π’Ž ), πΌπ‘š(π’Ž ), 𝑅𝑒(π’Ž ) > π’Ž , π’Ž ∈ β„‚ is the complex vector corresponding to the head and tail entities, π’Ž ∈ β„‚ is the re- lation vector, Β·Μ… means take the conjugate vector, 𝑅𝑒(Β·) means take the real part, πΌπ‘š(Β·) means take the imaginary part, <Β·,Β·,Β·> represents the product sum of vector elements. The function is antisymmetric when π’Ž has only an imaginary part, and symmetric when π’Ž has only a real part. In the process of model training, deleted facts are also used to construct negative samples, and pos- itive and negative samples are combined. On the one hand, the prediction effect and stability of the model are improved. On the other hand, the prediction accuracy of the model for the facts that were correct in the past but are wrong now (i.e. deleted) is improved. 𝑄 = (β„Ž, π‘Ÿ, 𝑑, 𝜏) (β„Ž, π‘Ÿ, 𝑑, 𝜏) ∈ 𝑄 ∩ (β„Ž, π‘Ÿ, 𝑑, 𝜏) βˆ‰ 𝑄 184 𝑄 = (β„Ž, π‘Ÿ, 𝑑, 𝜏) (β„Ž, π‘Ÿ, 𝑑, 𝜏) βˆ‰ 𝑄 ∩ βˆƒ(β„Ž, π‘Ÿ, 𝑑, 𝜏 ) ∈ 𝑄 2.2. Evaluation indicators of predictive effectiveness In terms of the selection of evaluation indexes, same evaluation indexes as the static knowledge graph are used. For the quintuples in the test set, two types of candidate quintuples are constructed by replacing the head or tail entities, and the scores of these entities are output in order. 1. MRR (Mean Reciprocal Ranking) is the average proportion of quadrilateral ranking in the predic- tion result. The larger the MRR value is, the higher the triplet ranking with correct prediction will be, and the better the prediction effect of the model will be. | | 1 1 𝑀𝑅𝑅 = |𝑆| π‘Ÿπ‘Žπ‘›π‘˜ 1 1 1 1 = ( + + β‹―+ ) |𝑆| π‘Ÿπ‘Žπ‘›π‘˜ π‘Ÿπ‘Žπ‘›π‘˜ π‘Ÿπ‘Žπ‘›π‘˜| | 2. HITS@n refers to the average proportion of triples ranked less than n in the link prediction. | | 1 𝐻𝐼𝑇𝑆@𝑛 = β…‘(π‘Ÿπ‘Žπ‘›π‘˜ ≀ 𝑛) |𝑆| | S | is the number of triples, π‘Ÿπ‘Žπ‘›π‘˜ refers to the ranking of link predictions for the ith triplet. β…‘(βˆ—) is the indicator function. The value of the function is 1 if the condition is true, and 0 otherwise. In general, n is equal to 1, 3, or 10. The larger HITS@n is, the higher the probability that the predicted results rank less than n, and the better the link prediction effect is. 3. Construction and Prediction of Knowledge Graph 3.1. Basic information of data This paper takes the inspection records of security risks in a city in a certain year as the core data, with a total of 512840 pieces of data content, including 169 fields such as the enterprise, time, place, place, and specific content related to the inspection of potential risks. Delete the fields with more than 80% missing values, and the remaining 68 fields are divided into 19 indicators related to the company, 27 indicators related to hidden dangers, and 22 indicators of other categories. According to statistics, the top ten feature words of hidden dangers in logistics enterprises are secu- rity, fire extinguisher, logo, clutter, production, warehouse, use, channel, time and line. The top ten feature words of all enterprises in the city are security, fire extinguisher, logo, clutter, distribution box, exit, cover, record, training and use. As can be seen from the statistics, logistics enterprises and all enterprises have both common problems and great differences. If all analysis is done using enterprise data, considerations of operational safety, workplace, and problem urgency will inevitably be ignored, thereby affecting the reliability and value of real-world applications. 3.2. Hidden danger network construction and sample setting Using the method of fuzzy matching and enterprise classification, 14,831 pieces of data of logistics enterprises were selected, involving 4042 logistics enterprises. According to the top-level structure de- sign of knowledge graph, 11 fields are extracted and six types of quad pattern are designed, which is shown in Table 1. Table 1 Basic quadruple form and quantity head relation tail t enterprise features happen hidden danger category t enterprise category happen hidden danger category t 185 industry classification happen hidden danger category t hidden danger category appear address t hidden danger category appear place t hidden danger category co-occurrence hidden danger category t According to the concept layer's quadruple pattern, the knowledge graph under different time win- dows can be constructed. As shown in Figure 3, the time sequence changes of corresponding facts in the knowledge graph can be used to complete the construction of positive and negative samples. 8 D B 7 7 1 C B A B 10 C 9 9 2 A 2 D C 6 A A A C 11 C 12 3 B 3 5 A B 4 4 sample data 8 D B 7 1 C A B 10 C 9 2 D B A A 6 C 11 12 3 B A 4 B 5 Figure 3 Hidden danger network and positive and negative sample settings Where, the number represents the entity, the letter represents the relationship. And the green indi- cates that the current time window is deleted relative to the previous time window, that is, the negative sample; and red represents the real sample under the current time window, that is, the positive sample. 3.3. Prediction process and result analysis In this paper, DE is used as the embedded representation method of timing knowledge graph, the time information is embedded into the representation of entity and relationship, and the model is trained by combining the complEx scoring function. In this paper, 70% data is taken as the training set and 30% data as the verification set. The learning rate of the model is set as 10-3, and the embedding size of representation learning is set as 128. When using positive and negative samples for model training, ensure that the total number of samples for each training is less than 1024. Through the performance test and comparison of the verification set, 100 samples are taken as the negative sampling rate in this paper, that is, 100 negative samples are set for each prediction fact, including 50 for head entity replace- ment and 50 for tail entity replacement. The experiment found that this ratio achieved an appropriate trade-off between task performance and training time. When the loss value of the model is no longer reduced, the average ranking of the real entity is between 2 and 3. The evaluation results on the test set are shown in Table 2. Table 2 Evaluation indicators of model predictions Hit@10 Hit@3 Hit@1 MRR 0.87 0.59 0.33 0.51 It can be seen that Hit@10 of this model has achieved a good result, that is, the link prediction ranking corresponding to most correct facts is less than 10, and the link prediction ranking correspond- ing to more than half of correct facts is less than 3, which indicates that the ranking conforms to the facts well. 186 The entity importance under different time windows is analysed by using the measurement method of complex network centrality. Take point-degree centrality as an example, the five most important hidden danger categories under the three continuous time Windows are shown in Table 3. Table 3 Entity importance under different time windows t=17 t=18 t=19 safety equipment and emergency rescue plan and safety in production facilities implementation auxiliary system equipment other security operating environment and facilities management auxiliary system equipment safety equipment and material and facilities facilities operation behavior of the education and training of safety signs and identifiers practitioner practitioners other equipment and safety equipment and operating environment facilities facilities According to the importance of hidden trouble entities under different time windows, it can be seen that the emphasis of key hidden trouble under different time windows is different, and the attention of the same type of hidden trouble has time sequence change. For example, the indexes of safety equipment and facilities are all in the top five hidden danger categories in the three time windows, indicating their universality, but their importance is different in different time windows. The job environment class ranked the top five in t=17, and after t=18 was improved, t=19 again attracted attention. 4. Summary Combined with the previous experimental analysis, we found that the time-series knowledge graph can be used to predict the hidden dangers of logistics enterprises. The time-series knowledge graph can be combined with various information to learn in the way of graph structured data, and the triplet rela- tionship between data can be used to improve the ability of hidden dangers prediction. The method in this paper provides effective data support for daily inspection and hidden danger prevention of logistics enterprises, improves the comprehensiveness of hidden trouble investigation, and greatly reduces the probability of hidden trouble occurrence. 5. References [1] Aditya Pingle, Aritran Piplai, Sudip Mittalet al. Relext: Relation extraction using deep learning approaches for cybersecurity knowledge graph improvement[M]. Vancouver, BC, Canada: Asso- ciation for Computing Machinery, Inc,2019: 879-886. [2] Celebi R, Uyar H, Yasar E, et al. Evaluation of knowledge graph embedding approaches for drug- drug interaction prediction in realistic settings[J]. BMC Bioinformatics, 2019,20(1). [3] Sutphin C, Lee K, Yepes A J, et al. Adverse drug event detection using reason assignments in FDA drug labels.[J].Journal of biomedical informatics, 2020,110:103552. [4] Elluri L, Nagar A, Joshi K P. An Integrated Knowledge Graph to Automate GDPR and PCI DSS Compliance[M]. Abe N, Liu H, Pu C, et al. IEEE International Conference on Big Data. 2018:1266-1271. [5] Wang Yibao, Yang Tinghui. Visual analysis of Knowledge graph of Urban security Research [J]. Urban Development Studies,2019,26(03):116-124. [6] Rishab Goel, Seyed Mehran Kazemi, Marcus Brubaker, Pascal Poupart. Diachronic Embedding for Temporal Knowledge Graph Completion[J]. Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(04). [7] Trouillon T , Welbl J , Riedel S , et al. Complex Embeddings for Simple Link Prediction[J]. JMLR.org, 2016. 187