Using Lexical Link Analysis as a Tool to Improve Sustainment Edwin Stevens, Ying Zhao ∗ Naval Postgraduate School, Monterey, CA, USA Abstract logistical tail wag the operation dog,” in other words, a good logistician does not want to be the reason that the mission A major challenge in the the complex enterprise of the US can’t go on. Limited manpower, funding, storage space, and Navy global materiel distribution is that when a new opera- resources for repair are all in high demand. A good system tion condition occurs, the probability of fail or demand model of a Naval ship part or item needs to modify to adapt to the needs to be in place to determine the most efficient and ef- new condition. Meanwhile, historical supply databases in- fective method of stocking, forward staging, or contracting clude demand patterns and associations that are critical when for the materials that have the highest likelihood of demand, the new condition enters the system as a perturbation or dis- balanced with the potential impact of failure. Since even ruption which can propagate through the item association net- one ship has hundreds of thousands of failed parts, many work. In this paper, we first show how the two types of item of which could cause a “redline.” It is of critical importance demand changes can be interacted and integrated to calculate to consider the activities of all the parts as a complex system the total demand change (TDC). We show a use case on how and predict the demand as a whole so that the supply system to apply the lexical link analysis (LLA) to discover the item is as intelligently designed as possible in order to quickly association network that propagates the TDC. handle part failures. Introduction Uncertainty, Perturbation and Association There are many challenges in the complex enterprise of the US Navy global materiel distribution. Forward deployed The probability of fail of a part can be affected by many US Navy ships, particularly in the high operating tempo factors. We need to consider the uncertainty, disruption and (OPTEMPO) areas such as the Seventh and Fifth Fleet, have perturbation that can impact the logistics plans as a whole. challenges that arise in receiving logistical support when For example, uncertainty factors related to environment and part failures occur. These failures manifest as either a de- events in wide geographic areas, such as, weather change mand on the supply system, a casualty report (CASREP), or or mission change from a peace time to a conflict time, or a request for technical assistance. The toughest challenges a sudden event can cause a perturbation and disruption for arise when a high impact part fails, and is not immediately previous logistics and supply plans. Previously high impact available. This can cause a “redline,” or a failure that stops but low fail parts may suddenly become in high demand. the unit from being able to complete its’ mission until the The probability of fail is also embedded in the historical problem can be resolved. The goal of any operational com- supply and maintenance data. A failed part is considered to mander is 100% operation availability (AO), meaning their be fixed before a new one is ordered. A part order frequency unit is always ready to be tasked for any situation that arises. in the historical supply data reflects its demand if the part can Failures in contentious environments will stop the mission, not be repaired within a certain period of time. The demand and could have great effects on the international and politi- data in the supply data reflects partial probability of fail. cal situation. The goal of a Navy logistician is to “not let the The complexity of predicting total probability of fail for ∗ a large list of the items calls for the integration of methods This will certify that all author(s) of the above article/paper are in data fusion, data mining, causal learning, and optimiza- employees of the U.S. Government and performed this work as part tion for all the elements in a logistics when facing partic- of their employment, and that the article/paper is therefore not sub- ject to U.S. copyright protection. No copyright. Use permitted un- ular uncertainty and perturbation. The goal of this paper is der Creative Commons License Attribution 4.0 International (CC to demonstrate the techniques such as data mining and lexi- BY 4.0). In: Proceedings of AAAI Symposium on the 2nd Work- cal link analysis (LLA) to recalculate the probability of fail shop on Deep Models and Artificial Intelligence for Defense Appli- for the previously high impact and low failure parts or items cations: Potentials, Theories, Practices, Tools, and Risks, Novem- when the whole system facing a perturbation, uncertainty, ber 11-12, 2020, Virtual, published at http://ceur-ws.org disruption, or a ”redine” failure. Lexical Link Analysis (LLA) Type 2): Item associations with other items where the asso- A data mining tool used for this research is Lexical Link ciations could be due to physical linkages or linked demand Analysis (Zhao,MacKinnon,and Gallup 2015). LLA is an based on past business practices. If an item i is ordered, item unsupervised machine learning method and describes the j is also likely to be ordered based on the historical data. characteristics of a complex system using a list of attributes Type 2) DCs can be mined from historical potentially big or features, or specific vocabularies or lexical terms. Be- data, Type 1) DCs may come from expert and engineering cause the potentially vast number of lexical terms from big knowledge and simulations. data, the model can be viewed as a deep model for big data. In Figure 2, Associj measures how strong item i and j are For example, we can describe a system using word pairs or demanded together. Probability and lift are the two measures bi-grams as lexical terms extracted from text data. LLA au- defined in Equation (1) and Equation (2) in LLA to measure tomatically discovers word pairs, and displays them as word the strength of an association. pair networks. Figure 1 shows an example of such a word network discovered from data. “Clean energy” and “renew- probij = demand of item i, j together out of demand of item j able energy” are two bi-gram word pairs. For a text docu- (1) ment, words are represented as nodes and word pairs as the links between nodes. A word center (e.g., “energy” in Fig- demand of item i, j together out of demand of item j ure 1) is formed around a word node connected with a list of lif tij = demand of item i out of all demands other words to form more word pairs with the center word (2) “energy.” In LLA, we first use lif tij to filter out the associations that are not strong enough, then apply probij to compute the Discovering Item Associations Using LLA total demand change (TDC) for item i as in Equation (3) M N Bi-grams allow LLA to be extended to numerical or cate- X X gorical data. For example, using structured data, such as at- T DCi = DCi |Cm + probij ∗ T DCj (3) tributes from supply chain databases, we discretize numeric m=1 j=1 attributes and categorize their values to word-like features. In this paper, we show LLA can be used to compute the The word pair model can further be extended to a context- association network, probij , and lif tij from historical de- concept-cluster model (Zhao and Zhou 2014). A context can mand data. When there is a perturbation such as a new opera- represent a location, a time point, or an object shared across tion condition Cm occurs that generates a DCj |Cm for item data sources. For example, the quarters in a year can be one j, it causes a T DCj for item j; meanwhile, T DCj propa- of the contexts for item supply data. Items (parts) are the gates through the discovered association network from LLA concepts. to affect the whole demand system and forward predictions In this paper, we use LLA for the structured data of sup- as shown in Equation (3). ply databases. We want to show that the bi-gram generated by LLA can also be a form discovery of association among Data Description and Initial Analysis Results items demand for a Navy supply database. Currently, a part is reviewed to be stocked if it has more The common consensus is that data-driven analysis or than two reorders in one year. This simple system is effec- data mining can discover initial statistical correlations and tive overall, but does not consider the reasons for failure, associations from big data. the reason it is being reordered, or the effect that the failure Figure 2 shows conceptually how the associations and has on the ship. There are a small amount of parts, called correlations are discovered by LLA. We anticipate the de- “maintenance assist modules” that are carried onboard ev- mand change (DC) an item i might come from two types of ery ship due to engineering specifications calling for imme- sources: Type 1): A collection of outside perturbations such diate availability if needed, but that is not enough to prevent as the change of missions or new operational conditions; and “redline” failures. To show the feasibility of our method- ology, we compiled a large selection of demand data over the last nine years, containing over 1,000,000 individual de- mands. This data was then compiled by Item Mission Essen- tiality Code (IMEC - impact code), quarter in which the de- mand occurred, and number of demands logged. Next, LLA Figure 2: Total demand change (TDC) caused by new con- Figure 1: An example of lexical link analysis ditions and associations was applied the data to help discover historical associations contexts set to be ship type, unit identification code, IMEC, among the failures. The associations reflect the items that or shorter time period than the quarters, and then apply LLA are ordered in the same contexts (e.g., the same quarter or to search for causal associations at higher or lower resolu- same ship) historically. Associated parts might be stockpiled tions, or by stricter or looser requirements. In comparison, in the same manner should one fail suddenly in a new and there is a current tool in place called Predictive Risk Spar- disrupted condition. On a sample run, there were 50 con- ing Matrix (PRiSM), which has been able to identify parts nections found across 65,000 demands as illustrated in Fig- in various C4I systems that have had real world demands, ure 3, we only considered the associations among the high which would not have been identified under the standard sys- impact items (4) with quarterly demand > 51 (high) or low tem. PRiSM uses mathematical algorithms from inventory (= 1). For example, item “lwm048749” and “lwm048745” sparing models to determine potential failures, and these al- both have high impact 4, while “lwm048749” had high de- gorithms could possibly be used in coordination with sim- mand in some quarters when “lwm048745” had low de- ulation and LLA to better determine future needs. We will mand. When drilling down using LLA as shown in Figure 4, also leverage the liaisons from NAVSUP and DLA at the “lwm048749” had high demand in two quarters (10 and 18) Fifth and Seventh fleet naval bases, whose job is to track de- when “lwm048745” had low demand. “lwm048749” had mand, and then to work with the DoD logistics organizations high demand in two quarters out of the total 20 quarters. The to improve operational availability. The LLA tool could be probability for the association of the two items is 100% and tested and then given to these liaisons to help them and to lift is 10. Should “lwm048745” demand more in a new oper- improve the overall area of operation (AO) for forward de- ation condition, associated parts such as “lwm048749” may ployed ships and improve sustainment. demand even more in the new condition. LLA calculates the lift measure that is similar to the counterfactual reasoning ACKNOWLEDGMENTS in causal learning (Mackenzie and Pearl 2018; Pearl 2018; Authors would like to thank the Office of Naval Research Zhao, MacKinnon, and Jones 2019), i.e., that there is indeed (ONR)’s Naval Enterprise Partnership Teaming with Uni- causal relationship between two demands. versities for National Excellence (NEPTUNE 2.0) program. The views and conclusions contained in this document are Conclusion and Future Work those of the authors and should not be interpreted as repre- In this paper, we showed the feasibility on how to apply LLA senting the official policies, either expressed or implied of to improve demand change predictions for a complex Navy the U.S. Government. supply database. In the future research, we will consider the association References Book with Multiple Authors Mackenzie, D. and Pearl, J. 2018. The Book of Why: The New Science of Cause and Effect. Penguin. Journal Article Zhao, Y. and MacKinnon, D.J. and Gallup, S.P., 2015. Big data and deep learning for understanding DoD data. Jour- nal of Defense Software Engineering, Special Issue: Data Mining and Metrics, July/August 2015, Page 4-10. Lumin Publishing ISSN 2160-1577. Proceedings Paper Published by a Society Pearl, J. 2018. The Seven Pillars of Causal Reasoning with Reflections on Machine Learning. Retrieved from http://ftp.cs.ucla.edu/pub/sta tser/r481.pdf Zhao, Y. and Zhou, C. 2014. System and method for knowl- Figure 3: Total demand change (TDC) caused by new con- edge pattern search from networked agents. US Patent ditions and associations 8,903,756. Proceedings Paper Published by a Press or Publisher Zhao Y., MacKinnon, D.; and Jones, J. 2019. Causal Learn- ing Using Pair-wise Associations to Discover Supply Chain Vulnerability. Proceedings of the 11th International Con- ference on Knowledge Discovery and Information Retrieval (KDIR 2019), September 17-19, 2019, Vienna, Austria. Figure 4: LLA allows a drill-down to see how many times (quarters) the two items are associated