=Paper=
{{Paper
|id=Vol-3318/short18
|storemode=property
|title=Knowledge Management System with NLP-Assisted Annotations: A Brief Survey and Outlook
|pdfUrl=https://ceur-ws.org/Vol-3318/short18.pdf
|volume=Vol-3318
|authors=Baihan Lin
|dblpUrl=https://dblp.org/rec/conf/cikm/Lin22
}}
==Knowledge Management System with NLP-Assisted Annotations: A Brief Survey and Outlook==
Knowledge Management System with NLP-Assisted Annotations: A Brief Survey and Outlook Baihan Lin1,* 1 Columbia University, New York, NY 10027, USA Abstract Knowledge management systems (KMS) are in high demand for industrial researchers, chemical or research enterprises, or evidence-based decision making. However, existing systems have limitations in categorizing and organizing paper insights or relationships. Traditional databases are usually disjoint with logging systems, which limit its utility in generating concise, collated overviews. In this work, we briefly survey existing approaches of this problem space and propose a unified framework that utilizes relational databases to log hierarchical information to facilitate the research and writing process, or generate useful knowledge from references or insights from connected concepts. Our framework of bidirectional knowledge management system (BKMS) enables novel functionalities encompassing improved hierarchical note-taking, AI-assisted brainstorming, and multi-directional relationships. Potential applications include managing inventories and changes for manufacture or research enterprises, or generating analytic reports with evidence-based decision making. Keywords knowledge management, insight annotation, relational databases, natural language processing, machine learning 1. Introduction want the system to be able to automatically assign topic to some papers based on text data mining. The user can Knowledge management systems (KMS) are the driv- filter the papers by topics. Within each paper, during ing engines of modern day information technologies the reading, the scientist might want to log an insight (IT). These IT systems store data in parsed ways and or note on certain paragraphs. Sometimes the notes can retrieve knowledge insights to improve the information be about multiple papers, and their relationship can be understanding, team collaboration and process alignment in various types. These notes or insights also have topic within organizations and groups. As an engineering enti- tags, which can optionally be automatically curated. The ties in high demand for industrial researchers, chemical system can also generate useful concepts or knowledge or research enterprises and evidence-based decision mak- as well as their references to facilitate the research and ing, knowledge management systems are often used by writing process of the scientist. organizations to affect innovation performance and gen- We see from this example that the relationships be- erate accurate metrics on organizational capacity [1], but tween papers chosen in academic fields can have multiple, they can also be user-centric by centering the knowledge bidirectional relationships. Existing knowledge manage- base around individual users or customers [2]. ment systems for organizing research papers in scientific Take the application of reference management of aca- fields or organizing manufacture enterprises use directed demic researchers as an example. KMS are often used by acyclic graphs, Bayesian networks, and machine learning researchers to keep track of papers or subsets of papers [3], which have limitations in categorizing and organiz- [3]. Usually, the research information of different papers ing these multi-faceted insights or relationships. This is or references has meta information that can be filtered because many traditional databases are usually disjoint and sorted. An example scenario would be: a scientist with logging systems, which limit its utility in generat- logs or inputs a particular paper into a system, with each ing concise, collated overviews. In this work, we briefly entry containing many meta information about the pa- survey existing approaches in the general field of these pers. These meta information elements can be filtered knowledge management systems, and propose a unified or sorted (e.g., by year, journal, author, etc.). Each paper framework as a solution to these challenges. In our frame- might contain multiple concepts or topics, and each topic work, we describe a knowledge management system that might contain multiple paper. In some cases, we might utilizes relational databases to log hierarchical informa- tion with connected concepts. CIKM 22: Workshop on Human-In-the-Loop Data Curation, October 21, 2022, Atlanta, GA Back to the example problem of reference management, * Corresponding author. our KMS would utilize relational databases to log hierar- $ baihan.lin@columbia.edu (B. Lin) chical information to facilitate the research and writing https://www.neuroinference.com/ (B. Lin) process, or to help generate useful knowledge from ref- 0000-0002-7979-5509 (B. Lin) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License erences or insights from connected concepts. This would CEUR Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 enable novel functionalities encompassing improved hier- archical notetaking, AI-assisted brainstorming, and multi- (like the topics). These are important insights to keep the directional relationships. For instance, one can generate factories or warehouses in safety. reports given keywords or topics collating hierarchical The second user scenario example is evidence-based and intra-connected records. With these automatic anno- decision making. In large business entities, critical de- tations, the system can enable automatic curation of topic cisions are usually made with a group of market re- tags using text data mining. Other applications include searchers or consulting firms that come up with vari- managing inventories and changes for manufacture or ous analytic reports. A knowledge management system research enterprises or generating analytic reports with with AI-assisted insight annotation can provide a fast and evidence-based decision making. evidence-based solution by generating a report (given Although we have seen successful system designs in the keyword or topic as input) which curates from hier- commercial products such as Mendeley and recent com- archical and interaconnected records. This hierarchical munity efforts such as Open Research Knowledge Graph knowledge graph can serve as a useful primer in impor- (ORKG), we believe that our survey can still bring useful tant decision making processes and guide the investiga- and new insights on the practical considerations on the tors to locate relevant resources. intersections among machine learning, database manage- ment and human-system collaboration. In the following 2.3. Case studies sections, we will first briefly survey the existing knowl- edge management systems approaches, and propose a In this section, we outline three case studies that recent unified bidirectional KMS (BKMS) framework that uti- real-world knowledge management systems are likely lizes relational databases to log hierarchical information adopt to become more interconnected and intelligent. to facilitate the research and writing and generate helpful The concept of Internet of Things (IoT): The IoT advance- knowledge from references or insights from related con- ments consist of a series of disruptive digital technolo- cepts. We present a useful and novel system design for gies, semantic languages, and virtual identities that can this bidirectional information management, formulate a increases efficiency and effectiveness in daily life oper- few potential use-cases for this design, address the four- ations through interconnected communications among subset system of NLP-assisted annotations, and discuss devices and systems [4]. Other than these organizational future design considerations. benefit, IoT stimulates the innovation process in various aspects, through fast iterations of knowledge flow and information gathering [5]. In [6], researchers employ 2. An Applied Perspective structural equation modelling on a sample of 298 Italian firms from different sectors. Their study suggest that in- 2.1. Applications terconnected knowledge management systems facilitate There are different application domains for knowledge the creation of a open and collaborative ecosystem by management systems with relational databases and in- utilizing the internal and external flows of knowledge sight annotation enabled by machine learning, including and increasing internal knowledge management capacity, but not limited to reference manager for academic re- which in turn increases innovation capacity. searchers, education and research tool, consulting firm Reference architecture: In the era of Industry 4.0 [7], report generator with evidence-based decision making, smart warehouses are envisioned to host production that inventory management for manufacture or research en- contains modular and efficient manufacturing systems terprises, organizational tool for industries with high- and characterizes scenarios in which products control volume data, and internal auditing tool for customized their own manufacturing process. As in our user scenario employee metrics. of warehouse inventory management, an optimal refer- ence architecture would be the key to the warehouse knowledge management system. For instance, [8] de- 2.2. User scenarios scribes a pipeline to perform a series of systematic analy- Other than the reference management example in our ses to identify the key concerns and processes and eventu- introduction, we also include two additional applications. ally arrive at potential architecture of smart warehouses. The first one is managing inventories and changes for They conduct a case study at a large warehouse in the manufacture, chemistry or research enterprises. The in- food industry and illustrates that an introduction of a ventories or measurements of factories usually involves reference architecture can be effective and practical. dependency and hierarchical interactions. A knowledge Conversational recommendation systems: A conversa- management system that uses a relational database in- tional recommendation system (CRS) is a computer sys- stead of disjoint databases with separate logging systems tem that is able to have a conversation with a human can enable useful curation function to offer very useful user in order to make recommendations [9]. This is dif- and concise report regarding key events or phenomon ferent from traditional recommendation systems, which Figure 1: A unified framework of a knowledge management system with relational databases and NLP-assisted annotation do not interact with users. Often used in e-commerce, 4. NLP-Assisted Insight Annotation social media, and entertainment applications, CRS are becoming increasingly popular as they can provide a As shown in the annotation component of Figure 1, there more personalized and interactive experience for users, are several routes we can utilize natural language pro- but can pose additional challenges in managing differ- cessing to generate and annotate insights within our ent layers of knowledge at different states: the intent databases. We will elaborate on how they play in knowl- of the conversation, the entities matched by the intents, edge management systems and survey modern machine the long-term preferences of the users and similar users, learning methods in each of these routes below. their state-dependent preferences related to the current Semantic similarity: In principle, any sentence or para- contexts, and the relationships between different entities, graph embeddings can help us characterize our document intents and users. One practical examples is recommend- and inventories of interest. For instance, the Doc2Vec em- ing discussion topic to therapist during psychotherapy bedding [12] is a popular unsupervised learning model in real-time given automatically speech-transcribed dia- that learns vector representations of sentences and text logue records [10] and helpful visual analytics [11]. documents. It improves upon the traditional bag-of- words representation by utilizing a distributed memory that remembers what is missing from the current context. 3. Bidirectional KMS Framework SentenceBERT [13] is another popular option which mod- ifies a pre-trained BERT network by using siamese and Figure 1 outlines our framework of bidirectional knowl- triplet network structures to infer semantically mean- edge management systems (BKMS) with relational ingful sentence embeddings. With word or sentence em- databases and insight annotation powered by natural lan- beddings, we can embed the document entries from our guage processing (NLP). The user interface provides the relational databases into vectors, and then compute the entry points into our knowledge management systems. cosine similarity between the vector at certain turn and Different interfaces introduces different routes, but they an inventory entry. With that, for each text, we obtain all involve a parsing and extraction process to atomize a N -dimension score for the said property. For instance, the user inputs into nodes that connects in a small knowl- the inventory can be written guidelines that evaluate edge graph. This graph is then placed into a relational the usefulness of certain documents, say, a list of lead- database where their links are preserved. The orange ership principles that some companies use to evaluate and blue arrows indicates intro- and inter-database data a candidate’s resume, work report or performance re- flows. The relational databases include three parts. Some view form. And the relational database could be hosting databases in the relational databases are only used for an employee’s self reported performance review form. storage. Some are used for analysis and annotations. And The system can automatically compute a score based on some databases are kept to store annotated insights or each item of the guidelines and annotate these document other downstream analytical artifacts, which provide an entry accordingly. Other applications can be evaluat- additional data flow direction. ing the patient-doctor alignment from an automatically transcribed psychotherapy sessions based on a clinical be use as actionable knowledge graphs [25]. Recently, questionnaire inventory, as shown in [14, 15, 16]. there have also been increasing interests in a modern Topic modeling: In natural language processing and approach called neuro-symbolic AI [26, 27], where the machine learning, a topic model is a type of statistical well-founded knowledge representation and reasoning graphical model that help uncover the abstract “topics” from the symbolic perspective are integrated with deep that appear in a collection of documents. The topic mod- learning from the statistical perspective. This offers both eling technique is frequently used in text-mining pipeline effective predictive power and necessary explainability to unravel the hidden semantic structures of a text body. for many real-world applications. This can be very handy in annotating the database en- try. For instance, a user scenario could be in a clinical consumer-facing chatbot, where the dialogue between 5. Practical Considerations the client and agent is transcribed, and a topic model- When designing a interconnected and intelligent knowl- ing analysis is automatically performed and generate edge management systems for a domain-specific applica- a list of discussed topics and their scores based on se- tion, here are some practical questions to be considered: mantic similarity, as shown in [17]. Several state-of-the- art neural topic models include the Neural Variational • Database consideration: What are the storage ca- Document Model (NVDM) [18] (an unsupervised text pacities of this technology? modeling approach based on variational auto-encoder), • User interface: What visual and user interface is Gaussian softmax construction (GSM) [19] (a NVDM vari- preferred by users? ant), the Wasserstein-based Topic Model (WTM) [20], the • Organizational benefits: What specific organiza- Embedded Topic Model (ETM) [21] among others. tional functionality would this system provide Text summarization: When the scale of our databases over current systems? increases, maintaining the interpretability of our knowl- edge management system becomes more and more chal- • Latency and responsiveness: What are the syn- lenging. This expanding availability of documents and chronization capacities of this technology across entries inside the database cannot yield actionable in- devices? sights without proper aggregation. The field of auto- • Customization: Can users modify or customize matic text summarization deals with this problem by this system to their own preferences? producing a concise and fluent summary while preserv- • Security: Would this technology allow for secure ing key information content and overall meaning [22]. encryption or storage of higher value data? For instance, we can first group or cluster the database • Collaboration: Would this system allow for col- entries (such as paper abstracts, or reading notes as in laborative use by multiple stakeholders? our reference manager example) by their semantic sim- • Investigation: What kind of insights or investiga- ilarity or inferred topics. And then, within each group, tions do we wish to gain from this system? generate a condensed descriptions. A user case would • I/O: Would this system allow import or export be, automatically generating writing outlines or topics from other knowledge management systems? based on the available references and reading notes in a paper reference manager. In the active field of text Other than these practical questions to consider, a summarization, extraction and abstraction are the two more thorough design process would involve market main approaches. The extractive summarization tech- analysis (market size, emerging technologies, policies, niques generate summaries by choosing a subset of the challenges, new trends, and policies as in [28]), domain sentences in the original text, by computing first an inter- analysis (systematic activity for deriving, storing domain mediate representation of the text, then a sentence score knowledge to support the engineering design process as and finally a subset selection operation onto the original in [29]), business process modeling (i.e. identifying the texts [23]. The abstraction approach uses latent semantic lead processes and subprocess of outgoing products [30]) analysis, frequency-driven approaches [24] and topics and architecture design with viewpoints (stakeholder modeling which we cover above. concerns, context diagram, decomposition view, uses Symbolic reasoning: While topic modeling offers in- view, and deployment view [31, 32]). Sometimes, case terpretable subjects, and text summarization offers in- studies can also be useful to clarify the problem settings. terpretable paragraphs, the logic and causal relationship Since we are proposing the idea of introducing rela- between these insights can be arbitrary. The field of tional databases and various AI and symbolic techniques symbolic AI bridge this gap by introducing high-level in knowledge management systems, there are additional and human-readable symbolic representations into these future research challenges in relation to this proposition practical problems. They can potentially derive logic in terms of the human-system “collaboration” enabled by programming rules and semantic relationships that can these systems. Methodologically, tne machine learning engine that powers many human-in-the-loop (HIL) solu- [3] Y. M. Yee, C. L. Tan, R. Thurasamy, Back to ba- tions in data curation is reinforcement learning methods sics: building a knowledge management system, that have been demonstrated to effectively learn from hu- Strategic Direction (2019). man interactions with the speech- or text-based systems [4] V. Scuotto, A. Ferraris, S. Bresciani, Internet of [33]. Operationally, from the human side, we need to things: applications and challenges in smart cities. encourage people to contribute their knowledge and ex- a case study of ibm smart city projects., Business pertise (e.g. crowdsourcing) by creating an effective user Process Management Journal (2016). interface that allows people to easily log in, search for [5] Y. Malhotra, Knowledge management for e- and find the information they need.From the system side, business performance: advancing information strat- we need to ensure that knowledge is effectively captured egy to “internet time”, Information Strategy: The and stored, consistently updated to keep the knowledge Executive’s Journal 16 (2000) 5–16. up to date and accuratem and manage different types of [6] G. Santoro, D. Vrontis, A. Thrassou, L. Dezi, The knowledge such that it is accessible to the right people. internet of things: Building a knowledge manage- Finally, there are also ethical and societal considerations ment system for open innovation and knowledge when we use machine learning and AI to encode knowl- management capacity, Technological forecasting edge related to human biometrics and well-beings, as and social change 136 (2018) 347–354. reviewed in [34]. [7] H. Lasi, P. Fettke, H.-G. Kemper, T. Feld, M. Hoff- mann, Industry 4.0, Business & information sys- tems engineering 6 (2014) 239–242. 6. Conclusions [8] M. van Geest, B. Tekinerdogan, C. Catal, Design of a reference architecture for developing smart In summary, we describe the applied problem of a knowl- warehouses in industry 4.0, Computers in industry edge management systems that host information that 124 (2021) 103343. contain multiple and bidirectional relationships in layers [9] Y. Sun, Y. Zhang, Conversational recommender of meta data. We briefly survey the application domains, system, in: The 41st international acm sigir con- user scenarios and the existing approaches in the fields, ference on research & development in information and eventually propose a framework for a knowledge retrieval, 2018, pp. 235–244. management system with relational database and NLP- [10] B. Lin, G. Cecchi, D. Bouneffouf, Supervi- assisted insight annotation. In our framework, a knowl- sorbot: Nlp-annotated real-time recommenda- edge management system can comprise a user interface tions of psychotherapy treatment strategies with to provide input and present output relating to one or deep reinforcement learning, arXiv preprint more documents or sensors. The system maintains a re- arXiv:2208.13077 (2022). lational database storing information relating to the one [11] B. Lin, Voice2alliance: automatic speaker diariza- or more documents, and a knowledge parsing unit, in tion and quality assurance of conversational align- communication to the user interface and the server, can ment, in: INTERSPEECH, 2022. determine at a first time instance the metadata informa- [12] Q. Le, T. Mikolov, Distributed representations of tion elements associated with the particular document sentences and documents, in: International confer- entry. The databases can then be automatically anno- ence on machine learning, PMLR, 2014, pp. 1188– tated with NLP techniques such as semantic similarity 1196. analysis, topic modeling, text summarization and sym- [13] N. Reimers, I. Gurevych, Sentence-bert: Sentence bolic reasoning. A knowledge graph can then be learned embeddings using siamese bert-networks, Preprint from these language models to be used as interpretable arXiv:1908.10084 (2019). insights for real-world downstream tasks. [14] B. Lin, G. Cecchi, D. Bouneffouf, Deep annotation of therapeutic working alliance in psychotherapy, References Preprint arXiv:2204.05522 (2022). [15] B. Lin, Personality effect on psychotherapy out- [1] B. Lawson, D. Samson, Developing innovation ca- come: A predictive natural language processing pability in organisations: a dynamic capabilities framework, arXiv preprint (2022). approach, International journal of innovation man- [16] B. Lin, G. Cecchi, D. Bouneffouf, Working alliance agement 5 (2001) 377–400. transformer for psychotherapy dialogue classifica- [2] M. A. Kabir, J. Han, J. Yu, A. Colman, User- tion, arXiv preprint arXiv:2210.15603 (2022). centric social context information management: an [17] B. Lin, D. Bouneffouf, G. Cecchi, R. Tejwani, Neural ontology-based approach and platform, Personal topic modeling of psychotherapy sessions, Preprint and Ubiquitous Computing 18 (2014) 1061–1083. arXiv:2204.10189 (2022). [18] Y. Miao, L. Yu, P. Blunsom, Neural variational infer- ence for text processing, in: International confer- (2022). ence on machine learning, PMLR, 2016, pp. 1727– [34] B. Lin, Computational inference in cognitive sci- 1736. ence: Operational, societal and ethical considera- [19] Y. Miao, E. Grefenstette, P. Blunsom, Discovering tions, arXiv preprint arXiv:2210.13526 (2022). discrete latent topics with neural variational infer- ence, in: International Conference on Machine Learning, PMLR, 2017, pp. 2410–2419. [20] F. Nan, R. Ding, R. Nallapati, B. Xiang, Topic model- ing with wasserstein autoencoders, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 6345–6381. [21] A. B. Dieng, F. J. Ruiz, D. M. Blei, Topic modeling in embedding spaces, Transactions of the Association for Computational Linguistics 8 (2020) 439–453. [22] M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, K. Kochut, Text sum- marization techniques: a brief survey, Preprint arXiv:1707.02268 (2017). [23] A. Nenkova, K. McKeown, A survey of text summa- rization techniques, in: Mining text data, Springer, 2012, pp. 43–76. [24] T. E. Dunning, Accurate methods for the statis- tics of surprise and coincidence, Computational linguistics 19 (1993) 61–74. [25] M. Garnelo, M. Shanahan, Reconciling deep learn- ing with symbolic artificial intelligence: represent- ing objects and relations, Current Opinion in Be- havioral Sciences 29 (2019) 17–23. [26] A. d. Garcez, L. C. Lamb, Neurosymbolic ai: the 3rd wave, Preprint arXiv:2012.05876 (2020). [27] J. Zhang, B. Chen, L. Zhang, X. Ke, H. Ding, Neural, symbolic and neural-symbolic reasoning on knowl- edge graphs, AI Open (2021). [28] G. Giudici, A. Milne, D. Vinogradov, Cryptocurren- cies: market analysis and perspectives, Journal of Industrial and Business Economics 47 (2020) 1–18. [29] Ö. Köksal, B. Tekinerdogan, Feature-driven domain analysis of session layer protocols of internet of things, in: 2017 IEEE International Congress on Internet of Things (ICIOT), IEEE, 2017, pp. 105–112. [30] M. Weske, Business process modelling foundation, in: Business Process Management, Springer, 2019, pp. 71–122. [31] P. Clements, D. Garlan, R. Little, R. Nord, J. Stafford, Documenting software architectures: views and beyond, in: 25th International Conference on Soft- ware Engineering, 2003. Proceedings., IEEE, 2003, pp. 740–741. [32] E. Demirli, B. Tekinerdogan, Software language en- gineering of architectural viewpoints, in: European Conference on Software Architecture, Springer, 2011, pp. 336–343. [33] B. Lin, Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlook, arXiv preprint arXiv:2210.13623