A Cognitive Automation Approach for a Smart Lending and Early Warning Application [Industrial and Application paper] Ermelinda Oro Massimo Ruffolo Fausto Pupo High Performance Computing and High Performance Computing and Altilia.ai Networking Institute of the National Networking Institute of the National Rende (CS), Italy Research Council Research Council fausto.pupo@altiliagroup.com Altilia.ai Altilia.ai Rende (CS), Italy Rende (CS), Italy linda.oro@icar.cnr.it massimo.ruffolo@icar.cnr.it ABSTRACT and alternative sources of information (such as social and on- The rapid development of Internet and the dissemination of in- line media, corporate websites and online financial document formation and documents through a myriad of heterogeneous repositories). data sources is having an ever-increasing impact on the financial Traditional approaches are not sufficient to effectively and domain. Corporate and Investment Banks (CIBs) need to improve efficiently exploit information contained in these sources. Indeed, and automate business and decision-making processes simplify- the ability to select, collect, analyze and interpret big data requires ing the way they access data sources to get alternative data and Artificial Intelligence (AI) algorithms capable of automatically and answers. Manual or traditional approaches to data gathering are efficiently searching, reading and understanding documents and not sufficient to effectively and efficiently exploit information content designed for humans. contained in all available data sources and represent a bottle- In this paper, we present a cognitive automation approach and neck to processes automation. This paper presents a cognitive the related system, along with a financial application, that au- automation approach, that makes use of Artificial Intelligence (AI) tomate and simplify business and decision-making tasks and algorithms for automatically and efficiently searching, reading processes requiring human cognitive abilities. Presented cogni- and understanding documents and contents intended to humans. tive automation approach allows CIBs to obtain answers and The paper also presents the system that implements the proposed analysis useful to improve the ability of different bank areas to approach by an application in the area of financial risk evaluation manage lending processes, forecast situations involving risks, and lending automation. The presented approach allows CIBs facilitate lead generation, and optimize sales activities. Examples to obtain answers and analysis useful to improve the ability of of required answers, alternatively referred to as data points in different bank areas to manage lending processes, forecast sit- this paper, are: entities and relationships between them, yes/no uations involving risks, facilitate lead generation, and develop answers, sentiments, perceptions, and opinions. customized marketing and sales strategies. The rest of the paper is organized as follows: Section 2 de- scribes related work useful to comprehend modules of the pro- KEYWORDS posed system. Section 3 introduces the proposed approach and related system. Section 4 presents how we solve some needs Augmented Intelligence, Machine Reading Comprehension, Ques- of CIBs by implementing a smart lending and early-warning tion Answering, Cognitive Automation, Heterogeneous Data, Fi- application. Finally, section 5 concludes the work. nancial Services, Smart Lending, Early Warning, Information Ex- traction, Natural Language Processing, Document Layout Analysis. 2 RELATED WORK 1 INTRODUCTION The proposed approach and system encompass strong and hard Financial organizations are constantly looking for innovative capabilities in machine reading comprehension (MRC) that ex- ways to generate opportunities, automate and optimize business ploit pre-trained language models and human-in-the-loop machine and decision-making processes, reduce risks and mitigate ad- learning. In this section, we briefly review related work regarding verse events. In order to build long-term partnerships with their these main aspects. customers, the Corporate and Investment Banks (CIBs) need to develop customized marketing and sales strategies, and at the Machine Reading Comprehension. Machine Reading Com- same time manage financial risks, based on a deep knowledge prehension (MRC) is the ability to answer questions asked in nat- of corporate customers and markets in which they operate. The ural language by automatically reading from texts. The objective answers to CIBs’ questions about the entities involved in the is to greatly simplify the way in which humans interrogate large business and decision-making processes must be sought within a volumes of information sources [3]. MRC is related to Natural myriad of heterogeneous data sources. Financial markets change Language Processing (NLP) and more specifically to Natural Lan- rapidly, therefore CIBs need to quickly process big data available guage Understanding (NLU), which refers the ability of machines in both traditional data sources (such as financial statements) to understand natural language. NLU is considered an AI-hard problem and all its activities can be thought within a MRC frame- © 2020 Copyright for this paper by its author(s). Published in the Workshop Proceed- ings of the EDBT/ICDT 2020 Joint Conference (March 30-April 2, 2020, Copenhagen, work [10]. MRC allows for exploring many aspects of language Denmark) on CEUR-WS.org. Use permitted under Creative Commons License At- understanding, simply by posing questions. MRC can also be tribution 4.0 International (CC BY 4.0) seen as the extended task of question answering (QA). Recently, MRC methods have attracted a lot of attention among poorly match the characteristics of the data of real business con- researchers and scholars around the world. Indeed, there have texts, such as the financial sector, where a complex language been many new datasets for reading comprehension developed with specialized terminology is used. in recent years, such as: SQuAD [22], NEWSQA [26], SearchQA To facilitate the learning of MRC models in the financial [6], TriviaQA [9], HotpotQA [28], the latter requires multi-hop domain, it is necessary to develop methods and interfaces for reasoning over the paragraphs, and ReCoRD [30] and COSMOS human-in-the-loop machine learning. Using these tools, humans QA [8] that are designed for challenging reading comprehension can transfer domain knowledge to machines by annotating and with commonsense reasoning. However, these datasets mainly validating datasets and models that can be used in the learning concern with understanding general text, and they are not related process. Currently, in the literature, there are some weakly su- to specific knowledge domains. With deep learning (DL), end- pervised machine learning methods and systems that allow for to-end models have produced promising results on some MRC creating annotated datasets from a human-driven perspective. tasks. Unlike traditional machine learning, these models do not For example, Snorkel2 [23], based on data programming para- need to engineer complex features. Deep learning techniques for digm [24], is a recently proposed framework that enables users MRC have achieved very high performances on large standard to generate large volumes of training data by writing labeling datasets in general domains [4, 14, 29] and more recently, big suc- functions (such as rules and patterns) that capture domain knowl- cesses have been obtained with approaches based on Pre-trained edge. By using the data programming, such labeling functions Language Models. can vary in accuracy and coverage, and they may be arbitrarily correlated. Other weakly supervised machine learning methods Pre-trained Language Models. We are entering the "Golden are for instance: Prodigy3 , Figure Eight4 , Amazon Mechanical Age of NLP"1 . With BERT of Google AI Language [5], initially Turk5 . These methods can use and be combined with: (i) transfer published in 2018 as e-print version on ArXiv, which obtained learning [17] that exploits labeled data, parameters, or knowledge outstanding performances in multiple NLP tasks (like sentiment available in other tasks to reduce the need for labeled data for the analysis, question answering, sentence similarity), pre-training specific new task, (ii) active learning [25] that select data points with fine-tuning has become one of the most effective and used for human annotators to label, and (iii) reinforcement learning method to solve NLP related problems. Compared to the word- [20] that enables learning from feedback received through in- level vectors (e.g. Word2Vec [13] released in 2013 and still quite teractions with an external environment. Weakly human-driven popular, Glove [18], and FastText [1]) BERT trains sentence-level methods can facilitate the adoption of MRC methods in complex vectors and get more information from context. Before BERT, domains, such as the financial one, in order to automate and sim- other pre-trained general language representations have been plify the extraction and interrogation of data of various formats introduced. ELMO [19], which uses a bi-directional LSTM, gen- in heterogeneous sources. For these reasons, in our approach, we eralizes traditional word embedding research along a different implement human-driven annotation methods. dimension extracting context-sensitive features. OpenAI GPT [21] demonstrates that greater results can be obtained by gen- 3 COGNITIVE AUTOMATION APPROACH erative pre-training of a language model on a diverse corpus of In this section we present the proposed approach useful to imple- unlabeled text, followed by discriminative fine-tuning on each ment cognitive automation in decisional and operational business specific task. ULMFiT [7] uses LSTM and produces contextual processes. Key steps of the presented approach are: token representations. ULMFiT has been pre-trained from un- • Search, perform layout analysis, and classify documents. labeled text and fine-tuned for a supervised downstream task. • Dynamically exploit the knowledge of users for training Unlike previous papers, BERT uses a bi-directional Transformer. and correction of the extraction algorithms thus enabling Transformers were introduced from Vaswani et at. [27]. After, a continuous learning. lot of BERT-based activities in natural language processing and • Extract answers about relevant questions concerning the understanding have shown even better results than BERT. The entities involved in business processes by exploiting ma- model ERNIE [31] is pre-trained by masking semantic units such chines capabilities of reading and comprehend documents. as entity concepts, rather than tokens. Liu et al. [12] measure • Harmonize and store extracted information in knowledge the impact of many key hyperparameters and training data size graphs. and present RoBERTa. Lan et al. [11] present ALBERT that im- • Explore obtained information, and visualize synthetic and plements two parameter-reduction techniques to lower memory easily interpretable charts. consumption and increase the training speed of BERT. In this paper, we use a BERT-based MRC method that allows us the In the following, we describe modules of the system shown in extraction of data points. figure 1 that implements the proposed approach. Human-in-the-Loop machine learning. Deep learning, in Documents and Contents Gathering and Analysis. This particular when it is applied to unstructured data, needs very module allows, through specific connectors and methods of web large training sets to learn the parameters and hyperparame- scraping and wrapping, the acquisition of heterogeneous con- ters, and the desired models [22]. Therefore, despite the obvious tents and documents from different information sources. In order advantages of deep learning-based MRC systems, their use is to obtain the machine-readable format, it processes image docu- often limited to an academic context where the performance of ments by using optical character recognition (OCR) algorithms. MRC techniques are tested on artificial datasets. These datasets Then, it applies document layout analysis and understanding 2 Snorkel https://www.snorkel.org/ 3 Prodigy https://prodi.gy 1 https://medium.com/@thresholdvc/neurips-2019-entering-the-golden-age-of- 4 Figure Eight https://www.figure-eight.com nlp-c8f8e4116f9d 5 Amazon Mechanical Turk https://www.mturk.com Figure 1: Cognitive Automation System. algorithms, also based on spatial reasoning [15, 16], to recognize (ii) Reader that takes as input the question and the portions structures of the documents (e.g.: columns, sections, tables, lists chosen by the Retriever and outputs the most probable of records) and the reading order. Finally, the module enables for answers it can find. This sub-module is based on a pre- indexing documents and their portions. trained deep learning model. The model is essentially a PyTorch version of the well known NLP model BERT [5], Training sets Modeling. This module allows the human- which is made available by Hugging Face7 . To fine-tune driven annotation of portions of documents that answer spe- the model, created training sets in the modeling phase are cific questions exploiting a semi-automatic interactive and itera- exploited. tive process. This process involves the user by means of actions, (iii) Selector that compares the answers’ scores obtained by mainly visual and/or based on simple rules, aimed at creating using an internal function and outputs the most likely training sets for deep learning algorithms. answer according to the scores. (iv) A graphical user interface that enables human-machine Machine Reading Comprehension (MRC). This module interaction used to implement reinforcement learning. By allows for learning models that extract data from documents in exploiting a graphical user interface that highlights re- the form of answers to questions in natural language and it is sults on portions of documents, users validate and give based on different components: feedbacks to the deep learning algorithms that learn and improve performance by exploiting the user feedbacks. (i) Retriever that selects a list of documents and portions that are most likely to contain the answer of a question obtained as input. It is implemented as a voting system Data Harmonization. This module enables the manipulation that considers different versions of matching (e.g., based in a scalable way of data by using workflows based on Spark8 . on Elasticsearch6 , DrQA [2] Reader that uses TF-IDF fea- Workflows enable users to visually create complex processes tures exploiting uni-grams and bi-grams, and S-Reader that allows for gathering and processing data, performing data [14] that uses different embeddings and hyperparameters analysis, storing results in knowledge graphs, simply by combin- with respect to DrQA). ing and concatenating blocks. A block embeds algorithms that 7 Higging Face Transformers https://github.com/huggingface/transformers 6 Elasticsearch https://www.elastic.co/ 8 Spark https://spark.apache.org/ implement a specific task, for instance, the learned model for the geographical distribution of a company’s debts, credits, and extracting data points, descriptive, predictive, and prescriptive revenues, and the volume of R&D investments. analytics. For the same task different blocks that embed different For the specific application, the proposed approach enables for logics (e.g.: various ways to collect data depending on the formats extracting interesting data points in a scalable way from a huge of sources) can be used. amount of web sources related to a large number of companies. Data points enrich different aspects of customer profiles, such Data Storage. Obtained results, including answers and meta- as Environment, Society, Governance (ESG) knowledge, which, data (e.g., the paragraphs where the answer was found and the for instance, can be used by credit scoring algorithms. The im- title of the document), are stored into knowledge graphs (KGs). plemented web scraping tools are used to download news and The current implementation of KGs is based on a multi-structured financial documents from websites of companies or from the SEC database that combines information retrieval capabilities with (Securities and Exchange Commission) website, and to collect the ability to store data as graph databases. information and reviews from booking websites. These tools are flexible, easily configurable and maintainable. More in detail, the Data Exploration and Visualization. Results can be ex- web scraping process consists of the definition of a configuration plored through application programming interfaces (APIs) that file for each different typology of websites to scrape. The wrapper allow integration with external applications, and they can be uses DOM information and XPath along with similarities between displayed in reports, dashboards, and presentations that visu- different web sites reducing work needed to design wrappers. ally track, analyze and show key performance indicators (KPI), In addition, which kind of data/information to extract from the metrics and key data points. websites can be defined by a data model to fill. For instance, in order to collect information about restaurants and hotels the 4 SMART LENDING AND EARLY WARNING scraping tools navigate booking websites extracting reviews and attributes such as authors, title, date and all relevant info. In APPLICATION order to collect PDF documents (e.g., annual reports, 10-k forms, The rapid development of web content and the dissemination notes to balance sheets) the scraping tools navigate the compa- of information through social networks, blogs, and newspapers nies’ websites searching the sections investor relation and press brought an ever-increasing impact on financial domain. How to release. Alternatively, the scraping tools download documents rapidly and accurately mine the key information from big data is from financial document providers like SEC9 . Downloaded PDF a challenging problem to study for researchers, and has become files are processed by using document layout analysis algorithms, one of the key issues for investors and decision-makers. Indeed, even exploiting optical character recognition (OCR) techniques the ability to automatically answer business questions enables when needed, to extract portions (such as columns, paragraphs, cognitive automation in decisional and operational business pro- tables, notes). Then, the different portions of documents (along cesses in different Corporate and Investment Banks (CIBs) areas. with their relations, information of reading order, link to the CIBs need to decide if it is convenient to grant a loan to a com- original document and metadata) are stored in knowledge graphs pany, to know the risk conditions of their customers portfolio, and indexed in the system to be furthermore elaborated. and to develop customized marketing and sales strategies. To this end, CIBs need to have a deep knowledge and to perform a 4.2 Training sets Modeling and MRC careful evaluation of: During the training set modeling phase, a user can define labeling (i) corporate customers (such as, know board members, the functions or visually annotate label-entity or question-answer environmental impact of the business, how they are per- pairs looking at input documents and information stored in the ceived, the solidity of their business), system. Figure 2 shows the graphical user interface that aids the (ii) markets in which their customers are located and oper- creation of labeling functions. ate (e.g., solidity of the market, information about used commodities, competitors). In practical terms, CIBs asked for a system capable to automati- cally: (i) answer to specific questions asked in natural language, i.e. extract data points, (ii) visualize queryable and navigable cus- tomer profiles that can be used for credit scoring, early warning, and marketing and sales activities. In the following, we describe our solution that is based on the proposed approach and presented in the previous section 3. 4.1 Documents and Contents Gathering and Analysis Financial operators search for answers that can be obtained or inferred by reading and studying, even simultaneously, various Figure 2: Labeling Functions GUI. information sources, such as financial documents (e.g., annual re- ports, 10-k forms, sustainability reports, notes to balance sheets), In the left part of the interface, the editor for defining label- as well as web sources (e.g., news, blogs, social media). Examples ing functions is displayed. These functions can exploit different of required answers (i.e., data points) are: the perception of a corporate brand on social media (customer brand perception), 9 https://www.sec.gov/edgar/searchedgar/companysearch.html syntactic, spatial, and ontological information. In addition, they revenues, EBIT, EBITDA) of more than 3000 companies. Informa- can use: (i) built-in that calls machine learning procedures or tion extracted are saved in knowledge graphs and can be provided complex algorithms used as black-box, (ii) functions and con- in different formats selected by the customer (e.g., csv, excel, or cepts defined in other imported labeling files. The editor provides json). some facilities to simplify the writing of labeling functions ex- ploiting relationships between label-value, titles-paragraphs or 4.4 Data Exploration and Visualization images-caption, table structures, and grammatical relationships Banks are interested in creating reports, dashboards, and presen- like subject-verb-object (fact). At the upper right part of the in- tations to visualize customer profiles. In the following, we show terface, taxonomies of desired concepts to label are visualized. some examples of dashboards and PowerPoint slides obtained by The GUI shows also the chosen PDF files used to visually evalu- analyzing extracted data points related to a target company and ate the results of the executed labeling functions. Results details considering peer companies used for benchmarking. (attributes of the labeled concepts) can be visualized in the lower Figure 5 show a comparison of main financial data of the right-hand corner of the interface. In figure 2 the labeling func- selected target client (e.g., revenue growth, EBITDA margin and tions annotate revenues in a financial statement. growth, and net debt-to-EBITDA ratio) with the mean values In addition, as shown in figure 3, the GUI enables also to of benchmarking companies (peers in the same industry of the visually annotate texts, for instance, to assign labels, or to select target company). Target companies and peers can be dynamically answers of questions in the documents. selected to see real-time updates of charts. Figure 3: A visual annotation of a concept related to the financial domain. Created training sets are exploited within machine / deep Figure 5: A comparison of main financial data between the learning algorithms, as described in section 3. target company and peers. 4.3 Data Harmonization and Storage Figure 6 shows an exposure of the customer to financial risks To scale-up KPIs extraction, a workflow can be designed, de- in the form of a presentation slide exported by the system. In ployed in the cloud, and execute in parallel and scheduled way. detail, the figure shows the exposure to interest rate changes on In figure 4, the designed workflow enable to search and extract loan risk, to forex rates changes risk, and to commodity price text portions from PDF documents related to the target questions. variation risk also making comparisons with peers (benchmark percentages). Figure 7 shows a deep dive on the foreign activities (forex risk) of the selected company (e.g.: revenue and credits/debit by country, non-euro revenues percentage of the total) and the forex derivatives it already has in its portfolio (i.e. derivatives usage table slitted for type of instruments). 5 CONCLUSION In this paper, we presented a cognitive automation approach and the related system, along with a financial application, that en- ables CIBs to automatically: (i) extract data points from textual data sources, and (ii) visualize dashboards and presentations con- taining customers’ data and comparisons between customers and their peers. The greater wealth and depth of information Figure 4: Workflow to scale-up KPIs extraction. on risks and opportunities improve the ability to manage lend- ing processes, provide real-time early warning, and help sales In the shown example, we are interested in extracting data activities. In particular, the implemented solution enables: (i) Au- points from balance sheets related to financial information (e.g., tomatic, faster, and predictive credit/risk scoring (customer qual- customer, industry, year, financial costs, commodities price, total ification) creation. (ii) Digitalization of lending processes (loans [9] Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. 2017. Triv- iaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In ACL. [10] Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. 2016. Ask me anything: Dynamic memory networks for natural language processing. In International Conference on Machine Learning. 1378–1387. [11] Zhen-Zhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ArXiv abs/1909.11942 (2019). [12] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692 (2019). [13] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient esti- mation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013). [14] Sewon Min, Victor Zhong, Richard Socher, and Caiming Xiong. 2018. Efficient and Robust Question Answering from Minimal Context over Documents. arXiv preprint arXiv:1805.08092 (2018). [15] Ermelinda Oro and Massimo Ruffolo. 2017. A Method forWeb Content Ex- traction and Analysis in the Tourism Domain. In International Conference on Enterprise Information Systems, Vol. 2. SCITEPRESS, 365–370. [16] Ermelinda Oro and Massimo Ruffolo. 2017. Object extraction from Figure 6: Exposure of a company to financial risks. presentation-oriented documents using a semantic and spatial approach. US Patent 9,582,494. [17] Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22 (2010), 1345–1359. [18] Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation.. In EMNLP, Vol. 14. 1532–1543. [19] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018). [20] Junfei Qiu, Qihui Wu, Guoru Ding, Yuhua Xu, and Shuo Feng. 2016. A survey of machine learning for big data processing. EURASIP Journal on Advances in Signal Processing 2016, 1 (2016), 67. [21] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/language- unsupervised/language_ understanding_paper. pdf (2018). [22] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100, 000+ Questions for Machine Comprehension of Text. In EMNLP. [23] Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment 11, 3 (2017), 269–282. [24] Alexander J Ratner, Christopher M De Sa, Sen Wu, Daniel Selsam, and Christo- pher Ré. 2016. Data programming: Creating large training sets, quickly. In Advances in neural information processing systems. 3567–3575. [25] Burr Settles. 2009. Active Learning Literature Survey. Technical Report. Uni- Figure 7: Deep dive on forex risk of a selected company versity of Wisconsin-Madison Department of Computer Sciences. [26] Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. 2016. NewsQA: A Machine Compre- hension Dataset. In Rep4NLP@ACL. underwriting). (iii) Smarter and more effective early warnings. [27] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, (iv) Reduction of losses due to unforeseen defaults. In this way, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. different areas of banks can benefit from developing customi- [28] Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, zed marketing and sales strategies, as well as building efficient Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In EMNLP. and effective lending processes, based on a deep knowledge of [29] Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mo- corporate customers and the market in which they operate. hammad Norouzi, and Quoc V Le. 2018. QANet: Combining Local Convolu- tion with Global Self-Attention for Reading Comprehension. arXiv preprint arXiv:1804.09541 (2018). REFERENCES [30] Sheng Zhang, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Kevin Duh, and [1] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Benjamin Van Durme. 2018. ReCoRD: Bridging the Gap between Human Enriching word vectors with subword information. Transactions of the Associ- and Machine Commonsense Reading Comprehension. ArXiv abs/1810.12885 ation for Computational Linguistics 5 (2017), 135–146. (2018). [2] Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading [31] Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. wikipedia to answer open-domain questions. ICLR (2017). 2019. ERNIE: Enhanced Language Representation with Informative Entities. [3] Hsinchun Chen, Roger H. L. Chiang, and Veda C. Storey. 2012. Business In ACL. Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly 36 (2012), 1165–1188. [4] Christopher Clark and Matt Gardner. 2017. Simple and Effective Multi- Paragraph Reading Comprehension. In ACL. [5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Under- standing. In NAACL-HLT. [6] Matthew Dunn, Levent Sagun, Mike Higgins, V Ugur Guney, Volkan Cirik, and Kyunghyun Cho. 2017. SearchQA: A new Q&A dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179 (2017). [7] Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine- tuning for Text Classification. In ACL. [8] Lifu Huang, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. 2019. Cos- mos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning. In EMNLP/IJCNLP.