Argumentation-driven information extraction for online crime reports Marijn Schraagen Daphne Odekerken Information and Computing Sciences Information and Computing Sciences Utrecht University Utrecht University M.P.Schraagen@uu.nl D.Odekerken@uu.nl Bas Testerink Floris Bex Dutch National Police Information and Computing Sciences Bas.Testerink@politie.nl Utrecht University Institute for Law, Technology and Society Tilburg University F.J.Bex@uu.nl Abstract 1 Introduction A new system is currently being developed to The ideas presented in this paper are part of a a assist the Dutch National Police in the assess- collaborative initiative of the Dutch National Police ment of crime reports submitted by civilians. and Utrecht University for developing a framework for This system uses Natural Language Process- (semi-)autonomous business processes in the police or- ing techniques to extract observations from ganization using technologies from text and data ana- text. These observations are used in a formal lytics together with computational argumentation and reasoning system to construct arguments sup- dialog. One project under the umbrella of this ini- porting conclusions based on the extracted ob- tiative concerns technologies to improve the intake of servations, and possibly ask the complainant criminal reports submitted by civilians on the topic who files the report extra questions during the of online trade fraud, which concerns cases such as intake process. The aim is to develop a dy- fake webshops and malicious second-hand traders on namic question-asking system which automat- trading platforms (e.g., eBay). Around 40.000 reports ically learns effective and user-friendly strate- are filed each year, and the legal background for trade gies. The proposed approach is planned to be fraud is a single article of the Dutch Criminal Code integrated in the daily workflow at the Dutch (art. 326) and a relatively small set of cases that are National Police, in order to provide increased used as legal precedents. This high volume and relative efficiency and transparency for processing of simplicity of such cases makes them ideal for further crime reports. automated processing. Keywords: Argumentation, Information Extraction, For the case of online trade fraud, the Dutch po- Relation Extraction lice currently collects online-submitted crime reports using a web interface which requires citizens to fill out Copyright © CIKM 2018 for the individual papers by the papers' several predefined fields (such as the name of the coun- authors. Copyright © CIKM 2018 for the volume as a collection terparty, bank account number, etc.) as well as a free by its editors. This volume and its papers are published under text description of the situation. Using this informa- the Creative Commons License Attribution 4.0 International (CC tion the police decides to either (a) discard the report BY 4.0). because it does not concern trade fraud, (b) accept the report and include it in the police database for further processing, or (c) ask follow-up questions (by not delivered e-mail) to the complainant in case more information waited is needed. In the current situation, human analysts have to read through all incoming reports and decide deception on either (a), (b) or (c). To improve the efficiency of R1 this assessment, we aim to develop a system that auto- matically determines the appropriate course of action given a report. not sent R2 R3 One way of handling (possible) trade fraud re- paid ports is to train an algorithm to automatically deter- mine which action to take given a complete incom- R4 false location false website ing report. This was explored in previous research [KSBB17, va18], where classifiers were trained to clas- sify reports as being of class (a - discard report) or of class (b - accept report), based on the elements of the fraud report (address of suspect, trade site that was used, shallow linguistic features). Given that the data is Figure 1: Example argumentation graph. highly skewed – only 16% of the incoming reports is normally discarded by human analysts – the results are texts, so their occurrence can be explained by exactly promising, with an F1 -score of 67.5% for class discard, those sentences, and more complex conclusions based 95.2% for class accept and a macro-average F1 -score of on multiple factors in a case can be checked by means 80.8%. of the argumentation. One important issue with the above solution is that In the rest of this paper, we discuss the concepts of for a machine learning classifier it cannot be explained our intake system. The design and implementation of satisfactorily why a complaint was discarded or ac- the system is part of ongoing research, therefore the cepted. For example, one important feature that is discussion in the current paper is primarily intended used as input for the final classifier F C algorithm is to be conceptual (leaving a full evaluation for future the output of another classifier W C trained on the work). The current discussion is structured as follows: (lemmatized) words of the free text field. The expla- Section 2 discusses the argumentation theory and in- nation of F C’s decision to accept a report is then, for ference mechanisms that constitute the basis of the instance, that the classifier W C gives a probability of automated reasoning about fraud. The process of col- 0.8 to accept, based on the occurrence of certain words lecting complaint information can be modeled in dif- (such as “never” and “tickets”) in the report text. In ferent ways, which is described in Section 3. One of a legal or law enforcement application, however, we the proposed approaches involves a dialog with the need transparent explanations that make sense from complainant, which requires a question asking policy, a legal and common-sense perspective, not explana- as described in Section 4. As a prerequisite for ar- tions that are based on certain patterns in the data. gumentative inference, the basic observations need to For example, we want to know that the complainant be extracted from the input given by the complainant. who filed the report bought tickets from the (suspect) For textual input, natural language processing (NLP) counterparty, but these tickets were never delivered. techniques are required for this task. The observations In order to automatically assess trade fraud reports as used in the graph generally denote a relation be- submitted online, we turn to a combination of sym- tween entities, e.g., a send -relation involving the com- bolic, argument-based reasoning about a case (simi- plainant, the counterparty and a package as relation lar to [PS96]) and non-symbolic information extrac- elements. The classification of entities and relations is tion techniques that use machine learning. These ex- described in Section 5. Section 7 concludes the paper traction techniques are intended to find basic observa- and discusses next steps. tions such as “this report concerns a ticket for a music concert”, “money was paid by the complainant to the 2 Argumentation Theory counterparty” and “nothing was delivered to the com- plainant”, and use these observations as premises in The Dutch Criminal Code defines fraud as “mislead- legal arguments to infer that, for example, the report ing through false contact details, deceptive tricks or an concerns a possible case of fraud and should therefore accumulation of lies”. These elements can be traced be considered for further processing. Thus, the non- back to observations or observable facts collected from symbolic algorithms are fine-grained: the basic obser- the victim and relevant third parties. Based on the vations are closer to sentences in the original report legal definitions in the Dutch Criminal Code, the rel- evant case law and knowledge of working procedures which is not in the grounded extension, then there is of the police analysts who currently assess the fraud an argument C in the grounded extension that attacks reports, we have constructed an argumentation theory B and thus defends A. Other options than grounded about online trade fraud. To construct the argumen- semantics exist, but grounded semantics fit nicely with tation theory, the right balance needs to be found in the conservative nature of legal processes and can be the level of detail for observations. On the one hand computed in polynomial time given the arguments. we want an observation to be directly observable from Consider for example the situation that a package the input document, for instance ‘no mention of pay- has been sent, however the recipient was not at home ments occurs in this document’. On the other hand, and the delivery service issued a note that the pack- observations that are too detailed lead to a large argu- age has been returned to the sender. For the purposes mentation theory, which is more difficult to construct, of the example, assume that the counterparty in fact maintain and use in argument inference. We try to has bad intentions (e.g., sending a defective product) find a balance by interacting with the police-side users and has used a false address. In this case the proposi- of the system such as the people that handle incoming tions false location, not delivered, waited and paid are complaints. If they think a statement is obvious from true, but not sent is observed to be false (given the a document then we do not require an argumentation note from the delivery service). Based on these ob- structure for those statements. Such statements are servations and the argumentation graph presented in candidates for becoming observables. For other state- Figure 1, using a forward chaining algorithm the con- ments such as ‘this document concerns fraud’ it is not clusions deception, not sent and fraud can be inferred. immediately obvious and we require some argumen- However, the conclusion not sent conflicts with the ob- tation as to why a crime is committed in that case. servation sent. Therefore not sent and the dependent Currently, we work with an argumentation theory of conclusion fraud are not in the grounded extension, 46 rules and 26 observable facts [Ber18]. which consists of the observation set and the conclu- The argumentation rules and observables can be sion deception. modeled in an argumentation graph, where (sets of) When sufficient information is available the argu- observations provide support or counter-evidence for ment inference will result in a stable state 1 . We say other propositions. A simplified example argumenta- that a certain conclusion is stable if either A) an argu- tion graph is presented in Figure 1. Inference rules ment for it is included in the grounded extension and are conjunctive, e.g., ‘if the package is not delivered more information does not change this, or B) there and the complainant waited a for reasonable period of is an argument for the conclusion but this argument time then the package is not sent’ (R1 ). The obser- or any other argument for the conclusion can never vation nodes are indicated with a gray background in be in the grounded extension, or C) no argument can Figure 1. be made and neither will this be possible with more Once the graph is constructed, the observed nodes information. For instance, consider a case where the are used as input to infer conclusions. As per counterparty in the case has refunded the payment to ASPIC+ [Pra10] definitions, we use inference trees as the complainant. In that case, there is no legal basis the data structure with which to represent arguments. anymore to convict the counterparty of fraud. So if An inference tree consists of a set of premises and a this proposition is observed for a case, then the sys- conclusion connected by rules, with possible intermedi- tem can establish that there will never be an argument ate conclusions (which are in turn premises for further for fraud in the grounded extension. The information conclusions) in between. For example, in Figure 1, necessary to result in a stable state needs to be pro- the inference tree for the conclusion not sent contains vided by the complainant (possibly combined with in- the premises and the conclusion of rule R1 , whereas formation from third parties, such as banks or trade the inference tree for the conclusion fraud contains all websites). The interaction with the complainant can nodes in the graph. Arguments may attack each other be modeled in different ways, which will be discussed because of inconsistent conclusions (rebutting attack) in the next section. or because a conclusion contradicts a premise of an- other argument (premise attack). Given a set of ar- 3 User interaction guments and the attack relation, we determine the set of acceptable arguments by calculating the grounded As mentioned earlier, the Dutch police currently col- extension from Dung’s abstract argumentation frame- lects a report (including free text but also predefined work [Dun95]. The grounded extension contains all fields for addresses, trade sites, etc.) using a web in- arguments that are conflict-free and that defend them- 1 Stability is not fully calculated (due to computational com- selves against any attackers, that is, if argument A plexity). Instead, we deploy a heuristic that runs polynomial in in the grounded extension is attacked by argument B the number of argument graph edges. terface. The argumentation system as described in use Q-learning [Wat89] to train a policy. Note that this Section 2 can be based on this document by providing requires the assumption that the answer to a question a conclusion (i.e., fraud or not fraud ) if a stable state is independent of how the current set of propositions is is reached, and suggesting to ask follow-up questions obtained. For our Q-learning approach we require a re- otherwise. Here, the full report document is used as ward function. In order to promote efficient dialogs, we input to instantiate propositions in the argumentation give a small penalty for each action. To promote sta- graph. bility, we give a high reward for reaching stable states. Alternatively, the user interaction model can be Finally, alongside a reward function we need a user changed into a dialog paradigm. In this case the com- model that realistically provides responses to questions plainant does not file a report document, but instead (the probabilities of transitions in the Markov Decision the system guides the complainant through the report- Process). To this end we currently work with hand- ing process by asking a number of questions. After written models. When the system is deployed it will each question the argumentation graph is updated us- gather user data and then a data-driven model will ing the reply of the complainant, and the dialog is fin- replace the initial model. ished when the argumentation reaches a stable state. As an example, using the argumentation graph in The questions can be selected dynamically, such that Figure 1, consider the state in which false location is the argumentation advances towards a stable state known to be true and all other propositions are un- with each question. This approach is similar to the known. Suppose the Q-learning algorithm selects ‘ask current practice for reporting a crime at a police sta- for false website’ as the next action to evaluate. This tion, where a police officer asks a number of questions question can support deception as a conclusion, how- in order to fill out a crime report. ever this conclusion was already supported by false lo- Note that, for practical purposes, the two ap- cation. The new state after asking this question there- proaches can be considered as opposite ends of the fore has the same reward value as the previous state, same methodology, i.e., providing a complete docu- while the penalty is increased by performing the ques- ment to the argumentation graph is essentially a dia- tion action. This will lead the Q-learning algorithm to log with a single user response. Similarly, a question reject this state-action pair as part of the policy, and within a multi-step dialog can result in a complex user to consider alternative actions instead. response which can be considered a short document. Regardless of the length of the dialog, the answers 5 Extracting entities and relations need to be parsed and processed in order to extract relevant information. This could be avoided by using As described in Section 3, user input (either from re- closed-form questions with a list of predefined answers port documents or from dialog responses) needs to be (e.g., ‘Did you receive a package?’, ‘How long did you mapped to propositions in the argumentation graph. wait?’), however such a dialog may prove to be insuf- These propositions generally consist of a relation be- ficient for users to explain the details of the situation. tween relevant entities (people, objects, locations, etc.) described in the complaint. Various techniques can be used to extract entities and relations between entities 4 Question policy learning from text, ranging from dictionary lists and syntactic When using a dialog between the system and the com- patterns to complex parsing algorithms and machine plainant, we want the system to get to a stable state learning models. For Dutch legal data the Dutch de- as efficiently as possible. Determining whether a state pendency parser Frog [BCD07] can be used for named is stable consists of hypothesizing over all possible fu- entity recognition, for which the performance on le- ture questions. As this is generally infeasible to do, we gal data is evaluated in previous work ([SBB17]). For turn to machine learning methods to train a question- relation extraction the development and evaluation of ing strategy to approximate an ideal solution. automatic methods is an ongoing effort in the current The policy that is to be learned maps observed research project, as described in the remainder of this propositions to questions that can be asked or to a section. terminating action (accept/reject). The action results In order to use these techniques effectively in a law in some response from the user, which consists of new enforcement application, the expected result from text observations and possibly inferred conclusions (both processing should be considered carefully. Given the propositions) that are added to the already known domain, for example, knowing whether the victim has propositions. As a result, we may view the state of paid the counterparty is essential. However, other in- the system as a set of propositions and the actions formation containing entities (e.g., details of contacts as non-deterministic transitions between states. If we with other victims) are not relevant for legal reason- model this as a Markov Decision Process, then we can ing. The relevance of certain types of information de- concept property We have transferred €100,- to this man on account residence person name, location, large distance number 1234. role: complainant, counterparty, related, unrelated Relation: send send sender, recipient, object Sender: we receive indicator of relation Role: complainant validity: true, false, unclear Recipient: this man type: product, payment, contact, other Role: counterparty roles sender, recipient: complainant, Object: €100,- Indicator: transferred counterparty, other Status: sent (or: ‘not sent’, ‘unclear’) object: fake, broken, other Table 1: Examples of properties of interest for anno- Figure 2: Example annotated sentence. tation. name, location, role of the person and large distance termines how data should be collected and processed property. In future research the processing of coref- in developing entity and relation extraction methods. erential expressions (e.g., the token he to refer to an This includes a mapping from nodes in the argumenta- earlier mention of John Smith in a document or dialog) tion graph to entities and relations in the text. How- will be addressed. ever, other propositions may not be represented di- Payment and delivery information are captured by rectly in the text, such as the use of a false website. In send and receive relations. The reason for this is that such cases, partial information may be present in the the complainant usually only knows one side of the text (e.g., the counterparty operated a website), while story: if the complainant intended to buy a product, the proposition itself can only be validated after con- he or she typically claims having sent money to the sidering information from a third party (e.g. checking counterparty without having received the product. We with the ISP to prove that the website is fake). How- do not know for sure if the counterparty received the ever, in both cases the legal definition of the crime (as money and/or sent the product, as there is a possibility expressed in the argumentation graph) is essential for of a delivery or payment failure by a third party. The the development of text processing methods. send and receive relations are ternary, having a sender, receiver and object, although some of the entities may 6 Data annotation for relation extrac- be omitted in the text: for instance, in the sentence ‘I did not receive anything’ the sender is missing. For tion the sender and receiver, we annotate the corresponding As we stated in Section 5, some of the propositions character indices and the role (complainant, counter- in the argumentation graph are based on relations be- party or other). In some complaints, the complainant tween entities in the crime report documents. We plan reports that he or she received a broken product. This to use supervised machine learning techniques (see for suggests a civil case instead of a fraud case. There- example [XML+ 15]) to automatically extract these re- fore, we annotate not only the character indices but lations, which has shown to provide high accuracy also the state of the product. Furthermore, we anno- for the current dataset in preliminary experiments. tate the word(s) indicating a send or receive relation Therefore, crime report documents need to be anno- and the validity of the relation. An example anno- tated with the concepts identified in the domain anal- tated sentence (translated for illustration purposes) is ysis process. Concepts of interest include residence, provided in Figure 2. payment and delivery information. Each concept has We plan to use the annotations in a classifier that, a number of associated properties for which annota- given a set of tokens, decides if they are entities in tion could prove useful. These properties are listed in one of the aforementioned relations. The output of Table 1. this classifier can then be mapped to propositions for Residence relations are interesting as they may indi- the argumentation graph. Such a classifier is intended cate the deceptive trick in which the fraudster gives a to operate on free text input, using simple features false address. In that case, we often find in the report such as the presence of selected keywords as well as that the actual occupant of the address was an un- more complex features such as lemmas or grammat- related person who did not know anything about the ical dependency paths. For real-world free text the advertisement. Furthermore, the address is often in a computation of these features may be unreliable (e.g., remote relation, facilitating the fraudster to (falsely) as a result of misspellings in the source text) or, even promise to send items per mail. To be able to detect with reliable features, a real-world example may not these situations in the future, we annotate the person conform to the regularities found in the training set. However, using a suitable classifier and an appropri- tion theory. Bsc. thesis, Utrecht University, ate training set size, the model is expected to gener- 2018. alize over irregularities to a certain extent. Moreover, as mentioned in Section 3, using a dialog component [BCD07] Antal van den Bosch, Bertjan Busser, Sander within the system will provide some context to inter- Canisius, and Walter Daelemans. An effi- pret the results of the relation classifier. cient memory-based morphosyntactic tagger and parser for Dutch. In Selected Papers of the 17th Computational Linguistics in the 7 Conclusion Netherlands Meeting, pages 99–114. Nether- In this paper we have described an approach to auto- lands Graduate School of Linguistics, 2007. matically handling the intake of criminal reports filed online by citizens. The proposed approach combines [Dun95] Phan Minh Dung. On the acceptability different types of techniques (i.e., natural language of arguments and its fundamental role in processing, argumentation and Q-learning) to obtain nonmonotonic reasoning, logic programming a system that A) handles natural language, B) pro- and n-person games. Artificial Intelligence, duces arguments for complex conclusions and hence 77:321–357, 1995. provides understandable and legally sensible explana- [va18] Ilse van ’t Hul. Improving online trade fraud tions for decisions regarding complaint reports, and complaint classification by applying machine C) is capable of gathering information from its envi- learning techniques. Bsc. thesis, Utrecht ronment efficiently by only asking the most relevant University, 2018. questions to the user and terminating the process if no more relevant information is to be found. [KSBB17] William Kos, Marijn Schraagen, Matthieu The algorithms ans implementations presented in Brinkhuis, and Floris Bex. Classification in this paper are currently under development and a a skewed online trade fraud complaint cor- number of prototypes are working or nearing comple- pus. In Preproceedings of the 29th Benelux tion. Furthermore, parts of the system, such as au- Conference on Artificial Intelligence, pages tomatically drawing conclusions using the argumen- 172–183, 2017. tation graph, the named entity recognition and basic relation extraction, have been implemented in the ex- [Pra10] Henry Prakken. An abstract framework isting development systems at the Dutch National Po- for argumentation with structured argu- lice. ments. Argument & Computation, 1(2):93– The techniques developed are generalizable beyond 124, 2010. the domain of online trade fraud. Extending the sys- [PS96] Henry Prakken and Giovanni Sartor. A di- tem to other domains will involve a substantial (knowl- alectical model of assessing conflicting argu- edge) engineering effort: argumentation theories will ments in legal reasoning. Artificial Intelli- have to be built for different domains, and algorithms gence and Law, 4:331–368, 1996. for extracting new types of observations will have to be trained. While our solution thus suffers from the [SBB17] Marijn Schraagen, Matthieu Brinkhuis, and classical “knowledge engineering bottleneck” that has Floris Bex. Evaluation of named entity hampered knowledge-based systems for decades, we recognition in Dutch online criminal com- believe the focus on smaller, relatively simple assess- plaints. Computational Linguistics in The ments makes true autonomous systems more feasible. Netherlands Journal, 7:3–16, 2017. Furthermore, building and maintaining a small argu- mentation theory may be more suitable for general IT [Wat89] Chris Watkins. Learning from Delayed Re- personnel at the Dutch Police than training machine wards. PhD thesis, King’s College London, learning algorithms on a new dataset. Finally, the al- 1989. gorithms for entity and relation extraction are aimed [XML+ 15] Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, to be as general as possible, with good performance Hao Peng, and Zhi Jin. Classifying relations in different domains. Thus, other tasks and processes via long short term memory networks along within the police organization can be gradually incor- shortest dependency paths. In Proceedings of porated into the framework. EMNLP 2015, pages 1785–1794. Association for Computational Linguistics, 2015. References [Ber18] Jeroen Bergers. Improving online trade fraud complaint handling using argumenta-