Manual and Automatic Annotation of Meeting Reports with Young Offenders for Quality Assessment of Interventions Pierre André Ménard1 , Sylvie Ratté2 , Geneviève Parent3 , Franck Barbedor2 Computer Research Institute of Montréal1 , École de Technologie Supérieure2 , Université du Québec en Outaouais3 pierre-andre.menard@crim.ca, sylvie.ratte@etsmtl.ca, genevieve.parent@uqo.ca, franck.barbedor.1@ens.etsmtl.ca Résumé We present an annotation project in criminology using meeting reports between clinicians and criminalized young offenders. The domain- specific goal is to assess the quality of the interventions versus the profile of criminal needs established for each offender. The project requires both the manual annotation of a significant number of reports by experts as well as the development of an automatic annotation process to classify the unannotated reports. Both annotation experiments help identify the needs and challenges of providing helpful linguistically relevant annotations to this type of task. Performances of a first classification effort is reported as well as the related manual process. 1. Introduction 2. Context Organizations often collect masses of textual information In Canada, since the Youth Criminal Justice Act came into from their daily activities, storing and using them to help force in 2003, each time a teenager is convicted in court and provide a better monitoring to the managers. But using this receives a sentence, an organization responsible for youth information to assess the quality of the activities is no small protection takes action to protect the public and promote task, often requiring in-depth analysis, annotations and, de- the rehabilitation and reintegration of the youth . To do so, pending on the quantity of data, machine learning methods many countries, including Canada, rely on the Risk-Need- in order to extract useful knowledge and give a clear view Receptivity intervention model (well known as the RNR of the process and its output. model) (Andrews and Bonta, 2010). The RNR model is This is one such project, involving textual records of mee- one of the most effective, that is, the one most likely to tings between clinicians and teenagers convicted of various reduce recidivism among juvenile offenders (Dowden and offences, who received a penal sentence and are doing man- Andrews, 1999; Koehler et al., 2013; Lipsey, 2009). First, datory follow-up meetings in relation to their convictions. the clinician, responsible for the follow-up with the young These records contain much information about young of- offender, reduces recurring risks of future offences by esta- fenders as well as the type and focus of interventions done blishing a risk profile for each teenager. This profile is built by the clinicians. from interviews and questionnaires filled out with the of- From a criminology perspective, the goal of this research fender, like the Youth Level of Service / Case Management project is to validate if the interventions are relevant to the Inventory (YLS/CMI) (Hoge and Andrews, 2010), to target criminal profile of the young offenders (YLS/CMI from his criminogenic needs. (Hoge and Andrews, 2010)). This is important for the qua- Criminogenic needs are dynamic risk factors, like antiso- lity of these activities as research shows (Baglivio et al., cial attitudes, associated to reoffending that can be reduced 2018; Bonta et al., 2008) that interventions targeting the re- with clinical interventions. A clinical intervention (hereaf- levant risk factors diminish the risk of reoffending of young ter called intervention) is defined as any discussion, com- offenders. On the other hand, doing interventions on irrele- munication or interaction between a clinician and a young vant aspects is counterproductive and time-consuming. offender that aims to reduce a risk factor. The offender’s cri- From a computational linguistic perspective, the goal is to minogenic needs profile is usually done once after the sen- enable the automatic classification of thousands of reports tence is received and every six months thereafter for cases using manual annotations from experts. Correctly classi- that exceed this duration. Not all offenders have the same fying these reports will provide a better insight of the moni- profile. It is therefore important to adapt the interventions toring process of young offenders. Manual annotations also according to the needs of each person. provide insights about which linguistic or semantic know- For each sentence received by a young offender, a specific ledge is relevant to identify the different types of interven- period is imposed by the judge during which he or she will tions. meet with a clinician in order to help reduce the risk of The following section will present a more in-depth context reoffending. For each of these meetings, the clinician must of the project, the analyzed data and the type of information detail in a report, where it happened, the interactions, the sought by experts in the criminology field. Section 3. will topics discussed, the exchange of information, news related present an overview of the manual annotation process. The to the sentence, etc. By reading a report, an expert should fourth section provides insights about the requirements and be able to judge if the described interventions are aligned challenges for this task, followed by classification experi- with the profile of the offender or if it was on another topic. ments using the annotated data. The conclusion presents the These reports and other activity entries (request for medi- current state of the project. cal record access, adding documents, etc.) related to a case 24 are stored in a secured database for five to seven years fol- 2.2. Criminology research objectives lowing the end of each sentence, at which time they are As mentioned earlier, studies in criminology shows that in- destroyed. This amounts to more than 150,000 reports for terventions aligned with the criminogenic needs of a young the entire follow-up period for our sample of 750 young of- offender reduce the chance of recidivism. Acting on this fenders. Of these entries, about 30% are reports relevant to knowledge, the main goal of this project from a crimino- our project in which an intervention could take place. logy standpoint is to validate if the interventions described in the reports are aligned with the profile of each individual. 2.1. Intervention types For example, if a young offender is sentenced for theft, Each intervention falls into one of the following categories : but his criminogenic needs profile reveals that he stole for — Administrative his consumption habits, interventions should mainly target — Antecedents this last topic. Focusing the interventions on the first topic — Attitude would thus be misaligned and much less effective, if at all. — Consumption In order to attain this goal, reports of each young offender — Family/Couple must be annotated, either manually or automatically, to give — Hobbies an estimate of the number of interventions for each cate- — Peers gory. This will enable researchers to compare the estimated — Personality number of typed interventions with the main criminogenic — Occupational school/work risk of each young offender to validate if they match. The Administrative category contains any discussion about A secondary goal is to assess the intrinsic quality of the re- the conditions of the penal case (contact restrictions, apo- ports, validating if the interventions are clearly mentioned logy letters, etc), the mandatory community work, or the and explained. While this is not part of the current project intervention plan. While these are not interventions aimed involving natural language processing techniques, a quali- at lowering reoffence risk, it often takes up a large part of tative evaluation will be done following the manual anno- meetings, often in concurrence to useful interventions of tation effort to make recommendations to improve future other categories. reports. Providing a more detailed account of interventions Antecedents relates to reinforcing believes or alternative be- might help the automatic annotation of future reports and haviors which lower the chance of criminal reoffence when thus enhance the performance of natural language proces- in the presence of a high risk situation. sing tasks like classification or information extraction. Attitude regroups interventions that seek change in motiva- tion, prosocial institution valorisation and value restructu- ration to recognize antisocial attitudes or criminal lifestyle 2.3. Computational linguistic aspects and promote alternative prosocial identities and attitudes. In order to correctly assess the alignment between interven- Consumption interventions favor the reduction of alcohol tions and risk profile, each intervention reported by clini- and drugs consumption or abuse, reduce the personal and cians should be fully identified for each meeting regarding interpersonal behavior that leads to consumption and deve- a case. We obtained a subset of more than 56,000 reports lop new substitutes to these habits. related to various cases, which is too much data for ma- The Family/couple category contains interventions about nual annotation alone. Also, as new meetings occurs every developping or maintaining positive family relationships, week, this would be an ongoing manual task that would re- respecting house rules and supervision as well as valuing quire much effort. couple relationship with a prosocial partner with a long All the reports are written in Canadian French as the orga- term or positive outcome. nization targets mostly French speaking young offenders. Any interventions that foster the participation or engage- While linguistic analysis tools exist for French, none are ment in an organized prosocial activity like sports, gym, trained on the Canadian French variation and, more impor- extracurricular or religious activities, fall into the Hobbies tantly, register which typically includes more anglicisms as category. well as different idiomatic expressions and semantic senses. Interventions about Peers target the reduction of interac- In addition, most reports contains various amount of abbre- tions with criminals and the valorisation of relationships viations, truncated words, missing words, missing letters, with prosocial persons. typos, domain-specific lingo, agglomerated words (missing Trying to help the young offender to cope with Personality space), implicit acronyms, colloquial terms, missing punc- issues can also reduce the risk of reoffence. This includes tuation, anglicisms and spoken sentence formulation. As discussions about anger management, improving problem such, the reports can be viewed as noisy texts, which im- resolution skills, discourage manipulation of others or ego- plies that usual natural language processing tools will fail centrism. to analyze them correctly. Finally, Occupational interventions relate to school or work The granularity of annotation should also be tuned to fit and can include helping and accompanying the offender the need for precision, either for the estimate number of through the subscription procedure for school, valuing ac- interventions of different categories or for the exact expres- tive participation and attendance to either school or work, sions used to detail an intervention. As such, two natural denoting positive rewards brought by an occupation and de- language processing tasks can be devised in order to obtai- veloping positive relationships with new colleagues or per- ned the necessary information : multilabel report classifica- son of authority. tion and intervention recognition and typing. 25 Annotation type Training Evaluation Ann. Doc. Ann. Doc. Administrative 2,114 1,783 553 484 Antecedents 9 9 1 1 Attitude 79 76 6 5 Consumption 113 113 31 31 Family and couple 55 53 15 15 Hobbies 367 365 47 47 Occupational 1,817 1,597 584 492 Peers 77 74 17 16 Personality 333 302 49 44 Without annotations — 4,720 — 1,637 Total 4,964 8,189 1,303 2,622 F IGURE 1 – PACTE manual annotation interface. TABLE 1 – Training and evaluation sets distribution for an- notations (Ann.) and documents (Doc.). Multilabel report classification implies the automatic anno- tation of a report with all the categories corresponding to for the evaluation set. A case-based random split was ap- the detailed interventions mentioned in it. While it does not plied, which means that all the reports from one case are give an exact count when multiple annotations of the same entirely found in either the training or evaluation set. category are found in one report, it still gives a good es- Both datasets were split into batches of 500 documents with timate as multiple annotation reports usually contain dif- a larger last batch for the remaining documents. This was ferent categories instead of multiple instances of the same. done in order to better organize the work of annotators in Intervention recognition and typing is akin to named entity the subsequent steps and provide them with a positive sense recognition and typing tasks as relevant candidate expres- of advancement throughout the entire effort. They were sions must first be identified in a text then classified into then upload into PACTE as separate corpora but included one of many type. As the exact expression is not needed, in the same annotation project. we view this as a sentence-level classification, as there is Looking at Table 1, we can readily see that the datasets are seldom more than one intervention in a single sentence. heavily unbalanced. For the training set, 79.2% of all anno- As these reports are highly confidential, largely involving tations either comes from the Administrative or Occupatio- minors, no similar training data was available to help the nal categories. Antecedents makes up less than 0.2% of the classification process, thus prompting a manual annotation entire set. This will likely requires further manual annota- effort. As these data were obtained through an ethic com- tion of this type to enrich the sample size. mittee and a court hearing, the data cannot be released pu- blicly. The examples shown in this article have been redac- 3.2. Annotation schemas ted to remove any possibility of individual identification. Using the schema designer in PACTE, we defined nine dif- ferent schemas corresponding to the nine categories at the 3. Annotation process top of Section 2.1.. For each of them we defined an attribute to specify the type of risk targeted by the intervention. For To simplify the interaction between the experts of each example, the Consumption schema has the Reduction and team from different institutions, we used the online text Solutions values for the type attribute while Family/Couple annotation platform PACTE (Ménard and Barrière, 2017) has Relationship, Supervision and Couple as the enumera- as the central repository for manual annotation, annotation tion for the type attribute. curation and classification results for this project. PACTE One exception is the Occupational category which has a enables an annotation project manager to import large text type attribute with School and Work and a subtype attribute corpora, define custom annotation schemas, add partici- with Help, Participation, Engagement. Satisfaction and Re- pants to a project, define project’s steps and allocate docu- lationship. All the type and subtype attributes were defined ments to be annotated by the project’s participant as shown as mandatory when creating a new annotation in PACTE. in Figure 1 (with unrelated text for confidentiality). Finally, an additional Comment attribute was added in order for the annotators to provide additional information to the 3.1. Dataset curator about uncertain annotation or edge cases. In order to have a significant quantity of reports to train and We defined each schema as annotation targeting text surface evaluate the machine-learning algorithm, 10,811 single re- (as opposed to document or corpus annotation). Text sur- ports from randomly chosen young offenders’ cases were face annotation schema in PACTE enables the annotators selected. For each individual case, all the reports were ta- to create contiguous zones spanning from one letter to the ken, thus giving a full historical account of each case. whole document. It also enables them to create single an- These reports were split into two sets for training and eva- notation with multiple segments. This was quite useful as luation. The training set contains 8,189 randomly selected the reports often contains contextual information between reports while the remaining 2,622 reports were kept aside parenthesis or apposition which are not part of the inter- 26 Avg. time per batch 9 hours Avg. number of annotations 274 Avg. time per document 1 min Avg. annotation in single report 1.41 Avg. size of reports 19 words Max. annotation in single report 5 TABLE 2 – Annotation effort statistics per batch. 4. Challenges Automatic annotation of the reports in this project proved a challenge as there was no off-the-shelf tools suited to the enrichment of these texts. This section presents the analysis F IGURE 2 – Annotation process. of some aspects of the data which represent a challenge for automatic processing. 4.1. Noisy data ventions. Using multipart annotations in this case provided As the processed documents in this study are internal re- a way to target precisely the sentence parts containing the ports, often hastily written at the end of the day, most of relevant information. them contains misspellings, typos, phonological writing, structural inconsistencies and so on. There are also many truncations (i.e. "reso" for "résolu- 3.3. Workflow tion", "ds" for "dans" ["in"]), abbreviations and acronyms Once the reports and annotation schemas were imported used across the reports, the amount used depending mostly and created in the platform, the annotation process could on the author of the report. For acronyms, the implicit short begin as shown in Figure 2. The first step (1) was done by forms (without explicitly linking to the long form) are often two annotators who annotated collaboratively each batch used as the report is intended for readers familiar with the of 500 reports with the web user interface. The curator domain of activity. This can hinder information extraction was then able to review (2) and possibly correct the ma- tasks applied to the dataset if no external reference list is nual annotations for this batch of reports. After retrieving used to explicitly link the two forms. It might also reduce all the manual annotations available via an online web API, the performance of the bag-of-word approach as concepts a machine-learning algorithm was used to train a classifi- with multiple different surface forms will be separated in cation model (3) with the N-1 batches and automatically the td-idf processing. annotate the last batch of reports (4) in a separate storage. 4.2. Report versus reality Using the user interface, the curator could then (5) validate the performance of the classification and analyze potential One key issue for annotation is trying to differentiate bet- issues. ween the young offender simply relating an event or fact and the clinicians making an active intervention on the The order of selected reports was randomized to minimize same subject. Because reporting this difference is not a re- the chance of two contiguous reports on the same case du- quirement asked of clinicians, there is much variation in ring annotation session to minimize the prior knowledge the ways it is expressed in the reports. As an example, one issue where two reports of the same case with similar infor- could only report that "He told me that he quit school" mation would be annotated differently. which does not count as an intervention. On the other hand, if the previous sentence was followed 3.4. Effort metrics by "I asked him what he intends to do next", this would be considered an active intervention and would have to be Table 2 shows some statistics for the manual annotation ef- annotated as such. Then again, if there is discrepancies bet- fort of this project. The project in PACTE had 21 steps, one ween what was said at the meeting and in the report, like for each of the 16 batches for training and 5 for evaluation. missing information about the intervention, neither a hu- This amounts to approximately 189 hours of annotation for man or a machine-learning agent could deduce what hap- each annotator, giving a total of 378 hours for this project. pened. For each batch, half of the reports had annotations for an As there is no way of knowing, without recorded proof, average of 274 annotations. exactly what was discussed and how during meetings, the Still the unannotated reports took time to process as some manual annotation was done in an optimistic mindset. This had information that lead to a discussion about whether it implies that what may look like a young offender simply was an intervention or not. Of course, some reports were telling the clinician about something was annotated as an very short (a few words like "He didn’t show up and we intervention. This will be taken into account when estima- rescheduled for later") while others were many paragraphs ting the number of interventions of each type as the number long. will likely be inflated. 27 4.3. Expressing intervention Expression Examples Add "Nous ajoutons dans son CV" (We add to his Without regard of what actually took place, the texts narrate resume) the history of discussions, attempts, failures and commit- Address "J’aborde la révision..." (I address the revi- ments. Despite their simplicity, each snippet contains tacit sion) knowledge and presents subtle characteristics. The analysis "On aborde chacun des point" (We address presented in the next subsections provides potential goals each point) for automated annotation tools in order to help the detec- Admit "Admet impulsivité." (Admits impulsivity) tion of interventions in reports. Announce "Je lui annonce" (I announce him) Ask "Je lui demande si" ("I ask him if") 4.3.1. Speech acts in interactions Congratulate "Le félicite d’emblée pr..." ("I readily From the speech acts (Searle, 1969) perspective, the narra- congratulate him for") tives contain many constative expressions that represent a Discuss "Discutons du plan..." (Discussing the plan) state of things or the recollection of ascertainment by the "Discussion sur " ("Discussion on") clinician (e.g. expressions Add, Address, Announce, Dis- "nous avons discuté des ..." (We have discus- sed about the) cuss, Explore, Expose, Inform, Return, Read, Repeat, Talk, Do "Nous faisons une première ébauche" (We do as shown in Table 3). They are pervasive in each category a first draft) and reflect the continuing interaction and accompaniment "On fait ensemble son devoir" (We do her ho- of the young. mework together) On the other hand, reinforcement expressions (e.g. Congrat, Explore "Nous tentons d’explorer ses pensées..." (We Reinforce, Underline) and commissive expressions (e.g. try to explore his toughts) admit) are absent or nearly absent from Antecedent and Expose "Je lui expose la situation" (I expose him the Family/Couple categories. These ratios are understandable situation) since both Personality and Consumption are part of a solu- Explain "M’explqieu qu’entre chaque cours" ((He) tion that can be within the control of the young offender, explain that between each course) and thus, merit reinforcement and commitment, while An- Inform "Informons que nous avons" (We inform that tecedent and Family/Couple are more or less likely to be we have) solved directly by the young. Invite "Je l’invite à faire les bons choix " (I invite him to make the right choices) Directive expression (e.g. ask, explain, invite, question, re- Mention "je lui mentionne que" (I mention him that) spond) appear with high ratios in Administrative and Occu- Question "Questionne à savoir où il se trouve" (Ques- pational categories as they consist of the clinician transfer- tion to know where he is) ring administrative information to the young offender or the Respond "Je lui répond que..." (I answer him that) latter informing the clinician about his everyday activities Return "Retour sur la révision" (Return on the revi- such as school and work. sion) In terms of usage, groups of categories are positively corre- Read "Je lui lit les conditions " (I read him the lated with the usage of these last three types of expressions conditions) (reinforcement, commissive and directive) : Consump- Reinforce "Je le renforce en le félicitant " ("I validate tion with Personality and Hobbies, Antecedent with Fa- him by congratulating him") mily/Couple, and Attitude with Peers. Repeat "Nous devons lui faire répéter certians pro- pos" (We must make him repeat some points) 4.3.2. Explicit versus implicit discussion Talk "Nous parlons de ses travaux" (We talk about The first and third person pronouns are often used together, his work) "Lui parlons du ..." (Talk to him about) but third alone may express either passive event or interac- Try "On tente de mêtre en place" (We try to put in tion depending on the accompanying verb. place) Underline "Je lui souligne aussi que" ("I also point out 4.3.3. Implied third person to him him that") Since each snippet of text is intended to be read by people Understand "Il semble comprendre" (He seems to unders- in the field, the young and the clinician are often mentioned tand) implicitly (e.g. "J’aborde [avec lui]"). The same can also be noticed for specialized subject (e.g. "le positif du suivi") TABLE 3 – Samples (verbatim, underlined noise) of expres- in which the intended reader understands without further sions used for interventions. explanation what is the implied meaning. In this specific example, "Le positif du suivi. Un endroit pour extérioriser ses émotions, pour ventiler." (The positive of follow-up. A place to externalize his emotions, to breathe. We can pro- For example, the commissive-commissive-reinforce struc- bably surmise that this was not a spontaneous expression of ture often unfold over three separate sentences. The first personality aspect, but derived from a discussion. two commissive sentences bring contextual knowledge to the last reinforcement expressions. Using this type of struc- 4.4. Structure of intervention ture and other relevant patterns across sentences could help While most interventions are expressed in the same sen- to better identify important intervention and annotate them tence fragment, some of them span multiple sentences. with the complete contextual knowledge. 28 Algorithm Recall Precision F1 Original Translation Complement Naive Bayes 0.8160 0.6301 0.7116 cumulé autre absence cumulate other absence Naive Bayes network 0.5801 0.5663 0.5731 démarches actions retour sur ses absences feedback on his school Random Forest 0.3196 0.6105 0.4195 scolaires absences REPTree 0.4258 0.6184 0.5043 va toujours à l’école still going to school SimpleCART 0.7013 0.4846 0.5731 il est encore suspendu he was suspended again J48 Consolidated 0.5478 0.6250 0.6000 imprime des copies de son print copies of his resume cv TABLE 4 – Average performances on all types for sentence- donne ses preuves d’emploi give his proof of his em- level classification task. ployment status toujours pas d’emploi still no job jeune dit aimer youngster said he likes 5. Classification results TABLE 5 – Some frequent ngrams from Occupational ma- nual annotations. We present in this section some of the classification experi- ments done for the second task of sentence-level classifica- tion. Using the manual annotations from the training set, a prediction model was built and then applied on the evalua- 5.3. Performance and error analysis tion set to assess the performances. As some reports have no annotations at all, a null class was added to the nine re- As the datasets are created based on noisy data, one issue levant classes listed in section 2.. As shown in Table 1, the is the frequency restriction on ngrams used as features. In dataset is unbalanced mostly in favour of the Administrative order to lower the number of features generated, we used and Occupational categories and to a lesser extent Hobbies a cut-off frequency of 2, so that any ngram occurring only and Personality which predict better results on the modal once was not used as a feature. This means that relevant classes and lower scores on the less represented ones. but uniquely miswritten words are eliminated from the da- tasets which impacts the representation power of features, 5.1. Preprocessing especially for categories with few instances. The same sentence can be classified in different catego- Using a tokenizer and sentence splitter, each report was ries depending on the surroundings. This is not captured by broken down as a single instance per sentence in the data- the current model of vectorizing single sentences without set. For simplicity, the few consecutive sentences that were contextual information. Thus a short sentence like "We dis- spanned by the same annotation were kept together and tag- cuss his continuing effort" creates confusion for the predic- ged with the annotation type. The stop words were remo- tion process as they were manually classified in different ved with the exception of personal pronouns as they can be categories in two separate instances. helpful, as explained in the last section. The baseline uses a bag-of-word approach with tf-idf vec- The unbalanced nature of the datasets, both training and tor build using generated ngrams from 1 to 5 words long. evaluation, also influence the results. For example, as A subset composed of the most discriminating 2,000 fea- shown in Table 1, the Attitude has a single instance in the tures was kept for training and evaluation. Table 5 shows a evaluation set and only nine for training the model. sample of ngrams generated for the Occupational category We can see that performance is not yet useful to provide an with stop words left in place for clarity. adequate estimation of intervention numbers and types in reports for this corpus. Taking into account the task of as- 5.2. Performances signing one of ten classes to a sentence (the nine categories plus the null class), it is still far better than an average 0.1 We applied Naive Bayes network (Friedman et al., 1997), performance provided by a random baseline. Complement Naive Bayes (Rennie et al., 2003), Random Forest (Breiman, 2001), SimpleCART (Breiman et al., 1984), J48 consolidated (Pérez et al., 2007) and REPTree 6. Conclusion (Quinlan, 1987) on the current data to compare perfor- mances on classic machine-learning algorithms from dif- We presented a first set of experiments and analysis using ferent types (rule-based, decision tree, function). While reports relating meetings with young offenders. While the these approaches are not cutting edge, they provide a quick analysis provided in Section 4.3. is in a preliminary stage, it view to assess the potential performance of current data. will be further explored to evaluate its potential as a helping The scores shown in Table 4 are averages combining per- linguistic annotation for the automatic detection and classi- formances on all ten categories (the nine basic ones plus fication of interventions for young offenders’ reports. The the null class). We can see that the complement naive next step in this project is to address the issue with noisy bayes outperforms the others, as it was specifically desi- data to single out expressions detailing interventions. If the gned to overcome the challenge of text classification. The number of raw reports allows it, an approach using neural less frequent classes like Antecedents and Family/Couple network will be applied to profit from the manual annota- had 0% score as none were correctly classified in the eva- tion while being able to use the noisy text from the whole luation set. corpus. 29 7. References Andrews, D. A. and Bonta, J. (2010). The psychology of criminal conduct (5e ed.). Lexis Nexis. Baglivio, M. T., Wolff, K. T., Howell, J. C., Jackowski, K., and Greenwald, M. A. (2018). The search for the holy grail : Criminogenic needs matching, intervention do- sage, and subsequent recidivism among serious juvenile offenders in residential placement. Journal of Criminal Justice, 55 :46 – 57. Bonta, J., Rugge, T., Scott, T.-L., Bourgon, G., and K.Yessine, A. (2008). Exploring the black box of com- munity supervision. Journal of Offender Rehabilitation, 47(3) :248–270. Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA. Breiman, L. (2001). Random forests. Machine Learning, 45(1) :5–32, Oct. Dowden, C. and Andrews, D. A. (1999). What works in young offender treatment : A meta-analysis. In Forum on Corrections Research, volume 1, pages 21–24. Friedman, N., Geiger, D., and Goldszmidt, M. (1997). Bayesian network classifiers. Mach. Learn., 29(2- 3) :131–163, November. Hoge, R. D. and Andrews, D. A., (2010). Youth Level of Service/Case Management Inventory 2.0 (YLS/CMI 2.0) : User’s Manual. Koehler, J. A., Lösel, F., Akoensi, T. D., and Humphreys, D. K. (2013). A systematic review and meta-analysis on the effects of young offender treatment programs in europe. Journal of Experimental Criminology, 9(1) :19– 43, Mar. Lipsey, M. W. (2009). The primary factors that characte- rize effective interventions with juvenile offenders : A meta-analytic overview. Victims & Offenders, 4(2) :124– 147. Ménard, P. A. and Barrière, C. (2017). Pacte : a collabora- tive platform for textual annotation. In Proc of the 12th International Conference on Computational Semantics. Pérez, J. M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., and Martín, J. I. (2007). Combining multiple class dis- tribution modified subsamples in a single tree. Pattern Recognition Letters, 28(4) :414–422. Quinlan, J. R. (1987). Simplifying decision trees. Int. J. Man-Mach. Stud., 27(3) :221–234, September. Rennie, J. D. M., Shih, L., Teevan, J., and Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the Twentieth Interna- tional Conference on International Conference on Ma- chine Learning, ICML’03, pages 616–623. AAAI Press. Searle, J. R. (1969). Speech Acts : An Essay in the Philo- sophy of Language. Cambridge University Press, Cam- bridge, London. 30