Manual and Automatic Annotation of Meeting Reports with Young Offenders
                 for Quality Assessment of Interventions

              Pierre André Ménard1 , Sylvie Ratté2 , Geneviève Parent3 , Franck Barbedor2
    Computer Research Institute of Montréal1 , École de Technologie Supérieure2 , Université du Québec en Outaouais3
    pierre-andre.menard@crim.ca, sylvie.ratte@etsmtl.ca, genevieve.parent@uqo.ca, franck.barbedor.1@ens.etsmtl.ca

                                                                   Résumé
We present an annotation project in criminology using meeting reports between clinicians and criminalized young offenders. The domain-
specific goal is to assess the quality of the interventions versus the profile of criminal needs established for each offender. The project
requires both the manual annotation of a significant number of reports by experts as well as the development of an automatic annotation
process to classify the unannotated reports. Both annotation experiments help identify the needs and challenges of providing helpful
linguistically relevant annotations to this type of task. Performances of a first classification effort is reported as well as the related manual
process.


                      1.    Introduction                                                            2.     Context
Organizations often collect masses of textual information                   In Canada, since the Youth Criminal Justice Act came into
from their daily activities, storing and using them to help                 force in 2003, each time a teenager is convicted in court and
provide a better monitoring to the managers. But using this                 receives a sentence, an organization responsible for youth
information to assess the quality of the activities is no small             protection takes action to protect the public and promote
task, often requiring in-depth analysis, annotations and, de-               the rehabilitation and reintegration of the youth . To do so,
pending on the quantity of data, machine learning methods                   many countries, including Canada, rely on the Risk-Need-
in order to extract useful knowledge and give a clear view                  Receptivity intervention model (well known as the RNR
of the process and its output.                                              model) (Andrews and Bonta, 2010). The RNR model is
This is one such project, involving textual records of mee-                 one of the most effective, that is, the one most likely to
tings between clinicians and teenagers convicted of various                 reduce recidivism among juvenile offenders (Dowden and
offences, who received a penal sentence and are doing man-                  Andrews, 1999; Koehler et al., 2013; Lipsey, 2009). First,
datory follow-up meetings in relation to their convictions.                 the clinician, responsible for the follow-up with the young
These records contain much information about young of-                      offender, reduces recurring risks of future offences by esta-
fenders as well as the type and focus of interventions done                 blishing a risk profile for each teenager. This profile is built
by the clinicians.                                                          from interviews and questionnaires filled out with the of-
From a criminology perspective, the goal of this research                   fender, like the Youth Level of Service / Case Management
project is to validate if the interventions are relevant to the             Inventory (YLS/CMI) (Hoge and Andrews, 2010), to target
criminal profile of the young offenders (YLS/CMI from                       his criminogenic needs.
(Hoge and Andrews, 2010)). This is important for the qua-                   Criminogenic needs are dynamic risk factors, like antiso-
lity of these activities as research shows (Baglivio et al.,                cial attitudes, associated to reoffending that can be reduced
2018; Bonta et al., 2008) that interventions targeting the re-              with clinical interventions. A clinical intervention (hereaf-
levant risk factors diminish the risk of reoffending of young               ter called intervention) is defined as any discussion, com-
offenders. On the other hand, doing interventions on irrele-                munication or interaction between a clinician and a young
vant aspects is counterproductive and time-consuming.                       offender that aims to reduce a risk factor. The offender’s cri-
From a computational linguistic perspective, the goal is to                 minogenic needs profile is usually done once after the sen-
enable the automatic classification of thousands of reports                 tence is received and every six months thereafter for cases
using manual annotations from experts. Correctly classi-                    that exceed this duration. Not all offenders have the same
fying these reports will provide a better insight of the moni-              profile. It is therefore important to adapt the interventions
toring process of young offenders. Manual annotations also                  according to the needs of each person.
provide insights about which linguistic or semantic know-                   For each sentence received by a young offender, a specific
ledge is relevant to identify the different types of interven-              period is imposed by the judge during which he or she will
tions.                                                                      meet with a clinician in order to help reduce the risk of
The following section will present a more in-depth context                  reoffending. For each of these meetings, the clinician must
of the project, the analyzed data and the type of information               detail in a report, where it happened, the interactions, the
sought by experts in the criminology field. Section 3. will                 topics discussed, the exchange of information, news related
present an overview of the manual annotation process. The                   to the sentence, etc. By reading a report, an expert should
fourth section provides insights about the requirements and                 be able to judge if the described interventions are aligned
challenges for this task, followed by classification experi-                with the profile of the offender or if it was on another topic.
ments using the annotated data. The conclusion presents the                 These reports and other activity entries (request for medi-
current state of the project.                                               cal record access, adding documents, etc.) related to a case

                                                                       24
are stored in a secured database for five to seven years fol-         2.2.   Criminology research objectives
lowing the end of each sentence, at which time they are               As mentioned earlier, studies in criminology shows that in-
destroyed. This amounts to more than 150,000 reports for              terventions aligned with the criminogenic needs of a young
the entire follow-up period for our sample of 750 young of-           offender reduce the chance of recidivism. Acting on this
fenders. Of these entries, about 30% are reports relevant to          knowledge, the main goal of this project from a crimino-
our project in which an intervention could take place.                logy standpoint is to validate if the interventions described
                                                                      in the reports are aligned with the profile of each individual.
2.1.   Intervention types
                                                                      For example, if a young offender is sentenced for theft,
Each intervention falls into one of the following categories :        but his criminogenic needs profile reveals that he stole for
    — Administrative                                                  his consumption habits, interventions should mainly target
    — Antecedents                                                     this last topic. Focusing the interventions on the first topic
    — Attitude                                                        would thus be misaligned and much less effective, if at all.
    — Consumption
                                                                      In order to attain this goal, reports of each young offender
    — Family/Couple
                                                                      must be annotated, either manually or automatically, to give
    — Hobbies
                                                                      an estimate of the number of interventions for each cate-
    — Peers
                                                                      gory. This will enable researchers to compare the estimated
    — Personality
                                                                      number of typed interventions with the main criminogenic
    — Occupational school/work
                                                                      risk of each young offender to validate if they match.
The Administrative category contains any discussion about
                                                                      A secondary goal is to assess the intrinsic quality of the re-
the conditions of the penal case (contact restrictions, apo-
                                                                      ports, validating if the interventions are clearly mentioned
logy letters, etc), the mandatory community work, or the
                                                                      and explained. While this is not part of the current project
intervention plan. While these are not interventions aimed
                                                                      involving natural language processing techniques, a quali-
at lowering reoffence risk, it often takes up a large part of
                                                                      tative evaluation will be done following the manual anno-
meetings, often in concurrence to useful interventions of
                                                                      tation effort to make recommendations to improve future
other categories.
                                                                      reports. Providing a more detailed account of interventions
Antecedents relates to reinforcing believes or alternative be-
                                                                      might help the automatic annotation of future reports and
haviors which lower the chance of criminal reoffence when
                                                                      thus enhance the performance of natural language proces-
in the presence of a high risk situation.
                                                                      sing tasks like classification or information extraction.
Attitude regroups interventions that seek change in motiva-
tion, prosocial institution valorisation and value restructu-
ration to recognize antisocial attitudes or criminal lifestyle
                                                                      2.3.   Computational linguistic aspects
and promote alternative prosocial identities and attitudes.           In order to correctly assess the alignment between interven-
Consumption interventions favor the reduction of alcohol              tions and risk profile, each intervention reported by clini-
and drugs consumption or abuse, reduce the personal and               cians should be fully identified for each meeting regarding
interpersonal behavior that leads to consumption and deve-            a case. We obtained a subset of more than 56,000 reports
lop new substitutes to these habits.                                  related to various cases, which is too much data for ma-
The Family/couple category contains interventions about               nual annotation alone. Also, as new meetings occurs every
developping or maintaining positive family relationships,             week, this would be an ongoing manual task that would re-
respecting house rules and supervision as well as valuing             quire much effort.
couple relationship with a prosocial partner with a long              All the reports are written in Canadian French as the orga-
term or positive outcome.                                             nization targets mostly French speaking young offenders.
Any interventions that foster the participation or engage-            While linguistic analysis tools exist for French, none are
ment in an organized prosocial activity like sports, gym,             trained on the Canadian French variation and, more impor-
extracurricular or religious activities, fall into the Hobbies        tantly, register which typically includes more anglicisms as
category.                                                             well as different idiomatic expressions and semantic senses.
Interventions about Peers target the reduction of interac-            In addition, most reports contains various amount of abbre-
tions with criminals and the valorisation of relationships            viations, truncated words, missing words, missing letters,
with prosocial persons.                                               typos, domain-specific lingo, agglomerated words (missing
Trying to help the young offender to cope with Personality            space), implicit acronyms, colloquial terms, missing punc-
issues can also reduce the risk of reoffence. This includes           tuation, anglicisms and spoken sentence formulation. As
discussions about anger management, improving problem                 such, the reports can be viewed as noisy texts, which im-
resolution skills, discourage manipulation of others or ego-          plies that usual natural language processing tools will fail
centrism.                                                             to analyze them correctly.
Finally, Occupational interventions relate to school or work          The granularity of annotation should also be tuned to fit
and can include helping and accompanying the offender                 the need for precision, either for the estimate number of
through the subscription procedure for school, valuing ac-            interventions of different categories or for the exact expres-
tive participation and attendance to either school or work,           sions used to detail an intervention. As such, two natural
denoting positive rewards brought by an occupation and de-            language processing tasks can be devised in order to obtai-
veloping positive relationships with new colleagues or per-           ned the necessary information : multilabel report classifica-
son of authority.                                                     tion and intervention recognition and typing.

                                                                 25
                                                                        Annotation type             Training         Evaluation
                                                                                                  Ann.    Doc.      Ann.    Doc.
                                                                        Administrative           2,114 1,783         553     484
                                                                        Antecedents                  9        9        1        1
                                                                        Attitude                    79       76        6        5
                                                                        Consumption                113     113        31      31
                                                                        Family and couple           55       53       15      15
                                                                        Hobbies                    367     365        47      47
                                                                        Occupational             1,817 1,597         584     492
                                                                        Peers                       77       74       17      16
                                                                        Personality                333     302        49      44
                                                                        Without annotations         — 4,720           — 1,637
                                                                        Total                    4,964 8,189       1,303 2,622

       F IGURE 1 – PACTE manual annotation interface.                  TABLE 1 – Training and evaluation sets distribution for an-
                                                                       notations (Ann.) and documents (Doc.).

Multilabel report classification implies the automatic anno-
tation of a report with all the categories corresponding to            for the evaluation set. A case-based random split was ap-
the detailed interventions mentioned in it. While it does not          plied, which means that all the reports from one case are
give an exact count when multiple annotations of the same              entirely found in either the training or evaluation set.
category are found in one report, it still gives a good es-            Both datasets were split into batches of 500 documents with
timate as multiple annotation reports usually contain dif-             a larger last batch for the remaining documents. This was
ferent categories instead of multiple instances of the same.           done in order to better organize the work of annotators in
Intervention recognition and typing is akin to named entity            the subsequent steps and provide them with a positive sense
recognition and typing tasks as relevant candidate expres-             of advancement throughout the entire effort. They were
sions must first be identified in a text then classified into          then upload into PACTE as separate corpora but included
one of many type. As the exact expression is not needed,               in the same annotation project.
we view this as a sentence-level classification, as there is           Looking at Table 1, we can readily see that the datasets are
seldom more than one intervention in a single sentence.                heavily unbalanced. For the training set, 79.2% of all anno-
As these reports are highly confidential, largely involving            tations either comes from the Administrative or Occupatio-
minors, no similar training data was available to help the             nal categories. Antecedents makes up less than 0.2% of the
classification process, thus prompting a manual annotation             entire set. This will likely requires further manual annota-
effort. As these data were obtained through an ethic com-              tion of this type to enrich the sample size.
mittee and a court hearing, the data cannot be released pu-
blicly. The examples shown in this article have been redac-            3.2.   Annotation schemas
ted to remove any possibility of individual identification.            Using the schema designer in PACTE, we defined nine dif-
                                                                       ferent schemas corresponding to the nine categories at the
               3.    Annotation process                                top of Section 2.1.. For each of them we defined an attribute
                                                                       to specify the type of risk targeted by the intervention. For
To simplify the interaction between the experts of each                example, the Consumption schema has the Reduction and
team from different institutions, we used the online text              Solutions values for the type attribute while Family/Couple
annotation platform PACTE (Ménard and Barrière, 2017)                  has Relationship, Supervision and Couple as the enumera-
as the central repository for manual annotation, annotation            tion for the type attribute.
curation and classification results for this project. PACTE            One exception is the Occupational category which has a
enables an annotation project manager to import large text             type attribute with School and Work and a subtype attribute
corpora, define custom annotation schemas, add partici-                with Help, Participation, Engagement. Satisfaction and Re-
pants to a project, define project’s steps and allocate docu-          lationship. All the type and subtype attributes were defined
ments to be annotated by the project’s participant as shown            as mandatory when creating a new annotation in PACTE.
in Figure 1 (with unrelated text for confidentiality).                 Finally, an additional Comment attribute was added in order
                                                                       for the annotators to provide additional information to the
3.1.    Dataset                                                        curator about uncertain annotation or edge cases.
In order to have a significant quantity of reports to train and        We defined each schema as annotation targeting text surface
evaluate the machine-learning algorithm, 10,811 single re-             (as opposed to document or corpus annotation). Text sur-
ports from randomly chosen young offenders’ cases were                 face annotation schema in PACTE enables the annotators
selected. For each individual case, all the reports were ta-           to create contiguous zones spanning from one letter to the
ken, thus giving a full historical account of each case.               whole document. It also enables them to create single an-
These reports were split into two sets for training and eva-           notation with multiple segments. This was quite useful as
luation. The training set contains 8,189 randomly selected             the reports often contains contextual information between
reports while the remaining 2,622 reports were kept aside              parenthesis or apposition which are not part of the inter-

                                                                  26
                                                                              Avg. time per batch                 9 hours
                                                                              Avg. number of annotations            274
                                                                              Avg. time per document               1 min
                                                                              Avg. annotation in single report      1.41
                                                                              Avg. size of reports               19 words
                                                                              Max. annotation in single report        5

                                                                            TABLE 2 – Annotation effort statistics per batch.


                                                                                          4.    Challenges
                                                                     Automatic annotation of the reports in this project proved
                                                                     a challenge as there was no off-the-shelf tools suited to the
                                                                     enrichment of these texts. This section presents the analysis
             F IGURE 2 – Annotation process.                         of some aspects of the data which represent a challenge for
                                                                     automatic processing.

                                                                     4.1.    Noisy data
ventions. Using multipart annotations in this case provided          As the processed documents in this study are internal re-
a way to target precisely the sentence parts containing the          ports, often hastily written at the end of the day, most of
relevant information.                                                them contains misspellings, typos, phonological writing,
                                                                     structural inconsistencies and so on.
                                                                     There are also many truncations (i.e. "reso" for "résolu-
3.3.   Workflow                                                      tion", "ds" for "dans" ["in"]), abbreviations and acronyms
Once the reports and annotation schemas were imported                used across the reports, the amount used depending mostly
and created in the platform, the annotation process could            on the author of the report. For acronyms, the implicit short
begin as shown in Figure 2. The first step (1) was done by           forms (without explicitly linking to the long form) are often
two annotators who annotated collaboratively each batch              used as the report is intended for readers familiar with the
of 500 reports with the web user interface. The curator              domain of activity. This can hinder information extraction
was then able to review (2) and possibly correct the ma-             tasks applied to the dataset if no external reference list is
nual annotations for this batch of reports. After retrieving         used to explicitly link the two forms. It might also reduce
all the manual annotations available via an online web API,          the performance of the bag-of-word approach as concepts
a machine-learning algorithm was used to train a classifi-           with multiple different surface forms will be separated in
cation model (3) with the N-1 batches and automatically              the td-idf processing.
annotate the last batch of reports (4) in a separate storage.
                                                                     4.2.    Report versus reality
Using the user interface, the curator could then (5) validate
the performance of the classification and analyze potential          One key issue for annotation is trying to differentiate bet-
issues.                                                              ween the young offender simply relating an event or fact
                                                                     and the clinicians making an active intervention on the
The order of selected reports was randomized to minimize
                                                                     same subject. Because reporting this difference is not a re-
the chance of two contiguous reports on the same case du-
                                                                     quirement asked of clinicians, there is much variation in
ring annotation session to minimize the prior knowledge
                                                                     the ways it is expressed in the reports. As an example, one
issue where two reports of the same case with similar infor-
                                                                     could only report that "He told me that he quit school"
mation would be annotated differently.
                                                                     which does not count as an intervention.
                                                                     On the other hand, if the previous sentence was followed
3.4.   Effort metrics                                                by "I asked him what he intends to do next", this would
                                                                     be considered an active intervention and would have to be
Table 2 shows some statistics for the manual annotation ef-          annotated as such. Then again, if there is discrepancies bet-
fort of this project. The project in PACTE had 21 steps, one         ween what was said at the meeting and in the report, like
for each of the 16 batches for training and 5 for evaluation.        missing information about the intervention, neither a hu-
This amounts to approximately 189 hours of annotation for            man or a machine-learning agent could deduce what hap-
each annotator, giving a total of 378 hours for this project.        pened.
For each batch, half of the reports had annotations for an           As there is no way of knowing, without recorded proof,
average of 274 annotations.                                          exactly what was discussed and how during meetings, the
Still the unannotated reports took time to process as some           manual annotation was done in an optimistic mindset. This
had information that lead to a discussion about whether it           implies that what may look like a young offender simply
was an intervention or not. Of course, some reports were             telling the clinician about something was annotated as an
very short (a few words like "He didn’t show up and we               intervention. This will be taken into account when estima-
rescheduled for later") while others were many paragraphs            ting the number of interventions of each type as the number
long.                                                                will likely be inflated.

                                                                27
4.3.   Expressing intervention                                           Expression   Examples
                                                                         Add          "Nous ajoutons dans son CV" (We add to his
Without regard of what actually took place, the texts narrate                         resume)
the history of discussions, attempts, failures and commit-               Address      "J’aborde la révision..." (I address the revi-
ments. Despite their simplicity, each snippet contains tacit                          sion)
knowledge and presents subtle characteristics. The analysis                           "On aborde chacun des point" (We address
presented in the next subsections provides potential goals                            each point)
for automated annotation tools in order to help the detec-               Admit        "Admet impulsivité." (Admits impulsivity)
tion of interventions in reports.                                        Announce     "Je lui annonce" (I announce him)
                                                                         Ask          "Je lui demande si" ("I ask him if")
4.3.1. Speech acts in interactions                                       Congratulate "Le félicite d’emblée pr..." ("I readily
From the speech acts (Searle, 1969) perspective, the narra-                           congratulate him for")
tives contain many constative expressions that represent a               Discuss      "Discutons du plan..." (Discussing the plan)
state of things or the recollection of ascertainment by the                           "Discussion sur " ("Discussion on")
clinician (e.g. expressions Add, Address, Announce, Dis-                              "nous avons discuté des ..." (We have discus-
                                                                                      sed about the)
cuss, Explore, Expose, Inform, Return, Read, Repeat, Talk,
                                                                         Do           "Nous faisons une première ébauche" (We do
as shown in Table 3). They are pervasive in each category
                                                                                      a first draft)
and reflect the continuing interaction and accompaniment                              "On fait ensemble son devoir" (We do her ho-
of the young.                                                                         mework together)
On the other hand, reinforcement expressions (e.g. Congrat,              Explore      "Nous tentons d’explorer ses pensées..." (We
Reinforce, Underline) and commissive expressions (e.g.                                try to explore his toughts)
admit) are absent or nearly absent from Antecedent and                   Expose       "Je lui expose la situation" (I expose him the
Family/Couple categories. These ratios are understandable                             situation)
since both Personality and Consumption are part of a solu-               Explain      "M’explqieu qu’entre chaque cours" ((He)
tion that can be within the control of the young offender,                            explain that between each course)
and thus, merit reinforcement and commitment, while An-                  Inform       "Informons que nous avons" (We inform that
tecedent and Family/Couple are more or less likely to be                              we have)
solved directly by the young.                                            Invite       "Je l’invite à faire les bons choix " (I invite
                                                                                      him to make the right choices)
Directive expression (e.g. ask, explain, invite, question, re-
                                                                         Mention      "je lui mentionne que" (I mention him that)
spond) appear with high ratios in Administrative and Occu-
                                                                         Question     "Questionne à savoir où il se trouve" (Ques-
pational categories as they consist of the clinician transfer-                        tion to know where he is)
ring administrative information to the young offender or the             Respond      "Je lui répond que..." (I answer him that)
latter informing the clinician about his everyday activities             Return       "Retour sur la révision" (Return on the revi-
such as school and work.                                                              sion)
In terms of usage, groups of categories are positively corre-            Read         "Je lui lit les conditions " (I read him the
lated with the usage of these last three types of expressions                         conditions)
(reinforcement, commissive and directive) : Consump-                     Reinforce    "Je le renforce en le félicitant " ("I validate
tion with Personality and Hobbies, Antecedent with Fa-                                him by congratulating him")
mily/Couple, and Attitude with Peers.                                    Repeat       "Nous devons lui faire répéter certians pro-
                                                                                      pos" (We must make him repeat some points)
4.3.2. Explicit versus implicit discussion                               Talk         "Nous parlons de ses travaux" (We talk about
The first and third person pronouns are often used together,                          his work)
                                                                                      "Lui parlons du ..." (Talk to him about)
but third alone may express either passive event or interac-
                                                                         Try          "On tente de mêtre en place" (We try to put in
tion depending on the accompanying verb.
                                                                                      place)
                                                                         Underline    "Je lui souligne aussi que" ("I also point out
4.3.3. Implied third person
                                                                                      to him him that")
Since each snippet of text is intended to be read by people              Understand   "Il semble comprendre" (He seems to unders-
in the field, the young and the clinician are often mentioned                         tand)
implicitly (e.g. "J’aborde [avec lui]"). The same can also
be noticed for specialized subject (e.g. "le positif du suivi")        TABLE 3 – Samples (verbatim, underlined noise) of expres-
in which the intended reader understands without further               sions used for interventions.
explanation what is the implied meaning. In this specific
example, "Le positif du suivi. Un endroit pour extérioriser
ses émotions, pour ventiler." (The positive of follow-up. A
place to externalize his emotions, to breathe. We can pro-             For example, the commissive-commissive-reinforce struc-
bably surmise that this was not a spontaneous expression of            ture often unfold over three separate sentences. The first
personality aspect, but derived from a discussion.                     two commissive sentences bring contextual knowledge to
                                                                       the last reinforcement expressions. Using this type of struc-
4.4.   Structure of intervention                                       ture and other relevant patterns across sentences could help
While most interventions are expressed in the same sen-                to better identify important intervention and annotate them
tence fragment, some of them span multiple sentences.                  with the complete contextual knowledge.

                                                                  28
 Algorithm                     Recall     Precision      F1             Original                      Translation
 Complement Naive Bayes        0.8160      0.6301      0.7116           cumulé autre absence          cumulate other absence
 Naive Bayes network           0.5801      0.5663      0.5731           démarches                     actions
                                                                        retour sur ses absences       feedback on his school
 Random Forest                 0.3196      0.6105      0.4195
                                                                        scolaires                     absences
 REPTree                       0.4258      0.6184      0.5043           va toujours à l’école         still going to school
 SimpleCART                    0.7013      0.4846      0.5731           il est encore suspendu        he was suspended again
 J48 Consolidated              0.5478      0.6250      0.6000           imprime des copies de son     print copies of his resume
                                                                        cv
TABLE 4 – Average performances on all types for sentence-               donne ses preuves d’emploi    give his proof of his em-
level classification task.                                                                            ployment status
                                                                        toujours pas d’emploi         still no job
                                                                        jeune dit aimer               youngster said he likes

              5.   Classification results                            TABLE 5 – Some frequent ngrams from Occupational ma-
                                                                     nual annotations.
We present in this section some of the classification experi-
ments done for the second task of sentence-level classifica-
tion. Using the manual annotations from the training set, a
prediction model was built and then applied on the evalua-           5.3.   Performance and error analysis
tion set to assess the performances. As some reports have
no annotations at all, a null class was added to the nine re-        As the datasets are created based on noisy data, one issue
levant classes listed in section 2.. As shown in Table 1, the        is the frequency restriction on ngrams used as features. In
dataset is unbalanced mostly in favour of the Administrative         order to lower the number of features generated, we used
and Occupational categories and to a lesser extent Hobbies           a cut-off frequency of 2, so that any ngram occurring only
and Personality which predict better results on the modal            once was not used as a feature. This means that relevant
classes and lower scores on the less represented ones.               but uniquely miswritten words are eliminated from the da-
                                                                     tasets which impacts the representation power of features,
5.1.   Preprocessing                                                 especially for categories with few instances.
                                                                     The same sentence can be classified in different catego-
Using a tokenizer and sentence splitter, each report was
                                                                     ries depending on the surroundings. This is not captured by
broken down as a single instance per sentence in the data-
                                                                     the current model of vectorizing single sentences without
set. For simplicity, the few consecutive sentences that were
                                                                     contextual information. Thus a short sentence like "We dis-
spanned by the same annotation were kept together and tag-
                                                                     cuss his continuing effort" creates confusion for the predic-
ged with the annotation type. The stop words were remo-
                                                                     tion process as they were manually classified in different
ved with the exception of personal pronouns as they can be
                                                                     categories in two separate instances.
helpful, as explained in the last section.
The baseline uses a bag-of-word approach with tf-idf vec-            The unbalanced nature of the datasets, both training and
tor build using generated ngrams from 1 to 5 words long.             evaluation, also influence the results. For example, as
A subset composed of the most discriminating 2,000 fea-              shown in Table 1, the Attitude has a single instance in the
tures was kept for training and evaluation. Table 5 shows a          evaluation set and only nine for training the model.
sample of ngrams generated for the Occupational category             We can see that performance is not yet useful to provide an
with stop words left in place for clarity.                           adequate estimation of intervention numbers and types in
                                                                     reports for this corpus. Taking into account the task of as-
5.2.   Performances                                                  signing one of ten classes to a sentence (the nine categories
                                                                     plus the null class), it is still far better than an average 0.1
We applied Naive Bayes network (Friedman et al., 1997),              performance provided by a random baseline.
Complement Naive Bayes (Rennie et al., 2003), Random
Forest (Breiman, 2001), SimpleCART (Breiman et al.,
1984), J48 consolidated (Pérez et al., 2007) and REPTree                                  6.    Conclusion
(Quinlan, 1987) on the current data to compare perfor-
mances on classic machine-learning algorithms from dif-              We presented a first set of experiments and analysis using
ferent types (rule-based, decision tree, function). While            reports relating meetings with young offenders. While the
these approaches are not cutting edge, they provide a quick          analysis provided in Section 4.3. is in a preliminary stage, it
view to assess the potential performance of current data.            will be further explored to evaluate its potential as a helping
The scores shown in Table 4 are averages combining per-              linguistic annotation for the automatic detection and classi-
formances on all ten categories (the nine basic ones plus            fication of interventions for young offenders’ reports. The
the null class). We can see that the complement naive                next step in this project is to address the issue with noisy
bayes outperforms the others, as it was specifically desi-           data to single out expressions detailing interventions. If the
gned to overcome the challenge of text classification. The           number of raw reports allows it, an approach using neural
less frequent classes like Antecedents and Family/Couple             network will be applied to profit from the manual annota-
had 0% score as none were correctly classified in the eva-           tion while being able to use the noisy text from the whole
luation set.                                                         corpus.

                                                                29
                    7.   References
Andrews, D. A. and Bonta, J. (2010). The psychology of
  criminal conduct (5e ed.). Lexis Nexis.
Baglivio, M. T., Wolff, K. T., Howell, J. C., Jackowski, K.,
  and Greenwald, M. A. (2018). The search for the holy
  grail : Criminogenic needs matching, intervention do-
  sage, and subsequent recidivism among serious juvenile
  offenders in residential placement. Journal of Criminal
  Justice, 55 :46 – 57.
Bonta, J., Rugge, T., Scott, T.-L., Bourgon, G., and
  K.Yessine, A. (2008). Exploring the black box of com-
  munity supervision. Journal of Offender Rehabilitation,
  47(3) :248–270.
Breiman, L., Friedman, J., Olshen, R., and Stone, C.
  (1984). Classification and Regression Trees. Wadsworth
  and Brooks, Monterey, CA.
Breiman, L. (2001). Random forests. Machine Learning,
  45(1) :5–32, Oct.
Dowden, C. and Andrews, D. A. (1999). What works in
  young offender treatment : A meta-analysis. In Forum
  on Corrections Research, volume 1, pages 21–24.
Friedman, N., Geiger, D., and Goldszmidt, M. (1997).
  Bayesian network classifiers. Mach. Learn., 29(2-
  3) :131–163, November.
Hoge, R. D. and Andrews, D. A., (2010). Youth Level of
  Service/Case Management Inventory 2.0 (YLS/CMI 2.0) :
  User’s Manual.
Koehler, J. A., Lösel, F., Akoensi, T. D., and Humphreys,
  D. K. (2013). A systematic review and meta-analysis
  on the effects of young offender treatment programs in
  europe. Journal of Experimental Criminology, 9(1) :19–
  43, Mar.
Lipsey, M. W. (2009). The primary factors that characte-
  rize effective interventions with juvenile offenders : A
  meta-analytic overview. Victims & Offenders, 4(2) :124–
  147.
Ménard, P. A. and Barrière, C. (2017). Pacte : a collabora-
  tive platform for textual annotation. In Proc of the 12th
  International Conference on Computational Semantics.
Pérez, J. M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I.,
  and Martín, J. I. (2007). Combining multiple class dis-
  tribution modified subsamples in a single tree. Pattern
  Recognition Letters, 28(4) :414–422.
Quinlan, J. R. (1987). Simplifying decision trees. Int. J.
  Man-Mach. Stud., 27(3) :221–234, September.
Rennie, J. D. M., Shih, L., Teevan, J., and Karger, D. R.
  (2003). Tackling the poor assumptions of naive bayes
  text classifiers. In Proceedings of the Twentieth Interna-
  tional Conference on International Conference on Ma-
  chine Learning, ICML’03, pages 616–623. AAAI Press.
Searle, J. R. (1969). Speech Acts : An Essay in the Philo-
  sophy of Language. Cambridge University Press, Cam-
  bridge, London.


                                                               30