Road Accidents: Information Extraction from Clinical Reports Enrico Mensa1,† , Daniele Liberatore1,† , Davide Colla1,† , Matteo Delsanto1,† , Marco Giustini2,† and Daniele P. Radicioni1,*,† 1 Dipartimento di Informatica, Università degli Studi di Torino 2 Istituto Superiore di Sanità, Roma Abstract In this paper we present a novel approach for the computation of statistical insights on road traffic accidents (RTAs). Instead of relying on numerical and categorical reports, we propose a system for the extraction of crucial information from clinical reports concerning RTAs, that can then be used to enrich more traditional data with relevant information associated to the injuries reported by accidents victims. After having tested and evaluated the system, we also illustrate and discuss the information extracted automatically from over 30, 000 medical records. Keywords Information Extraction, Road Traffic Accidents, Clinical Reports, BERT 1. Introduction In recent years Electronic Health Records (EHRs) have become more and more common, leading to the collection of increasing amounts of health and clinical data. Among these, Emergency Room records are particularly interesting since they can provide structured data concerning the patient, combined with unstructured free-text data describing the events that led to the hospitalization. These records are usually categorized under the type of event that caused the patient injuries, allowing for multiple analysis centered on the specific events. This paper focuses in particular on road traffic accidents. A road traffic accident (RTA) is defined by the French National Institute of Statistics and Economic Studies as an accident that occurs when at least one road vehicle is involved in an accident which happens on an open public road, and at least one person ends up being killed or injured [1]. Analyzing road accidents is a relevant task since, according to the World Health Organization (WHO) [2, 3] RTAs i) are likely to become the seventh leading cause of death by 2030, and ii) they cause death to vulnerable road users since more than half individuals died on the roads are either cyclists, motorcyclists, or pedestrians. For these reasons the study and prevention of RTAs is a priority in transportation management. Studies have been carried out on RTAs [4, 5, 6], mostly employing data mining techniques on structured data (e.g., traffic HC@AIxIA 2022: 1st AIxIA Workshop on Artificial Intelligence For Healthcare, November 28 - December 2, 2022, Udine * Corresponding author. † These authors contributed equally. $ enrico.mensa@unito.it (E. Mensa); daniele.liberator@edu.unito.it (D. Liberatore); davide.colla@unito.it (D. Colla); matteo.delsanto@unito.it (M. Delsanto); marco.giustini@iss.it (M. Giustini); daniele.radicioni@unito.it (D. P. Radicioni) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) crash data), often paired with further specific information concerning road quality or weather conditions. This sort of data is naturally suited for the analysis of RTAs, although often lacking of information on the specific injuries reported by the patients. In this work we investigate a novel approach for the study of RTAs: we extract relevant information concerning road accidents from medical records, so to develop a system providing detailed statistical insights on the injuries possibly caused by a given accident; to the best of our knowledge this task was never previously addressed in literature. The feasibility of such a system requires however to determine if current state-of-the-art language models (e.g., BERT) are at least able to identify and extract which vehicles are involved in a medical report. More specifically, we are interested in the extraction of the vehicles involved in the accident, and in which vehicle the patient was in when the accident occurred. This precious information may then enrich patients’ data, typically including age, gender, and the reported injuries. The whole extraction process needs to be completely automated, since asking the medical operators to promptly collect such data (maybe by choosing a vehicle in a complex list of tens of options) is completely unfeasible, due to the conditions of high time pressure and workload customarily afflicting emergency departments. The paper is structured as follows: after describing the state of the art in Section 2, we introduce the adopted dataset and illustrate the annotation process in Section 3. We then propose and evaluate an extraction algorithm based on BERT contextual embeddings in Section 4. Finally, we report some aggregated statistics by running the system on the entire dataset, focusing on the axis of age and gender (Section 5), deferring the more complex study of injuries to future work, described in Section 6. 2. Related Work Many works have been published on the topic of RTAs, showing that the problem can be analyzed from different points of view and with different aims. A great deal of effort has been naturally invested in accident severity prediction [4, 7, 8, 9, 10] and traffic accident anticipation [11], aimed at the understanding of which factors such as weather conditions, time, road quality, etc. contribute the most to the severity of the accidents. Other efforts have been spent in comparing different methods to understand which ones are better suited for the modeling of RTAs. In particular, a very recent study [5] compares many machine learning approaches including naïve Bayes, logistic regression, K-nearest neighbors, AdaBoost, support vector machines, and random forests finding the latter one as the best suited for the task. Particular emphasis has also been put on the explainability of the adopted algorithms: in [12] random forests and decision trees have achieved good results, allowing for the demonstration that weather conditions are very related to car accidents. In [13] a substantial sub-category of accidents has been examined, specifically addressing those against poles and trees. The authors rely on text mining and on other interpretable machine learning techniques to analyze available crash narratives, so to complement the results with explanations. Each work typically examines a different dataset which is released by a particular State or Region of the world. A plethora of such datasets can be found, for instance the National Road Network dataset from Canada [14], the Road Safety Data dataset from the UK Department of Transport [15], the Montreal Vehicle Collisions dataset from the City of Montreal [16], the Setùbal (Portugal) reports [7], the Gauteng (South Africa) reports from the Gauteng Department of Community Safety [5], etc. The amount of information provided by these datasets is highly variable. The crash reports from Victoria (Australia) [8] also include the gender and age of the drivers. Among the various surveyed datasets, the one from Louisiana treated in [13] is a rare instance of a dataset listing some insight concerning the gravity of the injuries incurred by the drivers. A complete review of many of these datasets can be found in [17]. Our work is hardly comparable with the current state-of-the-art, since we do not rely on categorical and numerical data in our analysis, but rather focus on text extraction from free-text narratives included in clinical data. The application of NLP techniques in this field is however not completely new, since in recent years some works have been carried out trying to extract and classify RTAs from social media [18, 19]. In summary, the novelty of our approach stems from the radically different dataset that we are employing, which requires specific NLP techniques to be dealt with. With this preliminary work we are allowing for the future extraction of detailed information regarding injuries in accidents, which can be paired with the data computed through more traditional approaches aimed at a better understanding of the impact of RTAs. 3. Dataset and Annotation The data employed in the present study are real-world Emergency Room Reports (ERRs) collected in Italian Hospitals, and then made available by the Italian National Institute of Health in the frame of the SINIACA project [20]. The SINIACA project1 is the Italian branch of the European Injury Database (EU-IDB), an EU-wide surveillance system concerned with accidents, collecting data from hospital emergency department patients according to EU recommendation [21]. The SINIACA-IDB is a data collection on injuries, based on a sample of hospital emergency departments, in implementation of the recommendation of the Council of the European Union no. C 164/2007/01 on injury prevention and safety promotion. The original dataset consists of 153, 826 clinical records from the SINIACA-IDB, which were originally annotated by hospital staff as referring to road traffic accidents or not. In this work we only consider the 35, 952 records that concern RTAs. It is important to note that these reports are very challenging to process. ERRs are compiled by medical staff under huge time pressure, which leads to typos and disjointed fragments of text, at times resembling bullet lists rather than actual sentences. By randomly sampling and analyzing 592 records of the dataset we measured that the over 10% tokens contain either typos, abbreviations, or acronyms (on average 2.25 per record) [22, 23]. All of these elements significantly increase the difficulty of extracting relevant data from the records. Data annotation. In this preliminary work we focus on the extraction of two types of informa- tion. Namely, given a record, we want to retrieve which kind of vehicles were actually involved in the incident (Task 1), and in which vehicle the patient was when the accident occurred (Task 2). The dataset was annotated separately for these two tasks, that will be referred to as T1 and T2, respectively. The label collection used to annotate vehicles for both tasks has been 1 ‘Sistema Informativo Nazionale sugli Incidenti in Ambiente di Civile Abitazione’, National Information System on Accidents in Civil Housing Environment. Table 1 Label distribution for the medical records annotated, before (left - 1, 000 records) and after (right - 987 records) pruning the under represented labels. Label # in T1 # in T2 Label # in T1 # in T2 NS (not specified) 405 412 NS 405 412 CAR 392 284 CAR 386 279 MOTORCYCLE 186 173 MOTORCYCLE 185 172 PEDESTRIAN 78 77 PEDESTRIAN 74 73 BUS 30 27 BUS 30 27 BICYCLE 26 24 BICYCLE 26 24 VAN 5 1 TRUCK 5 0 Total 1106 987 TRACTOR 1 1 AUTO-RICKSHAW 1 1 ARMORED-CAR 1 0 Total 1,130 1,000 designed based on the Wikipedia page related to vehicle categories,2 which in turn relies on the UNECE (United Nations Economic Commission for Europe) categories and regulations [24]. Some classes were merged for simplicity, while the PEDESTRIAN class was added to account for the annotation of pedestrians. The final taxonomy is illustrated in Figure 2 (Appendix A). Only the leafs of the taxonomy were adopted as labels throughout the annotation process; however, just a few were actually found in our dataset. Table 1 (left side) reports the label distribution on the two tasks for the 1, 000 randomly selected records. Provided that testing for the classification of labels with very few occurrences (e.g., one or zero for T2) is not meaningful, we decided to remove from the test bed the 13 entries labeled as VAN/TRUCK/TRACTOR/AUTO- RICKSHAW/ARMORED-CAR for either T1 or T2. Table 1 (right side) illustrates the final dataset of 987 records. Future work will focus on the annotation of entries containing less common vehicles, so to be able to evaluate the system for the classification of this type of labels. As illustrated in the Table, in the T2 task only one label per entry has been annotated (since the patient could only be in one vehicle), while T1 allowed for multiple labels since multiple vehicles can be involved in one accident. Moreover, the NS (not specified) tag allows for the classification of records that do not provide enough information to determine which vehicles were involved in the accident. The annotation process was carried out by two annotators on the text annotation tool Doccano [25]. The resulting Inter Annotator Agreement – calculated as Cohen’s kappa coefficient– was of 0.9957 for T1 and 0.9930 for T2. Such high IAA shows that this task is quite easy for humans; however, we will show that for artificial systems some language nuances are still difficult to decipher, especially for T2. 2 https://en.wikipedia.org/wiki/Vehicle_category. Table 2 Precision (P), Recall (R) and F1 score on the T1 task (10 fold). Baseline BERT Label P R F1 P R F1 NS 0.64 1.00 0.78 0.98 0.96 0.97 CAR 0.99 0.44 0.60 0.95 0.96 0.95 MOTORCYCLE 1.00 0.82 0.90 0.96 0.98 0.97 PEDESTRIAN 0.90 0.26 0.38 0.89 0.73 0.80 BUS 1.00 0.93 0.96 1.00 0.70 0.82 BICYCLE 0.90 0.77 0.82 1.00 0.50 0.67 All (weighted) 0.86 0.72 0.72 0.96 0.93 0.94 4. System and Evaluation Our system is based on a multilingual version of the BERT neural model [26].3 We employed the same model for the resolution of both T1 and T2. More specifically, the tasks are defined as follows: in T1 the system is requested to retrieve all the vehicles involved in the RTA, while in T2 the system has to detect the mean of transportation used by the patient at the time of the accident. We fine-tuned the existing language model on the whole set of 153, 826 records for ten epochs so as to specialize the model on the type of language used (Italian, deeply blended with medical jargon). Finally, we stacked a linear classifier from an off-the-shelf Transformers library from HuggingFace [27] on top of the pre-trained model, which exploits the dense representation of the input record to classify it with the appropriate label(s). The final model employs two different types of loss function according to the task definition: in the case of T1 we employed the Binary Cross-Entropy loss, that is the model is allowed to classify multiple labels for each entry (more specifically, we have 𝑁 loss functions, one for each class), while for T2 we adopted the Cross-Entropy loss, since in this setting the model can only return one class. The evaluation was conducted on the 987 annotated entries described in Section 3, in a 10-fold setup, with a training of 10 epochs for each such fold. The system has been also been evaluated against a simple baseline based on string-matching for both T1 and T2: for each label 𝑙 a set of trigger vehicles and their synonyms 𝑆 𝑙 has been obtained from the Treccani Italian Dictionary [28]. For T1 the baseline classifies a record as 𝑙 if the record contains a trigger word from the corresponding set 𝑆 𝑙 . For T2 the baseline works similarly, but the trigger word has also to be in the proximity of a trigger expression. Trigger expressions are a collection of words and sentences (e.g., was driving, guiding, run over, passenger of, etc.) often used in the records to indicate the fact that the patient was either the driver or a passenger in a vehicle. The baseline has been run on the same 10 folds as the BERT system, discarding the training portion of each fold, since no training was performed for this system. Discussion.Results for the T1 and T2 tasks are reported in Table 2 and Table 3 respectively. Concerning T1, both BERT and the baseline perform well overall. However, depending on 3 https://huggingface.co/bert-base-multilingual-cased. Table 3 Precision (P), Recall (R) and F1 score on the T2 task (10 fold). Baseline BERT Label P R F1 P R F1 NS 0.48 1.00 0.65 0.96 0.93 0.95 CAR 0.86 0.13 0.22 0.92 0.93 0.92 MOTORCYCLE 1.00 0.23 0.36 0.96 0.97 0.97 PEDESTRIAN 0.80 0.65 0.71 0.75 0.73 0.74 BUS 0.11 0.04 0.06 0.79 0.85 0.82 BICYCLE 0.20 0.04 0.07 0.76 0.92 0.83 All (weighted) 0.68 0.54 0.45 0.92 0.92 0.92 the class, the two systems show very diverse performances. The CAR and PEDESTRIAN classes appear to be more challenging compared to the MOTORCYCLE class: this is due to the linguistic variability adopted by the medical personnel when describing car or pedestrian accidents. As an example, PEDESTRIAN accidents are mostly described as ‘hit by vehicle’, however, some specific instances such as ‘grazed by a rearview mirror’, ‘slided against a car’, ‘crushed by a tire’, etc. can also be found. Such variance seems to be well dealt with by the neural model, while the baseline falls short in managing this complexity. On the other side, BUS and BICYCLE accidents are so few in the dataset that the neural model cannot properly generalize, whilst the baseline can deal with them by simply matching the words bicycle and bus, which are included in the respective trigger sets. The overall performance on the dataset (bottom row) shows that BERT mostly succeeds in this task with high accuracy (0.94 F1 score). BERT is also very precise on all classes (even on NS), which is a key feature for our goal: being able to rule out the records where no relevant information is available (thus limiting false negatives) is a priority. The results on T2 show how much the language model is able to distinguish vehicles involved in the incident from those in which the patient was either a passenger or the driver. PEDESTRIAN is the most challenging class: by looking at the records, this is probably due to the fact that the notion of being a pedestrian is sometimes expressed in convoluted and less clear ways compared to the notion of being passenger or driver. Increasing the dimension of the training set may be helpful to solve this issue. Given the huge drop in the performances of the baseline, this task is evidently too complex for a string-matching approach. Once again, the overall performance on the dataset (bottom row) shows that BERT can successfully deal with the task (0.92 F1 score). By looking specifically at precision and recall, we observe that the system is more precise in finding T1 occurrences, but it lacks recall. This phenomenon is interestingly reversed in T2, which may be explained by the fact that the notion of driving or being a passenger is more complex to detect. 5. Road Traffic Accidents Insights In this Section we report some insights on the whole dataset, obtained by running the BERT system on the 35, 952 entries concerning RTAs. In order to improve the stability of the system, we used all of the 987 entries as training set. We presently discuss the results on the T2 task (a) Age by Vehicle (c) Gender by Vehicle 91-100 81-90 71-80 61-70 51-60 41-50 31-40 21-30 MALE FEMALE 11-20 0-10 100% 100% 90% 75% 80% 70% 50% 60% 50% 25% 40% 30% 20% 0% 10% R E N S LE CA CL RIA BU YC CY ST BIC TOR EDE CAR MOTORCYCLE PEDESTRIAN BUS BICYCLE MO P (b) Vehicle by Age (d) Vehicle by Gender BICYCLE BUS PEDESTRIAN MOTORCYCLE CAR BICYCLE BUS PEDESTRIAN 100% MOTORCYCLE CAR 100% 75% 75% 50% 50% 25% 25% 0% 0% 0-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 FEMALE MALE Figure 1: Statistical insights on road traffic accidents obtained by the full dataset. The reported statistics refer to the T2 task i.e. the vehicle in which the patient was in at the time of the accident. (concerned with the vehicle on which the patient was traveling at the time of the accident), which it is definitely more interesting. To these ends, we provide aggregated data on gender and age, deferring the investigation on injuries to future work. Figure 1a reports the distribution of road accidents for each age interval with respect to the vehicles involved. According to such figures, the age of people involved in car and motorcycle accidents spans from 21 to 60. Differently, bus accidents mainly involve people over 60 years old, while 21% of bicycle accidents concern people from 41 to 50 years old. In contrast, accidents involving pedestrians seem to mainly concern lower and higher bounds of the age range: almost 50% of such collisions concern people under 30, and 25% involves over 70. In Figure 1b we illustrate the distribution of road accidents for each vehicle class with respect to the age interval of the patient. People aged between 0 and 10 are involved in accidents mainly as car passengers (53%) and pedestrians (29%), probably due to the fact that children are typically accompanied by parents. Conversely, people aged from 11 to 20 are often involved in motorcycle accidents (42%), consistently with the changes in travel habits during adolescence. The car becomes the main vehicle involved in accidents up to the 50 years limit, while only about the 30% of accidents concerns motorcycles. Progressively, after the age of 50, the percentage of car accidents decreases consistently with the aging of the patients; at the same time accidents involving pedestrians and buses increase accordingly to the changes in movement habits. Figure 1c shows the distribution of road accidents for gender of the patient with respect to the vehicles involved. Accidents involving cars and pedestrians are equally distributed between males and females, while events where the patient travels by bus mainly concern females (71%). Motorcycle (68%) and bicycle (77%) accidents mainly involve males. Figure 1d reports the distribution of road accidents for each vehicle class with respect to the gender of the accident patient. The 55% of accidents involving females concern cars, while the 22% involve motorcycles. Consistently with such figures, the events involving males mainly concern cars (42%), followed by motorcycle accidents (38%). Accidents involving pedestrians are of the same order of magnitude for both genders, while males experience three times more bicycle accidents than females. 6. Conclusions and Future Work In this work we presented a novel approach for the computation of statistical insights on road traffic accidents. We annotated a portion of the SINIACA dataset including Italian medical records concerning RTAs for the automatic extraction of vehicles, focusing on the extraction of those in which the patient was either the driver or a passenger. We observed that a simple baseline based on string-matching obtains .72 F1 score in the first task, but only .45 in the latter one. Conversely, the system based on modern contextual language models yields .94 and .92 F1 scores in the two tasks, respectively: this result suggests that such linguistic devices are mostly able to manage the language nuances and the medical jargon surrounding the relevant portions of text. We then reported and discussed some preliminary statistics extracted from our dataset through the system presented. Future work includes the release of the dataset in order to allow other researchers to propose and compare different approaches on the extraction tasks. We also plan to develop a pipeline that relies on the T1 results to guide the system through the T2 task. Additionally, we will explore how to employ both static [29] and contextual sense-enabled embeddings [30] (along with their related proximity measures, [31]) to exploit their lexicographic precision and to find links and associations among the extracted pieces of information. This work constitutes the first step in the development of a more complete system aimed at the integration of different sorts of information extracted from injury data; in fact, extracted infor- mation may be coupled with categorical and numerical data to improve the understanding of the impact of RTAs, and pinpointing details relevant to better understand the phenomenon of road injuries, and thus to assist the policies on injury prevention and transportation management. References [1] Institut National de la Statistique et des éstudies économiques, Road Accident definition, Accessed: 10-10-2022. https://www.insee.fr/en/metadonnees/definition/c1116. [2] W. H. Organization, Global status report on road safety 2015, World Health Organization, 2015. [3] W. H. Organization, Global status report on road safety 2018, World Health Organization, 2018. [4] M. Chong, A. Abraham, M. Paprzycki, Traffic accident analysis using machine learning paradigms, Informatica 29 (2005). [5] T. Bokaba, W. Doorsamy, B. S. Paul, Comparative study of machine learning classifiers for modelling road traffic accidents, Applied Sciences 12 (2022) 828. [6] B. Kumeda, F. Zhang, F. Zhou, S. Hussain, A. Almasri, M. Assefa, Classification of road traffic accident data using machine learning algorithms, in: 2019 IEEE 11th international conference on communication software and networks (ICCSN), IEEE, 2019, pp. 682–687. [7] D. Santos, J. Saias, P. Quaresma, V. B. Nogueira, Machine learning approaches to traffic accident analysis and hotspot prediction, Computers 10 (2021) 157. [8] K. Assi, Traffic crash severity prediction—a synergy by hybrid principal component analysis and machine learning models, International journal of environmental research and public health 17 (2020) 7598. [9] R. E. AlMamlook, K. M. Kwayu, M. R. Alkasisbeh, A. A. Frefer, Comparison of machine learning algorithms for predicting traffic accident severity, in: 2019 IEEE Jordan inter- national joint conference on electrical engineering and information technology (JEEIT), IEEE, 2019, pp. 272–276. [10] Z. Li, P. Liu, W. Wang, C. Xu, Using support vector machine models for crash injury severity analysis, Accident Analysis & Prevention 45 (2012) 478–486. [11] W. Bao, Q. Yu, Y. Kong, Uncertainty-based traffic accident anticipation with spatio- temporal relational learning, in: Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2682–2690. [12] C. Parra, C. Ponce, S. F. Rodrigo, Evaluating the performance of explainable machine learning models in traffic accidents prediction in california, in: 2020 39th International Conference of the Chilean Computer Science Society (SCCC), IEEE, 2020, pp. 1–8. [13] S. Das, S. Datta, H. A. Zubaidi, I. A. Obaid, Applying interpretable machine learning to classify tree and utility pole related crash injury types, IATSS research 45 (2021) 310–316. [14] Government of Canada, National Road Network, Accessed: 10-10-2022. https://open.canada. ca/data/en/dataset/3d282116-e556-400c-9306-ca1a3cada77f. [15] UK Department for Transport, Road Safety Data, Accessed: 10-10-2022. https://www.data. gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data. [16] City of Montreal, Montreal Veichle Collisions, Accessed: 10-10-2022. http://donnees.ville. montreal.qc.ca/dataset/collisions-routieres. [17] C. Gutierrez-Osorio, C. Pedraza, Modern data sources and techniques for analysis and forecast of road accidents: A review, Journal of traffic and transportation engineering (English edition) 7 (2020) 432–446. [18] A. Salas, P. Georgakis, Y. Petalas, Incident detection using data from social media, in: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), IEEE, 2017, pp. 751–755. [19] A. Salas, P. Georgakis, C. Nwagboso, A. Ammari, I. Petalas, Traffic event detection framework using social media, in: 2017 IEEE International Conference on Smart Grid and Smart Cities (ICSGSC), IEEE, 2017, pp. 303–307. [20] A. Pitidis, G. Fondi, M. Giustini, E. Longo, G. Balducci, Gruppo di lavoro SINIACA-IDB, Dipartimento di Ambiente e Connessa Prevenzione Primaria, ISS, Il Sistema SINIACA-IDB per la sorveglianza degli incidenti, Notiziario dell’Istituto Superiore di Sanità 27 (2014) 11–16. [21] R. Lyons, R. Kisse, W. Rogmans, Eu-injury database introduction to the functioning of the injury database (idb), 2015. https://bit.ly/37FAKaB. [22] E. Mensa, D. Colla, M. Dalmasso, M. Giustini, C. Mamo, A. Pitidis, D. P. Radicioni, Violence detection explanation via semantic roles embeddings, BMC Medical In- formatics Decis. Mak. 20 (2020) 263. URL: https://doi.org/10.1186/s12911-020-01237-4. doi:10.1186/s12911-020-01237-4. [23] E. Mensa, G. M. Marino, D. Colla, M. Delsanto, D. P. Radicioni, A resource for detecting misspellings and denoising medical text data, in: J. Monti, F. Dell’Orletta, F. Tamburini (Eds.), Proceedings of the Seventh Italian Conference on Computational Linguistics, CLiC- it 2020, Bologna, Italy, March 1-3, 2021, volume 2769 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 1–7. URL: http://ceur-ws.org/Vol-2769/paper_48.pdf. [24] UNECE, Definitions of Vehicles (pages 5-11), Accessed: 10-10-2022. https://unece.org/ fileadmin/DAM/trans/main/wp29/wp29resolutions/ECE-TRANS-WP.29-78r6e.pdf. [25] H. Nakayama, T. Kubo, J. Kamura, Y. Taniguchi, X. Liang, doccano: Text annotation tool for human, 2018. URL: https://github.com/doccano/doccano, software available from https://github.com/doccano/doccano. [26] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [27] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, A. M. Rush, Transformers: State-of-the-art natural lan- guage processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Online, 2020, pp. 38–45. URL: https://www.aclweb.org/anthology/2020.emnlp-demos.6. [28] Treccani, Treccani Italian Dictionary, Accessed: 10-10-2022. https://treccani.it. [29] D. Colla, E. Mensa, D. P. Radicioni, Lesslex: Linking multilingual embeddings to sense representations of lexical items, Computational Linguistics 46 (2020) 289–333. [30] D. Loureiro, A. M. Jorge, J. Camacho-Collados, Lmms reloaded: Transformer-based sense embeddings for disambiguation and beyond, Artificial Intelligence 305 (2022) 103661. [31] D. Colla, E. Mensa, D. P. Radicioni, Novel metrics for computing semantic similarity with sense embeddings, Knowledge-Based Systems 206 (2020) 106346. A. Appendix Mean of Transportation Spacecraft Aircraft Vehicle Non-Vehicle Pedestrian Seacraft Landcraft Railed Human-Powered Power-Driven Less than Four Goods Carriage Passengers Carriage Special Purpose Wheels 4+ Wheels 4+ Wheels Tram Bicycle Train Roller Skates Motorcycle Ambulance Truck Bus Skateboard Golf Cart Armored Car Van Car Ski Electric Bicycle Camper Agricultural Tractor Cycle Rickshaw Auto Rickshaw Hearse Kick Scooter Microcar Off-Road Pickup Snowboard Mobility Scooter Figure 2: Taxonomy of vehicles adopted for the annotation process.