Semantic Checking of Different Type Information Sources About Permitted Speeds in Railway Transport Viktor Shynkarenko and Larysa Zhuchyi Ukrainian State University of Science and Technologies 2, Lazarian str., Dnipro, Ukraine Abstract The infrastructure of railway stations must ensure a high level of safety when trains move at the declared speeds. The operation of various infrastructure elements is carried out in accordance with the normative and technical regulations of the railways, based on their current state. Information about this is stored in electronic documents and databases of various types. Means are proposed to improve the safety of train traffic based on the semantic checking of data from various sources about the permitted speeds on the elements of railway tracks. To formalize the restrictions of the technical operation rules of Ukrainian railways, it is proposed to use the method of semantic annotation. The ontology is formed based on the composition of relations. The methods of conceptualization of the tabular representation of knowledge and multi-level concretization proposed earlier are applied. A modular ontology has been developed for integrating the data of the railway switch and track lists, orders that set the permitted speeds on the elements of the railway infrastructure, technical operation rules and building norms. This approach provides a connection between natural language regulations, information systems and ontologies on the issues of a train speed restriction. Heterogeneous and diverse data sources harmonization will increase the level of their accuracy and, as a result, the reliability of the corresponding subsystems of the railway transport operation. Keywords 1 Ontology, permanent speed restriction, railway, natural-language regulations, tabular data, semantic checking, concept 1. Introduction This paper explores the possibilities of semantic annotation of railway transport legal regulations. The checking of permitted train speeds is carried out based on the integration of data from the lists of switches and tracks, orders establishing permitted speeds on elements of the railway infrastructure, technical operation rules and building norms. The approach is to convert part of the restrictions of legal regulations not into an ontology schema (such as owl class restrictions) but in an annotation in tsv format (and then in RDF format). This allows the subsequent integration of these regulations and track lists and consistency checking. Formalized document annotations provide a link between regulation texts and ontologies. 2. Problem statement and purpose The permitted speed of a section of a railway track includes many factors, such as rail type of track and the frog type of the switches and their condition. Data on the characteristics of infrastructure COLINS-2022: 6th International Conference on Computational Linguistics and Intelligent Systems, May 12-13, 2022, Gliwice, Poland EMAIL: shinkarenko_vi@ua.fm (V. Shynkarenko); larisa_zhuchiy@ukr.net (L. Zhuchyi); ORCID: 0000-0001-8738-7225 (V. Shynkarenko); 0000-0002-9209-7262 (L. Zhuchyi); ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) elements and speeds are stored in sources of various formats: drawings, databases, MS documents word. Compliance checking of the speed and rail type of track is carried out in accordance with the legal regulations, presented in the form of non-formalized restrictions of natural language texts. The purpose of the work is to improve safety on the railway by linking the railway track speed values of the order establishing permitted speeds on the elements of the railway infrastructure with the characteristics of the track and ensuring the consistency of these tables with formalized restrictions of the technical exploitation rules (TER) and state building norms (SBN) using ontological means. 3. Related works Much attention is paid to transport ontologies [1-9], as well as the formalization of regulatory documents, for example, the European Union EUR- Lex database [10]. Ontologies allow one to represent constraints as axioms, and data as triples to perform semantic checking. To populate ontologies, automated methods of data extraction from their tabular representation and natural language texts are often used. In the transport domain, a semantic dataset [11] has been developed by extracting and integrating data on the suitability of transport infrastructure for people with disabilities from various sources to enrich public transport data with it. The Mobility and Accessibility Ontology (MAnto) is based on such models as Transmodel [12] and IFOPT [13]. Structured texts like GTFS data descriptions are annotated to map them onto the MAnto ontology. Annotation of natural language texts can be performed in the ontology editor Protégé [14] and outside it [15]. The INCEpTION functionality includes means for annotation using ontologies with the entity linking method and tags using the semantic role labelling method. For annotation, we use the INCEpTION web service [16] and the semantic role labelling method. Some developments use tagsets to perform text annotation using the semantic role labelling method with ontologies concepts, for example [17]. As part of the S-CASE project (Scaffolding Scalable Software Services), software requirements are annotated [17] to check them for consistency. Tools have been developed, the architecture of which includes a module for "translating" requirements into software specifications. Annotation is performed with methods such as feature- based extraction, dependency parsing and semantic role labelling in mate tools and is done automatically. Actor-Action-Object triples can be marked manually or by the parser. 3.1. Entity linking Entity linking is an annotation method that allows one to assign named entities from the text (mostly proper names) URI of ontology individuals according to their context. The method has become widespread in the domains of biological and historical sciences. Entity linker output is a JSON table that is used, for example, to separate proper names in the text and not search for syntax errors in them, as in [19], where scispacy entity linker is used [20]. Python library scispacy is based on the ontologies such as Human Phenotype Ontology [21], Gene Ontology [22], RxNorm [23], Medical Subject Headings thesaurus [24] and Unified Medical Language System dictionaries [25]. In [26], ontologies of the Irish cultural heritage and Cultural Heritage entity Linker (CHEL) were developed. Ontologies are populated with individuals from The Statute Staple, The Down Survey, The Books of Survey and Distribution, Dictionary of Irish Biography and the Oxford Dictionary of National Biography. A feature of CHEL is the generation of a Globally Unique Identifier for identical instances of different ontologies. Digitized cultural heritage data are used in digital library and museum data integration [27], as in the case of the Europeana project [28]. 3.2. Manual data annotation Manual annotation is actively used in tasks of biological domain DNA sequencing for two reasons: greater reliability of manual annotations [29] and the inability to annotate some genes (for example, pseudogenes) in automatic mode. The DNA of a biofuel-producing bacterium is manually annotated in [29], where International Protein Nomenclature developed by the National Center for Biotechnology Information is used for naming, as well as bug fixes of various automatic annotation systems are done. Gene Ontology contains millions of manual and automatic annotations [30], is not just a vocabulary, but contains class logical definitions [22], and has been reused in many ontological developments. Annotation of software requirements is performed to increase the efficiency of its testing [31, 32]. Their own markup language has been developed for annotating software requirements. Software testing is done through simulation. Annotation allows one to develop a model for associating software requirements represented as natural language text and simulation signals. Mapping of signal names and requirement parts is done manually. Annotations have several levels of detail. Annotation is done manually because the engineer must choose the level of detail and, for example, determine under what conditions the car's turn signal should turn on. 3.3. Semantic Annotation of Legal Documents A domain-oriented software package was developed for automatic annotation of regulatory documents [33], which can export files in RDF format and allows one to execute SPARQL queries on text documents in the construction domain. The task is relevant because the verification of models for compliance with regulations is performed manually. To perform annotations, an ontology was developed that reuses concepts from the dc [34], doco [35], lemon [36] ontologies. Modular Financial Industry Regulatory Ontology [37] was developed to annotate the financial regulatory documents of the Anti-Money Laundering domain to check business processes for consistency. Document annotation is done automatically using machine learning to populate the ontology with instances. The training data is annotated manually in the GATE system. The software package allows one to execute SPARQL queries on text documents to search for relevant restrictions, obligations, prohibitions, etc. In [38], a domain-specific system was developed for recognizing named entities of legislative acts to perform intellectual analysis and integration of the national and European legal framework. For annotation, the Inter-Active Terminology vocabulary is used for Europe and the Wikipedia knowledge base. 4. Ontology development methods In the work, the ontology is populated with instances from the INCEpTION output files [16], obtained as a result of annotating the railway legal documents. Annotation is the process of changing the text by adding metadata (tags, ontology concepts) to it. Structure validation and data transformation of drawings are performed based on the tabular knowledge representation model [39], the modular approach of the ontology development and the method of separating rules and data are carried out according to the integration of the railway information systems ontology framework [40]. The linking of railway switch and track list tables is performed to check the consistency of track names and railway switch frog type. Semantic checking is carried out on the example of the TER restriction stating that only railway switches of the frog type of 1/11 should be installed on the main tracks. The linking of the tables of the track list and the order establishing permitted speeds on the elements of the railway infrastructure is performed to check the consistency of the rail type of track and the speed of the railway line section. Semantic checking is performed on the example of the SBN restriction on the correspondence of the maximum speed of 120 km/h on the line track of the P65 rail type. The ontology is developed in the Protégé ontology editor, the vocabulary – in MRcube, drawing tables data extraction – in Tabula, data wrangling – in OpenRefine. 5. Ontology formation Figure 1 shows the process of forming the railway infrastructure ontology. 1 Error ontology RV [41] 2b 2а 3а, b 3c 3d TSCR, SKOS TS TH SVR TSTCR TTCR 5 instruction data with which Railway the check will be performed 4a infrastructure SWM vocabulary 6a 6b 6c 7 switch, track list, order Matching 4b SBN RTE rules speeds SVM 8 Data 4c table data to be integration checked model DWM Figure 1: Modular ontology of railway track characteristics In Figure 1: RV – Resources Vocabulary TH – Table Header TS – Table Structure TSCR – Table Software Classification Rules TTCR – Table Type Classification Rules SVR – Structure Validation Rules TSTCR – Table Station Classification Rules SWM – Structure Wrangling Model SVM – Structure Validation Model DWM – Data Wrangling Model The ontology is formed in the following sequence: • common vocabularies (ontologies 1, 5 Figure 1) are developed to describe the structure and data of the tables of the track, railway switch lists, order speeds and the table of SBN «Capacity of the upper structure of the main tracks in the design of new railway lines» in the ontology of the abstract model of sources and the ontology of the abstract model of the railway infrastructure; • rules are formalized for validating the table structure (ontology 3c), classification (properties of the table that it should have for the reasoner to classify it into the class) of tables by type, program (ontologies 3a, b) and station (ontology 3d) in the ontology of a concrete source model that includes rules; • the header of the tables is described in the ontology of a concrete resources model that includes data (ontology 2a) • instances of the table header (ontology 2a) and rules for classifying by type and program (ontologies 3a, b) are imported into the ontology of a concrete resources model of the second level (ontology 4a), and the instances are classified into classes like «track list table» by the reasoner; • in annotations to classes, OpenRefine scripts are obtained to transform the structure of tables into ontology individuals; • table files with a weak structure (converted from AutoCAD, MS Word to pdf format) go through the data extraction procedure in Tabula (transformation to csv); • instances generated by OpenRefine, as part of a concrete data source model ontology (ontology 2b), are imported into a second-level concrete resources model ontology (ontology 4b), along with data validation rules (ontology 3c). If the ontology is consistent, one goes to the next step, otherwise, errors are corrected; • the rules for classifying tables by station (ontology 3d) are imported into the ontology (ontology 4c). Linking of tables related to the same station is performed, and the tables are classified into classes like «abstract station track record» by the reasoner; • scripts for data transformation are got in annotations of classes; • in OpenRefine one generates the first part of the ontology of a concrete railway infrastructure model that includes data (ontologies 6a); • classes and relations of the railway vocabulary ontology (ontology 5) are exported from Protégé in csv format and converted in OpenRefine to INCEpTION tagset in JSON format; • regulation text is annotated using the semantic role labelling method with tagsets in INCEpTION. Annotated text is exported to a tsv file; • tsv files are converted to railway infrastructure ontology instances in OpenRefine so that the second part of the ontology of a concrete railway infrastructure model that includes data (ontology 6c) is generated; • the matching rules formalized in the ontology of a concrete model of railway infrastructure that includes rules (ontology 7); • instances of the OpenRefine data of the tables of the track list, the list of switches, the table of the order establishing permitted speeds (ontology 6a), SBN tables of the capacity of the superstructure of the main tracks (ontology 6b), the output file of the TER annotation tables (ontology 6c), the rules for linking tables and checking data consistency (ontology 7) are imported into the ontology of a concrete railway infrastructure model of the second level (ontology 8). The consistency of these tables and restrictions to TER and SBN is checked. 6. Checking of factors affecting the speed of the track Consider the implementation of the ontology of data sources and the ontology of the railway infrastructure. 6.1. Data resources ontology The abstract source model is developed as a vocabulary for describing the structure of the switch list and the track list tables, the table of the order establishing permitted speeds, the power of the superstructure of the main tracks SBN table, the table of the RTE annotation output file and includes the names of these tables, the names of columns and station attributes. The population of ontologies of a concrete data source model with instances is partially automated by generating instances in OpenRefine, where empty cells are populated with literals like “error” and a station attribute is retrieved from the table with station owners. A concrete source model with rules is developed in the form of separate ontologies for the classification and validation of the table structure rules. The tables are classified by program, type, and station, so the hierarchy is organized using logical definitions (Figure 2): The station attribute is retrieved from the station owner drawing table at the table structure transformation step and is associated with all the station tables with a property chain. Validation is performed using SWRL rules [41] for cells whose content has been replaced with the "error" literal in OpenRefine, in the table structure transformation step. In the ontology of a concrete source model of the second level, tables are validated before being checked for compliance with legal documents and their structure and data are converted into ontology instances. Let's consider an example of the resources concrete model of the second level on the table of the track list. In the beginning, rules are imported into the ontology for classifying tables by type and program. The table contains columns with sections of the railway line, tracks and speeds and is made in the database, therefore it is classified into the «track list table» class. Figure 2: The logical definition of the permitted speed order table In the annotations, the OpenRefine script is got to convert the structure to RDF triples. Instances of «table», «tuple», «value» etc. are generated and are interconnected by the relations like «has part» and «has element». 6.2. Railway infrastructure ontology The railway infrastructure abstract model is a vocabulary that integrates concepts of railway tracks, railway switches and orders establishing permitted speeds, technical operation rules and building norms. The abstract model also contains the relation «word id corresponds to the individual» to process the table received from INCEpTION. Relationships in INCEpTION are not directed. The direction is indicated by the id of the subjects and objects of the relationship. Relations like «frog type SBN corresponds to track name» and «track name SBN corresponds to frog type» are related by the relationship «inverse of» in the ontology containing rules. Instances (frog type and track name) are associated in two contexts (regulation and list). Two different relationships are developed: «drawing corresponds to», «frog type SBN corresponds to track name» for the triple generated from the list, and the triple generated from the regulations. A concrete model of the railway infrastructure with data is developed in the form of five ontologies: • the track list table ontology; • the railway switches list table ontology; • the table of the order establishing permitted speeds ontology; • the power of the upper structure of the track table ontology; • the INCEpTION output file table is generated as a result of TER annotation ontology (the annotation process is shown in Figure 3). Figure 3: Annotation of RTE with railway infrastructure ontology concepts in INCEpTION URI for the name of the track and frog type is chosen as their actual names like «main» and «1/11». Instead of «1/11» names like «1slash11» are used, because INCEpTION interprets «1/11» as three separate words, and «1slash11» as one. Therefore, after extracting data from the text and tables, each track name and frog type must be associated with two relationships: for example «frog type TER corresponds to track name» and «list corresponds to» for railway switches frog type and rail type. A feature of the table of the output file INCEpTION are also cases and plurals of concepts like «main» instead of «main». OpenRefine searches for all values containing strings of type «гол» and replaces the values with «main». A concrete model of the railway infrastructure with rules is developed in the form of an ontology, including compositions of relations for linking railway switch and track lists tables, compositions of relations for processing annotations, and restrictions for checking the compliance of the track list and the order speeds tables data with the rules of SBN and TER. Consider the relations composition for processing text annotations (Figure 4). Annotation processing property word id chain word id corresponds to frog type TER the individual corresponds frog type TER to track corresponds name to track name frog type track name drawing corres- ponds is frog type of is switch to frog type of has name railway is switch of track Tables linking switch property chain Figure 4: Relationship compositions for data linking of tables of tracks and railway switches The red color in Figure 4 represents the relation that connects the same instances as the relation «list corresponding to» in the track list tables. In the INCEpTION output file table, words are associated not with words, but with ids of other words, for example, «3-34 587-608 siding track name trackNameTERCorrespondsToFrogType 3-41». Linking of concept to the concept is done using relation compositions like in Figure 5. Let us consider relations compositions linking railway switch list and railway track list tables (Figure 4) utilized to check whether the frog type of the switch corresponds to the name of the railway track. Relations compositions are used to link the railway switch list and the track list with compositions in such a way as to obtain a TER regulation fact, that is, to link the name of the track with the frog type of the railway switch. To link the values of the tracks and switches lists tables, the following operations are performed: • linking the station railway track and the railway switch frog type by the relation «is switch frog type» (Figure 6); • linking the name of the railway station track and the railway switch frog type with the relation «list corresponds to» (Figure 7). Figure 5: Relations composition for processing of the railway regulation text annotations Figure 6: Relations composition for linking railway track and railway switch lists tables Figure 7: Composition of relations to obtain TER triple from track lists The classification of track names is done by the reasoner according to axioms like Figure 8. Figure 8: Logical definitions of track names The classification and checking of the railway switch frog types are carried out by the reasoner according to the axioms like the Figure 9. Figure 9: Logical definition and restrictions of the railway switch frog type of the station main track Compositions of relations for linking the track list tables and the order speed table (Figure 10) are used to check the correspondence between the rail type and the speed and to link the track list and the order speed table to obtain an SBN regulation fact, that is, link the railway track rail type with the speed of the line section. The difference between the speed and the frog type values is that the frog type is a discrete value, and the speed is continuous. Permitted speed intervals correspond to different rail types and checking is performed when the order speed is equal to the extreme value of the interval (red arrows in Figure 10) and the intermediate one (green ones in Figure 10). is speed of line section line section speed equals speed1 speed SBN line has station corresponds speed SBN line has track to rail type corresponds is speed of track to rail type speed drawing corresponds to rail type station station track rail type has track has rail type Figure 10: Compositions of relations for linking the list of tracks and the order of permitted speeds Let us consider the case of equality of the speed of the order to the extreme value of the interval. To link the values of the tracks list and the order establishing permitted speeds tables, the following operations are sequentially performed: • linking the station track and the line section by the relation «line has track» (Figure 11); • linking the track and railway line speed by the relation «is speed of track» (Figure 12); • linking the rail type of the track and the railway line speed by the relation «speed drawing corresponding to rail type» (Figure 13). Figure 11: Relations composition for determining whether a station track belongs to the line section. Figure 12: Relations composition for linking track and railway line section speed Figure 13: Relations composition for linking rail type and railway line section speed The classification of track names is done by the reasoner according to axioms like Figure 14. Classification and speed checking are performed according to axioms like Figure 15. Let us consider the case of equality of the speed of the permitted speed order to the intermediate value of the gap. Since the speeds are not equal, they are related by the «equals» relations composition of Figure 16. Figure 14: Logical definitions for classifying rail types Figure 15: Logical definition and restriction of the permitted speed of the P50 rail type track class Figure 16: Relationship composition to determine if a station track belongs to a segment The axioms like in Figure 17, Figure 18 are developed for the second classification of speed by its value and a restriction of each interval such that the order speed corresponding to the rail type of the railway track (by Figure 15 axiom) can be related by the relation «equals» only to the speed in the range of speeds of this rail type. Figure 17: Logical definitions for the speed classification by value Figure 18: P50 rail type track speed logical definition and restriction Consider the reasoning path of the main track test instance of the railway infrastructure concrete model of the second level ontology, in which there is a 1/9 frog type railway switch corresponding to the siding track. From the track list, the name instance "main" of the track №1 instance is extracted and associated with the literal "main". The track name instance is classified by the reasoner into the «main track name» class by the axiom of Figure 8. The track instance is also linked to railway switch №1 in the track list. Switch №1 is associated with the frog type of 1/9 in the switch list. By the Figure 5 composition of relations, the track list and the switch list are linked, the railway switch frog type of track №1 and the railway track №1 are connected by the relation «is switch frog type of». In the Figure 6 relations composition, №1 track name («main») and railway switch frog type 1/9 are linked by the relation «list corresponds to». The reasoner then classifies the 1/9 frog type into the «main track name frog type» class according to the Figure 9 logical definition. Individual of the frog type 1/11 that were extracted from the TER text annotations, is linked with the word identifier «3-41» of the word «siding» by the relation «frog type TER corresponds to track name» and the name of the track «siding» is linked with the identifier «3-41» by the relation «word id corresponds to the individual». According to the Figure 7 composition of relations, the frog type 1/9 is associated with the name of the track «siding». An instance of a track name is classified by the reasoner into the class «siding track name» by the axiom like in Figure 8. Since the frog type of 1/9 is classified by the reasoner into the class of frog types of the main name of the track and is connected by the composition of relations in Figure 4 with the «siding» name of the track, the ontology becomes inconsistent by the restriction of Figure 9. Consider the reasoning path of a test instance of a track with a P50 rail type and the permitted speed 120 km/h, i.e. equal to the extreme value of the speed range corresponding to the P65 rail type. In the table of the order, the station «some station 1» is associated with №1 track by the relation «station has track» and with the railway line section «some station 1 - some station 2» by the relation «line has station». Track №1 is linked with the section of the railway line by the relation «line has track » by the Figure 11 composition of relations. In the order speed table, speed 120 is associated with a section of the railway line and «some Station 1 - some Station 2» by the relation «is speed of line section». Track №1 is linked with speed 120 by the relation «is speed of track» by Figure 12 relations composition. In the track list, track №1 is linked with the P50 rail type, and track №1 is linked with the literal «P50». Speed 120 is linked with the P50 rail type by the relation «speed drawing corresponding to rail type» according to the Figure 13 composition of relations. Reasoner classifies the P50 instance into the class «P50 rail type», and the speed 120 into the speed class «P50 rail type speed» according to the logical definition of Figure 14-Figure 15. In the SBN ontology, the speed 120 individual is linked with the individual of the P65 rail type. The ontology becomes inconsistent with the Figure 15 constraint. 7. Conclusions and future work Improving the safety of train traffic can be achieved by information reliability enhancement of information systems through the integration of heterogeneous data sources. An approach is proposed for formalizing the restrictions of legal regulations by annotating natural language texts. A modular ontology of railway line section permitted speeds has been developed using the composition of relations. This ontology allows one to check the consistency of the speeds and characteristics of the railway infrastructure in the relevant information systems with TER and SBN. In the future, it is planned to combine the permanent and temporary speed restrictions due to infrastructure element failure. 8. References [1] V. Skalozub, V. Ilman, V. Shynkarenko, Development of ontological support of constructive synthesizing modeling of information systems, Eastern-European Journal of Enterprise Technologies 6 (2017) 58-69. doi:10.15587/1729-4061.2017.119497 [2] V. Skalozub, V. Ilman, V. Shynkarenko, Ontological support formation for constructive- synthesizing modeling of information systems development processes, Eastern-European Journal of Enterprise Technologies 5 (2018) 55–63. doi:10.15587/1729-4061.2018.143968 [3] R. Lewis, A semantic approach to railway data integration and decision support, Ph.D. thesis, University of Birmingham, United Kingdom, Electrical and Computer Engineering, 2015. [4] J. Tutcher, Development of semantic data models to support data interoperability in the rail industry, Ph.D. thesis, University of Birmingham, United Kingdom, Electronic, Electrical, and Systems Engineering, 2016. [5] S. Bischof, G. Schenner, Rail Topology Ontology: A Rail Infrastructure Base Ontology, in: International Semantic Web Conference, Springer, Cham, 597-612, pp. 2021. doi:10.1007/978-3- 030-88361-4_35. [6] M. Katsumi, M. Fox iCity Transportation Planning Suite of Ontologies, 2020 [7] D. Corsar, M. Markovic, P. Edwards, The transport disruption ontology, in: International Semantic Web Conference, Springer, Cham, 329-336, pp. 2015. doi:10.1007/978-3-319-25010- 6_22. [8] L. Zhao, R. Ichise, S. Mita et al., An ontology-based intelligent speed adaptation system for autonomous cars, in: Joint International Semantic Technology Conference, Springer, Cham, 397- 413, pp. 2014. doi:10.1007/978-3-319-15615-6_30. [9] S. Verstichel, F. Ongenae, L. Loeve et al., Efficient data integration in the railway domain through an ontology-based methodology, Transportation Research Part C: Emerging Technologies (2011) 617-643. doi:10.1016/j.trc.2010.10.003. [10] F. Benvenuti, C. Diamantini, D. Potena et al., An ontology-based framework to support performance monitoring in public transport systems, Transportation Research Part C: Emerging Technologies (2017) 188-208. doi:10.1016/j.trc.2017.06.001. [11] R. Balakrishnan, M. A. Harris, R. Huntley, K. Van Auken et.al., A guide to best practices for Gene Ontology (GO) manual annotation. Database, 2013. DOI: 10.1093/database/bat054 [12] P. Cáceres, A. Sierra-Alonso, B. Vela et al., Adding semantics to enrich public transport and accessibility data from the Web, Open Journal of Web Technologies 1 (2020) 1-18. [13] CEN European reference data model for public transport information. URL: https://www.transmodel-cen.eu/ [14] IFOPT, “Identification of Fixed Objects in Public Transport” Standard CEN/TC 278, EN 28701, European Committee for Standardization, 2012. [15] P. Ogren, Knowtator: a protégé plug-in for annotated corpus construction, in: Proceedings of the Human Language Technology Conference of the NAACL, Association for Computational Linguistics, USA, 2006, 273-275. doi:10.3115/1225785.1225791 [16] J. C. Klie, M. Bugert, B. Boullosa, The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation, in: 27th International Conference on Computational Linguistics, USA, 2018. [17] T. Thongkrau, P. Lalitrojwong, Ontopop: An ontology population system for the semantic web, in: IEICE TRANSACTIONS on Information and Systems, CEUR-WS Team, Montenegro, 2012, 921-931. [18] T. Diamantopoulos, M. Roth, A. Symeonidis et.al. Software requirements as an application domain for natural language processing, Language Resources and Evaluation, 51, (2017) 495- 524. doi:10.1007/s10579-017-9381-z [19] S. Karthikeyan, A. G. S. de Herrera, F. Doctor, et al., An OCR Post-Correction Approach Using Deep Learning for Processing Medical Reports, IEEE Transactions on Circuits and Systems for Video Technology (2021). doi:10.1109/TCSVT.2021.3087641 [20] M. Neumann, D. King, I. Beltagy, ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing, in: Proceedings of the 18th BioNLP Workshop and Shared Task, Association for Computational Linguistics, Italy, 2019, 319-327. doi:10.18653/v1/W19-5034 [21] P. N. Robinson, S. Mundlos, The human phenotype ontology, Clinical genetics (2010) 525-534. doi:10.1111/j.1399-0004.2010.01436.x. [22] R. Balakrishnan, M. A. Harris, R. Huntley, A guide to best practices for Gene Ontology (GO) manual annotation, Database (2013). doi:10.1093/database/bat054 [23] bioportal.bioontology.org, RxNorm Vocabulary, 2021. URL: https://bioportal.bioontology.org/ontologies/RXNORM. [24] nlm.nih.gov, Medical Subject Headings. URL: https://www.nlm.nih.gov/mesh/meshhome.html. [25] nlm.nih.gov, Unified Medical Language System. URL: https://www.nlm.nih.gov/research/umls/index.html. [26] G. Munnelly, Entity Linking for Text Based Digital Cultural Heritage Collections, Ph.D. thesis, Trinity College Dublin, Ireland, School of Computer Science & Statistics, 2020. [27] G. Skevakis, EUROMUSE: A web-based system for the management of MUSEum objects and their interoperability with EUROpeana, Ph.D. thesis, Technical University of Crete, Greece, Electronic and Computer Engineering, 2011. [28] A. Isaac, B. Haslhofer, Europeana linked open data–data. europeana. eu, Semantic Web 3 (2013) 291-297. doi:10.3233/SW-120092 [29] C. M. Humphreys, S. McLean, S. Schatschneider, Whole genome sequence and manual annotation of Clostridium autoethanogenum, an industrially relevant bacterium, BMC genomics 16 (2015) 1-10. doi:10.1186/s12864-015-2287-5 [30] J. A. Blake, K. R. Christie, M. E. Dolan, The gene ontology resource: 20 years and still GOing strong, Nucleic acids research D1 (2019) D330-D338. [31] F. Pudlitz, F. Brokhausen, A. Vogelsang, What am I testing and where? Comparing testing procedures based on lightweight requirements annotations, Empirical Software Engineering 4 (2020) 2809-2843. doi: 10.1007/s10664-020-09815-w [32] F. Pudlitz, A. Vogelsang, F. Brokhausen, A lightweight multilevel markup language for connecting software requirements and simulations, in: International Working Conference on Requirements Engineering: Foundation for Software Quality, Springer, Cham, 2019, 151-166. doi:10.1007/978-3-030-15538-4_11 [33] D. I. Mouromtsev, I. A. Shilin, D. A. Pliukhin et al., Building knowledge graphs of regulatory documentation based on semantic modeling and automatic term extraction, Scientific and Technical Journal of Information Technologie 21 (2021) 256–266. doi:10.17586/2226-1494- 2021-21-2-256-266 [34] E. G. Hernández, J. M. Piulachs, Application of the Dublin Core format for automatic metadata generation and extraction, in: Proceedings of the 5th International Conference on Dublin Core and Metadata Applications, 2005, 213–216. [35] A. Constantin, S. Peroni, Pettifer S. et al., The document components ontology (DoCO), Semantic Web 7 (2016) 167–181. doi:10.3233/SW-150177 [36] M. Villegas, N. Bel, PAROLE/SIMPLE ‘lemon’ ontology and lexicons, Semantic Web 6 (2015) 363–369. doi:10.3233/SW-140148 [37] K. Asooja, G. Bordea, G. Vulcu, L. O’Brien, Semantic annotation of finance regulatory text using multilabel classification, in: Proceedings of the International Workshop on Legal Domain and Semantic Web Applications, Springer, Slovenia, 2015. [38] R. Nanda, G. Siragusa, L. Di Caro et.al., Concept Recognition in European and National Law, in: JURIX, IOS Press, Luxembourg, 2017, 193-198. doi:10.3233/978-1-61499-838-9-193 [39] V. Shynkarenko, L. Zhuchyi, O. Ivanov, Conceptualization of the tabular representation of knowledge, in: IEEE 16th International Conference on Computer Sciences and Information Technologies, IEEE, Lviv, 2021. [40] V. Shynkarenko, L. Zhuchyi, Ontological harmonization of railway transport information systems, in: 5th International Conference on Computational Linguistics and Intelligent Systems, CEUR-WS Team, Lviv, 2021, 541–554. [41] S. Peroni, The Error Ontology, 2010. URL: https://sparontologies.github.io/error/current/error.html