Tool for Automatic Annotation of Clinical Texts in
Bulgarian – BGMedAnno
Sylvia Vassileva 1, Svetla Boytcheva 2 and Ivan Koychev 1
1
  Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”,
Sofia, Bulgaria
2
  Institute of Information and Communication Technologies, Bulgarian Academy of
Sciences, Sofia, Bulgaria


             Abstract
             This paper describes the design of BGMedAnno: an automatic annotation
             tool for clinical text in Bulgarian. The proposed solution combines classical
             rule-based and dictionary-based approaches for name entity recognition
             (NER) with more advanced deep learning methods to identify different
             categories of medical terms, as well as some nested objects. The following
             categories of medical terms are currently identified: symptoms, complaints,
             diagnoses, anatomical organs and systems, risk factors, and family history.
             In addition, the negation relation and its scope are recognized. The location
             relation was modeled for connecting different categories of symptoms and
             complaints to anatomical organs and systems. All identified concepts were
             normalized to medical standard classifications and ontologies like UMLS,
             MESH, ICD-10 and mapped into the concepts of the linked open data
             Knowledge Graph WikiData. The proposed approach for automatic medical
             terms and their relations recognition shows high accuracy. The rule-based
             method shows an F1-score of 75%, while the trained BERT-based model
             presents an F1 score of 73%. Although the BERT model performs slightly
             worse on the test set, observations show that it finds objects in sentences
             that are not covered by the rule-based method. For the object linking task,
             the developed method based on the BERT language model shows a 61% F1
             result, significantly outperforming direct string comparison, which achieves
             an F1 result of 45%. The developed user interface allows direct application
             of the annotation tool for individual texts. The service API outputs data in
             JSON format, which enables interoperability with other systems and can be
             used to process large collections of clinical data.

             Keywords
             Annotation tools, natural language processing, health informatics, deep
             learning, machine learning
Information Systems & Grid Technologies: Fifteenth International Conference ISGT’2022, May 27–28, 2022, Sofia, Bulgaria
EMAIL: sylvia.vassileva@gmail.com (S. Vassileva); svetla.boytcheva@iict.bas.bg (S. Boytcheva); koychev@fmi.uni-sofia.bg
(I. Koychev)
ORCID: 0000-0002-5542-9168 (S. Boytcheva); 0000-0003-3919-030X (I. Koychev)

            © 2022 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
1. Introduction
     Medical term recognition and linking to corresponding entities in knowledge
bases, standard classifications, and ontologies is a very important task in clinical
text analysis. It allows extracting structured data from clinical text and the subse-
quent processing of the extracted information. Structured clinical data is useful in
many real-life scenarios like automatic analysis of drug effects in clinical trials,
extracting hidden relations between risk factors and diseases, searching for simi-
lar patient cases, as well as clinical decision support systems. By improving the
clinical software systems, clinicians can focus and improve patient care.
     In this paper, we present BGMedAnno2: an automatic annotation tool of
clinical text in Bulgarian. We propose an innovative approach for named entity
recognition (NER) of medical terms (symptoms, complaints, diagnoses, anatomi-
cal organs, and systems) and their automatic linking to relevant concepts from
the Unified Medical Language System (UMLS3), Medical Subject Headings
(MeSH4) and International Classification of Diseases (ICD-10-CM5).

2. Related work
     There are many tools for manual annotation of clinical text [1], but automatic
annotation continues to be a challenging task. The automatic annotation tools
perform two main tasks: Named Entity Recognition (NER) and relations extrac-
tions (RE). Most systems divide the process into two separate steps, choosing a
different approach for each of them. In contrast, others perform an end-to-end
process that discovers and links concepts simultaneously. The classical NER sys-
tems were based on manually created specific rules and features, appropriate to
a specific domain. The current state-of-the-art systems are based on deep neural
networks using vector representations of text [2]. The main approaches to finding
objects in a text can be organized into the following groups: rule-based, unsuper-
vised Machine Learning (ML), supervised ML, and deep learning.

2.1. Rule-based and dictionary-based NER systems
     The rules are manually created using dictionaries and templates, which is
a time-consuming laborious complex task that requires domain-specific knowl-
edge. This makes these approaches suitable mainly for some specific domains.
The advantage of this approach is that it does not require annotated training data.

2
    https://bgmedanno.fmi.uni-sofia.bg
3
    https://www.nlm.nih.gov/research/umls/index.html
4
    https://www.ncbi.nlm.nih.gov/mesh
5
    https://www.cdc.gov/nchs/icd/icd10cm.htm


                                                       75
It works very well if the dictionaries are complete. In the biomedical domain,
synonyms of classifications such as UMLS, MeSH, and ICD-10-CM can be
used. For example, Savova et al [3] propose a dictionary search approach built
from terms from UMLS, SNOMED CT6 and RxNORM7, showing an F1 score
of 71.5% on Mayo Clinic EMR (cTAKES) data. Chen’s system [4] is also rule-
based, trained on n2c2-1 [5] competition data with an F1 score of 90%. In the
clinical field in Bulgarian, Todorova [6] implements a system based on rules and
dictionaries, which recognizes symptoms, complaints, organs, anatomical sys-
tems, and other categories and demonstrates an F1 score of 91.4% on a small
dataset of samples of anonymized outpatient records (ORs) from the Bulgarian
National Diabetes Registry [7].

2.2. NER using unsupervised ML
      These approaches do not require manually annotated data. Very often these
systems use clustering, which categorizes the objects in the text based on their
proximity. Clusters are defined based on lexical resources, templates, and statis-
tics from large corpora [2]. In the biomedical domain, Zhang et al [8] proposed an
unsupervised approach using dictionaries of terms, corpus statistics such as IDF
and contextual vectors, and shallow syntactic knowledge (using noun phrases).
Experiments conducted with data from GENIA and i2b2-Pittsburgh show better
results than other unsupervised approaches – 15.2% and 26.5% micro-F1 with a
complete match, but worse results than the supervised approaches.

2.3. NER using supervised ML
     In supervised ML, the NER task can be considered as a multiclass classifi-
cation task of each word or as a sequence labeling task. Classical algorithms for
supervised ML require careful selection of the input properties of the text. Each
word in the text is represented as a vector, which may consist of one or many
boolean, numerical, or nominal values. This group of approaches often uses prop-
erties of words such as morphology, parts of speech, as well as properties of the
document or corpus as word frequency. The review of Li et al [2] gives examples
of common algorithms in this approach – Hidden Markov Models (HMM), deci-
sion trees, decision entries, maximum entropy models, Support Vector Machine
(SVM), and Conditional Random Fields (CRF). McNamee and Mayfield [9] con-
sider the task as a classification of each word of the document in one of 8 classes
and train one SVM classifier per class. This approach does not use the word
context to make a decision. In contrast, the CRF approach views the task as a se-
quence labeling task and uses context. This approach is used in many articles for
6
    https://www.snomed.org
7
    https://www.nlm.nih.gov/research/umls/rxnorm/index.html


                                                    76
NER, including in the biomedical domain: McCallum and Li [10] used the CRF
approach and showed an F1 score of 84.04% on CoNLL03 data; Settles [11] uses
CRF for biomedical sites and shows an F1 of 69.5% on data from the BioNLP /
NLPBA 2004 shared task. The main challenge with this approach is the need for
annotated training data.

2.4. NER based on Deep Learning (DL)
     With these approaches, deep neural networks automatically detect important
text properties that help object identification. A major challenge in this approach
is the need for a large amount of annotated training data. Many systems in the
biomedical domain are based on DL approaches: Magge et al [12] use a bidirec-
tional LSTM model with a CRF output layer, which achieves a macro F1 score
of 81%; Wu et al [13] compared CRF models with CNN and RNN networks and
showed that RNN performed best on i2b2 2010 data, achieving an F1 score of
85.94%; Vunikili et al [14] uses BERT [15] and Spanish BERT (BETO) [16] for
transfer learning. The model derived tumor information from clinical records in
Spanish and achieved an F1 score of 73.4%.

3. Clinical data and medical ontologies
3.1. Ontologies and standard medical classifications
3.1.1. Unified Medical Language System (UMLS)
     The Unified Medical Language System (UMLS) is an international system
of vocabularies, classifications, and standards for encoding medical terms. It aims
to facilitate the creation and integration of biomedical information systems and
services. UMLS consists of three parts – meta-thesaurus, semantic network, and
specialized lexicons and tools. The meta-thesaurus consists of concepts and their
corresponding encoding in different systems like CPT8, ICD-10-CM, LOINC9,
MeSH, RxNORM, and SNOMED CT. It contains more than 5 million terms, re-
lated to concepts with unique identifiers. Each concept has a relationship with one
or more lexical variations of the term. For example, the term “atrial fibrillation”
has a code C4015486. Unfortunately, UMLS does not have an official translation
in Bulgarian.
3.1.2. Medical Subject Headings (MeSH)
     Medical Subject Headings (MeSH) is a thesaurus indexing PubMed10 arti-
cles, consisting of more than 33 million biomedical articles from Medline, scien-
8
     https://www.ama-assn.org/practice-management/cpt
9
     https://loinc.org
10
     https://pubmed.ncbi.nlm.nih.gov


                                                    77
tific journals, and online books. MeSH contains about 27,000 terms, structured in
a hierarchy. Each MeSH descriptor uniquely identifies a concept and can appear
in one or more places in the hierarchy. The code for “atrial fibrillation” in MeSH
is D001281. Unfortunately, MeSH does not have an official translation in Bulgar-
ian.
3.1.3. International Classification of Diseases 10th revision
     The International Classification of Diseases 10th revision is a statistical clas-
sification of diseases used worldwide. It is used in the Bulgarian health insurance
domain for statistical and reimbursement purposes. It is a hierarchical classifica-
tion grouping diseases in several different levels.


Figure 1: Structure of ICD-10 codes

     Figure 1 shows the structure of ICD-10 codes – they can be 3 to 7 charac-
ters in length depending on the specificity of the classification. ICD-10 codes
of 4 characters are used in Bulgaria. The number of 4-character codes is almost
11,000. For this paper, we use the 4-character codes as they are most relevant for
Bulgarian healthcare. Each disease can be represented with one or more codes,
depending on the underlying cause or its localization. For example, “diabetic
polyneuropathy, retinopathy, and neuropathy” can be encoded as G36.2 as a dis-
ease of the nervous system or H36.0 as a retinal disease. There is an official ICD-
10 translation in Bulgarian which is used in this paper.

3.2. Clinical texts
3.2.1. Medical term vocabularies
     For this paper, we gather medical term vocabularies in different categories
using the following sources: vocabularies gathered by Todorova [6]; website of
Alexandrovska hospital11; website Puls12; diagnosis dataset by Boytcheva et al
[17] – more than 170,000 unique diagnoses and their ICD-10 codes.


11
     https://www.alexandrovska.com/display.php
12
     https://www.puls.bg/diagnostic/symptom


                                                 78
    Table 1 shows the different term categories and the number of terms in each
vocabulary. The total number of terms in all vocabularies is 180,553.
Table 1
Vocabulary categories and the number of terms per category
         Category                              Description                    Number
                                                                              of terms
DIAGNOSIS                   Diseases                                           177,271
SYMPTOM                     A doctor’s assessment of the patient’s problems        762
COMPLAINT                   A patient’s subjective report of their problems      1,112
ORGAN                       Human Organs                                         1,213
ANATOMICAL SYSTEM           Anatomical systems in the human body                    38
FAMILY                      Family relation types                                   53
FAMILY HISTORY              Family history phrases                                  27
RISK FACTOR                 Risk factors to health                                  51
NEGATION                    Negative phrases                                        26

3.2.2. Patient history dataset
     We extract patient history records from anonymized patient data including
hospital discharge letters and outpatient records. All names and dates were re-
moved in advance and the patient history records were split into sentences and
shuffled, resulting in 41,066 records. The average number of words in a record is
21.25 words. This dataset does not have any labels. To generate labels, we use the
approach based on rules and dictionaries by Todorova [6].
     As a result, we have a labeled corpus of patient history records. Since this
method is automatic, there can be some entities that were missed or annotated
incorrectly.
     Figure 2 shows the number of labeled entities using the automatic approach.


Figure 2: Automatically annotated entities per category

                                         79
3.2.3. Medical term knowledge base
     A knowledge base from WikiData was gathered including diseases, organs,
anatomical systems, symptoms, and complaints which have UMLS, MeSH, or
ICD-10 code. The knowledge base contains 122,922 terms related to 21,174 con-
cepts. Since not all WikiData terms are translated in Bulgarian, we use Google
Translate to translate the English terms automatically and then manually review
and clean the translation. Some terms are part of more than one classification
and/or can have one or more codes related to them. Table 2 shows the number of
examples and concepts from each system:

Table 2
The number of examples and concepts from each medical system in the knowl-
edge base
           Medical System      Number of examples         Number of concepts
 UMLS                                         114,883                     21,481
 MeSH                                          63,181                      9,422
 ICD-10                                        58,175                      4,073

3.2.4. Outpatient records dataset
     A small dataset of outpatient records was manually labeled with medical
terms and their corresponding entities in WikiData. The outpatient records are
written by General Practitioner physicians. The dataset consists of 30 records
that are used for testing purposes. Nested entities are labeled in the following
categories: organs, anatomical systems, diagnosis, symptoms, complaints, family
history, risk factors, and negations. The data contains 170 sentences and 582 la-
beled entity tokens. The number of labeled concepts from WikiData is 446 words
linked to 111 unique entities.

4. Methods for medical terms recognition and linking
     We split the task for recognizing and linking medical terms into two parts –
the first is to recognize the term mentioned in the text and assign a category
(named entity recognition) and the second is to find the corresponding concept
in the knowledge base (entity linking). The overall architecture of the tool BG-
MedAnno is shown in Figure 3. Each of the three main tasks (tokenization, entity
recognition and entity linking) is implemented in a separate module. For tokeni-
zation, the spaCy13 pipeline is used, and the methods used in the rest of the mod-
ules are described in the subsections below. The system has a user interface that

13
     https://spacy.io


                                       80
can be used to input a patient history record and visualize the annotated result and
an application programming interface (API), which can be used by other services
to consume the annotated results. The annotation results are returned in JSON
format.


Figure 3: Overall architecture of the tool for automatic annotation BGMedAnno


4.1. Medical terms recognition
     For the task of recognizing medical terms, a model is trained using the auto-
matically labeled data in the Patient History Dataset. We use a transfer learning
approach based on a model from the BERT family.
     BERT (Bi-directional Encoders Representations from Transformers) [15] is
a widely used deep learning model based on the transformers architecture. The
model is pre-trained on a huge corpus through unsupervised learning and can
be further trained on specific tasks like text classification, token classification
(named entity recognition), and others. BERT models use context information to
generate vector representations of each word.
     As a result of the automatic labeling, we have nested entities in 4 differ-
ent levels. We use MBG-ClinicalBERT by Velichkov et al. [18], a public model
based on ClinicalBERT, which was initially trained on 2 million clinical texts in
English and further trained on 10,000 medical articles in Bulgarian. We train the
MBG-ClinicalBERT-NER model to recognize objects from all four levels at the
same time. The standard BERT training architecture for named entity recognition
does not support recognizing overlapping entities. For this purpose, we adopt the
method of training by using a multi-task learning approach.
     Figure 4 shows the architecture of the model for training on multiple tasks.
We add four “heads” on top of BERT, i.e., four multi-class classifiers using the
standard architecture by Devlin et al. [15] and sharing the same MBG-Clinical-
BERT encoder, and we train each classifier on one level of the entities. The differ-
ent levels of entities are related to each other by rules, for example, “pain in the
chest area” has level 1 entities of “pain” and “chest area” and level 3 entity “pain


                                        81
in the chest area”. As input for each task, we use all sentences from the training
set which contain entities from the task level.
     The algorithm for training uses the steps outlined by Liu et al. [19], combines
the data for all tasks, and trains the shared encoder on all of them.

4.2. Medical terms linking
     Using the terms recognized by the named entity recognition process, the
next step is to link them to the corresponding entity in the knowledge base (if it
exists). The method for entity linking uses the Medical Term Knowledge Base
introduced in Section 3.2.3 . The method uses searching for the closest entity
using cosine similarity between the vector representations of the entities in the
knowledge base and the input text. The vector representations are generated us-
ing MBG-ClinicalBERT as an encoder. For each entity mentioned, the entities
with the highest cosine similarity from the knowledge base are identified. If the
highest cosine similarity is higher than the threshold (0.8), then the corresponding
concept is linked to the term mentioned. If there are no entities with scores higher
than the threshold, the entity is linked with the NIL entity.


Figure 4: Architecture of the entity recognition model using multi-task learning

    Since each word can be represented with one or more tokens using the BERT
tokenizer, we use a pooling strategy to combine the vector representations of the
tokens making up the word. Devlin et al. [15] compare different pooling strate-


                                        82
gies like mean, sum, or using the [CLS] token. It is possible to combine different
layers from the BERT model to create the token representation – using the second
to last layer, combining the last few layers, or concatenating the last few layers. In
our paper, we use the sum of the last four layers, so that we use additional feature
information from multiple layers but without increasing the vector dimensions
which would increase the execution time and memory requirements. For each
term in the knowledge base, a vector representation is generated using MBG-
ClinicalBERT encoder, summing the last 4 layers of each token, and averaging
the token representations to obtain the term representation. The vector represen-
tations are stored so that they can be easily used for cosine similarity search dur-
ing the entity linking process.

5. Experiments for evaluation of the used methods
     We conducted experiments to evaluate the performance of the implemented
entity recognition and entity linking methods.

5.1. Evaluation of medical terms recognition
     For the evaluation of medical terms recognition methods, a small test da-
taset is manually annotated with nested entities of the different categories. The
dataset consists of 100 records. The test dataset is initially labeled using the rules
and dictionaries approach and then manually reviewed and fixed. The automatic
entity recognition using rules and dictionaries is used as a baseline. We compare
the baseline and the trained MBG-ClinicalBERT-NER model using the macro-F1
approach. The evaluation metric F1 uses the exact match of named entities and
is calculated as the harmonic mean of precision and recall. Macro-F1 is used as
the dataset is highly imbalanced and macro-F1 will average the F1 score for each
class. The library seqeval14 is used for evaluation.
     The results from the experiment for evaluation of the medical terms recogni-
tion experiments for each level are shown in Figure 5. The annotation using rules
and dictionaries is doing better on the test set than MBG-ClinicalBERT-NER
on all levels except level 2. On average the rules and dictionaries approach is
showing a slightly better F1 score – 75%, while MBG-ClinicaBERT-NER has a
73% F1 score. This could be explained by the fact that the test set contains only
records in which the automatic annotation has found entities.
     Further experiments with different examples, in which the rules and dic-
tionaries approach finds nothing, show that MBG-ClinicalBERT-NER success-
fully recognizes some entities. For example, in the sentence “Д. панкреатикус е
недилатиран.” (“ductus pancreaticus is not dilated”), the MBG-ClinicalBERT-

14
     https://huggingface.co/metrics/seqeval


                                              83
NER model correctly recognizes “ductus pancreaticus” as an organ. A hybrid
combination of the two approaches using rules and MBG-ClinicalBERT-NER
could recognize more entities and can compensate for the limitations of the dic-
tionaries.


Figure 5: Results of evaluation of the NER model MBG-Clinical-BERT-NER

5.2. Evaluation of medical terms linking
     For the evaluation of medical terms linking methods, we use the Outpatient
Records Dataset from Section 3.2.4. It was manually annotated with terms, cat-
egories, and WikiData concepts. We compare a direct string search in the knowl-
edge base as a baseline, with the developed approach using cosine similarity
between vector representations. As evaluation metrics, we use token-based ac-
curacy and F1 score. A token is correctly linked if the model predicts one of the
concepts which is true for the token. We calculate precision, recall, and F1 using
the library sklearn.metrics15 for the multi-class classification task.
     The results from experiments for evaluation of the entity linking method
show that the MBG-ClinicalBERT model has an accuracy of 57% and F1-score
61%, while the baseline has an accuracy of 46% and F1-score 45%. We perform
several combined experiments with different approaches for entity recognition
and linking. The success of the entity linking is limited by the number of correctly
recognized terms. The experiments were performed using the Outpatient Records
dataset. The results are shown in Table 3. The Outpatient Records dataset is quite
different from the dataset used to train MBG-ClinicalBERT-NER, but the model
shows a consistent F1-score of 73%, while the rules and dictionaries approach,
which was built using similar data to the test, shows 87% F1-score. The entities
which were not recognized limit the success of the entity linking and thus the
15
     https://scikit-learn.org/stable/index.html


                                                  84
linking results are lower when using MBG-ClinicalBERT-NER – F1-score of
53% compared to 61% when using the rules and dictionaries approach.

Table 3
Combined evaluation results for entity recognition and linking.
    Entity Recognition       Entity Linking Model      Recog-     Link-     Total
          Model                                        nition    ing (F1)
                                                        (F1)
 Rules and dictionaries   Direct string search           87%       45%      66%
 Rules and dictionaries   Linking using                  87%       61%      74%
                          MBG-ClinicalBERT
 MBG-ClinicalBERT-NER Direct string search               73%       37%      55%
 MBG-ClinicalBERT-NER Linking using                      73%       53%      63%
                      MBG-ClinicalBERT

6. BGMedAnno user interface


Figure 6: Example of annotated clinical text in Bulgarian with medical terms in
the BGMedAnno tool. The annotated text translates into English: “Long stand-
ing hypertension, frequent headaches, palpitations and easy fatigue. continuing
complaints of tightening in the chest area, palpitations, nausea, proven prostate
carcinoma. redirected for hospitalization for chemotherapy.” The tool recognized
2 diagnoses (highlighted in green), 2 anatomical organs (in orange), 2 symptoms
(in cyan), and 4 complaints (in pink). For 6 out of 10 medical concepts, their cor-
responding WikiData entities are identified, and links are provided


                                         85
     The automatic annotation tool provides an API for clinical text annotation as
well as a web user interface, which allows entering a text and visualizing the ex-
tracted entities, their categories, and links to the corresponding WikiData concept
pages. The API returns the annotated results in JSON format and supports nested
entities of different categories. The user interface sends the entered text to the API
and visualizes the results in the browser. The user can also download the JSON
results as a file for further processing. An example of the user interface with an-
notated clinical text in Bulgarian is shown in Figure 6.

7. Conclusion
     In this paper, we presented a system for automatic annotation of clinical text,
named BGMedAnno. The system detects medical terms from the categories of
symptoms, complaints, diagnoses, anatomical organs and systems, risk factors,
family history, as well as negation of symptoms, complaints, and diagnoses. The
system connects the detected entities with the relevant concepts from WikiData,
thus linking them to the concepts in UMLS, MeSH, and ICD-10. BGMedAnno
detects nested objects and visualizes them.
     The paper explores and compares different approaches to discovering and link-
ing medical terms. Methods based on rules and dictionaries as well as based on a
trained BERT language model have been developed for the task of finding nested
objects. The rule-based method shows an F1 score of 75%, while the trained BERT-
based model achieves an F1 score of 73%. Although the BERT model performs
slightly worse on the test set, observations show that it finds objects in sentences
in which the rule method finds nothing. For the object linking task, the developed
method based on the BERT language model shows a 61% F1 result, significantly
outperforming direct string comparison, which achieves an F1 result of 45%.
     As a future development, additional data can be collected and annotated to
improve the results of the recognition model. It is possible to study hybrid models
combining rule-based models and deep neural networks, as well as hierarchical
models for learning multiple tasks so that the results of lower-level tasks can be
used as input for subsequent levels. For the linking task, opportunities can be
explored to train a linking model based on automatically annotated data, similar
to the recognition approach.

8. Acknowledgments
     This research is funded by the Bulgarian Ministry of Education and Science,
grant DO1-200/2018 “Electronic health care in Bulgaria” (e-Zdrave) and is also
partially supported by Project UNITe BG05M2OP001-1.001-0004 funded by the
OP “Science and Education for Smart Growth” and co-funded by the EU through
the ESI Funds.

                                         86
9. References
[1]  M. Neves, J. Ševa, An extensive review of tools for manual annotation of
     documents, Briefings in bioinformatics 22 (2021) 146–163.
[2] J. Li, A. Sun, J. Han, C. Li, A survey on deep learning for named entity
     recognition, IEEE Transactions on Knowledge and Data Engineering 34
     (2020) 50–70.
[3] G. K. Savova, J. J. Masanz, P. V. Ogren, J. Zheng, S. Sohn, K. C. Kipper-
     Schuler, C. G. Chute, Mayo clinical text analysis and knowledge extraction
     system (ctakes): architecture, component evaluation and applications, Jour-
     nal of the American Medical Informatics Association 17 (2010) 507–513.
[4] L. Chen, Y. Gu, X. Ji, C. Lou, Z. Sun, H. Li, Y. Gao, Y. Huang, Clinical trial
     cohort selection based on multi-level rule-based natural language process-
     ing system, Journal of the American Medical Informatics Association 26
     (2019) 1218–1226.
[5] A. Stubbs, M. Filannino, E. Soysal, S. Henry, Ö. Uzuner, Cohort selection
     for clinical trials: n2c2 2018 shared task track 1, Journal of the American
     Medical Informatics Association 26 (2019) 1163–1171.
[6] G. Todorova, Information extraction from medical texts in Bulgarian, Mas-
     ter’s thesis, Faculty of Mathematics and Informatics, Sofia University “St.
     Kliment Ohridski”, Sofia, Bulgaria, 2020.
[7] D. Tcharaktchiev, Z. Angelov, S. Boytcheva, G. Angelova, Automatic gen-
     eration of a national diabetes register from outpatient records, Math. Mod-
     eling 2 (2018) 163–166.
[8] S. Zhang, N. Elhadad, Unsupervised biomedical named entity recognition:
     Experiments with clinical and biological texts, Journal of biomedical infor-
     matics 46 (2013) 1088–1098.
[9] P. McNamee, J. Mayfield, Entity extraction without language-specific re-
     sources, in: COLING-02: The 6th Conference on Natural Language Learn-
     ing 2002 (CoNLL-2002), 2002.
[10] A. McCallum, W. Li, Early results for named entity recognition with
     conditional random fields, feature induction and web-enhanced lexicons
     (2003).
[11] B. Settles, Biomedical named entity recognition using conditional random
     fields and rich feature sets, in: Proceedings of the international joint work-
     shop on natural language processing in biomedicine and its applications
     (NLPBA/BioNLP), 2004, pp. 107–110.
[12] A. Magge, M. Scotch, G. Gonzalez-Hernandez, Clinical ner and relation
     extraction using bi-char-lstms and random forest classifiers, in: Interna-
     tional workshop on medication and adverse drug event detection, PMLR,
     2018, pp. 25–30.


                                        87
[13] Y. Wu, M. Jiang, J. Xu, D. Zhi, H. Xu, Clinical named entity recognition
     using deep learning models, in: AMIA Annual Symposium Proceedings,
     volume 2017, American Medical Informatics Association, 2017, p. 1812.
[14] R. Vunikili, H. Supriya, V. G. Marica, O. Farri, Clinical ner using spanish
     bert embeddings., in: IberLEF@ SEPLN, 2020, pp. 505–511.
[15] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep
     bidirectional transformers for language understanding, arXiv preprint arX-
     iv:1810.04805 (2018).
[16] J. Canete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, Span-
     ish pre-trained bert model and evaluation data, Pml4dc at iclr 2020 (2020)
     2020.
[17] S. Boytcheva, B. Velichkov, G. Velchev, I. Koychev, Automatic generation
     of annotated corpora of diagnoses with icd-10 codes based on open data and
     linked open data, in: FedCSIS 2020, IEEE, 2020, pp. 163–167.
[18] B. Velichkov, S. Vassileva, S. Gerginov, B. Kraychev, I. Ivanov, P. Ivanov, I.
     Koychev, S. Boytcheva, Comparative analysis of fine-tuned deep learning
     language models for ICD-10 classification task for Bulgarian language, in:
     RANLP 2021, INCOMA Ltd., Held Online, 2021, pp. 1448–1454. URL:
     https://aclanthology.org/2021.ranlp-1.162.
[19] X. Liu, P. He, W. Chen, J. Gao, Multi-task deep neural networks for natural
     language understanding, CoRR abs/1901.11504 (2019). arXiv:1901.11504.


                                        88