Features of designing and implementing an information system for
studying and determining the level of foreign language proficiency

Taras Basyuk a, Andrii Vasyliuk a, Vasyl Lytvyn a, Olha Vlasenko b
    Lviv Polytechnic National University, Bandera str.12, Lviv, 79013
    Osnabrück University, Institute of Education Science, Heger-Tor-Wall 9, 49074 Osnabrück, Germany

                                  This article analyzes the existing methods, international certifications and known systems that
                                  provide tools for learning foreign languages and describes the mechanisms for assessing these
                                  skills, which revealed the main shortcomings of existing approaches and showed the relevance
                                  of the study. The main requirements for the systems of this class are identified and presented
                                  in the form of work scenarios on the example of the module Learning Materials, which is
                                  presented to users with the roles of "Teacher" and "Student". A mathematical description of
                                  the subject area using the algebra of algorithms, which provided the means to minimize the
                                  created models by the number of uniterms. The presented models give a complete picture of
                                  the features of the system. The software system was designed using an object-oriented
                                  approach and the created diagrams were displayed in accordance with the UML notation. The
                                  study presents diagrams of options for use and activities, which simplified the understanding
                                  of the features of the information system of learning and determining the level of knowledge
                                  of foreign languages. The result of the study was the design and implementation of a system
                                  using the Golang programming language. Natural language processing is implemented in a
                                  separate module that provides tokenization and parsing, lemmatization / stemming, tagging of
                                  a part of speech and identification of semantic connections. The created software product
                                  works in the prototype mode and implements the described functionality.

                                  Keywords 1
                                  foreign language, learning, skills assessment, algebra of algorithms, information system

1. Introduction

    Today can be called the era of computer science and telecommunications. Thanks to the
telecommunication infrastructure, it is possible to create systems for general information exchange and
mass continuous self-education, regardless of time and space constraints. In the XXI century, distance
learning was one of the most effective systems of training and continuous support of high qualification
level of specialists [1,2]. The interactive cooperation between teacher and students with the help of
information and communication networks, from which the environment of Internet users stands out on
masse, is very promising in our time.
    One of the most common sections of distance learning is the study of foreign languages, as this
knowledge is the key to success in today's world, and the vast amounts of information processing and
exchange is becoming increasingly important. Due to the process of the labor market globalization,
knowledge of a foreign language significantly increases professional competitiveness between
specialists in various fields and promotes business contacts in all spheres of life [3,4]. That is why
knowledge of a foreign language is a necessity for a person's professional activity and successful career

   Creating learning conditions and technologies aimed at personal development and professional
competence is the main goal in the study of a foreign language by future professionals in various fields.
Both in research and in practice, access to sources of information provided by knowledge of a foreign
language is essential. However, the assessment of foreign language skills has some features that need
to be considered, in particular the use of the European Competence System (CEFR), which is used to
define and describe language levels [5]. Given these features, the urgent task is to design and implement
an information system for studying and determining the level of foreign language proficiency.

1.1 Analysis of recent researches and publications

1.1.1 Known methods of study
    The analysis showed that a significant number of approaches to learning a foreign language have
been developed nowadays. It will be considered each of them for detailed research.
    Grammar-translation method. This method is based on the traditional approach to teaching Greek
and Latin, which aimed to teach students to read and translate literary works, while the communication
component as such was not used. It is the oldest of all known methods of learning foreign languages,
but it has many disadvantages, including not taking into account the verbal aspects of communication
    Audiolingual method. In this method, students use only the target language in the learning process.
New words and grammar are explained orally in a foreign language. Grammar is learned automatically
through the practice of correct forms until the listener has mastered it on an intuitive level. The main
disadvantages of this method include the passive role of students and the student's mastery of the
memorized set of phrases [7,8].
    Situational language learning. This method belongs to the functional group and is an oral approach
developed by British linguists. Its main principle is the study of vocabulary and the practice of reading.
Learning processes are divided into three stages: acquisition of knowledge, memorization through
repetition and use of this knowledge in practice. In this case, language skills are presented first orally
and then in writing, because in this way they are more effectively studied [9].
    Communicative language learning. One of the modern standards of language learning. Its main
principle is to focus not on the study of grammar, but on the communicative aspect. The main thing is
that the student can send messages in a foreign language despite grammatical errors. The method of
communicative language learning has different features that distinguish it from previous methods [10]:
understanding is through the active use of a foreign language, and learning is through authentic texts.
    Immersion method. Many consider it the most successful way to learn a second language. After all,
in this case, the language being studied is constantly used. Language learning is mainly through
communication. Immersion can be complete or partial. Complete immersion in the early stages of
language learning is considered ineffective and even stressful. After learning the language by this
method, students often feel confident in communication [11]. In addition, their language resembles the
language of native speakers.
    The method of storytelling. According to this method, classes use a combination of reading and
storytelling to help students better understand a foreign language. The method works in three stages: at
the first stage new structures of vocabulary are studied, using a combination of translation, gestures and
personalized questions; on the second, these structures are used to retell stories; and finally, in the third
stage, the same structures are used during reading. Before learning new material, the teacher should
make sure that students have mastered and actively use what they have learned in previous lessons. It
is planned to study no more than three phrases in each lesson, which allows you to understand and
remember them well [12].
    Method of physical reaction. The method, developed by an American psychologist, states that a
person cannot learn anything without "passing it through." According to this method, the listener is
initially a passive performer. That is, during the first twenty classes he cannot speak, but only perceive
everything heard. When he has gained minimal language experience, the listener can respond to what
he heard, but only by action. For example, when learning the word "get up", everyone gets up while
pronouncing it. So a person literally feels these words on himself.
    The method of quiet learning. According to this method, students must learn independently, and the
teacher should only help them develop responsibility and autonomy. Almost all classes the teacher is
silent. Learning in silence, as opposed to repetition by the teacher, becomes a technique that promotes
mental activity and concentration on the task [11]. This method of "quiet" learning is difficult to use at
the initial stage, as it involves a high level of interest of students and the presence of intrinsic motivation,
which is not always possible, especially at a young age.
    In general, the list of methods for learning a foreign language is not exhaustive. We can say that the
ideal method has not yet been invented, but it is worth noting that it can hardly exist at all, because
everyone has different mindset and different skills, so the best effect in learning a foreign language can
be achieved only with an individual approach. Taking into account all peculiarities and individual
requests is planned to be implemented in our software system.

1.1.2. International certifications analysis

    Knowledge of a foreign language increases professional competitiveness between specialists in
various fields and facilitates business contacts establishment in all spheres of life. According to
research, one of the biggest incentives to learn foreign languages is career growth (about 51% of
respondents), for 23% of respondents - an internship or study abroad, 12% do it to increase the comfort
of leisure and fun, 7% go on business trips abroad, 4% study foreign languages to participate in
international conferences and only 3% of respondents work with foreign partners. The above statistics
show that the vast majority of answers are directly related to work and career advancement. The
conclusion from this is one - for most people learning a foreign language is not the main occupation,
but only the achievement for a certain global goal. In order to simplify the assessment of language
proficiency of candidates, the European Competence System (CEFR) was created - a system for
assessing and describing language levels [5]. The European system of competencies divides general
competencies into knowledge, skills and existential competencies with special skills: language,
sociolinguistic and pragmatic. General and special communication skills are developed by creating or
understanding texts of different contexts under different conditions. These contexts correspond to
different parts of public life - domains. There are four main domains: professional, public, initial,
personal. The user can achieve different levels of language proficiency in each of these domains and in
order to describe them, the pan-European competence system provides a set of standard levels: A -
Elementary level (A1 - Introductory, or level "Discovery", A2 - Intermediate, or level " Survival "), B
- Independent level (B1 - Threshold, B2 - Progressive), C - Proficient level (C1 - Autonomous, C2 -
Competent). Depending on the group and the level in it, the user must have a certain set of skills. There
are various test systems for language proficiency certification.
    International English Language Testing System is one of the most famous testing systems, jointly
supported by the University of Cambridge and the British Council. There are two versions of the test
[13]: Academic (designed for admission to universities and other universities, as well as for
professionals from different fields who want to study or practice in an English-speaking country) and
General Training (designed for extracurricular or gaining experience in one or another area, or
    IELTS has the following features [14,15]:
    •    variety of accents and writing styles presented in text materials to minimize possible linguistic
    •    english language proficiency testing in various forms - listening, reading, writing, speaking;
    •    use of separate grades for each skill;
    •    speaking module - a key component of IELTS. It is conducted in the form of an individual
    interview with the examiner, who evaluates the candidate, taking into account his methods of
    Test of English as a Foreign Language - a standardized test to determine the knowledge of English
non-native speakers who wish to enter higher education in the United States and Canada [16]. There
are several versions of the test: the paper version (Paper-Based Test), the computer version (Computer-
based test), the Internet version (Internet-based Test) and the online version for self-assembly at home
(iBT Home Edition). Currently, the TOEFL iBT option is the only one recognized in many universities
around the world, as it includes the most complete and relevant tasks not only in reading, listening and
writing, but also in speaking and combination tasks.
    One of the disadvantages of TOEFL testing is that the certificate is only valid for two years. Whereas
in some universities it is necessary that the test was passed no more than 18 months ago. IELTS exams
are held in Ukraine by the British Council, and this makes it impossible for IELTS to be part of remote
certification of knowledge, which is not convenient in the current pandemic and war conditions.
    Given the peculiarities of international certifications in the development of the system it is necessary
to combine the positive aspects of both, namely: remote free certification with recommendations for
improving the required level of language proficiency.

1.1.3. Known systems analysis

The main problem in learning a foreign language is that there are many systems of analogues, but the
most effective comprehensive solution has not yet been found. In addition, available research shows
that the best way to study is one that combines auditory and visual attention, which confirms the
relevance of the use of multimedia. Let's analyze the known analogue systems.
    •     Awabe is a free application that helps you learn more than 4,000 typical words and phrases.
    The application works offline and offers translation, audio and video tutorials for learning the
    language. A feature of the system is to receive daily tests that improve communication skills.
    •     Hello English - involves the use of interactive games in the teaching of English lessons [18]. In
    the process of learning, the user earns coins that provide the means to unlock the following tasks. In
    practice, this application also offers new popular audiobooks, the latest innovations to further
    improve their language.
    •     Duolingo - a popular application for learning a foreign language from scratch. Duolingo uses
    interactive games to help the user learn many different languages. For beginners, the program
    focuses on helping to learn verbs, phrases and sentences [19]. However, advanced users can also
    improve their knowledge by taking courses in writing, language and vocabulary.
    •     Lingbe is a community where people help each other in learning a foreign language. Namely,
    in order to learn a foreign language, appropriate native speakers are invited to gain practical
    communication experience. Although there are many applications such as Lingbe that connect to
    native speakers, it has an extremely simple user interface and usage rules. Namely, the user needs
    to choose the native language and the language to be studied, and then provide information about
    their interests. This information will not be visible to other users, but will only serve to select the
    appropriate media [20].
    •     The Sounding Out Machine - this application is very useful for users who have difficulty
    understanding and pronouncing foreign language content. The system allows you to play complex
    words and phrases that are played by syllables. This system is extremely useful, both in the process
    of individual and collective learning, as it allows you to identify "problematic phrases" and perform
    their pronunciation with subsequent storage on media [3].
    •     GoogleTranslate is a Google service that automatically translates words, phrases, and web
    pages from one language to another. Google uses its own software that supports statistical machine
    translation, which is based on comparing large volumes of language pairs. Language pairs are the
    texts that contain sentences in one language and the corresponding sentences in another, can be as
    options for writing two sentences by a person - a native speaker of two languages, and a set of
    sentences and their translations performed by a third party. Thus, statistical machine translation has
    the property of "self-learning" and the more language pairs are available and the more accurately
    they correspond to each other, the better the result [21].
    The analysis showed that in the process of creating a system it is necessary to take into account many
features: simplicity of the interface, features of information perception, availability of information, etc.
Given the urgent task is to create an information system for learning and determining the level of foreign
language proficiency.
1.2     The main study objectives and their significance

    The purpose of the study is to design and implement a system for studying and determining the level
of foreign language proficiency. The study will provide tools for learning foreign languages depending
on the initial level of the user, the required purpose and the need for certification of skills, which will
increase its competitiveness in the labor market.
    To achieve this goal it is necessary to solve the following main tasks: to analyze existing approaches,
methods and software tools used in the field of learning and assessing foreign language skills; identify
the main tasks that arise; display system scenarios for users of different access levels; make a
mathematical description of the subject area using the algebra of algorithms; to design a software system
using an object-oriented approach and display the created diagrams in accordance with the notation of
the UML language; to develop a software system for studying and determining the level of foreign
language proficiency.
    The results of the research solve the current scientific and practical problem of creating a
methodological approach and software for learning foreign languages.

2.    Major research results

    At the initial stage of creating the system, the necessary task was to determine the basic requirements,
which it would meet. Given that, one way is to present requirements in the form of scenarios [22].
Scenarios are mechanisms for identifying user needs that force you to focus on the goals that
stakeholders want to achieve. Any scenario considered is a set of consecutive results (or achieved states)
obtained in chronological order. The chronological sequence of obtaining the script allows you to
repeatedly view the possible options for the system, and stakeholders to gradually test the functionality.
In the process of forming work scenarios, the Subsystem Learning Materials was identified, which is
presented to users with the roles of "Teacher" and "Student".

Figure 1: Model of the subsystem "Learning Materials" for the user with the role of "Teacher"

    The subsystem "Learning Materials" is designed to enter and update a database of teaching materials,
manuals, video and audio files, web resources, as well as providing access to this data to users of
different roles. Input data for the subsystem, defined search criteria (for example, type or level of
complexity of tasks, choice of the ultimate goal, etc.), as well as keys for access to selected training
materials. After processing the received input data, the system gives the user the opportunity to use all
the necessary training material.
Figure 2: Model of the subsystem "Learning materials" for the user with the role "Student"

   The subsystem allows you to perform the following functions:
   •     updating the database of educational materials, adding new and editing existing materials;
   •     display of the necessary educational materials according to the specified criteria (course, level
   of complexity, type of tasks, etc.);
   •     search for available educational materials in accordance with the specified criteria;
   •     checking the keys to access relevant training materials;
   •     online testing and data storage.
   Further research was aimed at building models of system operation using the apparatus of algebra
algorithms. The first stage in the implementation of the algebra of algorithms was the description of
uniterms and the synthesis of sequences [23], which is given below.
   Formed uniterms: L - uniterm of entering the system; R - uniterm creation of the usher's office; N -
unit of registration of a new user; T - uniterm of testing to determine the level of language; P - unit of
selection of the section that needs to be improved; M - unit of formation of educational materials in
accordance with the selected section; A - uniterm of passing audition and testing of language skills; C -
uniterm of the description of wishes concerning possible improvement of skills; Rec - uniterm formation
of a list of recommendations on ways to improve the current level; u1 - check condition for lack of
authentication data; u2 - uniterm of verification of compliance with the established requirements, *L -
uniterm of exit from the cycle with verification of the availability of authentication data, *M - uniterm
of exit from the cycle with verification of compliance with the established requirements. As a result of
using the algebra of algorithms apparatus, the following sequences and eliminations are synthesized:
   S1 - sequence of system operation in case of authentication data and non-compliance with the
established requirements:

   S2 - the sequence of operation of the system in the presence of authentication data and compliance
with the established requirements:

   S3 - system operation sequence in case of no authentication data:

   L1 - verification of compliance with the established requirements:
   L2 - authentication data check:

   The next step is the substitution of the corresponding sequences and the elimination of L2. As a result
of using the properties of the algebra of algorithms [23], we make common uniterms for the sign of the
elimination operation and obtain the following formula of the algebra of algorithms:

   The next stage of the study was the design of the system using an object-oriented approach [24, 25].
Namely, at the beginning of the design, a diagram of usage options was created in the Visual Paradigm
software environment and shown in Fig.3.

Figure 3: Diagram of usage options
    The main actors in the chart are: Student - a user of the system who has access to the course materials
and can perform certain activities. It is enough for the user to register, create their own cabinet with
individual wishes for the course and take a test to establish the initial level of language proficiency.
Teacher - creates and edits the necessary initial attributes, provides technical support and provides
training materials. After training, the student's knowledge is tested with the formation of
    The diagram of activity of the designed system is presented on Fig. 4. From the user's point of view,
it is mandatory to register and pass a language proficiency test. There is feedback on the platform that
allows the user to leave comments and interact with the teacher online. Training takes place through
the interactive properties of the platform - graphic and multimedia display of educational material.

Figure 4: User activity diagram

    From the teacher's point of view, he mainly performs mentoring and teaching functions. The main
operations are entrusted to the system, which implements the test of user knowledge in various forms:
reading, listening and writing. For data extraction and natural language recognition, it is proposed to
use NLP methods that combine statistical models, models of machine learning and deep learning with
computational linguistics - modeling of human language based on rules [26,27]. These technologies
allow computers to process human speech in the form of text or voice data and "understand" its full
meaning, taking into account the mood of the speaker. With the help of NLP, the designed system
evaluates the following communication skills:
    •   ability to carry out conditioned communication;
    •   the ability to hear the content of authentic texts;
   •    the ability to read and perceive authentic texts of different genres and types with different levels
   of understanding of the content, considering them as a source of various information and as a means
   of mastering it;
   •    ability to communicate in writing in accordance with the tasks.

    As for the architecture of construction, after the analysis of the market of such systems [28] it was
decided to implement an information system based on a three-tier architecture. Three-tier architecture
is a client-server architecture in which functional process logic, data access, computer storage, and user
interface are developed and maintained as independent modules on separate platforms. The three levels
represent the following elements [29,30]: presentation level (displays information related to the services
available on the website), application level (controls the functionality of the system, implementing the
main functionality), data level (contains a database server, where information is stored and processed).
Due to these features, the web server is implemented using the Golang language, which contains tools
for parallel computing and the ability to remotely manage web packages.
    The Petri net was used as a device for modeling the sequence of actions of the system [31]. In this
case, in the general case, a simple labeled Petri net can be represented by the dependence: MN = {S, T,
F}, where S = {S1, S2,…,Sn} - the set of network states, T = {t1, t2,…,tn} - the set of transitions, F- set
of arcs. The Petri net for the designed system has the form shown in Figure 5.

Figure 5: Petri net illustrating the system operation

   The purpose of each position and transitions are given in the table. 1 and 2.

Table 1
Petri net positions table
               Position                                           Appointment
                  S1                    Making initial settings
                  S2                    Entering authentication data
                  S3                    Waiting for user actions
                  S4                    Granting access
                  S5                    Conducting initial testing
                  S6                    Displaying the results
                  S7                    Improving the level of grammar
                  S8                    Improving writing skills
                  S9                    Learning IT Slang
                  S10                   Studying the peculiarities of business communication
                  S11                   Service change
                  S12                   Formation of recommendations
Table 2
Table of transitions
               Position                                            Appointment
                  t1                     Start the system
                  t2                     Start processing authentication data
                  t3                     Complete authentication data processing
                  t4                     Enter your personal account
                  t5                     Complete the testing process
                  t6                     Select the type of service
                  t7                     Save the results to the user profile
                  t8                     Shut down the system

   The next stage was the construction of a system for studying and determining the level of foreign
language proficiency. The developed system is characterized by a simple and intuitive interface
implemented in accordance with the requirements of SEO [32]. When the user first visits the system
page (Fig. 6), he has access only to the forum and news. To start training, the user must register, or if
he is already registered, then log in using his login and password. He also has the opportunity to read
the reference information in the "About" section.

Figure 6: Home page of the system

   At the first login, a user profile is created, which contains the user's personal data. In the future, they
are supplemented with data on the course, exercises, testing and learning progress. Also, at the first
entry of the user, it is necessary to pass a test to determine the level of a foreign language (Fig.7).

Figure 7: System page after user registration
    Passing the test is implemented by a separate module, which is responsible for establishing the level
of language proficiency and testing language skills [33]. The module is based on the methods of Natural
Language Processing [27]. For the practical implementation of NLP concepts, the library
github.com/james-bowman/nlp was used, which is a part of the machine learning algorithms for natural
language processing in Golang [34,35]. The main characteristics of the library used to implement the
module, which is responsible for checking the tasks performed in the study of foreign languages,
include: Latent Semantic Analysis aka Latent Semantic Indexing implementation using truncated
Singular Value Decomposition for dimensionality reduction; fast comparison and retrieval of
semantically similar documents using SimHash algorithm; Random Indexing and Reflective Random
Indexing; Latent Dirichlet Allocation (LDA) using a parallel implementation of the fast SCVB0
(Stochastic Collapsed Variational Bayesian inference) algorithm for unsupervised topic extraction. An
example of a library for recognizing and correcting input text is shown in Fig.8.

Figure 8: Module for text analysis

   After passing the English language test, the user can start learning. He can choose the material he
wants to work with: videos, audio recordings, podcasts or authentic texts (Fig.9). After learning, training
exercises and testing are a must. Again, the user can view the data on the passed material, as well as the
results of all tests on his profile page.

Figure 9: System page after passing the test
   It should also be noted that all training materials displayed to the user are adapted to the level of his
skills based on the passed testing and progress of tasks. If there is a question about using the system or
calling for online help, the user should select the Help of the system, where he will find detailed
instructions, as well as tips on how to use the system better and more effectively or contact the teacher.
The peculiarity of the created system is that it provides tools for self-selection of skills that the user
seeks to improve (Fig.10).

Figure 10: Page for choosing skills to improve

   For example, after selecting the grammar section, the user will have access to video and audio
materials, as well as texts and podcasts on grammar topics. This will allow you to focus on specific
gaps in knowledge.

3. Conclusion

   As a result of the study, the existing methods, international certifications and known systems were
analyzed, which provide tools for learning foreign languages and describe the mechanisms for assessing
these skills. According to the analysis, today there are many software systems, but they are all
characterized by certain shortcomings, from commercial application to limited functionality, which
makes it an urgent task to design and implement a system for learning and determining the level of
foreign languages. In order to determine the basic requirements for systems of this class, they were
presented in the form of scenarios on the module Learning Materials example, which is presented to
users with the roles of "Teacher" and "Student". The next stage of the work was a mathematical
description of the subject area using the apparatus of algebra of algorithms, which provided the
necessary basis for the formation of mathematical software. Object-oriented design of the software
system by constructing a set of diagrams in UML notation, two of which (diagrams of use and activity
variants) are presented in the article. Based on the study, the system was designed and implemented
using the Golang programming language. The created software product works in the prototype mode
and implements the described functionality.
   Further research will focus on creating a mobile version of the system, resolving conflicts and
expanding the functionality in accordance with the defined requirements.

