Remote Host Operation System Type Detection Based on
Machine Learning Approach
Leonid Kupershtein, Tatiana Martyniuk, Olesia Voitovych and Artur Borusevych
Vinnytsia National Technical University, Khmelnytske shoes str., 95, Vinnytsia, 21021, Ukraine

             Abstract
             There are the research results of using machine learning to solve the problem of the remote
             host operating system detection in the article. The analysis of existing methods and means of
             detection of the remote host operating system are carried out, the main advantages and
             disadvantages of their using are defined. Modeling of machine learning methods is carried out.
             The software architecture is designed and experimental application is developed. It uses a
             trained machine learning model that allows detecting the type and version of operating system
             with high accuracy.

             Keywords 1
             Operating system detection, machine learning, computer networks, network protocol, scanning.

1. Introduction
    Today, the number of devices in computer networks is growing every day. The list of devices
includes routers, printers, IP phones, smart things, personal computers, laptops, smartphones, etc.
    However, not all the network devices have the latest version of the operating system (OS) and
updates related to its security. The reasons for this may be: lack of necessary funding or lack of
required hardware for the new version of the operating system to work properly; unwillingness of
device users to master the new interface or capabilities; lack of support for the software used in the
new version of the operating system. The need for constant OS updating is an ever-increasing number
of identified vulnerabilities [1, 2]. Exploiting these vulnerabilities could lead to breaches of the
confidentiality, integrity, and availability of data and other software, such as web services [3, 4]. OS
vulnerabilities cause the possibility of unauthorized access to database-oriented applications, which in
turn requires additional protection [5]. Network administrators must be ready for possible attacks. It
requires constant network monitoring to detect unauthorized devices or devices, which are running an
old and/or vulnerable version of the operating system.
    Penetration testing specialists and ethical hackers need to gather as much information about the
object as possible to conduct authorized attacks in the initial stages. It is necessary to form the most
effective vector of attack to identify potential vulnerabilities in the protection system of the researched
infrastructure [6]. Knowledge about the family, type, and version of the operating system installed on
network hosts can help them, because after all, each OS is associated with certain vulnerabilities in its
software [1].
    Currently, there are a significant number of software tools for the operating system detection,
which allow some probability to determine its family, not to mention the ability to determine the type
and version of the OS.
    Therefore, it is very important to research and develop methods and tools that will determine
detailed information about the remote host operating system with high reliability, which will increase
the efficiency of identifying vulnerabilities and, consequently, increase the level of cyber security in
general.

II International Scientific Symposium «Intelligent Solutions» IntSol-2021, September 28–30, 2021, Kyiv-Uzhhorod, Ukraine
EMAIL: kupershtein.lm@gmail.com (L.Kupershtein); martyniuk.t.b@gmail.com (T.Martyniuk); voytovych.op@gmail.com
(O.Voitovych) ; borusevych.av@gmail.com (A.Borysevich) ORCID: 0000-0003-4712-3916 (L.Kupershtein); 0000-0003-3811-6183
(T.Martyniuk); 0000-0001-8964-7000 (O.Voitovych)
           © 2021 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                                           65
2. Methods and tools for the operating system detection
    There are two main methods for detecting a remote host operating system: active and passive.
    The active methods are based on sending a specially built service packets to the target machine [7].
Then after receiving answer analysis, a conclusion about the target node operating system is formed.
    The advantages of this method are:
          speed – since the packets are sent to the target node, you can get a response faster, without
    having to wait for the necessary packets in the network;
          simplicity – usually you only need to compare the received answers with the database of
    signatures, without analyzing the parameters or their combination;
          flexibility – due to the packages are formed manually, it is possible to adjust the packages
    contents, adding new ones as needed.
    However, there are also disadvantages:
          visibility – using this method, packets are sent over the network, so one can detect them and
    apply appropriate actions;
          signature database usage – record absence in the database, causes wrong detection or no
    answer at all;
          necessity of the node response receiving – if the response from the target node is not received,
    it is impossible to detect the operating system.
    The passive methods of the operating system detection are based on network traffic listening,
transmitted packets collecting, and then, their contents analyzing to form a conclusion about the
remote host OS [8].
    The advantages of this method are:
          invisibility – because of continuous listening, there is no activity on the listening network
    device, or this activity is so low that it can be perceived as normal traffic during the business day;
          no need to receive the target node response – traffic analysis from the target node to other
    devices is allowed.
    The disadvantages of this method are:
          speed – it is necessary to wait for the appearance of certain packets in order to form a
    conclusion, that takes a long time in the case of the network activity absence;
          implementing complexity – since it is not possible to send self-generated packets, you need to
    use the information from intercepted packets.
    In conclusion, the appropriate approach should be chosen depending on whether you want to
perform the scan imperceptibly or you want to get the result quickly.
    Existing tools of the remote node operating system detection also is considered. Currently, a small
number of software products perform the task of scanning the remote node operating system. Often
this task is not their main function, but only one of the menu options.
    The most common tools that use the active detection method are Nmap, NetScanTools Pro and
Xprobe.
    Nmap is a free, open source software designed for network scanning and security auditing. The
program also allows you to detect the available nodes in the network, active network services, types
of firewalls, etc. [9].
    Nmap uses an active method to detect the operating system. To do this, the tool creates a
"fingerprint", sending TCP, UDP and ICMP packets to known potentially open and closed ports.
Nmap analyzes the responses to these packets. As a result, a conclusion is formed, which indicates the
type of node operating system and the reliability of this conclusion. If there are no complete matches
with the signature, the score is performed (each parameter has a corresponding weight in points). The
tool does not have the ability to detect the operating system by passive method, so the tool is designed
specifically for active analysis [7, 10].
    NetScanTools Pro uses responses to ICMP packets to detect the operating system. This is the main
disadvantage of the tool – using only ICMP packets do not allow obtaining a reliable answer [11].
    Xprobe is a software tool that relies on fuzzy signature matching, probabilistic assumptions,
analysis of multiple matches simultaneously in the signature database [12].


                                                                                                        66
    The most common tools that use the passive detection method are p0f, Satori, NetworkMiner,
PRADS and Ettercap. They all have a similar principle of operation, namely the analysis of incoming
traffic based on the signature database [13-17].
    It is worth noting the p0f tool, which uses many complex, purely passive mechanisms to detect the
node operating system by any random connections. The tool can detect node operating systems in
those networks, where Nmap packets provoke the security system operation.
    Ettercap tool is also quite interesting. It is designed to implement man-in-the-middle attacks, while
having the function of detecting the operating system by passive method.
    The comparative description of the software for the host operating system detecting is presented in
Table 1.
Table 1
OS detection software characteristics
      Software         Method          Protocols              Last update             OS family
       Nmap            Active       TCP, UDP, ICMP            23-04-2021          FreeBSD, iOS, Mac
                                                                                  OSX, OpenSolaris,
                                                                                   Linux, Windows
         p0f           Passive          TCP, HTTP             18-04-2016          FreeBSD, iOS, Mac
                                                                                  OSX, OpenSolaris,
                                                                                   Linux, Windows
    NetScanTools        Active            ICMP                02-09-2020                   -
        Pro
      Xprobe            Active          ICMP, TCP             27-07-2005          FreeBSD, Mac OSX,
                                                                                    Linux, Windows
        Satori         Passive     DHCP, TCP, HTTP,           04-05-2021          FreeBSD, iOS, Mac
                                        SMB                                        OSX, OpenSolaris,
                                                                                    Linux, Windows
   NetworkMiner        Passive          TCP, HTTP             23-09-2020          FreeBSD, iOS, Mac
                                                                                   OSX, OpenSolaris,
                                                                                    Linux, Windows
       PRADS           Passive      TCP, UDP, DHCP,       19-09-2020 (app),       FreeBSD, iOS, Mac
                                          ICMP             16-02-2010 (DB          OSX, OpenSolaris,
                                                              signature)            Linux, Windows
      Ettercap         Passive             TCP               06-01-2021           FreeBSD, iOS, Mac
                                                                                   OSX, OpenSolaris,
                                                                                    Linux, Windows

    You can also use Wireshark for passive detection [18]. In such case, you need to analyze certain
fields by yourself: TTL, User-Agent, etc. For example, if the TTL value is 128 and the User-Agent
parameter contains the value "Windows NT 10.0", you can conclude that the device has the Windows
10 operating system installed [9, 19]. However, in this case it is necessary to have database, where
types of operating systems are in accordance with the packets content values [20].
    There are also tools that can work using both active and passive operating system detection
methods, for example are SinFP [21] and queso [22]. However, support for these tools is currently
discontinued and download pages are unavailable.

3. Related works
   In research [23] only mobile operating systems were analyzed. Operating systems, that were
analyzed in that work, are: Android v2.3, Android v4.4, iOS 5, iOS 8, Symbian 3 and Win Phone 7.5.
Dataset size is 489 GB of data, that was gathered during several months. Traffic, that was used to
identify OS, was captured while watching videos on YouTube, downloading files, making video calls
on Skype and combined traffic. Combined traffic included traffic, while all actions described were

                                                                                                       67
made on OS’s, that support multitasking. Research results show that detection accuracy is around
70% when analyzing traffic for 30 seconds, around 90% accuracy while analyzing traffic for 5
minutes, and 100% accuracy when using combined traffic for 30 seconds period.
    In research [24] only TCP SYN packets were captured for analysis. There is no description on how
packets were gathered. Developed system searches signature in signature database. Experiments had
next results: an accuracy of 86.3% when finding exact match and additionally 9.2% were detected
correctly, when using minimal distance match. Type I error is 4.5%. Developed system allows to
classify OS as one of three classes: Windows 7 or 8, Windows 7 or Vista, Linux. It does not allow to
detect exact version of OS, and classes are very general, because they unite 2 or more OS in 1 class.
    SVM was used as a machine learning method in [25] to detect OS. Nmap signature database was
used as a dataset for machine learning. Training set consisted of 1503 samples, testing set consisted of
1023 samples. Most of samples were classified as: Other system, Windows, Linux. Developed system
allowed to classify detected OS as one from the list: Windows, Linux, FreeBSD, OpenBSD, MAC
OS, Sun Solaris, Cisco, Other system. On average accuracy is 86.63%. Amount of errors depends on
OS that was detected: for Windows amount of errors is 3,91%, for Linux – 5,19%; FreeBSD –
17,71%; OpenBSD – 15.85%; Mac – 25.8%; Solaris – 4.53%; Cisco – 24.22%; Other system –
9.74%.
    In research [26] only headers of ICMP packets were analyzed. No system was developed, the OS
detection was conducted manually by analyzing TTL value and identification field increment. Only 5
OS’s were used during experiments: Windows 7, Windows 8.1, Windows 10, Linux 18.x, and Debian
7.x. In [27] 2 similar to our research algorithms were used: Decision Table and J48. Decision Table
used data from ICMP protocol (checksum, checksum_status, ext.checksum, ident, length), J48 used
data from IP, UDP, DNS protocols (IP: checksum_status, dsfield, dsfield.dscp, dsfield.ecn, flags,
flags.df, flags.mf, flags.rb, frag_offset, hdr_len, len, proto, ttl, version). For Decision Table and J48
dataset size is around 79,000 packets. Operating systems, that were classified are: - Linux (Raspberry,
Xubuntu), Mac OS (10.7, 10.11), Windows (7, 8, 10). Decision Tree has an accuracy of 0.994 and J48
has an accuracy of 0.94.
    In [28] Decision Tree/C4.5 algorithm was used. Algorithm used data from TCP protocol (window
size, ttl, don’t fragment bit, packet size, options order, window size of fin packet, ttl of fin packet,
don’t fragment of fin packet, packet size of fin packet). Dataset size is around 30000 packets. Oss that
were classified are: Windows Vista SP0-2, Windows 7 SP1, Windows 2000 SP2,4, Windows XP
SP1+, Linux. Algorithm has an accuracy of 0.9086.

4. Problem statement
   Methods and instruments analysis of the operating system type detection show that all of
approaches are based on the signatures database. However, this imposes limitations if the signature of
an OS is missing from the database, which can lead to low credibility. Herewith it is possible to form
a huge base for all possible types of OS but it can lead to considerable time expenses on finding the
corresponding signature. An alternative solution of the remote host OS type detection is use of the
machine learning methods [23-29]. Many researchers present some results of these methods usage.
They are also based on the signature database usage but with a purpose of model learning.
Consequently, the signatures obtaining method, their preliminary analysis and processing may
significantly affect the credibility of the OS type detection.
   If focusing on the real software development the signature database should be formed manually.
This is necessary for understanding the signature forming principles so in the future it would be
possible to use the tool for actual tasks rather than leaving everything on the stage of developed
model. In addition, this approach will allow system scaling, namely to gradually increase the detected
OS number. For making the OS type detection software more convenient, both active and passive
mode should be implemented.
   In addition, it is important not only detect the type of OS, but also its version, what can have
significant influence on the user’s decision-making. Since the task of OS type detection is a
classification task, the criteria of qualitative model obtaining is maximizing its accuracy and
precision. In addition, an important metric of the tool grade evaluation is the response time. Most of
the time is spent on the trial packet sending and receiving responses so it is a network delay.

                                                                                                      68
Herewith, the attempts amount can also have great influence on the response receiving time. The
complexity of the machine learning model can also significantly affect the system efficiency so upon
reaching sufficient accuracy it should be as simple as possible. The volume of the studied test
packages parameters and their pre-processing affect the time delays as well. Therefore, it will be
advisable to select relevant protocol headers.
   Thereby, it is possible to determine the main goals and criteria to develop the OS detection
software based on the machine learning methods:
       precision maximization;
       response time minimization;
       self-dependent signature forming;
       scalability;
       cross-platforming;
       universality of use.

5. Data gathering and preprocessing
     Solving the object classification problem, namely the problem of detection the OS type, involves
the usage of algorithms for learning with the teacher (supervised learning) [30]. Although it is
possible to set the problem as clustering of objects using unsupervised learning. In any case, a dataset
is needed to build a machine learning model. In addition, when using supervised learning, this data set
must be labeled, i.e., each class of the operating system is associated with a specific label.
     A ready-made data set can be used to solve the set tasks [31], but it contains only families of
operating systems. However, the user may be interested in the type and version of the operating
system. Therefore, authors decide to form a data set independently.
     The main idea of detecting the family / type of remote node OS is based on the analysis of
network protocol headers of the OSI model application, transport and network layers. Despite the
standards of network protocols (RFC, IEEE) in different operating systems of even one family, some
header fields can differ significantly, such as TTL, DF, ToS IP [32].
     The essence of the experiment of a training data set formation is to generate certain traffic,
capture and analyze it. The analysis consists in parsing packets, redacting to one data format and
relevant parameters (features) selecting. The creation of a dataset was performed under laboratory
conditions. The following OS versions were studied: Linux (version 5.4.0), Mac OS (version 10.12.4
and 11.4), Windows 10 (Corporate 20h2, Home 20h2), Windows 7 (Professional), Windows XP
(Professional SP3). At the same time, Windows XP and Linux 5.4 were studied using virtualization
technology using VirtualBox software [33], and the others were installed on real PCs.
    The experiment steps of dataset creation are shown in Figure 1.

                 1                    2                   3                   4                       5
      Traffic                                                                              Feature
                       Traffic gathering        Parsing           Preprocessing
    generation                                                                            selection

    Figure 1: Experiment flow
     Step 1. Traffic generation. Different types of traffic were used to form the dataset. To obtain it,
the following steps were performed for each operating system:
        Sending 20 ICMP packets with the Type 8 Echo Request value from the studied OS;
        Viewing video content on the YouTube web resource for at least 30 seconds from the studied
    OS;
        Viewing different web pages from the studied OS;
        Downloading images from the web to the studied OS.
     Step 2. Traffic capture. The most popular Wireshark traffic analyzer was used to capture packets
from the network, which allows recording an interception session to a file for further processing
without having to be connected to the network [19]. This software is distributed under the GNU GPL
license. There are versions of Wireshark for various operating systems: Linux, Windows, MacOS,
FreeBSD, Solaris. It is also possible to use no less popular console utilities "tshark and tcpdump [34].
Traffic files are saved in pcap format. As a result, so many packages were collected: MacOS 11.4 -
67949, Windows XP Professional SP3 – 21294, Windows 10 Home 20h2 – 19291, Windows 7

                                                                                                          69
Professional – 17741, Mac OS X 10.12.4 - 16204, Linux 5.4.0 – 14307, Windows 10 Corporate 20h2
– 14072.
     Step 3. Traffic parsing. Packets were selected from all captured traffic, in which the IP address of
the source corresponds to the IP address of the PC with the studied OS. After that, it is necessary to
disassemble structure of a packet on headers fields of the main protocols: IP, ICMP, TCP, DNS and
HTTP. As a result of parsing, the following field’s values were obtained:
    1. IP – version, hdr_len, dsfield, dsfield_dscp, dsfield_ecn, len, id, flags, flags_rb, flags_df,
       flags_mf, frag_offset, ttl, proto, checksum, checksum_status.
    2. ICMP – type, code, checksum, checksum_status, ident, seq, seq_le, data, data_data, data_len.
    3. TCP – hdr_len, flags, flags_res, flags_ns, flags_cwr, flags_ecn, flags_urg, flags_ack,
       flags_push, flags_reset, flags_syn, flags_fin, flags_str, window_size_value, window_size,
       window_size_scalefactor, checksum, checksum_status.
    4. DNS – id, flags, flags_response, flags_opcode, flags_truncated, flags_recdesired, flags_z,
       flags_checkdisable,       count_queries,     count_answers,         count_auth_rr,    count_add_rr,
       qry_name_len, count_labels, qry_type, qry_class.
    5. HTTP – user_agent.
     Parsing is performed using the Python 3.8 and the PyShark library, which allows working
effectively with pcap-files. The parsing results of each packet are saved in a csv-file for further
analysis. In addition to the field values, each entry in the csv file also has an OS version. As a result, a
dataset containing 42,318 records was formed. It can be used as a database of signatures, which can
give a certain result the search for.
     Step 4. Preprocessing. Almost all classifiers do not work directly with text data. They received
features as well as numbers or booleans (which translate into numbers 0/1) that some feature is there
or not. Therefore, it is necessary to convert all categorical (text) features (IP: dsfield, flags; TCP:
flags, flags_str; DNS: flags, qry_class) into numbers. The following methods were used for this
purpose [35]:
        label encoding – to encode the values of the operating system classes by assigning a certain
    number to each OS value;
        one hot encoding – to encode attribute values (protocol parameters) by creating columns
    where each column is responsible for single attribute value and the attribute value is set as 1 in the
    corresponding column, and 0 in the other columns of the attribute.
     The Processing module from the Scikit-learn (Python) library was used as a tool. An example of
the converted data is shown in Table 2, 3.
Table 2
Results of label encoding
         Type/version OS                       Label                Type/version OS                Label
           Linux 5.9.0                           0              Windows 10 Home 20h2                 4
         Mac OS 10.12.4                          1               Windows 7 Professional              5
           MacOS 11.4                            2             Windows XP Professional SP3           6
    Windows 10 Corporate 20h2                    3

Table 3
Example of one hot encoding
                             Before                                       After
 Feature name        ip_flags                  ip_flags_0x00000000        ip_flags_0x00004000
                     0x00000000                1                          0
 Values
                     0x00004000                0                          1
    The id, checksum, and data fields of the TCP / IP protocols were not used during the
transformation and were removed from the feature list due to lack of clear differences between OS
families. Also for the study of some classifiers data normalization was performed, i.e. reduction of all
values of each parameter with a mean of 0 to the standard deviation of 1 by expression [36]:


                                                                                                           70
                                                                                                      (1)
                                               (𝑥𝑗𝑖 − µ𝑗 )
                                       𝑧𝑗𝑖 =                 ,
                                                   𝑗
where zij – i-th normalize value of j-th feature, xij – i-th original value of j-th feature, µj – mean of j-th
feature, j – standard deviation of j-th feature. For this transformation, the “StandartScaler” method
from Scikit-learn (Python) is used.
    Step 5. Feature selection/importance. Since the size of the dataset has slightly increased from 54 to
156 after the some features transformation, it will be advisable to reduce the dimension and select the
most relevant features. This is necessary to increase the learning speed of the model, computation
time when using the trained model, avoid overfitting and increase the generalizing ability of the
model. Among the significant number of methods for selecting features by importance, the Recursive
Feature Elimination method was used. The essence of this method is to build a model (which includes
all factors), which excludes the least significant factor (feature) from the point of view of the model.
After that, a new model is built, which contains all the factors except those excluded in the previous
stage, and so on [37]. The RFE module from the Scikit-learn library was used to solve the problem.
The decision tree model with default hyperparameters was used as an estimator. As a result of
selection, 14 features were identified, the parameters and the importance degree of which are shown
in Table 4. The importance degree is obtained from the "feature_importance" attribute of the trained
model. Moreover, although some features have little impact, in our opinion they are important and
give some expressiveness to different operating systems.

Table 4
Characteristics of dataset features
Protocol    Parameter name          Description                  Value example               Importance (%)
    IP           hdr_len               the length of the IP               20                     0.0001
                                                header
   IP                flags                 list of set flags          0x00004000                  0.0001
   IP              flags_df            the value of the flag               1                      0.0001
                                           Don’t Fragment
   IP                ttl                   packet lifetime                 64                    24.8843
   IP               proto                the protocol used              17 (UDP)                 0.0001
                                                below
  ICMP              ident            identification of parts                1                      8.71
                                    of the protocol packet
  ICMP               seq             packet number in the                  558                    0.0001
                                              sequence
  ICMP             seq_le             the current length of              142848                   0.0088
                                            the sequence
  ICMP           data_data              the contents of the      61:62:…:76:77:61:62:6            0.0001
                                    message in the packet         3:64:65:66:67:68:69
  ICMP            data_len                the length of the                32                     36.87
                                    message in the packet
   TCP             hdr_len                the length of the                 32                    0.0001
                                          protocol header
   TCP      window_size_value            window size value                 513                     0.008
   TCP         window_size          the calculated value of              131328                   0.0001
                                             the window
   TCP       window_size_scale      the calculated window                  256                   29.5181
                  factor                size modifier (2^n)

6. Model selection and training
   When solving the problem of classification by machine learning methods, the question of choosing
a classifier model arises. At present, a significant number of classifiers are known and implemented,

                                                                                                            71
which differ in approaches to the construction of the decision rule, as well as hyperparameters.
Correct adjustment of hyperparameters is one of the key points; it allows receiving desirable results.
In some models of machine learning, the number of hyperparameters can reach more than 10, and
each hyperparameter can take different values. Finding the optimal combination is not an easy task.
One of the options for solving this problem is to build a model for each possible combination for all
given domains of hyperparameters.
     As a result of preliminary data processing, a dataset was obtained, the structure of which is
shown in Table 5.
Table 5
Dataset structure
                  Full dataset                                          Small dataset
          OS               Samples       Percent, %             OS             Samples      percent, %
      MacOS 11.4            16949          40.05           MacOS 11.4            999          14.67
  Windows XP Pro SP3        5281           12.48        Windows XP Pro SP3       999          14.67
   Windows 10 Home          4717           11.14        Windows 10 Home          999          14.67
    Windows 7 Pro           4385           10.36          Windows 7 Pro          999          14.67
   Mac OS X 10.12.4         4024            9.51         Mac OS X 10.12.4        999          14.67
      Linux 5.4.0           3549            8.39            Linux 5.4.0          816          11.98
Windows 10 Corp 20h2        3413            8.07         Windows 10 Corp         999          14.67
         Total:             42318           100                Total:           6810           100
      For each model, the desired metric is calculated, the best result of which determines the best
model. This approach is implemented in the GridSearch package of the Scikit-learn library [38].
Although this approach is quite costly in terms of machine time, it gives the best result. In addition,
although the Scikit-learn library does not support GPU computing, one can use a set of cuML libraries
from the RAPIDS project to parallelize calculations based on CUDA technology [39, 40].
      Using GridSearch technology, the best parameters for the most used classifier models were
automatically selected:
    1. Decision Tree (DT) (criterion='entropy', max_depth=6);
    2. Multilayer Perceptron (MLP) (alpha=0.05, hidden_layer_sizes=(50, 50, 50));
    3. Gaussian Naive Bayes (GNB) (var_smoothing=1e-9);
    4. K-Nearest Neighbors (KNN) (leaf_size=5, n_neighbors=1);
    5. Support Vector Machine (SVM) (C=1000, gamma=0.0001);
    6. Logistic Regression (LR) (C=5, max_iter=500, solver='newton-cg');
    7. Random Forest (RF) (criterion='entropy', max_features='sqrt', min_samples_leaf=4,
    n_estimators=1900).
    Since the use of GridSearch is quite time consuming even with a large number of models, it was
decided to use a small dataset (table 5). It is representative enough to obtain adequate results.
    The following metrics were used to assess the quality of classifiers [3, 35]:
        Accuracy – share of correctly defined classes;
        F1 – weighted average estimate of type I and II errors;
        Precision – the ratio of the correctly defined classes number to the sum of the correctly
    defined classes and type I error (false positive) numbers;
        Recall – the ratio of the correctly defined classes number to the sum of the correctly defined
    classes and type II error (false negative) numbers;
        Confusion Matrix – a matrix showing the number of correct definitions and the number of
    erroneous definitions;
        FP – type I error (False Positive);
        FN – type II error (False Negative).
    After obtaining the optimal parameters of the classifiers based on the maximizing "accuracy"
criterion, they were trained on a full dataset. Naive Bayes trained the fastest in time, and Logistic
Regression the longest: DT - 0.194 s, GNB - 0.0409 s, KNN - 0.742, LR - 482.51 s, RF - 88.369 s,
SVM - 5.08 s, MLP - 39.43. Training of models was performed on a PC with AMD A8-4500M APU,
1.9 GHz (4 cores, 4 threads) and 8 Gb RAM. Dataset is divided into training and test in the amount of
70% and 30%, respectively. The test results of the classifiers are presented in Table 6.

                                                                                                    72
Table 6
Classifiers metrics
      Classifier         Accuracy     Precision Recall        F1       FP, %    FN, %     Average cross-
                                                                                             validation
                                                                                               score
          DT            1.0           1.0         1.0       1.0       0         0         0.9999
         MLP            0.99964       0.99952     0.9995    0.9995    0.002     0,004     0.9998
         GNB            0.90548       0.91690     0.8736    0.8474    1.7       0,189     0.9031
         KNN            0.99763       0.99682     0.9960    0.9964    0.016     0,031     0.9949
         SVM            0.99952       0.99926     0.9994    0.9993    0.009     0         0.9987
          LR            0.97861       0.97287     0.9880    0.9792    0.378     0,059     0.9784
          RF            1.0           1.0         1.0       1.0       0         0         0.9999
Frequency analysis [23] 0.7-1         -           -         -         -         -         -
Euclidean distance [24] 0.955         -           -         -         4.5       -         -
       SVM [25]         0.8663        -           -         -         -         13.36     -
  Decision Table [26] 0.994           -           -         -         -         -         -
      DT/J48 [27]       0.94          -           -         -         -         -         -
     DT/C4.5 [28]       0.9086        -           -         -         -         -         -
    As can be seen from Table 6, the Decision Tree models and its ensemble modification Random
Forest have best results without any errors. The worst classifier is Gaussian Naive Bayes, which
showed the highest number of erroneous predictions, but in general, all classifiers have high metrics.
These results are confirmed by 5-Fold cross validation. Compared with the known research results,
our metrics are better. However, for a more adequate comparison it is necessary to carry it out at least
for identical families / types / versions of OS.
    According to the simulation results, confusion matrixes is obtained, which are shown in Appendix
A. They contain the results of model prediction, namely the following indicators: True Positive, True
Negative and FP, FN.
    Analyzing the matrix of classifier errors (Figs.A.1 – A.6), we can note that most classifiers have
errors within the OS family:
        Multilayer Perceptron: incorrect definition of Windows 10 Home (defined as Windows 10
    Corporate) and Windows 7 Professional (defined as Windows XP Professional);
        Gaussian Naive Bayes: incorrect definition of Linux (defined as Windows 10 Home and
    Corporate), MacOS (defined as Linux), Windows 10 Corporate (defined as Linux), Windows 10
    Home (defined as Windows 10 Corporate and Linux);
        K-Nearest Neighbors: incorrect definition of Windows 10 Corporate (defined as MacOS 11.4
    and Windows 10 Home), Windows 10 Home (defined as Windows 10 Corporate);
        Support Vector Machine: incorrect definition of Windows 10 Home (defined as Windows 10
    Corporate) and Windows XP (defined as Windows 10 Home);
        Logistic Regression: incorrect definition of MacOS 11.4 (defined as MacOS 10.12.4) and
    incorrect definition of Windows 10 Corporate (defined as Windows 10 Home and Linux).
    Because the Decision Tree model is architecturally simpler than Random Forest, it is therefore
more appropriate to use it in software. The tree consists of five levels and has 13 leaves. Due to the
bulkiness of the tree, its image is not given in this article.

7. Application architecture and experiments
    A software is developed for application use of trained classifier model. The software architecture
is shown in Figure 2.
    It consists of four modules: visualization module, scanning module, preprocessing module and
intelligent analysis module. System has its own database also.
    Visualization module is the first one to start. It initializes graphical components of program: icons,
buttons, windows etc. It is also responsible for providing dialogue windows for user interaction,
providing detection results for user.


                                                                                                       73
   Scanning module has two purposes: sniffing network traffic to capture packets from the host,
which OS needs to be detected; sending probes (specific packets) to the host, which OS needs to be
detected, in order to receive responses and conduct OS detection.
   Preprocessing module is responsible for gathering fields from captured packets that are needed in
order to create OS signature and creating OS signature. Signature is a string, where all fields needed
from received packets are written, each field is separated by comma.
                User


                                                                 Intelligent analysis
                               Visualization module
                                                                        module


                                 Scanning module                Preprocessing module


                        Network


                                                                      Database
                                     ...
Figure 2: Application architecture
    Intelligent analysis module uses trained machine learning classification algorithm to conduct OS
detection on provided signature. Classification results are sent to visualization module in order to
show them to user. Application database is used for storing previous OS detection results for using
later. It can be used for signature storing and pcap-files also.
    Based on proposed architecture a software is developed using Python 3.8 programming language
with PyQt5 library. SQLite is used as a database. Software has two modes: online and offline.
    In offline mode, user has an opportunity to upload pcap-file from external or internal resources. It
can be useful when you already have captured traffic and need to detect type/version of OS after some
time. Pcap-file should contain information about one specific host. Otherwise it can be filtered by ip-
address and then resaved. Online mode is available for both active and passive scanning. Passive
scanning takes more time, since, as stated before, all needed packets are required for analysis.
Software is in waiting mode until all needed packets are not captured. However, it can be sped up by,
for example, visiting it if detected host is a webserver and is available through http or ftp. In addition,
software provides a way to detect OS by analyzing User Agent from HTTP packets.
    Active mode is more appropriate since it does not require waiting, because software send probes:
four ICMP requests along with up to 10 TCP SYN requests. The procedure will be repeated up to 5
times if no response is received. After preprocessing module receives packets it sends formed
signature to intelligent analysis module. Classifier predicts OS family/type/version of the target host
and prediction probability. Next, an experiment is conducted by checking how software works and
compare results with Nmap. First a scan for Windows 10 Corporate was conducted, it’s IP address in
local network is 192.168.1.170. Results of the active scan mode are shown in Appendix B.
    Results are stating that Nmap couldn’t detect exactly installed operating system. Also in results we
can see a wide specter of possible operating systems (Windows 10, 7, Windows Phone, Windows
Server, FreeBSD), and probability is equal to 0.92 compared to 0.955 of developed software (Fig.
B.1, 2). By executing similar actions for host, that has IP address 192.168.43.60 developed software
correctly detected Windows 10 Home 20h2 with probability of 0.937 (Fig. B.3). Nmap do not provide
a result and states that host has too many signature matches (Fig. B.4). Also rented host with white IP
address was used in experiments. It has OS Linux 5.4 installed beforehand. Developed software
detected it with probability of 0.987 (Fig. B.5). Nmap gave a result of it having Windows 7 installed
(Fig. B.6).

                                                                                                        74
8. Conclusion and future work
    The analysis of existing methods and means of detection of the remote node operating system
shows the main advantages and disadvantages of their using. Two main approaches, active and
passive, of the remote node OS detection are considered. The combination of these approaches can
make the process detection more flexible and accurate. The existing method of remote host OS
detection is mostly based on signature model of decision making, but in such way they have a lot of
errors because of undecidability in case of properly signature absence. That is why the machine
learning methods are carried out. To form dataset for model learning the five-stage process is realized:
traffic generating and gathering, parsing, preprocessing and feature selection.
    As a result of the research, a classifier based on the Decision Tree model was trained. At the same
time, other classifiers also showed high metrics with low rate errors within the OS family. The high
accuracy of detection the type of OS indicates well-chosen features, as well as a sufficient size of
input data set. This helped to avoid the effect of overfitting.
    The trained model can detect seven OS versions within several families with absolute accuracy.
The system architecture is proposed to remote OS detection. It realized in crosspaltform application
based on trained model. This software tool allows scanning hosts in both offline and online modes. At
the same time, both active and passive scanning are implemented online. This adds a means of
versatility compared to analogues. The developed tool can be used by an ethical hacker for intrusion
testing, network administrator for auditing, checking the network for new unknown devices. You can
also use the tool to test the effectiveness of server protection tools against detecting their OS.
    As a result of the experiment, the developed software was more effective compared to Nmap.
However, Nmap allows to define many more types of OS. Therefore, further research will be
associated with scaling the model, as well as expanding the functionality of application. One such
future function will be to implement the ability to provide a list of ranked vulnerabilities inherent in a
particular OS. It provides faster vector attack creation or decision making for network protection.

9. References
[1] CVE details: The ultimate security vulnerability datasource – Operation systems. URL:
     https://www.cvedetails.com/product-list/product_type-o/vendor_id-0/firstchar-W/Operating-
     Systems.html.
[2] Vulnerability and threat trends 2020: Reseatch report.               URL: https://lp.skyboxsecuri-
     ty.com/rs/440-MPQ-510/images/Skybox_Report_2020-VT_Trends.pdf
[3] Leonid Kupershtein, Tatiana Martyniuk, Olesia Voitovych, Bohdan Kulchytskyi, Andrii
     Kozhemiako et al. "DDoS-attack detection using artificial neural networks in Matlab," Proc.
     SPIE 11176, Photonics Applications in Astronomy, Communications, Industry, and High-Energy
     Physics Experiments 2019. doi: 10.1117/12.2536478
[4] Voitovych, O.P., Yuvkovetskyi, O.S., Kupershtein, L.M. "SQL injection prevention system",
     2016 IEEE International Scientific Conference "Radio Electronics and Info Communications",
     UkrMiCo 2016 - Conference Proceedings. doi: 10.1109/UkrMiCo.2016.7739642
[5] Voitovych, O., Kupershtein, L., Lukichov, V., Mikityuk, I. "Multilayer Access for Database
     Protection", 2018 International Scientific-Practical Conference on Problems of
     Infocommunications Science and Technology, PICS&T'2018 - Proceedings, pp. 474-478. doi:
     10.1109/INFOCOMMST.2018.8632152
[6] Gurpreet K. Juneja1 “Ethical hacking: a technique to enhance information security”,
     International Journal of Innovative Research in Science, Engineering and Technology, Vol. 2,
     Issue 12, pp. 7575-7580.
[7] A. Orebaugh, B. Pinkard, Nmap in the Enterprise: your guide to network scanning, Burlington,
     MA : Syngress Pub., 2008.
[8] G. Fyodor Lyon, Nmap Network Scanning: The Official Nmap Project Guide to Network
     Discovery and Security Scanning, Nmap Project, 2009, 468 p.
[9] Nmap: the Network Mapper - Free Security Scanner. URL: https://nmap.org.
[10] Fingerprinting Methods Avoided by Nmap. URL:                https://nmap.org/book/osdetect-other-
     methods.html#osdetect-passive.
[11] NetScanTools Pro OS Fingerprinting Tool Description. URL: https://www.netscantools.com/
     nstpro_os_fingerprinting.html.

                                                                                                       75
[12] xprobe2(1) - Linux man page. URL: https://linux.die.net/man/1.
[13] p0f v3 (version 3.09b). URL: https://lcamtuf.coredump.cx/p0f3.
[14] GitHub - xnih/satori: Python rewrite of passive OS fingerprinting tool. URL: https://git-
     hub.com/xnih/satori.
[15] NetworkMiner. URL: https://www.netresec.com/?page=NetworkMiner.
[16] GitHub - gamelinux/prads: Passive Real-time Asset Detection System. URL: https://git-
     hub.com/gamelinux/prads.
[17] Ettercap Home Page. URL: https://www.ettercap-project.org.
[18] OS Fingerprinting using Wireshark. URL: https://andytanoko.wordpress.com/2020/07/19/os-
     fingerprinting-using-wireshark.
[19] Wireshark Tutorial: IdentifyingHosts and Users. URL: https://unit42.paloaltonetworks.com/
     using-wireshark-identifying-hosts-and-users.
[20] OS Detection Techniques. URL: https://jonathansblog.co.uk/os detection-techniques.
[21] P. Auffret, SinFP, unification of active and passive operating system fingerprinting. Journal in
     Computer Virology, 2008, 6(3), pp. 197–205. doi:10.1007/s11416-008-0107-z
[22] C. Trowbridge, An Overview of Remote Operating System Fingerprinting, White paper, 2003.
     URL: https://sansorg.egnyte.com/dl/dp8wFpM37k/?
[23] Gurary, Jonathan & Zhu, Ye & Bettati, Riccardo & Guan, Yong. (2016). Operating System
     Fingerprinting. doi: 10.1007/978-1-4939-6601-1_7.
[24] Tyagi, Rohit & Paul, Tuhin & Bs, Manoj & B., Thanudas. (2015). Packet Inspection for
     Unauthorized OS Detection in Enterprises. IEEE Security & Privacy. 13. pp. 60-65. doi:
     10.1109/MSP.2015.86.
[25] B. Zhang, T. Zou, Y. Wang and B. Zhang, "Remote Operation System Detection Base on
     Machine Learning," 2009 Fourth International Conference on Frontier of Computer Science and
     Technology, 2009, pp. 539-542. doi: 10.1109/FCST.2009.21.
[26] Song, Jinho & Kim, Yonggun & Won, Yoojae, Operating System Fingerprint Recognition Using
     ICMP, 2020. doi: 10.1007/978-981-13-9341-9_49.
[27] Aksoy, Ahmet & Louis, Sushil & Gunes, Mehmet, Operation system fingerprinting via
     automated network traffic analysis, 2017, pp. 2502-2509. doi: 10.1109/CEC.2017.7969609.
[28] Al-Shehari, Taher & Shahzad, Farrukh. Improving Operating System Fingerprinting using Machine
     Learning Techniques. International Journal of Computer Theory and Engineering, 2014.
[29] A. Aksoy, S. Louis and M. H. Gunes, "Operating system fingerprinting via automated network
     traffic analysis," 2017 IEEE Congress on Evolutionary Computation (CEC), Donostia, Spain,
     2017, pp. 2502-2509. doi: 10.1109/CEC.2017.7969609.
[30] Martyniuk, T.B., Kozhemiako, A.V., Kupershtein, L.M. "Formalization of the Object
     Classification Algorithm", Cybernetics and Systems Analysis, 2015, 51 (5), pp. 751-756. doi:
     10.1007/s10559-015-9767-0.
[31] Dataset Using TLS Fingerprints for OS Identification in Encrypted Traffic. URL:
     https://zenodo.org/record/3461771.
[32] De Montigny-Lebouf A. A Multi-Packet Signature Approach to Passive Operating System
     Detection,     Communications Research Centre,               Canada, 2005. URL: https://apps.dtic.
     mil/sti/pdfs/ADA436420.pdf.
[33] Virtualization: IBM Cloud Education. URL: https://www.ibm.com/cloud/learn/virtualization-a-
     complete-guide.
[34] Tracing network traffic using tcpdump and tshark. URL: https://techzone.ergon.ch/tcpdump
[35] Albon C. Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to
     Deep Learning, O'Reilly Media, Inc., 2018, 336 p.
[36] D. Singh, B. Singh, Investigating the impact of data normalization on classification performance,
     Applied Soft Computing, Vol. 97, Part B, 2020. doi: 10.1016/j.asoc.2019.105524.
[37] Kuhn M., Kjell J. Feature Engineering and Selection: A Practical Approach for Predictive
     Models: Chapman and Hall/CRC, 2019. 310 р.
[38] Tuning the hyper-parameters of an estimator. URL: https://scikit-learn.org/stable/modu-
     les/grid_search.html.
[39] Scikit-learn Tutorial – Beginner’s Guide to GPU Accelerating ML Pipeline. URL:
     https://developer.nvidia.com/blog/scikit-learn-tutorial-beginners-guide-to-gpu-accelerating-ml-pipeline.
[40] Mahler P. RAPIDS Release 21.06. URL: https://medium.com/rapids-ai/rapids-release-21-06-
     f9bd2e5a9aa4.


                                                                                                          76
10. Appendix
Appendix A. Confusion matrixes


Figure A.1: Multilayer Perceptron


                            Figure A.2: K-Nearest Neighbors


                                                              77
Figure A.3: Support Vector Machine


Figure A.4: Gaussian Naive Bayes


                                     78
Figure A.5: Logistic Regression


Figure A.6: Random Forest


                                  79
Appendix B. Experiments results


  Figure B.1: Developed application


  Figure B.2: Nmap


  Figure B.3: Developed application


                                      80
Figure B.4: Nmap


Figure B.5: Developed application


Figure B.6: Nmap


                                    81