=Paper= {{Paper |id=Vol-3118/p07 |storemode=property |title=Detecting Phishing Websites by using Neural Network Models |pdfUrl=https://ceur-ws.org/Vol-3118/p07.pdf |volume=Vol-3118 |authors=Dominika Zurawska |dblpUrl=https://dblp.org/rec/conf/icyrime/Zurawska21 }} ==Detecting Phishing Websites by using Neural Network Models== https://ceur-ws.org/Vol-3118/p07.pdf
Detecting Phishing Websites by using Neural Network
Models
Dominika Zurawska1
1
    Faculty of Applied Mathematics, Silesian University of Technology, Kaszubska 23, 44100 Gliwice, POLAND


                                             Abstract
                                             In the article is presented the problem of classifying domains that may be phishing by using parameters and information
                                             extracted from sample pages. Presented tests are using various ML classifications models which we used from open libraries
                                             in selected programming language. Presented methods are implemented in simple way just to test selected models and
                                             compare them in standard metrics. To my tests i have selected neural networks, decision tree, svm, logistic regression and
                                             random forest. I have tested their effectiveness to select the best option for phishing.

                                             Keywords
                                             neural network, classification, phishing, security domain



1. Introduction                                                                                                            important in the classification of phishing.

Machine learning methods are very popular in last years
[1, 2, 3, 4]. In the development of It we can see that many                                                                2. Phishing Websites Features
applications use such methods to improve working to-
ward some important aspects. In [5], [6], and [7] there                                                                    In this project, we shed light on the important features
are several application of neural networks in image pro-                                                                   that have proved to be sound and effective in predict-
cessing. The model presented in [8, 9] show that neural                                                                    ing phishing websites. We classified our domain based
networks are very good extractors of potential danger-                                                                     on features such as: having IP Address, URL Length,
ous situation on the internet. Tests on classifiers for IoT                                                                Shortening Service, having At Symbol, double slash redi-
environments show that both neural networks and fuzzy                                                                      recting, Prefix Suffix, having Sub Domain, SSLfinal State,
systems have very good application [10, 11].                                                                               Domain registration length, Favicon, port, HTTPS token,
   Phishing attacks attempt to gain sensitive, confiden-                                                                   Request URL, URL of Anchor, Links in tags, SFH, Submit-
tial information such as usernames, passwords, credit                                                                      ting to email, Abnormal URL, Redirect, on mouseover,
card information, network credentials and more [12]. By                                                                    RightClick, pop up window, Iframe, age of a domain,
posing as a legitimate individual or institution via phone                                                                 DNSRecord, web traffic, Page Rank, Google Index, Links
or email, cyber attackers use social engineering to ma-                                                                    pointing to the page, Statistical report, Result.
nipulate victims into performing specific actions—like
clicking on a malicious link or attachment or willfully                                                                    3. Main decision parameters
divulging confidential information. Both individuals and
organizations are at risk; almost any kind of personal                                                                     The features that matter the most in the context of phish-
or organizational data can be valuable, whether it be to                                                                   ing websites detect.
commit fraud or access an organization’s network. In
addition, some phishing scams can target organizational
                                                                                                                           3.1. SSL final State
data in order to support espionage efforts or state-backed
spying on opposition groups. Very interesting comments                                                                     The Subject Common Name of the certificate has to match
on this model can be found directly in online resources of                                                                 the hostname of the phishing site that returned it. Some
https://www.antiphishing.org/resources/apwg-reports/.                                                                      sites will return the hosting company’s certificate when
   To properly classify our domains, we decided to check                                                                   requested over HTTPS. As most modern browsers display
and compare different classifiers to see if there are any                                                                  warnings when a non-matching certificate is encoun-
significant differences between the results and which                                                                      tered, such certificates only serve to make the user more
one is best suited to this problem. And also check if we                                                                   suspicious instead of increasing the perceived security
can extract the features of a given domain that are most                                                                   of the site.
ICYRIME 2021 @ International Conference of Yearly Reports on
Informatics Mathematics and Engineering, online, July 9, 2021                                                              3.2. URL of Anchor
" domizur257@student.polsl.pl (D. Zurawska)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                           An anchor is an element defined by the  tag. This
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)                                             feature is treated exactly as “Request URL”. However, for



                                                                                                                      45
Dominika Zurawska CEUR Workshop Proceedings                                                                                45–50



this feature we examine:                                           3.6. Links pointing to page
    1. If the  tags and the website have different              The number of links pointing to the webpage indicates
       domain names. This is similar to request URL                its legitimacy level, even if some links are of the same
       feature.                                                    domain. In our datasets and due to its short life span,
    2. If the anchor does not link to any webpage, e.g.:           we find that 98% of phishing dataset items have no links
           a)                                          pointing to them. On the other hand, legitimate websites
           b)                                   have at least 2 external links pointing to them.
           c) 
           d)                        Of Link Pointing to The Webpage = 0 → Phish-
                                                                   ing
Rule:                                                              Of Link Pointing to The Webpage > 0 and <= 2
% of URL Of Anchor <31% → Legitimate                               →Suspicious
% of URL Of Anchor ≥ 31% And ⩽ 67% → Suspicious                    Otherwise → Legitimate
Otherwise→ Phishing

3.3. Links in tags                                                 4. Classifications Algorithms
Given that our investigation covers all angles likely to           In this work some selected models were tested.
be used in the webpage source code, we find that it is             Presented results are from open libraries that
common for legitimate websites to use  tags to               were available for student tests in online services.
offer metadata about the HTML document;