<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ankita Panigrahi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rakesh Sharma</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sujata Chakravarty</string-name>
          <email>chakravartys69@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bijay K. Paikaray</string-name>
          <email>bijaypaikaray87@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Regression</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artificial Neural Network</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Random Forest.</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty. of Management Studies, Sri Sri University</institution>
          ,
          <addr-line>Odisha</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Information &amp; Communication Technology, Medhavi Skills University</institution>
          ,
          <addr-line>Sikkim</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <fpage>172</fpage>
      <lpage>178</lpage>
      <abstract>
        <p>Currently, everyone loves to travel by flights. Going along with the study, the charge of travelling through a plane change now and then which also includes the day and night time. Additionally, it changes with special times of the year or celebration seasons. There are a few unique elements upon which the cost of air transport depends. The salesperson has data regarding each of the variables, however, buyers can get confined information which is not sufficient to foresee the airfare costs. Considering the provisions, for example, time of the day, the number of days remaining and the time of take-off this will provide the perfect time to purchase the plane ticket. The motivation behind this paper is to concentrate on every component that impacts the variations in the costs of this means of transport and how these are connected with the diversity in the airfare. Subsequently, at that point, utilizing this data, construct a framework that can help purchasers when to purchase a ticket. Machine Learning algorithms prove to be the best solution for the above-discussed problems. In this project, there is an implementation of Artificial Neural Network (ANN), LR (Linear Regression), DT (Decision Tree), and RF (Random Forest).</p>
      </abstract>
      <kwd-group>
        <kwd>Machine Learning Algorithms</kwd>
        <kwd>airfare</kwd>
        <kwd>supervised learning</kwd>
        <kwd>predictions</kwd>
        <kwd>flight</kwd>
        <kwd>Linear</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>A person who already has reserved a ticket for a flight realizes how powerfully the price of the ticket</title>
        <p>
          switches [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Airline utilizes progressed techniques considered Revenue Management to accomplish a
characteristic esteeming technique [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The most affordable ticket available changes over a course of time.
The expense of the booking may be far and wide. This esteeming technique normally alters the cost
according to the different times in a day namely forenoon, evening, or night. Expenses for the flight may
similarly alter according to the different seasons in a year like summers, rainy and winters, also during the
period of festivals. The buyers would be looking for the cheapest ticket while the outrageous objective of
the transporter would be generating more and more revenue. Travelers for the most part attempt to buy
the ticket ahead of their departure day. The reason would be their belief that the prices might be the highest
when they would make a booking much nearer to the day of their flight but conventionally this isn't
verifiable. The buyer might wrap up paying more than they should for a comparable seat. Considering the
challenges faced by the travellers for getting an affordable seat, various strategies are utilized which will
extract a particular day on which the fare will be the least. For this purpose, Machine Learning comes into
the picture. Gini and Groves developed a model using PLSR, to predict the appropriate time to book the
        </p>
        <p>
          2020 Copyright for this paper by its authors.
Using the Linear Quantile Blended Regression methodology, Janssen [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] developed an assumption model
for the route of San Francisco to New York with already available data on flight fares for each day provided
by www.infare.com. The two important features were the day count from departure and which day of the
week it is, whether it’s weekday or weekend. This model was capable enough to predict the expense for
the flight for the days that were nowhere close to the day of departure but the results were not satisfying
if it would be close to the date of journey. A ticket-purchasing time incremental model depending upon
marked point processors and information extracting systems and computable investigation strategy was
suggested by Wohlfarth [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The proposed system changes the heterogeneous value arrangement
information to added value arrangement system. For choosing the best synchronizing group and later
comparison of advancement model a tree-based order calculation has been used. Papadakis [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] anticipated
whether there would be a fall in the airfare later on by addressing the issue as a classification task using
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Logistic Regression, Linear SVM and Ripple Down Rule Learner models. Ren, Yang, and Yuan [7] worked on Linear Regression, Naïve Bayes, SoftMax Regression, and SVM models in predicting the prices.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Data Collection</title>
      <sec id="sec-2-1">
        <title>The assortment of data is the very first step in machine learning projects. There are various sources of</title>
        <p>data available on numerous websites that are deployed to construct the models. These sites supply a huge
variety of data regarding different airlines, routes, times, and tolls. In this part, data gathered from the
various available sources are studied. For the execution of this, information is brought from a site called</p>
      </sec>
      <sec id="sec-2-2">
        <title>Kaggle. For the assortment of the data and to execute the model's Python is utilized [8-15]. The dataset</title>
        <p>collected contains information about different airlines in India. It consists of various factors which affect
the price of a flight ticket including the price for a particular flight. It contains 10683 rows of data. The
features present in the dataset are the name of companies, Date of travelling, Origin, terminus, path of
travelling, Time of Departure, Time of Arrival, Travelling Hours, Total Stoppage, Additional Info, and</p>
      </sec>
      <sec id="sec-2-3">
        <title>Price.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Cleaning and Preparing of Data</title>
      <sec id="sec-3-1">
        <title>Cleaning and preparing data are a very important step in machine learning. The data collected can’t</title>
        <p>be used raw as it may contain certain parameters which would be of no use and also certain data can’t be
used the way it would be present in the dataset. So, before proceeding to the actual work, the data needs
to be filtered and it should be absolutely clean. For achieving this, all the duplicate and null values are
removed from the dataset and specific data is converted to a usable format.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Machine Learning Techniques</title>
      <sec id="sec-4-1">
        <title>Various conventional machine learning algorithms are used for creating a model for flight fare</title>
        <p>prediction which is ANN, LR, DT, and RF. These loads of machine learning techniques are executed using
the sci-kit-learn library available in python. For assessing the exhibition of these algorithms, definite
boundaries are thought of. These are mentioned as follows: MAPE (Mean Absolute Percentage Error) and</p>
      </sec>
      <sec id="sec-4-2">
        <title>RMSE (Root Mean Square Error).</title>
        <p>4.1 RMSE</p>
      </sec>
      <sec id="sec-4-3">
        <title>RMSE is a tool that helps in determining how accurately the model is making the predictions. It</title>
        <p>calculates how much error the model creates while making these predictions. It measures the standard of
predictions. Mathematically, it is defined as the square root of the average of the squares of all the errors.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Error is defined as the difference between the actual and predicted value. Less the RMSE, the better the performance of the model is. Usually, an RMSE score of less than 1 is considered the best.</title>
        <p>4.2 MAPE</p>
      </sec>
      <sec id="sec-4-5">
        <title>Mean Absolute Percentage Error is most often used in regression problems. It is most popular in</title>
        <p>calculating errors in forecasting. It gives an idea about how much accurately the model is evaluating the
predictions. Statistically, it is the mean or average of the absolute percentage errors of forecasts. Error is
characterized as the contrast between the actual and predicted value. Less the MAPE, the better the
exhibition of the model is. Typically, a MAPE score of below 1 is viewed as awesome.</p>
        <p>Here,
=</p>
      </sec>
      <sec id="sec-4-6">
        <title>At is the actual-value</title>
      </sec>
      <sec id="sec-4-7">
        <title>Ft is the forecasted-value</title>
        <p>(1)
(2)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Machine Learning Algorithms Used</title>
    </sec>
    <sec id="sec-6">
      <title>5.1 Artificial Neural Network (ANN)</title>
      <sec id="sec-6-1">
        <title>An artificial Neural Network is simply a Neural Network that resembles the biological Neural</title>
        <p>Network present in the human brain. It is designed in a way such that it would function the same way a
human brain function. It is the collection of millions and millions of artificial neurons. These artificial
neurons are the building blocks of the ANN model. Artificial Neuron consists of Inputs and their
corresponding weights. An activation function is chosen which takes these inputs multiplies them to their
corresponding weights and produces the output. Every Artificial Neural Network must have three layers:
the input layer which takes the input, the hidden layer where all the computations take place, and the
output layers which produce the output.
whereas Linear Activation Function with weight 1 is used in case of the final output. Here, adam optimizer
is used.</p>
        <p>= ( ∑ 
 −1   , −   )

  −1</p>
        <p>Here,</p>
        <p>) =  0 +  1 ∗ 
y is dependent variable,
x is independent variable,
b0 is constant,
b1 is slope.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5.2 Linear Regression</title>
      <sec id="sec-7-1">
        <title>Linear Regression is an algorithm in machine learning. It works by finding the relationship between single or multiple input variables and the output variable. These relationships are built with linear predictor functions. The graph of a linear regression model is linear justifying its name.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>5.3 Decision Tree</title>
      <sec id="sec-8-1">
        <title>This model is a member of a supervised learning family. It can fit well in both classification and regression</title>
        <p>problems. As its name says, it is structured like a tree containing the decision nodes and leaf nodes.</p>
      </sec>
      <sec id="sec-8-2">
        <title>Decision nodes have multiple branches for decision making where leaf nodes represent the outcomes of these decisions which is further not divided into any branches.</title>
      </sec>
      <sec id="sec-8-3">
        <title>Splitting</title>
      </sec>
      <sec id="sec-8-4">
        <title>ROOT NODE</title>
      </sec>
      <sec id="sec-8-5">
        <title>Branch/Sub-Tree</title>
      </sec>
      <sec id="sec-8-6">
        <title>Decision Node</title>
      </sec>
      <sec id="sec-8-7">
        <title>Terminal Node</title>
      </sec>
      <sec id="sec-8-8">
        <title>Decision Node</title>
        <p>A
B</p>
      </sec>
      <sec id="sec-8-9">
        <title>Decision Node</title>
      </sec>
      <sec id="sec-8-10">
        <title>Terminal Node</title>
      </sec>
      <sec id="sec-8-11">
        <title>Terminal Node</title>
      </sec>
      <sec id="sec-8-12">
        <title>Terminal Node</title>
      </sec>
      <sec id="sec-8-13">
        <title>Terminal Node</title>
        <p>(5)</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>5.4 Random Forest</title>
    </sec>
    <sec id="sec-10">
      <title>6. Algorithms Evaluation</title>
      <sec id="sec-10-1">
        <title>Like the Decision Tree, RF is also a supervised learning technique. Random forest works with multiple</title>
        <p>decision trees. Here, the trees are operated as an ensemble. Every tree present in a random forest divides
a class prediction and the class having the most votes comes out as models’ prediction.</p>
        <sec id="sec-10-1-1">
          <title>RMSE RESULTS</title>
          <p>Artificial Neural</p>
          <p>Network
Linear Regression
Decision Tree
Random Forest
Algorithm Applied</p>
        </sec>
        <sec id="sec-10-1-2">
          <title>MAPE RESULTS</title>
          <p>Artificial Neural</p>
          <p>Network
Linear Regression
Decision Tree
Random Forest
Algorithm Applied</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>7. Conclusion</title>
      <sec id="sec-11-1">
        <title>We learn that ML models can be used to predict prices based on earlier data more correctly. The</title>
        <p>presented paper reflects the dynamic change in the cost of flight tickets from which we get the
information about the increase or decrease in the price as per the days, weekends, and the time of the
day. With the Ml algorithm applied on various datasets, better results can be obtained for prediction.</p>
      </sec>
      <sec id="sec-11-2">
        <title>The error values that we got for Artificial Neural Network are comparatively high but for obtaining lesser values we can use evolutionary algorithms of ANN like genetic algorithms in the future.</title>
        <p>[16]
[17]</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Rajankar</surname>
            , Supriya, and
            <given-names>Neha</given-names>
          </string-name>
          <string-name>
            <surname>Sakharkar</surname>
          </string-name>
          .
          <article-title>"A Survey on Flight Pricing Prediction using Machine Learning."</article-title>
          <source>Internatıonal Journal Of Engıneerıng Research &amp; Technology (Ijert) 8</source>
          .6 (
          <year>2019</year>
          ):
          <fpage>1281</fpage>
          -
          <lpage>1284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>Barry C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>John F. Leimkuhler</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ross</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Darrow</surname>
          </string-name>
          .
          <article-title>"Yield management at American airlines</article-title>
          .
          <source>" interfaces 22.1</source>
          (
          <year>1992</year>
          ):
          <fpage>8</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Groves</surname>
            , William, and
            <given-names>Maria</given-names>
          </string-name>
          <string-name>
            <surname>Gini</surname>
          </string-name>
          .
          <article-title>"An agent for optimizing airline ticket purchasing." Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems</article-title>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Janssen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Tim</surname>
          </string-name>
          , et al.
          <article-title>"A linear quantile mixed regression model for prediction of airline ticket prices</article-title>
          .
          <source>"</source>
          Radboud University (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Wohlfarth</surname>
          </string-name>
          ,
          <string-name>
            <surname>Till</surname>
          </string-name>
          , et al.
          <article-title>"A data-mining approach to travel price forecasting</article-title>
          .
          <source>" 2011 10th International Conference on Machine Learning and Applications and Workshops</source>
          . Vol.
          <volume>1</volume>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Papadakis</surname>
            ,
            <given-names>Manolis.</given-names>
          </string-name>
          "
          <source>Predicting Airfare Prices."</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ren</surname>
            , Ruixuan,
            <given-names>Yunzhe</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            , and
            <given-names>Shenli</given-names>
          </string-name>
          <string-name>
            <surname>Yuan</surname>
          </string-name>
          .
          <article-title>"Prediction of airline ticket price</article-title>
          ." University of Stanford (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Tziridis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Konstantinos</surname>
          </string-name>
          , et al.
          <article-title>"Airfare prices prediction using machine learning techniques</article-title>
          .
          <source>" 2017 25th European Signal Processing Conference (EUSIPCO)</source>
          . IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Boruah</surname>
          </string-name>
          ,
          <string-name>
            <surname>Abhijit</surname>
          </string-name>
          , et al.
          <article-title>"A Bayesian Approach for Flight Fare Prediction Based on Kalman Filter." Progress in Advanced Computing</article-title>
          and
          <string-name>
            <given-names>Intelligent</given-names>
            <surname>Engineering</surname>
          </string-name>
          . Springer, Singapore,
          <year>2019</year>
          .
          <fpage>191</fpage>
          -
          <lpage>203</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakravarty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Paikaray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mishra</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Dash</surname>
          </string-name>
          ,
          <article-title>"Hyperspectral Image Classification using Spectral Angle Mapper," 2021 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE</article-title>
          ),
          <year>2021</year>
          , pp.
          <fpage>87</fpage>
          -
          <lpage>90</lpage>
          , doi: 10.1109/WIECONECE54711.
          <year>2021</year>
          .9829585.
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Tianyi</surname>
          </string-name>
          , et al.
          <article-title>"A framework for airfare price prediction: A machine learning approach</article-title>
          .
          <source>" 2019 IEEE 20th International Conference on Information Reuse</source>
          and
          <article-title>Integration for Data Science (IRI)</article-title>
          . IEEE,
          <year>2019</year>
          . Abdella,
          <string-name>
            <given-names>Juhar</given-names>
            <surname>Ahmed</surname>
          </string-name>
          , et al.
          <article-title>"Airline ticket price and demand prediction: A survey."</article-title>
          <source>Journal of King Saud University-Computer and Information Sciences 33.4</source>
          (
          <year>2021</year>
          ):
          <fpage>375</fpage>
          -
          <lpage>391</lpage>
          .
          <string-name>
            <surname>Zhao-Jun</surname>
            , Gu,
            <given-names>Wang</given-names>
          </string-name>
          <string-name>
            <surname>Shuang</surname>
            ,
            <given-names>and Zhao</given-names>
          </string-name>
          <string-name>
            <surname>Yi</surname>
          </string-name>
          .
          <article-title>"Flight ticket fare prediction model based on timeserial</article-title>
          .
          <source>" Journal of Civil Aviation University of China 31.2</source>
          (
          <year>2013</year>
          ):
          <fpage>80</fpage>
          .
          <string-name>
            <surname>Huang</surname>
          </string-name>
          , Tenghui,
          <string-name>
            <surname>Chih-Chien Chen</surname>
            , and
            <given-names>Zvi</given-names>
          </string-name>
          <string-name>
            <surname>Schwartz</surname>
          </string-name>
          .
          <article-title>"Do I book at exactly the right time? Airfare forecast accuracy across three price-prediction platforms</article-title>
          .
          <source>" Journal of Revenue and Pricing Management 18.4</source>
          (
          <year>2019</year>
          ):
          <fpage>281</fpage>
          -
          <lpage>290</lpage>
          . S. Chakravarty,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mohapatra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Dash</surname>
          </string-name>
          , (
          <year>2016</year>
          ),
          <article-title>Evolutionary Extreme Learning Machine for Energy Price Forecasting</article-title>
          ,
          <source>International Journal of Knowledge-Based and Intelligent Engineering Systems</source>
          ,
          <volume>20</volume>
          ,
          <fpage>75</fpage>
          -96 https://www.kaggle.com/nikhilmittal/flight-fare
          <string-name>
            <surname>-</surname>
          </string-name>
          prediction-mh/ https://github.com/rishabdhar12/Flight-Price-Prediction/tree/main/Dataset
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>