<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Reduction Complexity with Regression Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Deineha</string-name>
          <email>oleksandr.deineha@karazin.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volodymyr Donets</string-name>
          <email>v.donets@karazin.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grygoriy Zholtkevych</string-name>
          <email>g.zholtkevych@karazin.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>V.N. Karazin Kharkiv National University</institution>
          ,
          <addr-line>4 Svoboda Sqr, Kharkiv, 61022</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>20</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>In this study, we dive deep into the Pure Lambda Calculus' evaluation process, specifically zooming in on term reduction steps. Commonly, these steps are seen as the same, but our research suggests otherwise - some steps are more complex than others. We investigated how specific term features play a role in computing efficiency, aiming to shed light on how different reduction strategies perform. Through testing, our Linear Regression and ANN regression models showed potential in predicting the time taken for a reduction step based on term details. Yet, when we used richer datasets to pin down complexity, our predictions didn't sharpen as much as hoped. This leads us to believe that term details, while useful, might not be the full puzzle pieces we need for spot-on strategy predictions. This research opens the door for further exploration into refining our models and unearthing other crucial factors that could shape term reduction time. Ultimately, our findings could help optimize programming tools like compilers and interpreters, steering them toward swifter operations. pure lambda calculus, term computational complexity, estimating computational complexity, regression analysis.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>reduction, cost of reduction, functional programming,</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        In Pure Lambda Calculus exists one of the fundamental problems of the evaluating program
execution process [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The existing approach only analyzes term reduction steps as equal processes [
        <xref ref-type="bibr" rid="ref1 ref2">1,
2</xref>
        ]. The problem is that they are different, so we are trying to find a way to estimate that inequality.
      </p>
      <p>This study aims to research the factors influencing time complexity in term reduction within the
Pure Calculus environment. We focus on measuring different term features and their impact on
computational efficiency. That approach must show that some reduction strategies are less effective in
terms of computational resources but, at the same time, have smaller amounts of reduction steps.</p>
      <p>Potentially, it may have an impact on improving programming language compilers and interpreters
in a way to analyze programming code for choosing a faster execution strategy.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Background and related work</title>
      <p>
        The idea that the impracticality of estimating term reduction strategy effectiveness by counting
reduction steps as cost-equal processes has been known since early works in this area of math [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Also,
many studies have explored various reduction strategies in functional programming but haven't focused
on how time spent in term reduction varies with the topology of terms and choice of reduction strategy
[
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. But we must admit that there are works where authors tried to investigate methods for measuring
computational reduction complexity by analyzing some reduction strategy and its memory consumption
[
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] or measuring complexity via cost models [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In the work [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the author tried to define some
classes of terms by computational cost and define a type of reduction strategy. Also, other researchers
proposed a weak invariant cost model for the lambda calculus, the model based on theoretical
      </p>
      <p>2023 Copyright for this paper by its authors.
CEUR</p>
      <p>
        ceur-ws.org
assumptions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In previous articles [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ], we showed our Pure Lambda Calculus environment, which
we developed using the Python programming language, in order to find an average optimal strategy for
reducing lambda terms. The lambda environment is built on the idea of term tree representation, which
was shown in many articles [
        <xref ref-type="bibr" rid="ref10 ref2 ref3 ref5">2, 3, 5, 10</xref>
        ], and we reduce all operations with terms to the idea of tree
operations. So, simple operations like finding redexes, finding vertices, and finding height and width
are the tree traversal operations, and operation reduction is actually a combination of the node removing
and copying operations, which was shown in the paper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Obviously, tree topology and its volume
affect processing tree operations, but the lambda term in the tree representation is a more complicated
construction, which may have many insert operations for one reduction.
      </p>
      <p>
        We must admit that this task is practically impossible to solve due to the infinite amount of possible
lambda terms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], but it is possible to find a solution that will work in many practical applications.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Problem Statement</title>
      <p>In our Pure Lambda Calculus environment we tried to find the best average strategy for reducing
Lambda terms, and during experiments, we noticed that the time spent by the program on a one-step
reduction of different terms differs and depends on the term topology and the selected redex. Thus, the
necessity of a metric for defining the complexity of term reduction arose both before and after reduction.</p>
      <p>After analyzing a certain amount of literature we can conclude that there is no good metric that could
precisely measure the computational complexity of the term reduction strategy. There are no studies
where authors tried to combine the lambda calculus environment for investigating computational
measures based on the environment's computational costs. So, we need to find a way of defining this
metric using knowledge about a term and redex. The metric can be used as a smart way to define the
term reduction complexity and a way to define new computationally effective reduction strategies.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Hypothesis</title>
      <p>We suggested that some combination of the parameters of the number of vertices, the number of
redexes, the height and width of the term, as well as the depth of the redex can determine the time spent
on one step of the reduction and, therefore, determine the complexity of the reduction process. In other
words, we hypothesize that parameters for describing term topology can be used for accurate prediction
time spent on reducing a specific redex, which shows the computational complexity of a term with
specific redexes. Thereby, we can suggest two assumptions:</p>
      <p>1. The term parameters before the reduction process with the redex data can define the
computational complexity of the reduction process before performing one, or in other words,
complexity prediction;</p>
      <p>2. The second assumption requires term data before and after the reduction process with the redex
data, which can define the computational complexity of the reduction process afterward, in other words,
complexity determination.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Methodology</title>
      <p>The central idea of Machine Learning methods is to automatically construct a complex function that
best (according to the selected ML method) can approximate the real function describing the data.
Because we assume time spent on one reduction step is a measure of computational complexity, we can
use some method of Machine Learning for the prediction of time spent on one reduction step or solving
a regression problem, in other words.</p>
      <p>
        For regression analysis, we have to highlight the main regression methods:
 Linear Regression: one of the most important and widely used statistical techniques [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for
modeling the relationship between some feature data and target value, which is based on the equation
 =  +  .
      </p>
      <p>
        Advantages: it is relatively easy to train and get the final dependency expression, and this method could
be relatively easily extended for use on non-linear convolution of input data.
Disadvantages: the assumption that input and output data have a linear dependency, and sensitivity on
outliers, especially on small datasets.
 Decision Tree Regression: the main idea of this method [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is to split the data into subsets based
on features and make predictions dependent on the average value of the sample in the leaf node.
Advantage: interpretable, handling nonlinear and complex dependencies.
Disadvantage: prone to overfitting, hard to configure optimal effectiveness, sensitive to noises in data.
      </p>
      <p>
         Support Vector Regression is an extension of the Support Vector Machine method for regression
analysis. The method constructs a hyperplane in a high-dimensional input data space, which tends to
get the largest distance to the nearest training data point for better generalization [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Advantages: versatility due to different kernel functions, effective in capturing complex relationships
in high-dimensional data.
      </p>
      <p>Disadvantage: sensitive to the choice of kernel.</p>
      <p>
         K-Nearest Neighbors Regression: uses a weighted average of the k nearest neighbors for
estimating continuous variables [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Advantages: do not make strong assumptions about data distribution, can handle nonlinear
relationships.</p>
      <p>Disadvantages: it is difficult to obtain the objective function, sensitive to the choice of K and distance
metric.</p>
      <p>
         ANN Regression: for an artificial neural network, we can set the output layer without activation
function and it makes ANN a continuous value predictor. This is a widely used approach for obtaining
continuous data without constraining the ANN model [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
Advantages: ability to capture complex data relationships, high level of generalization, simple set of
approaches to modify the effectiveness of the ANN.
      </p>
      <p>Disadvantages: prone to overfitting, hard to tune many hyperparameters, sensitive to loss function
choice.</p>
      <p>
        As a metric for the estimating effectiveness of the regression model we are supposed to use default
measures like [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]:
      </p>
      <p> Mean Absolute Error (MAE): measures the average absolute difference between the expected
values and predicted.</p>
      <p>Advantages: easy to interpret, resistant to outliers because of equal weights to all errors.
Disadvantages: can’t define if it overestimates or underestimates, doesn’t provide information about
the error distribution.</p>
      <p> Mean Squared Error (MSE): squares the difference between the expected and predicted values
and takes the average of its sum.</p>
      <p>Advantage: gives higher penalties for larger errors, which is important for use as a loss function.
Disadvantage: sensitive to outliers and large errors can have a significant impact on the results. The
squared units make interpretation less intuitive.</p>
      <p> Root Mean Squared Error (RMSE): which is the square root of MSE.
Advantages: provide a more interpretable measure compared to MSE, weighted penalties for larger
error value.</p>
      <p>Disadvantage: sensitive to outliers like MSE.</p>
      <p> Mean Absolute Percentage Error (MAPE): measures the percentage of the difference between
the expected and predicted values, averaged over all observation values.
Advantages: it is easy to interpret and compare across different datasets, a percentage-based measure
of accuracy.</p>
      <p>Disadvantage: undefined when actual values are zero or close to zero, can be sensitive to outliers.</p>
      <p>Over a list of the main regression analysis methods and metrics for measuring regression accuracy,
we can use Linear Regression with complex data preprocessing and ANN Regression as methods that
can help us to formulate dependencies between term characteristics and time spent on reduction steps.
Other methods could be indicators of the best accuracy. Also, for ANN Regression, we use the MSE
measure as a loss function and RMSE and MAE as more interpretable metrics for the cross-comparing
model effectiveness.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Experiments and data collection</title>
    </sec>
    <sec id="sec-8">
      <title>6.1. Data generation</title>
      <p>
        In the Pure Lambda Calculus environment, we have a possibility to artificially generate terms. For
the generating procedure, we can set min, and max counts vertices and redexes, after generating we
have a filtering procedure that filters out terms with too long reduction trace with the selected strategy
(by default, it is the leftmost outermost strategy) or with too big memory consumptions. The term
generating procedure was inspired by the article [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. For experiments, we generated 220 terms, which
we normalized with the leftmost outermost and the rightmost innermost strategies, which gave us 3931
records with time in nanoseconds spent on each reduction. We also must admit that in this scenario time
spent on reduction consists of time spent on searching for appropriate redex by selected strategy plus
time spent on doing reduction. We use a Unix-based operating system for collecting time data because
it gives a more clear representation of time spent on processes than other operating systems. That
approach should give as accurate computational resource consumption as possible.
6.2.
      </p>
    </sec>
    <sec id="sec-9">
      <title>Data points and metrics</title>
      <p>In Fig. 1, we can see that the term could be easily represented as a tree, so features that we collect
about the term are actually parameters that describe a tree:</p>
      <p>1. Height – the longest sequence from the tree root to the term tree (for the term in Fig. 1 the height
value equals 4).</p>
      <p>2. Width – count leaves of the term tree (for the example equals 4).</p>
      <p>3. Redexes – count redexes in a term (in Fig. 1, you can find 2 redexes, marked as red Apps in the
tree representation, and underlined in the formula representation).</p>
      <p>4. Vertices – count vertices in the term tree, in terms of lambda calculus, are count all applications,
abstractions, and atom variables (in Fig. 1, it equals 12).</p>
      <p>5. Redex depth – length of sequence from the tree root to the selected redex application node (for
the first redex, which is the root node in the tree, it equals 1, and for the second redex it equals 3).</p>
      <p>6. Step time – time spent on finding appropriate redex by selected strategy plus time spent on
reduction step by selected redex in nanoseconds.
b</p>
      <p>Also, we collect parameters that describe the term after the reduction procedure, which are the same
except for redex depth, which isn’t required now. We also got differences between parameter values
before and after the reduction procedure.</p>
    </sec>
    <sec id="sec-10">
      <title>7. Data analysis</title>
    </sec>
    <sec id="sec-11">
      <title>7.1. Correlation matrices</title>
      <p>In the domain of regression analysis, the quest for precision and accuracy demands a meticulous
understanding of the relationship between variables in input data and the relationship between inputs
and outputs, which will help us decide if further research makes sense. In the broadest sense, correlation
may indicate any association type, but in statistical analysis, it usually refers to linear relationships, and
this is an important drawback to be aware of.</p>
      <p>Therefore, it became necessary to analyze the obtained data for correlations. Therefore, correlation
matrices were obtained, you can find them in Fig, 2, it was shown that the term features data and the
redex data have the highest correlation with the reduction time (0.56 - 0.77, shown in Fig. 2.a), the term
features data after reduction have a slightly smaller correlation with reduction time (0.48 – 0.68, shown
in Fig. 2.b) and given features difference before and after reduction have the smallest correlation
(0.033 – 0.13, shown in Fig. 2.c).</p>
      <p>We must admit that parameters like vertices, redexes, heights, widths, their post-reduction variant,
and the difference variant have high correlation values, close to 1, which simply indicates that these
parameters are related due to the term tree topology. Also, we should emphasize that the selected
reduction strategy doesn’t have correlation to the reduction step time and other term parameters.
According to the results of the correlation analysis, we decided that solving the regression problem for
these data and the target value made sense.
7.2.</p>
    </sec>
    <sec id="sec-12">
      <title>Assumptions</title>
      <p>We previously defined two assumptions. Based on these assumptions, we can now define three
datasets for training and testing:</p>
      <p>1. For the complexity prediction, we can use the term data before the reduction process and the
redex data. Input data: widths, heights, vertices, redexes, and redex_depths. Target data: steps_time.
2. For the complexity determination, we can use the term data before and after the reduction process
and the redex data. Input data: widths, heights, vertices, redexes, widths_post, heights_post,
vertices_post, redexes_post, and redex_depths. Target data: steps_time.</p>
      <p>3. For the complexity determination, we can use the term data before and after the reduction process,
the step difference term data, and the redex data. Input data: widths, heights, vertices, redexes,
widths_post, heights_post, vertices_post, redexes_post, widths_diff, heights_diff, vertices_diff,
redexes_diff, and redex_depths. Target data: steps_time.
7.3.</p>
    </sec>
    <sec id="sec-13">
      <title>Data preprocessing</title>
      <p>The typical problem for any Machine Learning method is data distribution inputs and outputs: many
methods require close distributions like (0; 1), (-1; 1), etc., which guarantee as fast method convergence
as possible. Hence, we decided to analyze the term features distributions and the target data distribution.
The results of the data distribution analysis are shown in Fig. 3, and we can see two problems: the first
problem with collected data is too wide distributions (unscaled in other words) and the second problem
for some features is non-normal data distribution, which might impair the learning process. A
nonnormal distribution of data is characteristic of width (Fig. 3.a and Fig. 3.e), vertices (Fig. 3.c and Fig.
3.g), redexes (Fig. 3.d and Fig. 3.h) before and after the reduction step. All parameters, which are
detailed in Figure 3, are presented in their unscaled form.</p>
      <p>
        To increase model regression performance we decided to take a logarithm of target value (step time).
For all input fields, we apply Yeo-Jonson power transformation [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], because it can take zero values in
data, process data logarithmically, normalize data distribution, and automatically scale data. After
applying the Yeo-Jonson power transformation for input data and taking the logarithm of the target
value, we got new data distributions which are shown in Fig. 4. It’s easy to see that new data
distributions for widths (Fig. 4.a and Fig. 4.e), vertices (Fig. 4.c and Fig. 4.g), and redexes (Fig. 4.d and
Fig. 4.h) before and after reduction step, also, the target variable step time (Fig. 4.n) became close to
normal distribution. Other features: heights (Fig. 4.b and Fig. 4.f), and all differences (Fig. 4.i, Fig. 4.j,
Fig. 4.k, and Fig. 4.l) did not change their distribution law. The transformation ensures setting compact
value distribution for all features and the target variable. In total, all these changes will improve model
quality and speed up regression model convergence.
      </p>
      <p>Before testing regression methods, we should consider that usual practice in Machine Learning is
dividing a dataset into tree sets for training, validation and testing. These sets are required for model
training episodes, tuning model hyperparameters, and final model testing in accordance, which allows
us to reveal the real quality of learning on previously unseen data. Due to relatively small amounts of
data and small models, we decided to use the test set as the validation. So, we divided our term records
in all datasets into two subsets namely training and testing ones in proportion to 70% / 30% in
accordance, so we got 2782 samples in the train set and 1149 samples in the test set. Also, we should
highlight that terms data in three datasets are identical, and train-test splitting happens in the same order.
So, model performance depends not on the random data splitting but mostly on itself.</p>
    </sec>
    <sec id="sec-14">
      <title>8. Regression models</title>
    </sec>
    <sec id="sec-15">
      <title>8.1. Standard approaches</title>
      <p>Further, standard approaches for solving regression problems were tested on the 1-st dataset
(complexity prediction problem). Results are represented in Table 1, where you can find results for the
following methods: 1. Linear Regression, 2. Decision Tree Regression, 3. Support Vector Regression,
4. K-Nearest Neighbors Regression and 5. ANN Regression. Methods 2 and 4 showed the lowest error
values (RMSE = 0.0497 – 0.2105) on the train data, but the error value on the test data was relatively
high (RMSE = 0.30 – 0.4), indicating the inability of these models to generalize the data. Methods 1, 3,
and 5 showed a higher error rate during training (RMSE = 0.27 – 0.29), but on test data, the error was
identical (RMSE = 0.28), which indicates the sufficient ability of the algorithms to generalize the data.
We decided to continue testing methods 1 and 5 due to the high generalizability. Linear Regression was
also practically proven that the linear model is quite sufficient for estimating the reduction time, and
it’s easy to get the final complexity estimation equation. ANNs are effective in terms of modifiability
and their ability to approximate complex functions within data.</p>
      <p>As you can see in Table 1, the Linear Regression and the ANN Regression models showed high
generalizability with RMSE of 0.28-0.29 on the test set with the 1st dataset. So we decided to test the
second assumption about complexity definition (on the 2nd and the 3rd datasets), where we achieved
RMSE of 0.27-0.29, which didn’t seem as much improvement withholding more data as expected.</p>
      <p>Also, for the 1st dataset, you can see in Fig. 5, drawings for ANN predictions and expected values.
These plots are achieved by plotting a dot in the space of predicted and expected values, where closer
values to the diagonal (marked as a gray line) mean a better model approximation. Here, dots over the
diagonal indicate overestimation, and dots under the diagonal indicate underestimation. Also, these
plots for the 2nd and the 3rd datasets are approximately the same as for ANN Regression and for Linear
Regression, which means that even more complex models cannot get better results. In Table 2 we can
find other metrics like MSE, MAE, and MAPE for estimating efficiency of Linear Regression and ANN
Regression models on the 1st, 2nd, and 3rd datasets. We conclude that using other metrics did not give
many benefits in distinguishing the quality of models, but some metrics like MAE give an average error
value, and MAPE gives an average percent of error, which might help in understanding plots in Fig. 5.
1. Linear Regression
2. Decision Tree Regression
3. Support Vector Regression
4. K-Nearest Neighbors</p>
      <p>Regression
5. ANN Regression
9. Discussion
1st dataset
2nd dataset
3rd dataset
train</p>
      <p>Performance testing of our model indicates that both the Linear Regression and ANN regression
models effectively predict the time spent on a reduction step based on term parameters. So we can
conclude that the first assumption about complexity prediction is relevant. However, the second
assumption about complexity determination with more detailed datasets didn’t show much effectiveness
improvements despite available data about the term state after the reduction procedure and differences
in these states. In particular, we can admit that the effect on the ANN model on which we can increase
model volume, but without much impact on MSE. So, we can conclude that term parameters have
relations to term reduction complexity, but these parameters aren’t sufficient, which is a weak sign, that
might help with estimating the best reduction strategy but without prominent accuracy.
10.</p>
    </sec>
    <sec id="sec-16">
      <title>Conclusion</title>
      <p>We hypothesized that parameters that describe term topology can be used for accurate prediction
time spent on reducing a specific redex, which shows the computational complexity of a term with
specific redexes. For this, we have proposed two assumptions: complexity prediction and complexity
determination, and we formulated three datasets on data obtained by normalizing artificially generated
terms. The first conclusion: we showed on two datasets for complexity determination, we didn’t achieve
much improvement in RMSE value even with increasing model capacity, which indicates that state
term after the reduction process and difference data don’t offer much information about the reduction
process. The second conclusion: we achieved a stable level of RMSE on train and test data for the 1-st
dataset for complexity prediction, which means that with the Linear Regression model or with the ANN
Regression model we can get a weak term reduction estimator, which doesn’t require big computational
resources, but doesn’t provide quite accurate estimation.</p>
      <p>For further research, we could focus on enhancing the regression models or investigating new
parameters that might affect term reduction time. For investigating new parameters, we could use
parameters that describe redex in more detail, like count bound variables in the object body (which
indicates count inserts in the term tree), parameters for redex subject, etc. We should also consider
improving the accuracy of our time measurements. Our experiments were conducted on a Unix-based
operating system using the Python interpreter, which does not guarantee precise time accuracy.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Mariangiola</given-names>
            <surname>Dezani-Ciancaglini</surname>
          </string-name>
          ,
          <article-title>Simona Ronchi Della Rocca, and Lorenza Saitta. Complexity of lambda-term reductions</article-title>
          .
          <source>RAIRO Theor. Informatics Appl</source>
          .
          <volume>13</volume>
          (
          <year>1979</year>
          ):
          <fpage>257</fpage>
          -
          <lpage>287</lpage>
          . doi:
          <volume>10</volume>
          .1051/ita/1979130302571.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Kazuyuki</given-names>
            <surname>Asada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Naoki</given-names>
            <surname>Kobayashi</surname>
          </string-name>
          ,
          <article-title>Ryoma Sin'ya, and Takeshi Tsukada. Almost Every Simply Typed Lambda-Term Has a Long Beta-Reduction Sequence</article-title>
          . Logical Methods in Computer Science,
          <year>February 2019</year>
          , Volume
          <volume>15</volume>
          . doi:
          <volume>10</volume>
          .23638/LMCS-
          <volume>15</volume>
          (
          <issue>1</issue>
          :16)
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Clemens</given-names>
            <surname>Grabmayer</surname>
          </string-name>
          .
          <article-title>Linear Depth Increase of Lambda Terms along Leftmost-Outermost BetaReduction</article-title>
          .
          <source>November</source>
          <year>2019</year>
          . URL: https://doi.org/10.48550/arXiv.1604.07030.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Xiaochu</given-names>
            <surname>Qi</surname>
          </string-name>
          .
          <source>Reduction Strategies in Lambda Term Normalization and their Effects on Heap Usage</source>
          .
          <year>2004</year>
          . URL: https://arxiv.org/abs/cs/0405075.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Ugo</given-names>
            <surname>Dal Lago</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Simone</given-names>
            <surname>Martini</surname>
          </string-name>
          .
          <source>On Constructor Rewrite Systems and the Lambda Calculus. Logical Methods in Computer Science</source>
          ,
          <year>August 2012</year>
          ,
          <article-title>Volume 8</article-title>
          . URL: https://doi.org/10.2168/LMCS-8(
          <issue>3</issue>
          :12)
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Maciej</given-names>
            <surname>Bendkowski</surname>
          </string-name>
          .
          <source>Towards the Average-Case Analysis of Substitution Resolution in LambdaCalculus. International Conference on Formal Structures for Computation and Deduction</source>
          (
          <year>2018</year>
          ). URL: https://arxiv.org/pdf/
          <year>1812</year>
          .04452.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Ugo</given-names>
            <surname>Dal</surname>
          </string-name>
          Lago and
          <string-name>
            <given-names>Simone</given-names>
            <surname>Martini</surname>
          </string-name>
          .
          <article-title>An Invariant Cost Model for the Lambda Calculus</article-title>
          .
          <year>2005</year>
          . URL: https://arxiv.org/pdf/cs/0511045.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Vladyslav</given-names>
            <surname>Shramenko</surname>
          </string-name>
          , Victoriya Kuznietcova, and
          <string-name>
            <given-names>Grygoriy</given-names>
            <surname>Zholtkevych</surname>
          </string-name>
          .
          <source>Studying Mixed Normalization Strategies of Lambda Terms. Proceedings of the 2nd International Workshop of ITprofessionals on Artificial Intelligence</source>
          , Vol.
          <volume>3348</volume>
          (
          <year>2022</year>
          ):
          <fpage>57</fpage>
          -
          <lpage>68</lpage>
          . URL: https://ceur-ws.org/Vol3348/paper5.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Oleksandr</given-names>
            <surname>Deineha</surname>
          </string-name>
          , Volodymyr Donets, and
          <string-name>
            <given-names>Grygoriy</given-names>
            <surname>Zholtkevych</surname>
          </string-name>
          .
          <article-title>On Randomization of Reduction Strategies for Typeless Lambda Calculus</article-title>
          .
          <source>Proceedings of conference ICTERY</source>
          <year>2023</year>
          , in press. URL: https://icteri.org/icteri-2023/proceedings/preview/01000021.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Tarau</surname>
          </string-name>
          .
          <article-title>On lambda-term skeletons , with applications to all-term and random-term generation of simply-typed closed lambda terms</article-title>
          .
          <year>2017</year>
          . URL: http://www.cse.unt.edu/~tarau/research/2017/repel.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Maulud</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Abdulazeez</surname>
          </string-name>
          .
          <article-title>A Review on Linear Regression Comprehensive in Machine Learning</article-title>
          .
          <source>JASTT</source>
          , vol.
          <volume>1</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>140</fpage>
          -
          <lpage>147</lpage>
          , Dec.
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .38094/jastt1457.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Chang</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Zhenhua Hu,
          <string-name>
            <given-names>Yan</given-names>
            <surname>Li</surname>
          </string-name>
          , Shaojun Liu.
          <article-title>Forecasting copper prices by decision tree learning</article-title>
          .
          <source>Resources Policy</source>
          , Volume
          <volume>52</volume>
          ,
          <year>2017</year>
          , Pages
          <fpage>427</fpage>
          -
          <lpage>434</lpage>
          . ISSN 0301-
          <fpage>4207</fpage>
          . URL: https://doi.org/10.1016/j.resourpol.
          <year>2017</year>
          .
          <volume>05</volume>
          .007.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.J.</given-names>
            <surname>Smola</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          .
          <article-title>A tutorial on support vector regression</article-title>
          .
          <source>Statistics and Computing</source>
          <volume>14</volume>
          ,
          <fpage>199</fpage>
          -
          <lpage>222</lpage>
          (
          <year>2004</year>
          ). URL: https://doi.org/10.1023/B:STCO.
          <volume>0000035301</volume>
          .49549.88.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Cover</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Hart</surname>
          </string-name>
          .
          <article-title>Nearest neighbor pattern classification</article-title>
          .
          <source>In IEEE Transactions on Information Theory</source>
          , vol.
          <volume>13</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>27</lpage>
          ,
          <year>January 1967</year>
          . doi:
          <volume>10</volume>
          .1109/TIT.
          <year>1967</year>
          .
          <volume>1053964</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Malte</given-names>
            <surname>Jahn</surname>
          </string-name>
          .
          <article-title>Artificial neural network regression models: Predicting GDP growth</article-title>
          .
          <source>HWWI Research Paper No. 185</source>
          ,
          <year>2018</year>
          . URL: https://www.econstor.eu/handle/10419/182108.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>M. M. Krell</surname>
            , and
            <given-names>Bilal</given-names>
          </string-name>
          <string-name>
            <surname>Wehbe</surname>
            .
            <given-names>A First</given-names>
          </string-name>
          <string-name>
            <surname>Step Towards Distribution Invariant Regression Metrics</surname>
          </string-name>
          .
          <year>2020</year>
          . URL: https://doi.org/10.48550/arXiv.
          <year>2009</year>
          .
          <volume>05176</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tieppo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Barddal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Nievola</surname>
          </string-name>
          .
          <article-title>Improving Data Stream Classification using Incremental Yeo-Johnson Power Transformation</article-title>
          .
          <source>2022 IEEE International Conference on Systems, Man, and Cybernetics</source>
          (SMC), Prague, Czech Republic,
          <year>2022</year>
          , pp.
          <fpage>3286</fpage>
          -
          <lpage>3292</lpage>
          , doi:10.1109/SMC53654.
          <year>2022</year>
          .
          <volume>9945302</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>