Predicting child labor in Peru: A comparison of logistic regression and
                          neural networks techniques
                       Christian Fernando Libaque-Saenz1 Juan Lazo 1
                      Karla Gabriela Lopez-Yucra2 Edgardo R. Bravo1
                                    1
                                      Universidad del Pacı́fico
                      Avenida Salaverry 2020, Jesús Marı́a, Lima 11, Peru
                           2
                             Pontificia Universidad Católica del Perú
                       Av. Universitaria 1801, San Miguel, Lima 32, Peru
                  cf.libaques@up.edu.pe               jg.lazol@up.edu.pe
                   karla.lopez@pucp.pe              er.bravoo@up.edu.pe

                      Abstract                                a high probability of becoming individuals with
                                                              a low stock of skills in both quantity and qual-
     Child labor is a relevant problem in de-                 ity (Becker, 1962). In fact, these children (who
     veloping countries because it may have                   work) usually do not dedicate their efforts to study
     a negative impact on economic growth.                    and sometimes they do not even attend school at
     Policy makers and government agencies                    all. In turn, this low level of human capital and
     need information to correctly allocate their             the associated lack of skills have a negative impact
     scarce resources to deal with this prob-                 on individuals earnings and income (Hanushek,
     lem. Although there is research attempt-                 2013). Therefore, as a countrys human capital de-
     ing to predict the causes of child labor,                creases, its economy decreases as well.
     previous studies have used only linear sta-
                                                                 According to the International Labor Organi-
     tistical models. Non-linear models may
                                                              zation (ILO), in Latin America this phenomenon
     improve predictive capacity and thus op-
                                                              reached 12.5 million children and teenagers be-
     timize resource allocation. However, the
                                                              tween 5 and 17 years old in 2014 (Lopez, 2016).
     use of these techniques in this field re-
                                                              Although this number has decreased from 20 mil-
     mains unexplored. Using data from Peru,
                                                              lion in 2010, an important fact is that the num-
     our study compares the prediction capabil-
                                                              ber of children working in dangerous activities has
     ity of the traditional logit model with arti-
                                                              increased from 9 million in 2010 to 9.6 million
     ficial neural networks. Our results show
                                                              in 2014 (Lopez, 2016). As for the case of Peru,
     that neural networks could provide better
                                                              the National Housing Survey (ENAHO in Span-
     predictions than the logit model. Find-
                                                              ish) shows that 21% of teenagers between 12 and
     ings suggest that geographical indicators,
                                                              17 years old had been working in 2014 (Lopez,
     income levels, gender, family composition
                                                              2016). In other words, 1 out of 5 teenagers works
     and educational levels significantly pre-
                                                              in Peru.
     dict child labor. Moreover, the neural net-
     work suggests the relevance of each factor                  Child labor can not only lead to gaps among
     which could be useful to prioritize strate-              countries but also within a country. In Peru, for
     gies. As a whole, the neural network could               example, the child labor rate in rural areas is twice
     help government agencies to tailor their                 as high as in urban areas (Sausa, 2016). By assess-
     strategies and allocate resources more ef-               ing child labor by region, Huancavelica presents
     ficiently.                                               the highest rate of child labor (58%), which is
                                                              more than 10 times that for Tumbes (5%) the lat-
1    Introduction                                             ter is the region with the lowest rate of child labor
                                                              (Sausa, 2016). Therefore, this phenomenon could
Child labor is a critical problem in developing               negatively impact social and economic inclusion
countries because it could negatively affect eco-             by increasing socioeconomic differences. It is im-
nomic growth (Hanushek, 2013). Child labor has                portant that governments formulate adequate pro-
a negative effect on human capital, which is de-              grams and policies to reduce child labor. It is also
fined as the stock of skills that the labor force pos-        important that they identify those children with a
sesses (Goldin, 2016). Children who work have


                                                         69
high probability of becoming workers in order to                In contrast, non-parametric techniques (e.g., ar-
allocate resources in the correct place. There are          tificial neural networks) do not assume a function a
various techniques to achieve this goal. We have            priori but instead approximate the function based
traditional techniques such as logit models, and            on observation. Once the function has been ap-
modern techniques such as neural networks. The              proximated, it can be used to predict new cases.
principal difference is that the former capture lin-        One relative advantage of these techniques is that
ear effects, while the latter can capture non-linear        they can represent complex non-linear mathemat-
relationships. It is important to have a model with         ical functions. In other research arenas, this flexi-
high predictive capability, and therefore it is nec-        bility of non-parametric techniques has, under cer-
essary to compare the predictive power of the dif-          tain conditions, demonstrated the superiority of
ferent models.                                              its predictive power over that of parametric tech-
   Table 1 shows a summary of the issues covered            niques (e.g., Abdou et al., 2008; Altman et al.,
by previous research in this field. All these stud-         1994).
ies used traditional techniques; to the best of our             Our research compares the logit model (para-
knowledge, in this field there are few studies us-          metric technique) with artificial neural networks
ing modern techniques such as neural networks.              (non-parametric technique) in the field of child la-
For example, Rodrigues, Prata, and Silva (2015)             bor. The application of these models for predictive
used data from Brazil and decision trees to search          purposes involves the following steps:
for patterns in the variables explaining child la-
bor. The objective of the present study is to com-            • The sample is randomly divided into two sub-
pare the predictive power of traditional and mod-               samples.
ern models in regard to child labor (i.e., correctly          • The parameters of the model are estimated
identify those children who work). It is expected               with one of the subsamples.
that our results will shed light on the difference
between models in terms of predictive power. By               • The predictive capacity of the model (number
identifying the antecedents to child labor and the              of hits over total observations) is assessed.
technique with the best predictive power, we will
be able to provide recommendations to the Peru-               • With these estimated parameters, prediction
vian government.                                                of the dependent variable for the other sub-
                                                                sample is conducted.

2   Theoretical background                                    • The predictive capacity of the model (with
                                                                the test data) is assessed.
Classification problems such as the child labor             2.1   Logit model
issue can be addressed by several techniques,
both parametric and non-parametric. Parametric              The logit model is a method that uses indepen-
techniques (e.g., discriminant analysis, the logit          dent variables to estimate the probability of oc-
model) require the prior specification of a function        currence of a discrete outcome in the dependent
(or model) that relates the independent variables           variable (Lattin et al., 2003). According to the
(Xi ) with the dependent variable (Y ). In practical        number of discrete outcomes, this technique can
terms, this function may be known grounded in               be divided into binary logit or multinomial logit
theory or assumed. These techniques use obser-              models (Hosmer et al., 2013; Lattin et al., 2003).
vations of Y and Xi to estimate the parameters of           The former defines a dependent variable with two
the function. Once the parameters have been esti-           discrete outcomes whereas the latter represents a
mated, they can be used for prediction with new             logit model with more than two discrete outcomes
participants. One disadvantage of the paramet-              for the dependent variable (Hosmer et al., 2013;
ric techniques is that they have a rigid structure          Lattin et al., 2003). In both cases, the discrete out-
(the mathematical function does not change and it           comes for the dependent variable should be mutu-
only allows for estimating the parameters). Thus,           ally exclusive (Lattin et al., 2003).
these techniques may not be appropriate to rep-                The logit model has a straightforward and
resent phenomena that do not follow well-known              closed functional form that is easily estimated us-
mathematical functions.                                     ing maximum likelihood methods (Lattin et al.,


                                                       70
                                   Table 1: Literature review
 Author                                   Topic
 Rodrı́guez (2002)                        Impact of family factors on education
 Emerson and Souza (2002)                 Impact of gender on child labor
 Sapelli and Torche (2004)                Determinants of school desertion
 Lavado and Gallegos (2005)               Characteristics of children with high probability of
                                          leaving the school
 Garcı́a (2006)                           Relationship between home responsibilities and work
 Gunnarsson, Orazem, and Snchez (2006)    Impact of child labor on education performance
 Alcázar (2008)                          Determinants of school desertion in rural areas
 Rodrı́guez and Vargas (2008)             Consequences of child labor
 Rodrı́guez and Vargas (2009)             Characteristics and nature of economic activity in child
                                          labor
 Lima, Mesquita ,and Wanamaker (2015)     Effect of family wealth on the utilization of child labor
 Le and Homel (2015)                      Impact of child labor on education performance
 He (2016)                                Relationship between child labor and a child’s aca-
                                          demic achievement


2003, p. 475). The logit technique does not as-                 is applied in this design because of its “(. . . ) capa-
sume restrictions on the normality of the distribu-             bility to organize its structural constituents, known
tion of variables (Press and Wilson, 1978). Also,               as neurons, so as to perform certain computations
independent variables can be both continuous and                (e.g., pattern recognition, perception, and motor
categorical variables (Lattin et al., 2003). This               control) many times faster than the fastest digi-
technique is a special case of regression, which                tal computer in existence today” (Haykin, 1998,
uses a transformation of the discrete dependent                 p. 23). Therefore, a neural network resem-
variable. This model assumes: 1) a categorical                  bles the brain mainly in two aspects: 1) the way
dependent variable with mutually exclusive out-                 knowledge is acquired by the network from its
comes, 2) independent variables can be continuous               environment (i.e., learning process); and 2) the
or categorical, 3) independence of observations, 4)             strength of interneuron connections (i.e., synap-
absence of multicollinearity between independent                tic weights), which are used to store the acquired
variables, 5) a linear relationship between the con-            knowledge (Haykin, 1998). Accordingly, an artifi-
tinuous independent variables and the logit trans-              cial neural network is a physical cellular network
formation of the dependent variable, and 6) ab-                 that is able to acquire, store, and utilize experi-
sence of outliers.                                              ential knowledge (Zurada, 1992). A fundamental
   The logit model is defined by the following                  unit in the operation of a neural network is the neu-
function:                                                       ron. It is an information-processing unit which has
                                  !                             three basic elements: a set of synapses or connect-
                        pi                                      ing links, each one with a weight or strength of its
 Logit(pi ) = Ln                      = ↵ + XiT + "i (1)
                    1        pi                                 own; an adder for summing the input signals; and
                                                                an activation function for limiting the amplitude of
where pi is the probability that an observation                 the output of a neuron (Haykin, 1998, p. 32). The
takes a specific outcome of the dependent variable,             neurons perform simple operations, transmitting
↵ is the constant term; is the corresponding vec-               their results to neighboring processors. Hence, the
tor of the coefficients; and "i is the error term.              ability of a neural network to perform non-linear
                                                                relationships between its inputs and outputs makes
2.2 Artificial neural networks                                  it a useful technique for pattern recognition and
                                                                modeling of complex systems (Bishop, 1995).
A neural network is, in a general sense, a ma-
                                                                   According to their topology, neural networks
chine designed to model the way in which the
                                                                can be feedforward or feedback networks. In the
brain performs a particular task or function of in-
                                                                former, the mapping goes from an input to an out-
terest (Haykin, 1998). The functioning of the brain


                                                           71
put layer instantaneously since there is no delay               In the case of Peru, the minimum age for a child
between them. This type of network is character-             to be allowed to legally work is 14 years old, as
ized by its lack of feedback which implies that the          long as these activities do not harm their integrity
neural network has no explicit connection between            nor negatively impact their studies (Lopez, 2016).
layers (Zurada, 1992). In contrast, the latter has           Also, they must have the permission of their par-
a connection between the output and input layers             ents or legal guardians to engage in these activi-
(Zurada, 1992).                                              ties. In exceptional cases, children between 12 and
   Another typology of neural networks is re-                14 years old could also work as long as the work
lated to the learning paradigm which distinguishes           meets the same requirements (Lopez, 2016). In
between supervised learning and non-supervised               the present research, a child was considered to be
learning. The first implies that the knowledge of            a worker if he/she helps in the family business, in
the environment available to the teacher is trans-           domestic tasks in a house that is not his or her own,
ferred to the neural network through training as             in producing products to be sold, in agriculture ac-
fully as possible (Haykin, 1998). Also, it implies           tivities, in selling products or providing services.
an error-correction learning in which the network               According to the National Housing Survey,
parameters are adjusted under the combined in-               child labor between 6 and 13 years old in rural
fluence of the training vector (i.e., example) and           areas (67.5%) is twice as prevalent as child labor
the error signal (i.e., difference between the de-           in urban areas (32.5%). However, in the range
sired response and the actual response of the net-           from 14 to 17 years old, the values are similar
work). This adjustment is carried out step by step           (49.7% and 50.3% for rural and urban areas, re-
in order to make the neural network emulate the              spectively). Another important issue is that child
teacher (Haykin, 1998). On the other hand, the               labor rates significantly differ between cities. For
second does not consider a teacher to oversee the            example, Huancavelica is the city with the highest
learning process. In this case, there are no labeled         rate of child labor with 79.0%, followed by Puno,
examples of the function to be learned by the net-           Huanuco, and Amazonas with 69.0%, 65.0%, and
work. The learning of an input-output mapping is             64.0% respectively. Trujillo has the lowest child
performed through continued interaction with the             labor rate, at about 5.0%, which is significantly
environment or based on the optimization of its pa-          lower than the others. Not surprisingly, the cities
rameters in order to develop the ability to form in-         with the highest rates of child labor are also those
ternal representations (Haykin, 1998).                       with the lowest incomes per capita. Furthermore,
   This research uses a Multilayer Perceptron neu-           according to the National Institute of Statistics and
ral network with a back-propagation algorithm                Informatics (INEI in Spanish), economic activity
which consists of applying a family of gradient-             for females (63.3%) is considerable lower than for
based optimization methods to find the optimal               males (81.4%).
value of the weights based on minimizing the error              Based on the above paragraph, we included
norm between the desired output and the output               variables capturing: 1) age and gender; 2) type of
calculated by the neural network (Rumelhart et al.,          residence area such as urban/rural, region, stratum,
1986). In this type of network, the processing is            and schooling available; and 3) socioeconomic
performed by the inputs. The output obtained is              variables such as expenses, education of the fam-
compared to the expected output. From the ob-                ily head, type of housing, housing ownership, and
tained error, a process of adjustment of weights is          housing status (adequacy, coverage of basic needs,
applied, attempting to minimize the error.                   sanitation). In addition, following (Lopez, 2016),
                                                             we included family characteristics as potential an-
                                                             tecedents to child labor. Indeed, families where
2.3 Child labor in Peru                                      both parents work are less likely to have their chil-
                                                             dren working, while the number of children could
The concept of child labor varies from country to
                                                             increase the probability that one or more children
country depending on the cultural context. Ac-
                                                             work. In these cases, the oldest child is the one
cording to the ILO, child labor refers to a work that
                                                             with the highest probability of engaging in eco-
is dangerous and harmful to the physical, men-
                                                             nomic activities. Finally, current schooling status
tal, or moral wellness of the child, interfering with
                                                             could also be a potential factor for child labor be-
his/her education.


                                                        72
cause those children who are behind in their stud-           a better estimation of the weights of the network.
ies are potentially engaged in other activities.             Finally, we assessed the predictive power of the
                                                             model with the test subsample.
3   Research method
                                                             4     Results
3.1 Measurement model
Table 2 defines our variables and shows the mea-             4.1    Logit results
surement items used in each one.                             We conducted a preliminary analysis including
                                                             all 17 independent variables. Results show that
3.2 Data collection and analysis
                                                             only 9 variables were statistically significant (vari-
Data were collected from the Peruvian National               ables with coefficients with p-value less than 0.05)
Housing Survey (ENAHO) for the year 2014. We                 in explaining the variance of our dependent vari-
eliminated the data for the months of January,               able (WORK). The other 8 variables (p-values
February and March to eliminate seasonality. The             higher than 0.05) were not considered in the sub-
rationale is that those months are holidays in Peru-         sequent analysis given that they do not have any
vian schools and thus the probability of child labor         impact on the dependent variable. Retained vari-
is high but does not imply that children stop study-         ables are divided into 6 categorical variables: UR-
ing to carry it out. Data include children between           BAN, AREA, STRATUM, OWN, ADEQ, and
12 and 17 years old at the national level who meet           UNMET; and 3 continuous variables: EXPENSE,
the following criteria: 1) is the son/daughter of the        EDU HEAD, and SIBLINGS. We calculated the
head of the family, and 2) he/she has not yet fin-           coefficients of the model using equation (1), where
ished school.                                                pi is the probability that child i becomes a worker.
   For analysis, we used logit and neural networks              We assessed whether assumptions of logistic re-
techniques to find the antecedents to child labor            gression were met. Assumptions 1, 2, and 3 were
and to classify children according to the proba-             determined by the model and data collection. For
bility of becoming a worker. We used these two               assumption 4, we conducted a linear regression
techniques to compare predictive power because a             to obtain VIF values. All VIF values were lower
correct prediction may allow governments to cor-             than 5 (the independent variable URBAN has the
rectly allocate resources to deal with this prob-            highest VIF value at 2.274). Therefore, there is
lem. The first technique is based on linear rela-            no evidence of multicollinearity problems in our
tionships, while the latter can manage non-linear            model (Hair et al., 2011). For the fifth assump-
effects. Thus, differences in their results are ex-          tion, we used the Box and Tidwell (1962) proce-
pected. In the case of the logit model, we ran-              dure. This procedure establishes that if the inter-
domly divided the full sample into 2 subsamples:             action between an independent continuous vari-
1) a training subsample consisting of 85% of the             able and its natural logarithm transformation is
full sample, and 2) a test subsample made up of              found to be significant, this variable is not lin-
the remaining 15%. We used the training subsam-              early related to the logit of the dependent vari-
ple to calibrate the model (i.e., estimate the pa-           able. In addition, following Tabachnick and Fi-
rameters of the function), and the test subsample            dells (2007) recommendation, we used a Bon-
to assess the predictive power of these results. In          ferroni correction for the statistical significance
the case of neural networks, we randomly divided             level by dividing it by the number of independent
the sample into 3 subsamples: 1) a training sub-             variables running this test including the constant
sample (70% of the total data), 2) a validation sub-         term. This correction provided a significance level
sample (15% of the total data), and 3) a test sub-           of 0.0038 (i.e., 0.05/13, where 0.05 is the origi-
sample (15% of the total data). We used the train-           nal significance level and 13 is the sum of vari-
ing and validation subsamples together to estimate           ables including the constant term: 1 constant term,
the parameters of the model. To avoid overfitting            6 categorical independent variables, 3 continuous
and guarantee that the results of this stage could           independent variables, and 3 interaction terms).
be generalized, we validated the predictive qual-            P-values for the interaction terms were 0.688 for
ity of the model with only the validation subsam-            EDU HEAD, 0.999 for SIBLINGS, and 0.0041
ple every 1000 interactions. This process allows             for EXPENSE. Based on this assessment, all p-


                                                        73
                                Table 2: Measurement items
Variable                       Description
                                   Dependent Variable
Worker (WORK)                  1 = If the child works
                               0 = If the child exclusively studies
                            Continuous Independent Variables
Age (AGE)                      Age of the child (in years)
Education of the family head
                               Level of schooling of the head of the family (in years)
(EDU HEAD)
Younger siblings (SIBLINGS)    Number of children under 5 years old in the family
                               Ratio of the number of adults (18 years old or older) to the number
Family composition (COMPO)
                               of children (younger than 18 years old) in the family
                               Ratio of the number of education centers to the number of school-
Education centers (CENTER)
                               age children in the province of residence of the family
                               Natural logarithm of the total monthly expense per family mem-
Monthly expense (EXPENSE)
                               ber
                            Categorical Independent Variables
Maleness (MALE)                1 = If the child is male
                               0 = If the child is female
Urban (URBAN)                  1 = If the residence of the family is located in the urban area
                               0 = If the residence of the family is located in a non-urban area
Oldest child (OLD CHI)         1 = If the child is the oldest in the family
                               0 = If the child is not the oldest in the family
School backwardness (DELAY) 1 = If the child presents school backwardness
                               0 = If the child does not present school backwardness
Geographic area (AREA)         1 = North Coast
                               2 = Center Coast
                               3 = South Coast
                               4 = North Highlands
                               5 = Center Highlands
                               6 = South Highlands
                               7 = Jungle
                               8 = Lima Metropolitan Area
Geographic stratum             1 = More than 100,000 dwellings
(STRATUM)                      2 = From 20,001 to 100,000 dwellings
                               3 = From 10,001 to 20,000 dwellings
                               4 = From 4,001 to 10,000 dwellings
                               5 = From 401 to 4,000 dwellings
                               6 = 400 dwellings or fewer
                               7 = Composite rural area
                               8 = Simple rural area
Type of housing (TYPE)         1 = Independent house
                               2 = Apartment in building
                               3 = Chalet
                               4 = Neighborhood house
                               5 = Shack or cottage
                               6 = Improvised housing
                               7 = Non-housing premises
                               8 = Other


                                               74
 Housing ownership (OWN)              1 = Rented
                                      2 = Owned by the family, totally paid
                                      3 = Owned by the family, as result of squatting
                                      4 = Owned by the family, paying off a loan
                                      5 = Given by the workplace of one of the members
                                      6 = Given by other family or institution
                                      7 = Other
 Housing inadequacy (ADEQ)            1 = If the housing is inadequate
                                      0 = If the housing is adequate
 Uncovered basic needs                1 = If the housing has unmet basic needs
 (UNMET)                              0 = If the housing has not unmet basic needs
 Absence of sanitation                1 = If the house does not have sanitation
 (HYGIENIC)                           0 = If the housing has sanitation


values were over the value of 0.0038 and thus our           gentsigmoidy, and an output layer with activation
model satisfied the linearity assumption. For the           functions Log-sigmoid. The value of weights and
sixth assumption, we found 4 outliers of concern            bias are updated according to gradient descent mo-
which were not considered in subsequent analy-              mentum and an adaptive learning rate. The train-
sis. Results of the logistic model are presented            ing parameters of the neural network were: Max-
in Table 3. Our model is statistically significant          imum number of epochs to train: 40000, learning
( 2 = 2300.885, df = 25, p = 0.000), and ex-                rate: 0.01, momentum constant: 0.7, performance
plains between 33.2% and 47.8% of the variance              goal: 10-5. These values were set following cur-
in child labor.                                             rent literature (Haykin, 1998; Zurada, 1992). They
   In terms of predictive value, our model correctly        were also adjusted during the training process us-
predicted 82.61% of cases, with 55.80% of correct           ing an adaptive algorithm to find better parame-
positive classifications (sensitivity) and 93.02% of        ters.
correct negative classifications (specificity). Ac-            The first neural network used the 17 proposed
cordingly, our model has an efficiency (average of          independent variables (inputs). With the training
sensitivity and specificity) of 74.41% and a mean           and validation subsamples we obtained the best
absolute percentage error (MAPE) of 17.39%. Al-             neural network made up of 38 neurons in the hid-
though our model has an adequate overall predic-            den layer and 1 neuron in the output layer. This
tive power, the Hosmer and Lemeshow goodness                model predicted 88.26% of all the cases, with a
of fit test was significant ( 2 = 39.889, df =              sensitivity of 90.97% and specificity of 87.21%.
8, p = 0.000) showing that it is poor at predicting         The efficiency of the model was thus 89.09%,
the categorical outcomes. The reason for this find-         and the MAPE was 11.74%. When applying this
ing may be the difference between sensitivity and           model to the test subsample, it predicted 85.11%
specificity. Finally, coefficients (B) were found           of all the cases, with a sensitivity of 90.02%, a
to be significant based on the Wald test. Table 3           specificity of 79.42%, an efficiency of 84.72%,
also shows the standard error (SE) of the coeffi-           and a MAPE of 14.89%.
cients and their odd ratio (OR). We then assessed              In addition, by analyzing the weight of the in-
the model with our test subsample. Our model                puts of the neural network, we ranked the inde-
correctly predicted 80.64% of all the cases in this         pendent variables from the highest to the lowest
sample, with a sensitivity of 52.78%, a specificity         effect: AREA (7.7), EXPENSE (7.4), HYGIENIC
of 91.88%, an efficiency of 72.33%, and a MAPE              (7.4), STRATUM (7.0), MALE (6.2), OWN (6.0),
of 19.36%.                                                  SIBLINGS (5.9), TYPE (5.9), OLD CHI (5.9),
                                                            AGE (5.7), EDU HEAD (5.5), COMPO (5.5),
4.2 Neural network results                                  ADEQ (5.1), DELAY (5.0), CENTER (4.9), UR-
                                                            BAN (4.6), and UNMET (4.4).
For purposes of comparison, we chose a simple                  The second neural network used only the 9 vari-
neural network. Accordingly, we used a hidden               ables that were statistically significant in the logit
layer with activation functions Hyperbolic tan-


                                                       75
                                Table 3: Logistic regression with training sample
 Variables                                                          Model 1 (N=700)
                                            B                 SE              Wald          OR
 URBAN                                               3.649            0.441     68.563***                           38.455
 EDU HEAD                                           -0.056             0.01     28.969***                            0.946
 SIBLINGS                                            0.181            0.081        4.978*                            1.198
 AREA                                                   SS               SS    360.946***                               SS
 STRATUM                                                SS               SS     71.179***                               SS
 OWN                                                    SS               SS       13.571*                               SS
 ADEQ                                                0.644            0.146     19.408***                            1.905
 UNMET                                              -0.366            0.117       9.721**                            0.693
 EXPENSE                                            -0.262            0.084       9.707**                            0.769
 Constant                                           -1.423            0.663          4.604
 -2log likelihood                                                       4459.518
 Chi-square (Model)                                       2300.885*** (df=25, p-value=0.000)
 Hosmer & Lemeshow                                          39.889*** (df=8, p-value=0.000)
 Cox & Snell R2                                                          33.20%
 Nagelkerke R2                                                           47.80%
 Overall predicted %                                                     82.61%
 Sensitivity                                                             55.80%
 Specificity                                                             93.02%
   *p < 0.05, **p < 0.01, ***p < 0.001
   B=Coefficients; SE=Standard error; OR=Odds ratio
   SS=Skipped for simplicity. (For categorical variables with more than 2 categories, there is a coefficient for each category.
   We are choosing not to report them all because our focus is the predictive power of the model.)


model for a straight comparison. In this model,                  ables and the same instances to ensure a fair com-
with the training and validation subsamples we ob-               parison. In addition, Table 4 shows the results
tained the best neural network made up of 30 neu-                of the neural network technique with the com-
rons in the hidden layer and 1 neuron in the output              plete 17 variables to assess if this non-linear model
layer. Our model achieved 84.45% of correct total                could extract important information from those 8
predictions, with a sensitivity of 79.61%, a speci-              variables without a linear impact on the depen-
ficity of 86.34%, an efficiency of 82.97%, and a                 dent variable. This table shows that overall neural
MAPE of 15.55%. When using our models param-                     network technique performed better than the logit
eters on the test subsample, it predicted 81.69%                 model. In fact, the neural network obtained the
of all the cases, with a sensitivity of 78.86%,                  highest values of accuracy (correct total - positive
specificity of 84.23%, efficiency of 81.55%, and                 and negative - predictions). Also, considering that
a MAPE of 18.31%.                                                it is more important to predict when a child has
   For this model, the ranking of the inputs accord-             high probabilities of becoming a worker than to
ing to their weights is: AREA (12.9), STRATUM                    predict that a child will be non-worker, sensitivity
(12.6), EDU HEAD (12.4), SIBLINGS (12.3),                        stands as our most important metric when compar-
URBAN (10.9), ADEQ (10.6), UNMET (9.7),                          ing models. By an inspection of Table 4, sensitiv-
OWN (9.5), and EXPENSE (9.1).                                    ity of the neural network technique was superior to
                                                                 the values obtained from the logit model. In spite
                                                                 of these results, the logit model was superior in
4.3 Technique comparison
                                                                 terms of specificity. However, specificity is a met-
The results of the previous section are summarized               ric for correct predictions of non-workers, which
in Table 4. Considering that the logit model used                is not relevant in our case. In addition, Figure 1
9 variables (8 were not considered because they                  shows the ROC curve of prediction for these tech-
have no significant impact on the dependent vari-                niques.
able), the neural network used these same 9 vari-


                                                            76
               Table 4: Comparison of results of predictive capacity in the test subsample
                  Predictive measures        Logit             Neural network
                                         9 Variables 9 Variables 17 Variables
                  Accuracy                  80.64%          81.69%          85.11%
                  Sensitivity               52.78%          78.86%          90.02%
                  Specificity               91.88%          84.23%          79.42%
                  Efficiency                72.33%          81.55%          84.72%
                  MAPE                      19.36%          18.31%          14.89%


                                   Figure 1: ROC Curves Comparation


5   Discussion                                               with 17 variables does not ignore information that
                                                             is relevant to the prediction. This result could be
Overall, the results show that the neural network            used by decision makers to avoid discarding rele-
technique surpasses the logit model in predictive            vant factors when dealing with this phenomenon.
capacity of child labor (sensitivity). Indeed, this             Another important result is that the neural net-
phenomenon may have a more complex structure                 work model shows that geographical indicators,
than is assumed by the logit model. In conse-                income levels, gender, family composition and ed-
quence, the neural network (which adopts non-                ucational levels significantly predict child labor.
linear relationships) could capture sources of vari-         These results are aligned with those of the logit
ation that are not identified by the logit tech-             model showing that stratum, geographic area, and
nique. An accurate prediction of this phenomenon             housing conditions have a significant impact on
could be used by policy makers and government                our dependent variable. These results can be used
agencies to design adequate strategies or to invest          to determine the relevance of each factor. In turn,
scarce resources efficiently to deal with this prob-         this relevance-based ranking of factors could fur-
lem.                                                         ther help government agencies to better allocate
   Also, our findings show that the neural network           their resources and implement their strategies to
model with 17 variables performed better than the            reduce child labor.
9-variables models (logit or neural network). This              Finally, previous studies in this field have used
result suggests that this additional set of variables        linear statistical models to predict child labor. Our
capture an important variability in explaining child         study shows that the use of computational intel-
labor. In other words, the neural network model


                                                        77
ligence techniques, such as the neural network,                 D W Hosmer, S Lemeshow, and R X Sturdivant. 2013.
could provide better predictions, which leads to                  Applied Logistic Regression. Wiley Series in Proba-
                                                                  bility and Statistics. Wiley.
better decision making.
                                                                James M Lattin, J D Carroll, and P E Green. 2003. An-
                                                                  alyzing Multivariate Data. Number v. 1 in Analyz-
References                                                        ing Multivariate Data. Thomson Brooks/Cole.
Hussein Abdou, John Pointon, and Ahmed El-Masry.                Pablo Lavado and José Gallegos. 2005. La dinámica de
  2008. Neural nets versus conventional techniques in             la deserción escolar en el Perú: un enfoque usando
  credit scoring in Egyptian banking. Expert Systems              modelos de duración. Working Papers 05-08, De-
  with Applications 35(3):1275–1292.                              partamento de Economı́a, Universidad del Pacı́fico.
                                                                  https://ideas.repec.org/p/pai/wpaper/05-08.html.
Lorena Alcázar. 2008. Asistencia y deserción en escue-
  las secundarias rurales del Perú, pages 41–82.
                                                                Huong Thu Le and Ross Homel. 2015. The impact of
Edward I Altman, Giancarlo Marco, and Franco                      child labor on children’s educational performance:
  Varetto. 1994. Corporate distress diagnosis: Com-               Evidence from rural Vietnam. Journal of Asian Eco-
  parisons using linear discriminant analysis and neu-            nomics 36:1–13.
  ral networks (the Italian experience). Journal of
  Banking & Finance 18(3):505–529.                              Luiz Renato Lima, Shirley Mesquita, and Marianne
                                                                  Wanamaker. 2015. Child labor and the wealth para-
Gary Becker. 1962. Investment in Human Capital: A                 dox: The role of altruistic parents. Economics Let-
  Theoretical Analysis. Journal of Political Economy              ters 130:80–82.
  70.
                                                                Karla Lopez. 2016. Determinantes del trabajo infantil
Christopher M Bishop. 1995. Neural Networks for                   y la deserción escolar en menores de 12 a 17 años
  Pattern Recognition. Oxford University Press, Inc.,             en el Perú para los años 2006 y 2014. Bachelor’s
  New York, NY, USA.                                              degree, PUCP, Perú.
George EP Box and Paul W Tidwell. 1962.
  Transformation of the Independent Vari-                       S James Press and Sandra Wilson. 1978. Choosing be-
  ables.              Technometrics   4(4):531–550.                tween Logistic Regression and Discriminant Analy-
  https://doi.org/10.1080/00401706.1962.10490038.                  sis. Journal of the American Statistical Association
                                                                   73(364):699–705.
Patrick M. Emerson and André Portela Souza. 2002.
  Bargaining over sons and daughters: Child labor,              Diego C Rodrigues, David N Prata, and Michel A
  school attendance and intra-household gender bias               Silva. 2015. Exploring social data to understand
  in brazil.                                                      child labor. International Journal of Social Science
                                                                  and Humanity 5(1):29.
Luis Garcı́a. 2006. The supply of child labor and
  household work. MPRA Paper 31402, University                  Jose Rodriguez. 2002. Adquisición de educación esco-
  Library of Munich, Germany.                                      lar básica en el Perú: uso del tiempo de los menores
Claudia Goldin. 2016. Human Capital, Springer Ver-                 en edad escolar.         Departamento de Economı́a
  lag, Heidelberg, Germany.                                        - Pontificia Universidad Católica del Perú.
                                                                   http://EconPapers.repec.org/RePEc:pcp:pucotr:otr-
Victorı́a Orazem Peter F & Sánchez Mario A Gunnars-               2002-02.
  son. 2006. Child labor and school achievement in
  latin america .                                               Jose Rodrı́guez and Silvana Vargas. 2009. Trabajo
                                                                   infantil en el Perú. Magnitud y perfiles vulnera-
Joe F Hair, Christian M Ringle, and Marko Sarstedt.                bles. Informe Nacional 2007-2008. Departamento
  2011. PLS-SEM: Indeed a Silver Bullet. Journal                   de Economı́a - Pontificia Universidad Católica del
  of Marketing Theory and Practice 19(2):139–152.                  Perú.
  https://doi.org/10.2753/MTP1069-6679190202.
                                                                José Rodrı́guez and Silvia Vargas. 2008. Escolaridad y
Eric A Hanushek. 2013. Economic growth in develop-
                                                                   trabajo infantil: patrones y determinantes de la asig-
   ing countries: The role of human capital. Economics
                                                                   nación del tiempo de niños y adolescentes en Lima
   of Education Review 37:204–212.
                                                                   Metropolitana. Technical report, PUCP, Peru.
Simon Haykin. 1998. Neural Networks: A Compre-
  hensive Foundation. Prentice Hall PTR, Upper Sad-             David E. Rumelhart, Geoffrey E. Hinton, and Ronald J.
  dle River, NJ, USA, 2nd edition.                                Williams. 1986. Learning internal representations
                                                                  by error propagation. In Parallel Distributed Pro-
Huajing He. 2016. Child labour and academic achieve-              cessing: Explorations in the Microstructure of Cog-
  ment: Evidence from gansu province in china.                    nition, Volume 1: Foundations, MIT Press, pages
  China Economic Review 38(C):130–150.                            318–362.


                                                           78
Claudio Sapelli and Arı́stides Torche. 2004. Deserción
  Escolar y Trabajo Juvenil: ¿Dos Caras de Una
  Misma Decisión? Cuadernos de economı́a pages
  173–198.
Mariela Sausa. 2016. El trabajo infantil es más alto y
 más penoso en las zonas rurales. Perú 21.
BG Tabachnick and LS Fidell. 2007. Multivariate anal-
  ysis of variance and covariance. In Using Multivari-
  ate Statistics, Allyn and Bacon Boston.
Jacek M Zurada. 1992. Introduction to artificial neural
   systems. West St. Paul.


                                                          79