Predicting child labor in Peru: A comparison of logistic regression and neural networks techniques Christian Fernando Libaque-Saenz1 Juan Lazo 1 Karla Gabriela Lopez-Yucra2 Edgardo R. Bravo1 1 Universidad del Pacı́fico Avenida Salaverry 2020, Jesús Marı́a, Lima 11, Peru 2 Pontificia Universidad Católica del Perú Av. Universitaria 1801, San Miguel, Lima 32, Peru cf.libaques@up.edu.pe jg.lazol@up.edu.pe karla.lopez@pucp.pe er.bravoo@up.edu.pe Abstract a high probability of becoming individuals with a low stock of skills in both quantity and qual- Child labor is a relevant problem in de- ity (Becker, 1962). In fact, these children (who veloping countries because it may have work) usually do not dedicate their efforts to study a negative impact on economic growth. and sometimes they do not even attend school at Policy makers and government agencies all. In turn, this low level of human capital and need information to correctly allocate their the associated lack of skills have a negative impact scarce resources to deal with this prob- on individuals earnings and income (Hanushek, lem. Although there is research attempt- 2013). Therefore, as a countrys human capital de- ing to predict the causes of child labor, creases, its economy decreases as well. previous studies have used only linear sta- According to the International Labor Organi- tistical models. Non-linear models may zation (ILO), in Latin America this phenomenon improve predictive capacity and thus op- reached 12.5 million children and teenagers be- timize resource allocation. However, the tween 5 and 17 years old in 2014 (Lopez, 2016). use of these techniques in this field re- Although this number has decreased from 20 mil- mains unexplored. Using data from Peru, lion in 2010, an important fact is that the num- our study compares the prediction capabil- ber of children working in dangerous activities has ity of the traditional logit model with arti- increased from 9 million in 2010 to 9.6 million ficial neural networks. Our results show in 2014 (Lopez, 2016). As for the case of Peru, that neural networks could provide better the National Housing Survey (ENAHO in Span- predictions than the logit model. Find- ish) shows that 21% of teenagers between 12 and ings suggest that geographical indicators, 17 years old had been working in 2014 (Lopez, income levels, gender, family composition 2016). In other words, 1 out of 5 teenagers works and educational levels significantly pre- in Peru. dict child labor. Moreover, the neural net- work suggests the relevance of each factor Child labor can not only lead to gaps among which could be useful to prioritize strate- countries but also within a country. In Peru, for gies. As a whole, the neural network could example, the child labor rate in rural areas is twice help government agencies to tailor their as high as in urban areas (Sausa, 2016). By assess- strategies and allocate resources more ef- ing child labor by region, Huancavelica presents ficiently. the highest rate of child labor (58%), which is more than 10 times that for Tumbes (5%) the lat- 1 Introduction ter is the region with the lowest rate of child labor (Sausa, 2016). Therefore, this phenomenon could Child labor is a critical problem in developing negatively impact social and economic inclusion countries because it could negatively affect eco- by increasing socioeconomic differences. It is im- nomic growth (Hanushek, 2013). Child labor has portant that governments formulate adequate pro- a negative effect on human capital, which is de- grams and policies to reduce child labor. It is also fined as the stock of skills that the labor force pos- important that they identify those children with a sesses (Goldin, 2016). Children who work have 69 high probability of becoming workers in order to In contrast, non-parametric techniques (e.g., ar- allocate resources in the correct place. There are tificial neural networks) do not assume a function a various techniques to achieve this goal. We have priori but instead approximate the function based traditional techniques such as logit models, and on observation. Once the function has been ap- modern techniques such as neural networks. The proximated, it can be used to predict new cases. principal difference is that the former capture lin- One relative advantage of these techniques is that ear effects, while the latter can capture non-linear they can represent complex non-linear mathemat- relationships. It is important to have a model with ical functions. In other research arenas, this flexi- high predictive capability, and therefore it is nec- bility of non-parametric techniques has, under cer- essary to compare the predictive power of the dif- tain conditions, demonstrated the superiority of ferent models. its predictive power over that of parametric tech- Table 1 shows a summary of the issues covered niques (e.g., Abdou et al., 2008; Altman et al., by previous research in this field. All these stud- 1994). ies used traditional techniques; to the best of our Our research compares the logit model (para- knowledge, in this field there are few studies us- metric technique) with artificial neural networks ing modern techniques such as neural networks. (non-parametric technique) in the field of child la- For example, Rodrigues, Prata, and Silva (2015) bor. The application of these models for predictive used data from Brazil and decision trees to search purposes involves the following steps: for patterns in the variables explaining child la- bor. The objective of the present study is to com- • The sample is randomly divided into two sub- pare the predictive power of traditional and mod- samples. ern models in regard to child labor (i.e., correctly • The parameters of the model are estimated identify those children who work). It is expected with one of the subsamples. that our results will shed light on the difference between models in terms of predictive power. By • The predictive capacity of the model (number identifying the antecedents to child labor and the of hits over total observations) is assessed. technique with the best predictive power, we will be able to provide recommendations to the Peru- • With these estimated parameters, prediction vian government. of the dependent variable for the other sub- sample is conducted. 2 Theoretical background • The predictive capacity of the model (with the test data) is assessed. Classification problems such as the child labor 2.1 Logit model issue can be addressed by several techniques, both parametric and non-parametric. Parametric The logit model is a method that uses indepen- techniques (e.g., discriminant analysis, the logit dent variables to estimate the probability of oc- model) require the prior specification of a function currence of a discrete outcome in the dependent (or model) that relates the independent variables variable (Lattin et al., 2003). According to the (Xi ) with the dependent variable (Y ). In practical number of discrete outcomes, this technique can terms, this function may be known grounded in be divided into binary logit or multinomial logit theory or assumed. These techniques use obser- models (Hosmer et al., 2013; Lattin et al., 2003). vations of Y and Xi to estimate the parameters of The former defines a dependent variable with two the function. Once the parameters have been esti- discrete outcomes whereas the latter represents a mated, they can be used for prediction with new logit model with more than two discrete outcomes participants. One disadvantage of the paramet- for the dependent variable (Hosmer et al., 2013; ric techniques is that they have a rigid structure Lattin et al., 2003). In both cases, the discrete out- (the mathematical function does not change and it comes for the dependent variable should be mutu- only allows for estimating the parameters). Thus, ally exclusive (Lattin et al., 2003). these techniques may not be appropriate to rep- The logit model has a straightforward and resent phenomena that do not follow well-known closed functional form that is easily estimated us- mathematical functions. ing maximum likelihood methods (Lattin et al., 70 Table 1: Literature review Author Topic Rodrı́guez (2002) Impact of family factors on education Emerson and Souza (2002) Impact of gender on child labor Sapelli and Torche (2004) Determinants of school desertion Lavado and Gallegos (2005) Characteristics of children with high probability of leaving the school Garcı́a (2006) Relationship between home responsibilities and work Gunnarsson, Orazem, and Snchez (2006) Impact of child labor on education performance Alcázar (2008) Determinants of school desertion in rural areas Rodrı́guez and Vargas (2008) Consequences of child labor Rodrı́guez and Vargas (2009) Characteristics and nature of economic activity in child labor Lima, Mesquita ,and Wanamaker (2015) Effect of family wealth on the utilization of child labor Le and Homel (2015) Impact of child labor on education performance He (2016) Relationship between child labor and a child’s aca- demic achievement 2003, p. 475). The logit technique does not as- is applied in this design because of its “(. . . ) capa- sume restrictions on the normality of the distribu- bility to organize its structural constituents, known tion of variables (Press and Wilson, 1978). Also, as neurons, so as to perform certain computations independent variables can be both continuous and (e.g., pattern recognition, perception, and motor categorical variables (Lattin et al., 2003). This control) many times faster than the fastest digi- technique is a special case of regression, which tal computer in existence today” (Haykin, 1998, uses a transformation of the discrete dependent p. 23). Therefore, a neural network resem- variable. This model assumes: 1) a categorical bles the brain mainly in two aspects: 1) the way dependent variable with mutually exclusive out- knowledge is acquired by the network from its comes, 2) independent variables can be continuous environment (i.e., learning process); and 2) the or categorical, 3) independence of observations, 4) strength of interneuron connections (i.e., synap- absence of multicollinearity between independent tic weights), which are used to store the acquired variables, 5) a linear relationship between the con- knowledge (Haykin, 1998). Accordingly, an artifi- tinuous independent variables and the logit trans- cial neural network is a physical cellular network formation of the dependent variable, and 6) ab- that is able to acquire, store, and utilize experi- sence of outliers. ential knowledge (Zurada, 1992). A fundamental The logit model is defined by the following unit in the operation of a neural network is the neu- function: ron. It is an information-processing unit which has ! three basic elements: a set of synapses or connect- pi ing links, each one with a weight or strength of its Logit(pi ) = Ln = ↵ + XiT + "i (1) 1 pi own; an adder for summing the input signals; and an activation function for limiting the amplitude of where pi is the probability that an observation the output of a neuron (Haykin, 1998, p. 32). The takes a specific outcome of the dependent variable, neurons perform simple operations, transmitting ↵ is the constant term; is the corresponding vec- their results to neighboring processors. Hence, the tor of the coefficients; and "i is the error term. ability of a neural network to perform non-linear relationships between its inputs and outputs makes 2.2 Artificial neural networks it a useful technique for pattern recognition and modeling of complex systems (Bishop, 1995). A neural network is, in a general sense, a ma- According to their topology, neural networks chine designed to model the way in which the can be feedforward or feedback networks. In the brain performs a particular task or function of in- former, the mapping goes from an input to an out- terest (Haykin, 1998). The functioning of the brain 71 put layer instantaneously since there is no delay In the case of Peru, the minimum age for a child between them. This type of network is character- to be allowed to legally work is 14 years old, as ized by its lack of feedback which implies that the long as these activities do not harm their integrity neural network has no explicit connection between nor negatively impact their studies (Lopez, 2016). layers (Zurada, 1992). In contrast, the latter has Also, they must have the permission of their par- a connection between the output and input layers ents or legal guardians to engage in these activi- (Zurada, 1992). ties. In exceptional cases, children between 12 and Another typology of neural networks is re- 14 years old could also work as long as the work lated to the learning paradigm which distinguishes meets the same requirements (Lopez, 2016). In between supervised learning and non-supervised the present research, a child was considered to be learning. The first implies that the knowledge of a worker if he/she helps in the family business, in the environment available to the teacher is trans- domestic tasks in a house that is not his or her own, ferred to the neural network through training as in producing products to be sold, in agriculture ac- fully as possible (Haykin, 1998). Also, it implies tivities, in selling products or providing services. an error-correction learning in which the network According to the National Housing Survey, parameters are adjusted under the combined in- child labor between 6 and 13 years old in rural fluence of the training vector (i.e., example) and areas (67.5%) is twice as prevalent as child labor the error signal (i.e., difference between the de- in urban areas (32.5%). However, in the range sired response and the actual response of the net- from 14 to 17 years old, the values are similar work). This adjustment is carried out step by step (49.7% and 50.3% for rural and urban areas, re- in order to make the neural network emulate the spectively). Another important issue is that child teacher (Haykin, 1998). On the other hand, the labor rates significantly differ between cities. For second does not consider a teacher to oversee the example, Huancavelica is the city with the highest learning process. In this case, there are no labeled rate of child labor with 79.0%, followed by Puno, examples of the function to be learned by the net- Huanuco, and Amazonas with 69.0%, 65.0%, and work. The learning of an input-output mapping is 64.0% respectively. Trujillo has the lowest child performed through continued interaction with the labor rate, at about 5.0%, which is significantly environment or based on the optimization of its pa- lower than the others. Not surprisingly, the cities rameters in order to develop the ability to form in- with the highest rates of child labor are also those ternal representations (Haykin, 1998). with the lowest incomes per capita. Furthermore, This research uses a Multilayer Perceptron neu- according to the National Institute of Statistics and ral network with a back-propagation algorithm Informatics (INEI in Spanish), economic activity which consists of applying a family of gradient- for females (63.3%) is considerable lower than for based optimization methods to find the optimal males (81.4%). value of the weights based on minimizing the error Based on the above paragraph, we included norm between the desired output and the output variables capturing: 1) age and gender; 2) type of calculated by the neural network (Rumelhart et al., residence area such as urban/rural, region, stratum, 1986). In this type of network, the processing is and schooling available; and 3) socioeconomic performed by the inputs. The output obtained is variables such as expenses, education of the fam- compared to the expected output. From the ob- ily head, type of housing, housing ownership, and tained error, a process of adjustment of weights is housing status (adequacy, coverage of basic needs, applied, attempting to minimize the error. sanitation). In addition, following (Lopez, 2016), we included family characteristics as potential an- tecedents to child labor. Indeed, families where 2.3 Child labor in Peru both parents work are less likely to have their chil- dren working, while the number of children could The concept of child labor varies from country to increase the probability that one or more children country depending on the cultural context. Ac- work. In these cases, the oldest child is the one cording to the ILO, child labor refers to a work that with the highest probability of engaging in eco- is dangerous and harmful to the physical, men- nomic activities. Finally, current schooling status tal, or moral wellness of the child, interfering with could also be a potential factor for child labor be- his/her education. 72 cause those children who are behind in their stud- a better estimation of the weights of the network. ies are potentially engaged in other activities. Finally, we assessed the predictive power of the model with the test subsample. 3 Research method 4 Results 3.1 Measurement model Table 2 defines our variables and shows the mea- 4.1 Logit results surement items used in each one. We conducted a preliminary analysis including all 17 independent variables. Results show that 3.2 Data collection and analysis only 9 variables were statistically significant (vari- Data were collected from the Peruvian National ables with coefficients with p-value less than 0.05) Housing Survey (ENAHO) for the year 2014. We in explaining the variance of our dependent vari- eliminated the data for the months of January, able (WORK). The other 8 variables (p-values February and March to eliminate seasonality. The higher than 0.05) were not considered in the sub- rationale is that those months are holidays in Peru- sequent analysis given that they do not have any vian schools and thus the probability of child labor impact on the dependent variable. Retained vari- is high but does not imply that children stop study- ables are divided into 6 categorical variables: UR- ing to carry it out. Data include children between BAN, AREA, STRATUM, OWN, ADEQ, and 12 and 17 years old at the national level who meet UNMET; and 3 continuous variables: EXPENSE, the following criteria: 1) is the son/daughter of the EDU HEAD, and SIBLINGS. We calculated the head of the family, and 2) he/she has not yet fin- coefficients of the model using equation (1), where ished school. pi is the probability that child i becomes a worker. For analysis, we used logit and neural networks We assessed whether assumptions of logistic re- techniques to find the antecedents to child labor gression were met. Assumptions 1, 2, and 3 were and to classify children according to the proba- determined by the model and data collection. For bility of becoming a worker. We used these two assumption 4, we conducted a linear regression techniques to compare predictive power because a to obtain VIF values. All VIF values were lower correct prediction may allow governments to cor- than 5 (the independent variable URBAN has the rectly allocate resources to deal with this prob- highest VIF value at 2.274). Therefore, there is lem. The first technique is based on linear rela- no evidence of multicollinearity problems in our tionships, while the latter can manage non-linear model (Hair et al., 2011). For the fifth assump- effects. Thus, differences in their results are ex- tion, we used the Box and Tidwell (1962) proce- pected. In the case of the logit model, we ran- dure. This procedure establishes that if the inter- domly divided the full sample into 2 subsamples: action between an independent continuous vari- 1) a training subsample consisting of 85% of the able and its natural logarithm transformation is full sample, and 2) a test subsample made up of found to be significant, this variable is not lin- the remaining 15%. We used the training subsam- early related to the logit of the dependent vari- ple to calibrate the model (i.e., estimate the pa- able. In addition, following Tabachnick and Fi- rameters of the function), and the test subsample dells (2007) recommendation, we used a Bon- to assess the predictive power of these results. In ferroni correction for the statistical significance the case of neural networks, we randomly divided level by dividing it by the number of independent the sample into 3 subsamples: 1) a training sub- variables running this test including the constant sample (70% of the total data), 2) a validation sub- term. This correction provided a significance level sample (15% of the total data), and 3) a test sub- of 0.0038 (i.e., 0.05/13, where 0.05 is the origi- sample (15% of the total data). We used the train- nal significance level and 13 is the sum of vari- ing and validation subsamples together to estimate ables including the constant term: 1 constant term, the parameters of the model. To avoid overfitting 6 categorical independent variables, 3 continuous and guarantee that the results of this stage could independent variables, and 3 interaction terms). be generalized, we validated the predictive qual- P-values for the interaction terms were 0.688 for ity of the model with only the validation subsam- EDU HEAD, 0.999 for SIBLINGS, and 0.0041 ple every 1000 interactions. This process allows for EXPENSE. Based on this assessment, all p- 73 Table 2: Measurement items Variable Description Dependent Variable Worker (WORK) 1 = If the child works 0 = If the child exclusively studies Continuous Independent Variables Age (AGE) Age of the child (in years) Education of the family head Level of schooling of the head of the family (in years) (EDU HEAD) Younger siblings (SIBLINGS) Number of children under 5 years old in the family Ratio of the number of adults (18 years old or older) to the number Family composition (COMPO) of children (younger than 18 years old) in the family Ratio of the number of education centers to the number of school- Education centers (CENTER) age children in the province of residence of the family Natural logarithm of the total monthly expense per family mem- Monthly expense (EXPENSE) ber Categorical Independent Variables Maleness (MALE) 1 = If the child is male 0 = If the child is female Urban (URBAN) 1 = If the residence of the family is located in the urban area 0 = If the residence of the family is located in a non-urban area Oldest child (OLD CHI) 1 = If the child is the oldest in the family 0 = If the child is not the oldest in the family School backwardness (DELAY) 1 = If the child presents school backwardness 0 = If the child does not present school backwardness Geographic area (AREA) 1 = North Coast 2 = Center Coast 3 = South Coast 4 = North Highlands 5 = Center Highlands 6 = South Highlands 7 = Jungle 8 = Lima Metropolitan Area Geographic stratum 1 = More than 100,000 dwellings (STRATUM) 2 = From 20,001 to 100,000 dwellings 3 = From 10,001 to 20,000 dwellings 4 = From 4,001 to 10,000 dwellings 5 = From 401 to 4,000 dwellings 6 = 400 dwellings or fewer 7 = Composite rural area 8 = Simple rural area Type of housing (TYPE) 1 = Independent house 2 = Apartment in building 3 = Chalet 4 = Neighborhood house 5 = Shack or cottage 6 = Improvised housing 7 = Non-housing premises 8 = Other 74 Housing ownership (OWN) 1 = Rented 2 = Owned by the family, totally paid 3 = Owned by the family, as result of squatting 4 = Owned by the family, paying off a loan 5 = Given by the workplace of one of the members 6 = Given by other family or institution 7 = Other Housing inadequacy (ADEQ) 1 = If the housing is inadequate 0 = If the housing is adequate Uncovered basic needs 1 = If the housing has unmet basic needs (UNMET) 0 = If the housing has not unmet basic needs Absence of sanitation 1 = If the house does not have sanitation (HYGIENIC) 0 = If the housing has sanitation values were over the value of 0.0038 and thus our gentsigmoidy, and an output layer with activation model satisfied the linearity assumption. For the functions Log-sigmoid. The value of weights and sixth assumption, we found 4 outliers of concern bias are updated according to gradient descent mo- which were not considered in subsequent analy- mentum and an adaptive learning rate. The train- sis. Results of the logistic model are presented ing parameters of the neural network were: Max- in Table 3. Our model is statistically significant imum number of epochs to train: 40000, learning ( 2 = 2300.885, df = 25, p = 0.000), and ex- rate: 0.01, momentum constant: 0.7, performance plains between 33.2% and 47.8% of the variance goal: 10-5. These values were set following cur- in child labor. rent literature (Haykin, 1998; Zurada, 1992). They In terms of predictive value, our model correctly were also adjusted during the training process us- predicted 82.61% of cases, with 55.80% of correct ing an adaptive algorithm to find better parame- positive classifications (sensitivity) and 93.02% of ters. correct negative classifications (specificity). Ac- The first neural network used the 17 proposed cordingly, our model has an efficiency (average of independent variables (inputs). With the training sensitivity and specificity) of 74.41% and a mean and validation subsamples we obtained the best absolute percentage error (MAPE) of 17.39%. Al- neural network made up of 38 neurons in the hid- though our model has an adequate overall predic- den layer and 1 neuron in the output layer. This tive power, the Hosmer and Lemeshow goodness model predicted 88.26% of all the cases, with a of fit test was significant ( 2 = 39.889, df = sensitivity of 90.97% and specificity of 87.21%. 8, p = 0.000) showing that it is poor at predicting The efficiency of the model was thus 89.09%, the categorical outcomes. The reason for this find- and the MAPE was 11.74%. When applying this ing may be the difference between sensitivity and model to the test subsample, it predicted 85.11% specificity. Finally, coefficients (B) were found of all the cases, with a sensitivity of 90.02%, a to be significant based on the Wald test. Table 3 specificity of 79.42%, an efficiency of 84.72%, also shows the standard error (SE) of the coeffi- and a MAPE of 14.89%. cients and their odd ratio (OR). We then assessed In addition, by analyzing the weight of the in- the model with our test subsample. Our model puts of the neural network, we ranked the inde- correctly predicted 80.64% of all the cases in this pendent variables from the highest to the lowest sample, with a sensitivity of 52.78%, a specificity effect: AREA (7.7), EXPENSE (7.4), HYGIENIC of 91.88%, an efficiency of 72.33%, and a MAPE (7.4), STRATUM (7.0), MALE (6.2), OWN (6.0), of 19.36%. SIBLINGS (5.9), TYPE (5.9), OLD CHI (5.9), AGE (5.7), EDU HEAD (5.5), COMPO (5.5), 4.2 Neural network results ADEQ (5.1), DELAY (5.0), CENTER (4.9), UR- BAN (4.6), and UNMET (4.4). For purposes of comparison, we chose a simple The second neural network used only the 9 vari- neural network. Accordingly, we used a hidden ables that were statistically significant in the logit layer with activation functions Hyperbolic tan- 75 Table 3: Logistic regression with training sample Variables Model 1 (N=700) B SE Wald OR URBAN 3.649 0.441 68.563*** 38.455 EDU HEAD -0.056 0.01 28.969*** 0.946 SIBLINGS 0.181 0.081 4.978* 1.198 AREA SS SS 360.946*** SS STRATUM SS SS 71.179*** SS OWN SS SS 13.571* SS ADEQ 0.644 0.146 19.408*** 1.905 UNMET -0.366 0.117 9.721** 0.693 EXPENSE -0.262 0.084 9.707** 0.769 Constant -1.423 0.663 4.604 -2log likelihood 4459.518 Chi-square (Model) 2300.885*** (df=25, p-value=0.000) Hosmer & Lemeshow 39.889*** (df=8, p-value=0.000) Cox & Snell R2 33.20% Nagelkerke R2 47.80% Overall predicted % 82.61% Sensitivity 55.80% Specificity 93.02% *p < 0.05, **p < 0.01, ***p < 0.001 B=Coefficients; SE=Standard error; OR=Odds ratio SS=Skipped for simplicity. (For categorical variables with more than 2 categories, there is a coefficient for each category. We are choosing not to report them all because our focus is the predictive power of the model.) model for a straight comparison. In this model, ables and the same instances to ensure a fair com- with the training and validation subsamples we ob- parison. In addition, Table 4 shows the results tained the best neural network made up of 30 neu- of the neural network technique with the com- rons in the hidden layer and 1 neuron in the output plete 17 variables to assess if this non-linear model layer. Our model achieved 84.45% of correct total could extract important information from those 8 predictions, with a sensitivity of 79.61%, a speci- variables without a linear impact on the depen- ficity of 86.34%, an efficiency of 82.97%, and a dent variable. This table shows that overall neural MAPE of 15.55%. When using our models param- network technique performed better than the logit eters on the test subsample, it predicted 81.69% model. In fact, the neural network obtained the of all the cases, with a sensitivity of 78.86%, highest values of accuracy (correct total - positive specificity of 84.23%, efficiency of 81.55%, and and negative - predictions). Also, considering that a MAPE of 18.31%. it is more important to predict when a child has For this model, the ranking of the inputs accord- high probabilities of becoming a worker than to ing to their weights is: AREA (12.9), STRATUM predict that a child will be non-worker, sensitivity (12.6), EDU HEAD (12.4), SIBLINGS (12.3), stands as our most important metric when compar- URBAN (10.9), ADEQ (10.6), UNMET (9.7), ing models. By an inspection of Table 4, sensitiv- OWN (9.5), and EXPENSE (9.1). ity of the neural network technique was superior to the values obtained from the logit model. In spite of these results, the logit model was superior in 4.3 Technique comparison terms of specificity. However, specificity is a met- The results of the previous section are summarized ric for correct predictions of non-workers, which in Table 4. Considering that the logit model used is not relevant in our case. In addition, Figure 1 9 variables (8 were not considered because they shows the ROC curve of prediction for these tech- have no significant impact on the dependent vari- niques. able), the neural network used these same 9 vari- 76 Table 4: Comparison of results of predictive capacity in the test subsample Predictive measures Logit Neural network 9 Variables 9 Variables 17 Variables Accuracy 80.64% 81.69% 85.11% Sensitivity 52.78% 78.86% 90.02% Specificity 91.88% 84.23% 79.42% Efficiency 72.33% 81.55% 84.72% MAPE 19.36% 18.31% 14.89% Figure 1: ROC Curves Comparation 5 Discussion with 17 variables does not ignore information that is relevant to the prediction. This result could be Overall, the results show that the neural network used by decision makers to avoid discarding rele- technique surpasses the logit model in predictive vant factors when dealing with this phenomenon. capacity of child labor (sensitivity). Indeed, this Another important result is that the neural net- phenomenon may have a more complex structure work model shows that geographical indicators, than is assumed by the logit model. In conse- income levels, gender, family composition and ed- quence, the neural network (which adopts non- ucational levels significantly predict child labor. linear relationships) could capture sources of vari- These results are aligned with those of the logit ation that are not identified by the logit tech- model showing that stratum, geographic area, and nique. An accurate prediction of this phenomenon housing conditions have a significant impact on could be used by policy makers and government our dependent variable. These results can be used agencies to design adequate strategies or to invest to determine the relevance of each factor. In turn, scarce resources efficiently to deal with this prob- this relevance-based ranking of factors could fur- lem. ther help government agencies to better allocate Also, our findings show that the neural network their resources and implement their strategies to model with 17 variables performed better than the reduce child labor. 9-variables models (logit or neural network). This Finally, previous studies in this field have used result suggests that this additional set of variables linear statistical models to predict child labor. Our capture an important variability in explaining child study shows that the use of computational intel- labor. In other words, the neural network model 77 ligence techniques, such as the neural network, D W Hosmer, S Lemeshow, and R X Sturdivant. 2013. could provide better predictions, which leads to Applied Logistic Regression. Wiley Series in Proba- bility and Statistics. Wiley. better decision making. James M Lattin, J D Carroll, and P E Green. 2003. An- alyzing Multivariate Data. Number v. 1 in Analyz- References ing Multivariate Data. Thomson Brooks/Cole. Hussein Abdou, John Pointon, and Ahmed El-Masry. Pablo Lavado and José Gallegos. 2005. La dinámica de 2008. Neural nets versus conventional techniques in la deserción escolar en el Perú: un enfoque usando credit scoring in Egyptian banking. Expert Systems modelos de duración. Working Papers 05-08, De- with Applications 35(3):1275–1292. partamento de Economı́a, Universidad del Pacı́fico. https://ideas.repec.org/p/pai/wpaper/05-08.html. Lorena Alcázar. 2008. Asistencia y deserción en escue- las secundarias rurales del Perú, pages 41–82. Huong Thu Le and Ross Homel. 2015. The impact of Edward I Altman, Giancarlo Marco, and Franco child labor on children’s educational performance: Varetto. 1994. Corporate distress diagnosis: Com- Evidence from rural Vietnam. Journal of Asian Eco- parisons using linear discriminant analysis and neu- nomics 36:1–13. ral networks (the Italian experience). Journal of Banking & Finance 18(3):505–529. Luiz Renato Lima, Shirley Mesquita, and Marianne Wanamaker. 2015. Child labor and the wealth para- Gary Becker. 1962. Investment in Human Capital: A dox: The role of altruistic parents. Economics Let- Theoretical Analysis. Journal of Political Economy ters 130:80–82. 70. Karla Lopez. 2016. Determinantes del trabajo infantil Christopher M Bishop. 1995. Neural Networks for y la deserción escolar en menores de 12 a 17 años Pattern Recognition. Oxford University Press, Inc., en el Perú para los años 2006 y 2014. Bachelor’s New York, NY, USA. degree, PUCP, Perú. George EP Box and Paul W Tidwell. 1962. Transformation of the Independent Vari- S James Press and Sandra Wilson. 1978. Choosing be- ables. Technometrics 4(4):531–550. tween Logistic Regression and Discriminant Analy- https://doi.org/10.1080/00401706.1962.10490038. sis. Journal of the American Statistical Association 73(364):699–705. Patrick M. Emerson and André Portela Souza. 2002. Bargaining over sons and daughters: Child labor, Diego C Rodrigues, David N Prata, and Michel A school attendance and intra-household gender bias Silva. 2015. Exploring social data to understand in brazil. child labor. International Journal of Social Science and Humanity 5(1):29. Luis Garcı́a. 2006. The supply of child labor and household work. MPRA Paper 31402, University Jose Rodriguez. 2002. Adquisición de educación esco- Library of Munich, Germany. lar básica en el Perú: uso del tiempo de los menores Claudia Goldin. 2016. Human Capital, Springer Ver- en edad escolar. Departamento de Economı́a lag, Heidelberg, Germany. - Pontificia Universidad Católica del Perú. http://EconPapers.repec.org/RePEc:pcp:pucotr:otr- Victorı́a Orazem Peter F & Sánchez Mario A Gunnars- 2002-02. son. 2006. Child labor and school achievement in latin america . Jose Rodrı́guez and Silvana Vargas. 2009. Trabajo infantil en el Perú. Magnitud y perfiles vulnera- Joe F Hair, Christian M Ringle, and Marko Sarstedt. bles. Informe Nacional 2007-2008. Departamento 2011. PLS-SEM: Indeed a Silver Bullet. Journal de Economı́a - Pontificia Universidad Católica del of Marketing Theory and Practice 19(2):139–152. Perú. https://doi.org/10.2753/MTP1069-6679190202. José Rodrı́guez and Silvia Vargas. 2008. Escolaridad y Eric A Hanushek. 2013. Economic growth in develop- trabajo infantil: patrones y determinantes de la asig- ing countries: The role of human capital. Economics nación del tiempo de niños y adolescentes en Lima of Education Review 37:204–212. Metropolitana. Technical report, PUCP, Peru. Simon Haykin. 1998. Neural Networks: A Compre- hensive Foundation. Prentice Hall PTR, Upper Sad- David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. dle River, NJ, USA, 2nd edition. Williams. 1986. Learning internal representations by error propagation. In Parallel Distributed Pro- Huajing He. 2016. Child labour and academic achieve- cessing: Explorations in the Microstructure of Cog- ment: Evidence from gansu province in china. nition, Volume 1: Foundations, MIT Press, pages China Economic Review 38(C):130–150. 318–362. 78 Claudio Sapelli and Arı́stides Torche. 2004. Deserción Escolar y Trabajo Juvenil: ¿Dos Caras de Una Misma Decisión? Cuadernos de economı́a pages 173–198. Mariela Sausa. 2016. El trabajo infantil es más alto y más penoso en las zonas rurales. Perú 21. BG Tabachnick and LS Fidell. 2007. Multivariate anal- ysis of variance and covariance. In Using Multivari- ate Statistics, Allyn and Bacon Boston. Jacek M Zurada. 1992. Introduction to artificial neural systems. West St. Paul. 79