Exploratory network analysis of large social science questionnaires Robert J. B. Goudie Sach Mukherjee Frances Griffiths Department of Statistics Department of Statistics & Health Sciences Research Institute University of Warwick Centre for Complexity Science University of Warwick Coventry, UK University of Warwick Coventry, UK Coventry, UK Abstract in the data. There are now many large surveys of individ- Graphical models provide a statistical framework uals that include questions covering a wide within which the relationship between variables can range of behaviours. We investigate longi- be studied. These models enable complex multivari- tudinal data from the Add Health survey of ate distributions to be decomposed into simpler local adolescents in the US. We describe how struc- distributions. This can reveal a great deal about the tural inference for (dynamic) Bayesian net- relationships between the variables, as well as provide works can be used to explore relationships be- a statistical and computationally tractable description tween variables in such data and present this of their (often large) joint distribution. The decompo- information in an interpretable format for sition is formed by the conditional independence struc- subject-matter practitioners. Surveys such ture, which can be represented by a graph. The use of as this often have a large sample-size, which, graphs helps to make the interpretation of the model whilst increasing the precision of inference, simpler. In this paper, we focus on the structure of may mean that the posterior distribution the model, as given by the graph. We aim to make over Bayesian networks (or graphs) is con- inference about this using statistical model selection. centrated on disparate graphs. In such situ- The structure of the model suggests how the differ- ations, the standard MC3 sampler converges ent components of the system interact, which may be very slowly to the posterior distribution. In- helpful in understanding the system as a whole. These stead, we use a Gibbs sampler (1), which methods have been widely adopted in molecular biol- moves more freely through graph space. We ogy (2, 3), and have been used in some areas of medical present and discuss the resulting Bayesian sciences (4). network, focusing on depression, and provide Consideration of unexpected relationships between estimates of how different variables affect the factors requires datasets that incorporate a wide range probability of depression via the overall prob- of topics. Such data is now widely available for rep- abilistic structure given by the Bayesian net- resentative samples of populations in many countries, work. and for many sub-groups of interest. Many of these datasets are derived from surveys that are general in scope, and are not collected to study any one par- 1 INTRODUCTION ticular question. For example, in the US, the health of the whole population is representatively sampled Hypotheses of multifactorial causes of symptoms and annually for the Behavioral Risk Factor Surveillance outcomes play an important role in the social sciences System (BRFSS) survey, and the Add Health study, and in public health. Regression-based approaches are which we use here, followed a cohort of young peo- widely-used in these fields to explore such hypotheses. ple from 1994 until 2008. Data from both of these A great deal of insight can be gained through such have been used in scores of studies, but these com- approaches, but it is sometimes overly constraining to monly focus on one specific aspect, often using the fix a particular quantity as the dependent variable, data to evaluate existing hypotheses. Given the wide especially if the goal is to explore the possibility of un- scope inherent in the design of these studies and the expected relationships between the data. Instead, we large samples available in many cases, it is possible to can consider a number of variables on an equal footing, broaden the scope of the analysis by considering richer and study the possibility of unexpected relationships structures. In this paper, we discuss the potential that Scale (CES-D) (13). Two questions from the 20-item such a more explorative approach yields. We do not scale are omitted from AddHealth, and two are mod- seek to make conclusive causal claims, but instead sug- ified, and so we scale the score given by the available gest that a broader approach may uncover important questions (14). A Receiver Operating Characteristic aspects that have been neglected. (ROC) analysis showed that thresholds of 24 for fe- males and 22 for males provided the best agreement Our focus will be on depression among adolescents in with clinical assessments of depression (15). We use the US, drawing on data from the National Longitu- this threshold to create a binary indicator of depres- dinal Study of Adolescent Health (Add Health). It is sion status. estimated that around 1–6% of adolescents each year are affected by depression (5, 6). The effects of de- Many of the remainder of the variables that we con- pression in this age-group are wide-ranging (7), and sider (Table 1) are drawn from the risk factors de- include the stigma associated with poor mental health scribed in the depression literature, and the mental more generally (8). There is considerable evidence that health literature more generally. A recent review (8) there are a wide range of causal factors for depression described a wide range of factors that are associated amongst adolescents, spanning biological, psycholog- with poor mental health in young people, including ical and social domains. Understanding these causal gender, poverty, violence and the absence of social net- factors and separating them from the consequences of works in the local neighbourhood. The quality of rela- depression has been recognised as an important aim tionships with parents is also thought to be important, (9). Some of the relevant causal factors may interact especially with the mother (16), as are parental alcohol and the approach taken here accounts for this. problems (17) and parental discord (16). The individ- ual’s use of alcohol, drugs, smoking and HIV/AIDS The remainder of this paper is organised as follows. are all also associated with depression (18, 19). Phys- We first introduce the AddHealth dataset and de- ical exercise has been proposed in some studies as a scribe the Bayesian network framework. Inference for useful intervention for the management of depression, Bayesian networks is performed using Markov Chain but many of these studies have been deemed to be poor Monte Carlo (MCMC), but the large sample size of quality (20). the dataset we consider makes achieving convergence difficult because the posterior distribution may be con- centrated on disparate graphs, and so we describe an 2.2 Bayesian Networks alternative sampler that has superior properties in this Our study uses Bayesian networks to explore the rela- situation. Whilst the PC-algorithm (10, 11) has prop- tionships between variables in the Add Health study. erties that often make it attractive in such contexts, Bayesian networks are a particular type of graphical we found that the results in this situation were not model that enable classes of probability distributions robust (see Discussion). We then present and discuss to be specified using a directed acyclic graph (DAG). the results for the Add Health dataset. A Bayesian network G is represented using a DAG with vertices V = (V1 , . . . , Vp ), and directed edges 2 MATERIALS AND METHODS E ⊂ V × V . The vertices correspond to the compo- nents of a random vector X = [X1 , . . . , Xp ]T , subsets 2.1 Add Health of which will be denoted by XA for sets A ⊆ {1, . . . , p}. For 1 ≤ i, j ≤ p, we define the parents Gj of each The data that we use are drawn from the National Lon- node Vj to be the subset of vertices V such that gitudinal Study of Adolescent Health (Add Health) Vi ∈ Gj ⇔ (Vi , Vj ) ∈ E. Specifying the parents of the that explores health-related behavior of adolescents vertices determines the edges E of the graph G. We (12) in the US. The questionnaire contains over 2000 denote by G the space of all possible directed acyclic questions that cover many aspects of adolescent be- graphs with p vertices. We will use XGi to refer to the haviours and attitudes. We consider the representa- random variables that are parents of Xi in the graph tive sample of adolescents from Waves I and II of the G. in-home section, and the parental questionnaire from The graph specifies that the joint distribution for X, Wave I of the study. The analysis we perform is not with parameters θ = (θ1 , . . . , θp ), can be written as feasible when the data is not complete (see Discus- a product of conditional distributions p(Xi | XGi , θi ), sion), and so individuals with missing data were re- given the variables XGi corresponding to the parents moved from the study. Removing incomplete samples of Xi in the graph. leaves 5975 individuals in the study. p Our measure of depression is a self-assessed scale based Y p(X | G, θ) = p(Xi | XGi , θi ) upon the Centre for Epidemiologic Studies Depression i=1 We will need to be able to evaluate the marginal like- space fully because the sampler may become ‘trapped’ lihood p(X | G) easily, and so we consider only a con- in one mode. This issue becomes more severe as the jugate analysis in which the conditional distributions sample size increases because the posterior distribu- p(Xi | XGi , θi ) are multinomial, with Dirichlet priors tion becomes more concentrated. A natural approach p(θi ) for each θi . In this case, the marginal likelihood in such situations is to use the PC-algorithm (10, 11), can be evaluated analytically. Suppose each Xi takes which has been shown to be asymptotically consistent one of ri values, and define qi as the number of levels (23), but we found in this case that the results were of the sample space of XGi , each element of which we not robust (see Discussion). call a configuration. For each configuration j of XGi , Our analyses in this paper were performed using a let Nijk be the number of observations in which Xi Gibbs sampler (1), which we found to converge rapidly takes value k. We assume the Dirichlet priors for each 0 to its equilibrium state. A naı̈ve Gibbs sampler for θi , each with hyperparameters Nijk , are independent. Pri 0 Pri 0 structural inference that proposes single-edge addi- We define Nij = N k=1 ijk and N ij = k=1 Nijk , tions and removals can easily be constructed, but and the local score p(Xi | XGi ) to be this sampler offers no advantages over the analogous qi 0 ri 0 MC3 . This naı̈ve scheme, however, can be improved Y Γ(Nij ) Y Γ(Nijk + Nijk ) p(Xi | XGi ) = . by ‘blocking’ together a number of components, and Γ(N + N 0 ) Γ(N 0 ) j=1 ij ij k=1 ijk sampling from their joint conditional distribution. In theory, any group of components can be taken as a The marginal likelihood Qp can be shown to equal the block, but sampling from their joint conditional distri- product p(X | G) = i=1 p(Xi | XGi ) of these local bution needs to be possible and, ideally, computation- scores (21). ally quick. For Bayesian networks, the most natural blocks are 2.3 Structural inference for Bayesian those consisting of parent sets G1 , . . . , Gp . This is Networks natural because the marginal likelihood p(X | G) for We aim to make inference about the DAG G, given a graph G factorises across vertices into conditionals data X and so our interest focuses on the posterior p(Xj | XGj ) and these conditionals depend on the par- distribution Pr(G | X) on Bayesian networks. Under ent set of the vertex. Therefore, since any graph G ∈ G the assumptions we have made, this can be written in can be specified by a vector G = (G1 , . . . , Gp ) of parent terms of the marginal likelihood p(X | G), and a prior sets, the posterior distribution on Bayesian networks π(G) for the Bayesian network structure. G ∈ G can be written as functions of G1 , . . . Gp in the following way. p Y Pr(G | X) ∝ π(G) p(Xi | XGi ) p Y i=1 Pr(G1 , . . . , Gp | X) ∝ π(G1 , . . . , Gp ) p(Xi | XGi ) i=1 The priors π(G) can be chosen to encode domain infor- mation (3). For the analyses in this paper, we choose In the following, we will denote subsets of the vector an improper prior π(G) ∝ 1 that is flat across the G = (G1 , . . . , Gp ) by GA = {Gk : k ∈ A}, and the space of graphs. subset given by the complement AC = {1, . . . , p} \ A of a set A will be denoted by G−A = {Gk : k ∈ AC }. The posterior distribution Pr(G | X) is difficult In particular, the complete graph can be specified by to evaluate, because cardinality of G grows super- G = (G1 , . . . Gp ) = (Gi , G−i ) for any i ∈ {1, . . . , p}. exponentially in p. This motivates the use of approx- imations to Pr(G | X), which are usually based on To be able to construct a Gibbs sampler using Markov chain Monte Carlo (MCMC). parent sets, we need to find their conditional dis- tribution, given the other parent sets G−j = 2.4 Approximate inference for Bayesian {G1 , . . . , Gj−1 , Gj+1 , . . . , Gp }. Parent sets Gj for Networks which G = (Gj , G−j ) is cyclic will have no probabil- ity mass in the conditional distribution. Let Kj? be The standard form of MCMC that is used for struc- the set of parent sets Gj such that G = (Gj , G−j ) is tural inference for Bayesian networks is MC3 (22). acyclic. The conditional posterior distribution of Gj is This is a Metropolis-Hastings sampler that explores G multinomial, with weights given by the posterior dis- by proposing to add or remove a single edge from the tribution of G = (Gj , G−j ). When the cardinality of current graph G. This sampler works surprisingly well Kj? is constrained (for example, by restricting the max- in many situations, but if the posterior distribution is imum number of parents of each node) the conditional not unimodal, the local moves may fail to explore the posterior distribution for Gj ∈ Kj? can be evaluated MC3 sampler Gibbs sampler Depressed (time point 1) ● ● 1.0 ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● Didn't present to doctor (2) ● ● ● ●●● 0.8 ● ● Female ● ● ● ● ● Didn't present to doctor (1) ● ● ● ● 0.6 ●● Good health (2) Run 2 ● ● ●● ● ● ● ● 0.4 ● ● ● ● ● Good health (1) ●●● ● ● ● ● ●● ● ● ●● ● Victim of violence (2) ●● ● ●● ● ●● ● 0.2 ● ●● ● ● ● ● ●●● ● ● ● ● Strong academically (2) ● ● ●● ●● ●● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● Drug user (1) ●● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.10 0.15 0.20 0.25 0.30 0.35 Run 1 Prob. of depression, time point 2 3 Figure 1: Diagnostic runs for MC (left) and the Figure 3: Conditional probability of depression. The Gibbs sampler (right). The posterior edge probabili- conditional probability of being depressed at Wave II ties given by two independent runs are plotted against given the variable indicated is changed to the level in- each other. When the two runs give the same estimates dicated by the colours, conditional on the DAG shown of the posterior edge probabilities, all of the points ap- in Figure 2. For binary variables, is true, and is pear on the line y = x. We observe that the two Gibbs false; shades of grey indicate intermediate levels. Wave runs gives similar posterior edge probabilities, but the number (time point) is indicated in parentheses. Only MC3 runs do not. (5 runs of 750,000 samples (MC3 ) variables for which the conditional probability differed or 100,000 samples (Gibbs) of each sampler were per- between levels by at least 0.005 are displayed. formed; the first half of the samples were discarded as burn-in; mean Pearson correlation between runs was 0.9999 ± 0.0002 (standard deviation) for Gibbs and 0.6322 ± 0.0477 for MC3 .) 3 RESULTS The variables that we consider are detailed in Ta- ble 1. As is common when using graphical models (24), all of these variables were grouped, initially into exactly. ‘Background’, ‘Wave I’ and ‘Wave II’, and then re- Pr(Gj , G−j | X) fined into whether the question asked about the long- Pr(Gj | G−j , X) = or short-term, as shown in Table 2. These groups de- Pr(G−j | X) fine constraints on the Bayesian networks that are con- Pr(Gj , G−j | X) =P (1) sidered. Specifically, no edges can be directed back- Gj ∈K ? Pr(Gj , G−j | X) j wards through the groups. Edges, however, are al- lowed within groups. For example, no edge is allowed We can improve the speed of convergence of this sam- to be directed into ‘Gender’, and no edge can pass pler by allowing pairs of parent sets to be sampled backwards in time, for example, from Depression at together. At each step of the Gibbs sampler we Wave II to Depression at Wave I. Additionally, no conditionally sample pairs of parent sets (Gj1 , Gj2 ), edge can pass from a short-term variable to a long- given the remainder of the graph G−{j1 ,j2 } . Parent term variable, for example, from Depressed at Wave I sets G−{j1 ,j2 } such that G = (Gj1 , Gj2 , G−{j1 ,j2 } ) is to Have HIV/AIDS at Wave I. cyclic have no probability mass in the conditional dis- We precomputed the local scores, and then drew tribution. Let Kj?1 ,j2 be the set of pairs of parent 100,000 samples (the first half of which were discarded sets (Gj1 , Gj2 ) such that G = (Gj1 , Gj2 , G−{j1 ,j2 } ) as burn-in) using the Gibbs sampler (Section 2.3), is acyclic. For (Gj1 , Gj2 ) ∈ Kj?1 ,j2 , the conditional which took 30 minutes (on a single core of a cluster posterior distribution is multinomial, by analogy with computer). The graph space was constrained such that (1), with weights given by posterior distribution of no node had more than 3 parents, to ensure Equation G = (Gj1 , Gj2 , G−{j1 ,j2 } ). 1 could be evaluated. Pr(Gj1 , Gj2 | G−{j1 ,j2 } , X) We ran 5 independent samplers, with disparate initial Pr(Gj1 , Gj2 , G−{j1 ,j2 } | X) states. This enables a simple test of convergence to =P (Gj ,Gj )∈K ? Pr(Gj1 , Gj2 , G−{j1 ,j2 } | X) be performed that compares the posterior edge prob- 1 2 j1 ,j2 abilities obtained from each of the independent runs Similarly, sets of three parent sets can be conditionally (25). The agreement between runs can be examined sampled. Full technical details are presented in (1). graphically by plotting the edge probabilities against Family bereavement (2) Been expelled (2) Victim of violence (2) Hisp/Latino Have HIV/AIDS (2) Been suspended (2) Strong academically (2) Family bereavement (1) Been expelled (1) Parents unhappy together (1) Seen shooting (2) In physical fights (2) Family poor (1) In physical fights (1) Householder smokes (1) Victim of violence (1) Black/Af Am Live with father (1) Been suspended (1) Learning disability Parent drinks (1) Strong academically (1) Experiences prejudice (2) Seen shooting (1) Talks to neighbours (1) Live with father (2) Female Experiences prejudice (1) White Talks to neighbours (2) Skips school (1) Live with mother (1) Severely injured (1) Live with mother (2) Exercises (2) Drug user (1) Parents aid decisions (1) Good health (1) Exercises (1) Mother warm/loving (1) Didn't present to doctor (1) Depressed (2) Smoker (1) Mother warm/loving (2) Depressed (1) Severely injured (2) Skips school (2) Age Alcohol (1) Drug user (2) Didn't present to doctor (2) Alcohol (2) Smoker (2) Good health (2) Asian/Pac Isl. Am Ind/Nat Am Have HIV/AIDS (1) Other race Figure 2: Summary network for the AddHealth variables considered. The edge colors are given by the Kendall correlation coefficents between the two variables, with green edges corresponding to positive correlation, and red edges to negative correlation. The strength of the correlation is indicated by the transparency of the line, with greater transparency indicating weaker correlation. The variables ‘Depressed (1)’, ‘Depressed (2)’ and their parents are shown in bold. each other (Figure 1). Mean Pearson correlation coef- the model does not say that these are the only factors ficients between edge probabilities from pairs of runs that are important. For example, “Drug user” at Wave were 0.9999±0.0002 (standard deviation) for the Gibbs I is related to depression through “Didn’t present to sampler and 0.6322 ± 0.0477 for MC3 . The agreement doctor” at Wave I and II (Figure 2). between the independent runs of the Gibbs sampler This is shown in Figure 3, which gives the conditional gave us confidence in our results, in contrast to the probability of being depressed at Wave 2 when a par- large disagreements between MC3 runs. In addition, ticular variable is set to a specific value. We see that cumulative edge probability plots for each edge showed general health, violence, academic performance and regular excursions around the mean (26), and a nu- drug use all affect the conditional probability of de- merical diagnostic (27) monitoring the number edges pression at Wave II. Note that to compute this prob- in the sampled graph also clearly suggested that suffi- ability, links from the parents of the variable in which cient samples had been drawn (R̂ ≈ 1.0). we ‘intervene’ are removed; this is equivalent to the The samples drawn using MCMC allow the posterior ‘do-operator’ in the terminology of Pearl (28). distribution of Bayesian networks to be approximated. The analysis reveals the interaction between the many In particular, the samples can be used to estimate the aspects of life that have an impact on depression. The posterior edge probability P (e|X) with e ∈ E. Fig- connection between the depression and its two parents ure 2 displays all edges with posterior probability of in Figure 2 have been previously discussed in the lit- at least 0.5. erature. The importance of gender in depression is Our focus is on depression, the parents of which in particularly extensively documented in the literature Figure 2 we observe are “Didn’t present to doctor” (8). The connection to a failure in seeking medical care and “Gender”. It important, however, to note that even when the individual thinks they should has also been discussed in the literature, often in terms of poor pression have not yet been demonstrated to be cost accessibility of health care services for young people effective (31). (29, 8). Several decades of research have revealed the We performed structural inference for the Bayesian complex causation of depression in young people, as network using a Gibbs sampler (1), because MC3 did suggested by this study (8). not mix in a reasonable time. We have also found (1) this algorithm to be superior to the REV sampler 4 DISCUSSION (32), and it has the advantage of avoiding the need to consider an order prior as required by order MCMC There is a large amount of information held in large methods (33, 34), which induces a bias that can only social science questionnaires. In this paper we have ex- be corrected exactly by NP-hard computation of a cor- amined a graphical model approach to inferring struc- rection factor. ture amongst the variables in such questionnaires. In An alternative to the MCMC method used here is the contrast to the standard regression-based approaches, PC-algorithm (10, 11). This method is computation- a graphical model approach forgoes the need to specify ally efficient and is asymptotically consistent. How- a particular variable as the response. Instead, a more ever, to test whether the sample size available here is comprehensive estimate of the entire structure of the sufficient to reach the asymptotic regime, we applied underlying system can be obtained. Regression ap- the PC-algorithm (without constraints) to 10 differ- proaches posit a particular conditional-independence ent subsamples, each containing 90% of the data. We structure, while graphical approaches allow considera- found that these results differed significantly, with a tion of more general structures. mean 84 in structural Hamming distance between the The limitations of this study include those of all simi- pairs of completed partially directed acyclic graphs lar studies using observational data that are collected (CPDAGs) given for the subsamples. for multiple audiences. These forms of data, including We used a Multinomial-Dirichlet model for the local the longitudinal data used here, do not permit strong conditional distributions, which yields a closed-form causal conclusions to be drawn. In particular there marginal likelihood. This model posits an entirely may be important variables that we have not included general discrete distribution, allowing its form to be in the analysis. However, the results are consistent guided by the data. However, the number of parame- with studies that have used other research approaches ters in the local distributions for this model increases including experimental designs. The connection be- exponentially with the number of parents, which may tween an individual not seeking medical care when mean that overly-sparse models are preferred. This is they think they should and depression supports cur- problematic when the sample size of the available data rent practice guidance in the UK (30) where there is an is small, because models with many parameters cannot emphasis on providing access to health care through be assessed adequately without a large dataset. The the school system rather than expecting young people large sample size of the dataset used here minimises to seek health care themselves. Not seeking medical this issue, but it would nonetheless be worthwhile to care despite believing it should be sought is a com- consider more compact parameterisations. However, plex factor because it captures both barriers to getting estimating such models (35) significantly increases the medical care within the individual, such as lacking mo- complexity of the model space, which makes such an tivation to seek care, and barriers within the individ- approach computationally challenging in this setting. ual’s environment, such as poor access to care. This may mean that the variable encapsulates a number of For this paper, we removed samples with missing data. different characteristics related to depression, and thus It is possible to handle missing data formally, for exam- may form a ‘marker’ for depression. However, the use ple by using structural EM (36), and similarly consider of a form of the question “Has there been any time over latent variables (e.g. shared genetics driving both child the past year when you thought you should get medi- and parent behaviour). However, at present, doing so cal care, but you did not?” as a screening question in whilst robustly exploring large model spaces remains different contexts needs further consideration. an open challenge. Tackling these computational and inferential issues is a key area for future research. This method of analysis clarifies the complexity of depression and suggests why when using traditional References methods of analysis it can be difficult to clarify whether or not factors, such as experiences in the fam- [1] Goudie, R. J. B. and Mukherjee, S. M. (2011). An ily, in the wider community and at school, impact on Efficient Gibbs Sampler for Structural Inference the experience of depression for young people. It may in Bayesian Networks. CRiSM Working Paper 11- also suggest why interventions for prevention of de- 21 (Dept. of Statistics, University of Warwick). [2] Friedman, N. (2004) Science, 303, 5659, 799–805. [23] Kalisch, M. and Bühlmann, P. (2007) J Mach [3] Mukherjee, S. and Speed, T. P. (2008) Proc Natl Learn Res, 8, 613–636. Acad Sci USA, 105, 38, 14313–14318. [24] Cox, D. and Wermuth, N. (1996) Multivariate De- [4] Acid, S., de Campos, L. M., Fernández-Luna, pendencies Models, Analysis and Interpretation J. M., Rodrı́guez, S., Rodrı́guez, J. M. and Sal- (Chapman & Hall, London). cedo, J. L. (2004) Artif Intell in Med, 30, 3, 215– [25] Robert, C. P. and Casella, G. (2004) Monte Carlo 232. Statistical Methods (Springer, New York). [5] Costello, E. J., Mustillo, S., Erkanli, A., Keeler, [26] Yu, B. and Mykland, P. (1998) Statistics and G. and Angold, A. (2003) Arch Gen Psych, 60, 8, Computing, 8, 3, 275–286. 837–844. [27] Gelman, A. and Rubin, D. B. (1992) Statistical [6] Costello, E. J., Erkanli, A. and Angold, A. (2006) Science, 7, 4, 457–472. J Child Psychol Psych, 47, 12, 1263–1271. [28] Pearl, J. (2000) Causality: Models, Reasoning, [7] Thapar, A., Collishaw, S., Potter, R. and Thapar, and Inference (Cambridge University Press, New A. K. (2010) Br Med J, 340, c209. York). [8] Patel, V., Flisher, A. J., Hetrick, S. and McGorry, [29] Rickwood, D. J., Deane, F. P. and Wilson, C. J. P. (2007) Lancet, 369, 9569, 1302–1313. (2007) Med J Aus, 187, 7 Suppl, S35–S39. [9] Barnett, P. A. and Gotlib, I. H. (1988) Psych Bull, [30] National Institute for Health and Clinical Excel- 104, 1, 97–126. lence (2005) Depression in Children and Young [10] Spirtes, P., Glymour, C. and Scheines, R. (2000) People (NICE, London). Causation, Prediction, and Search (The MIT [31] Merry, S. N. (2007) Curr Opin Psych, 20, 4, 325– Press, Cambridge, MA). 329. [11] Korb, K. B. and Nicholson, A. E. (2011) Bayesian [32] Grzegorczyk, M. and Husmeier, D. (2008) Mach Artificial Intelligence (CRC Press, Boca Raton, Learn, 71, 2-3, 265–305. FL). [33] Ellis, B. and Wong, W. H. (2008) J Am Stat As- [12] Harris, K. M., Halpern, C. T., Whitsel, E. A., soc, 103, 482, 778–789. Hussey, J., Tabor, J., Entzel, P. and Udry, J. R. (2009) The National Longitudinal Study of Ado- [34] Friedman, N. and Koller, D. (2003) Mach Learn, lescent Health: Research Design. 50, 1-2, 95–125. [13] Radloff, L. (1977) App Psych Meas, 1, 3, 385–401. [35] Friedman, N. and Goldszmidt, M. (1996) In Proc. Twelfth Conference on Uncertainty in Artificial [14] Goodman, E. (1999) Am J Pub Health, 89, 10, Intelligence (UAI-96) (Morgan Kaufmann Pub- 1522–1528. lishers Inc.), 252–260. [15] Roberts, R. E., Lewinsohn, P. M. and Seeley, J. R. [36] Friedman, N. (1998) In Proc. Fourteenth Con- (1991) J Am Acad Child Adolesc Psych, 30, 1, 58– ference on Uncertainty in Artificial Intelligence 66. (UAI-98) (Morgan Kaufmann Publishers Inc.), [16] Holt, S., Buckley, H. and Whelan, S. (2008) Child 129–138. Abuse & Neglect, 32, 8, 797–810. [17] Obot, I. S. and Anthony, J. C. (2004) J Child Adolesc Subst Abuse, 13, 4, 83–96. [18] Brown, R. A., Lewinsohn, P. M., Seeley, J. R. and Wagner, E. F. (1996) J Am Acad Child Adolesc Psych, 35, 12, 1602–1610. [19] Battles, H. B. and Wiener, L. S. (2002) J Adoles Health, 30, 3, 161–168. [20] Larun, L., Nordheim, L. V., Ekeland, E., Hagen, K. B. and Heian, F. (2006) Cochrane Database Syst Rev, 3, CD004691. [21] Heckerman, D., Geiger, D. and Chickering, D. M. (1995) Mach Learn, 20, 197–243. [22] Madigan, D. and York, J. C. (1995) Int Stat Rev, 63, 2, 215–232. Table 1: The table shows the label used in the plots above, the number of levels (r), and the exact word- ing of the question. The ID(s) of the relevant variables in the Add Health dataset are in parentheses. See www.cpc.unc.edu/projects/addhealth for full details of all of these questions. Label r Question Female 2 Interviewer, please confirm that R’s sex is (male) female. (BIO SEX) Hisp/Latino 2 Are you of Hispanic or Latino origin? (H1GI4) White 2 What is your race? [White] You may give more than one answer (H1GI6A) Black/Af Am 2 What is your race? [Black or African American] You may give more than one answer (H1GI6B) Am Ind/Nat Am 2 What is your race? [American Indian or Native American] You may give more than one answer (H1GI6C) Asian/Pac Isl. 2 What is your race? [Asian or Pacific Islander] You may give more than one answer (H1GI6D) Other race 2 What is your race? [Other] You may give more than one answer (H1GI6E) Skips school 4 [If SCHOOL YEAR:] During this school year [If SUMMER:] During the 1994- 1995 school year how many times HAVE YOU SKIPPED/DID YOU SKIP school for a full day without an excuse? (H1ED2; H2ED2) Experiences prejudice 3 [If SCHOOL YEAR:] Students at your school are prejudiced [If SUMMER:] Last year, the students at your school were prejudiced. (H1ED21; H2ED17) In physical fights 4 In the past 12 months, how often did you get into a serious physical fight? (H1DS5; H2FV16) Didn’t present to doc- 2 Has there been any time over the past year when you thought you should get tor medical care, but you did not? (H1GH26; H2GH28) Severely injured 3 Which of these best describes your worst injury during the past year? (H1GH54; H2GH47) Have HIV/AIDS 2 Have you ever been told by a doctor or a nurse that you had... HIV/AIDS (H1CO16D; H2CO19D) Seen shooting 3 During the past 12 months, how often did each of the following things happen? You saw someone shoot or stab another person. (H1FV1; H2FV1) Mother warm/loving 4 Most of the time, your mother is warm and loving toward you. (H1PF1; H2PF1) Been suspended 2 Have you ever received an out-of-school suspension from school? (H1ED7; H2ED3) Been expelled 2 Have you ever been expelled from school? (H1ED9; H2ED5) Good health 3 In general, how is your health? Would you say... (H1GH1; H2GH1) Talks to neighbours 2 In the past month, you have stopped on the street to talk with someone who lives in your neighborhood? (H1NB2; H2NB2) Age 5 Age at interview, computed from date of birth, and date of interview (Con- structed from IYEAR, IMONTH, IDAY, H1GI1Y, H1GI1M) Live with mother 2 Indicator variable (Constructed from H1HR3A-T; H2HR4A-Q) Live with father 2 Indicator variable (Constructed from H1HR3A-T; H2HR4A-Q) Smoker 4 Frequency of smoking (Constructed from H1TO1/2/5; H2TO1/5) Drinks alcohol 4 Frequency and amount of drinking alcohol (Constructed from H1TO12/15/18; H2TO15/19/22) Exercises 3 Amount of exercise (Constructed from H1DA4/5/6; H2DA4-6) Depressed 2 Rescaled CES-D, following (14) (Constructed from H1FS1-18; H2FS1-18) Victim of violence 2 Indicator variable (Constructed from H1FV2-6; (H2FV2-5) Family bereavement 3 Number of bereavements (Constructed from H1NM2/F2, H1FP24A1-5; H2NM4/F4, H2FP28A1-3) Strong academically 4 Quartiles (Constructed from H1ED11-4; H2ED7-10) Drug user 2 Indicator variable (Constructed from H1TO30/34/37/41; H2TO44/50/54/58) Family poor 5 Census Bureau measure of poverty (Constructed from H1HR2/3/7/8, PA55) Parents unhappy to- 4 (Parent asked.) Do you and your partner argue/talk of separating? (Constructed gether from PB19/20) Parent drinks 4 (Parent asked.) Number/frequency of drinks (Constructed from PA61/2) Householder smokes 3 (Parent asked.) Either parent or others in household smokes (Constructed from PA63/4) Has learning disability 2 (Parent asked.) Does (he/ she) have a specific learning disability, such as diffi- culties with attention, dyslexia, or some other reading, spelling, writing, or math disability? (PC38) Parents aid decisions 5 (Parent asked.) How often would it be true for you to make each of the following statements about {child’s name}? {Child’s name} and you make decisions about (his/ her) life together. (PC34B) Table 2: The groupings of the variables that were used to determine constraints on the Bayesian networks. Each variable in the analysis is either a Background variable, or from Wave I or Wave II of the Add Health study. Within each wave of the study, variables were further classified into whether they asked about the short- or long-term. Background Wave I Long-term Wave I Short-term Wave II Long-term Wave II Short-term Female Skips school Househol. smokes Seen shooting Smoker Age Experiences prejudice Smoker Alcohol Live with mother Hisp/Latino In physical fights Live with mother Drug user Live with father White Didn’t pres. to doctor Live with father Mother warm/loving Talks neighbours Black/Af Am Severely injured Parent drinks Have HIV/AIDS Exercises Am Ind/Nat Am Have HIV/AIDS Talks neighbours Family bereavement Depressed Asian/Pac Isl. Seen shooting Exercises Experiences prejudice Other race Mother warm/loving Depressed Been expelled Has learning dis. Been suspended Been suspended Been expelled Victim of violence Good health In physical fights Alcohol Strong academically Victim of violence Didn’t pres. to doctor Family bereavement Skips school Strong academically Severely injured Drug user Good health Family poor Parents unhappy togth. Parents aid decisions