Department of Psychology and Health Science Research Institute, Warwick Medical School, University of Warwick, Coventry
Department of Oral and Dental Science, University of Bristol
Department of Psychology, University of Warwick, Coventry
Department of Community-based Medicine, University of Bristol
Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Kings College London
Institute of Health Service Research, Peninsula College of Medicine and Dentistry, Exeter
Department of Psychology, University of Warwick, Coventry, UK
Correspondence: Dieter Wolke, Department of Psychology, University of Warwick, Coventry CV4 7AL, UK. Email: D.Wolke{at}warwick.ac.uk
The UK Medical Research Council, the Wellcome Trust and the University of Bristol provide core support for ALSPAC. This research was specifically funded by the Health Foundation to D.W., R.G., Jean Golding and Mike Beveridge (Grant 265/1981).
|
|
|---|
Participant drop-out occurs in all longitudinal studies, and if systematic, may lead to selection biases and erroneous conclusions being drawn from a study.
Aims
We investigated whether drop out in the Avon Longitudinal Study of Parents And Children (ALSPAC) was systematic or random, and if systematic, whether it had an impact on the prediction of disruptive behaviour disorders.
Method
Teacher reports of disruptive behaviour among currently participating, previously participating and never participating children aged 8 years in the ALSPAC longitudinal study were collected. Data on family factors were obtained in pregnancy. Simulations were conducted to explain the impact of selective drop-out on the strength of prediction.
Results
Drop out from the ALSPAC cohort was systematic and children who dropped out were more likely to suffer from disruptive behaviour disorder. Systematic participant drop-out according to the family variables, however, did not alter the association between family factors obtained in pregnancy and disruptive behaviour disorder at 8 years of age.
Conclusions
Cohort studies are prone to selective drop-out and are likely to underestimate the prevalence of psychiatric disorder. This empirical study and the simulations confirm that the validity of regression models is only marginally affected despite range restrictions after selective drop-out.
|
|
|---|
|
|
|---|
![]() View larger version (18K): [in a new window] [as a PowerPoint slide] |
Fig. 1 Description of ALSPAC sample: flow chart.
|
Procedures
During pregnancy, and annually since then, detailed information about the
mothers and their partners has been collected via self-report questionnaire
with regard to medication, symptoms, diet and lifestyle, attitudes and
behaviour, and social–environmental
features.5 From 4
weeks after the birth of the child, mothers completed questionnaires about the
childs health, development and environment (biannually on average).
When the children were 7 years and 9 months, teachers were asked to complete the Development and Well-Being Assessment (DAWBA)6 as part of a study on disruptive behaviour disorders. The teacher version of the DAWBA is a brief structured questionnaire that covers the operationalised diagnostic criteria for the main disruptive behavioural disorders included in DSM–IV,7 namely oppositional defiant disorder, conduct disorder and ADHD. Thirty-nine cases were excluded where there were insufficient data from teachers for a diagnosis to be made.8
Data collection from teachers occurred over three academic years (1999, 2000 and 2001), with response rates varying from year to year. A minority of schools declined to participate (5%, 13% and 6% respectively) and some failed to respond to the invitation (17%, 37% and 16%), but the response rate from the schools who agreed to participate was high (80%, 99% and 80%) leading to an overall response rate of 62%, 50% and 63% for each year.
The following family-based risk factors were assessed during pregnancy: marital status (married v. single); education (any qualification v. no educational qualifications (i.e., no O-levels, professional qualifications or higher)); financial difficulties (yes v. no); family size (0–4; 5 or more children); smoking v. nonsmoking; critical partner relationship derived from the Family Adversity Index,9 (low affection and high aggression, physical or emotional cruelty, no partner social support v. not present); poor housing defects (a summary variable of three indicators: inadequacy; basic living; and defects/infestation present v. not present); crime (in trouble with police) or conviction of the mother or father (yes v. no); and psychopathology of the mother (affective disorder, suicide attempts v. none).
In addition, the childs gender and whether or not they were born prematurely (before 37 weeks gestation) was also recorded.
Statistical analysis
ALSPAC cohort
Data were collected on standardised forms that were returned to the study
centre and encoded for computer analysis using SPSS 12.0 on a PC. The data for
each child were double entered, checked and cleaned before being combined with
the main data-set for analysis. Current ALSPAC childrens prevalence of
disruptive behaviour disorders were compared with never ALSPAC children as
well as previous ALSPAC children diagnoses using categorical
2
tests (Question 1). Combining current and previous ALSPAC children provides an
approximate estimate of the prevalence that would be found in the original
ALSPAC cohort, excluding those who dropped out for whom we did not have
teacher data. The prevalence of disruptive behaviour disorder in this
total ALSPAC group was then compared with that in the never
ALSPAC group.
To determine whether participant drop-out was random or systematic,
previous ALSPAC children were compared with current ALSPAC children on factors
previously shown to predict disruptive behaviour problems (Question
2).10–12
Categorical outcomes were compared using
2 tests, and
continuous outcomes with the use of Mann–Whitney tests for ordinal data.
To determine the independent factors best predicting drop-out, all precursors
were entered into multiple logistic regression (outcome: previous ALSPAC
v. current ALSPAC) and individually adjusted for all other precursor
variables. To answer whether prediction models are still valid despite
participant drop-out, univariate logistic regressions were computed separately
for the current ALSPAC and previous ALSPAC children employing factors
previously reported to predict disruptive behaviour disorder (Question 3). The
outcome was any disruptive behaviour disorder (ADHD and behaviour disorders
combined) v. no disruptive behaviour diagnosis. Individual factors
assessed in pregnancy and previously reported to predict disruptive behaviour
disorder (i.e. male
gender,13
prematurity,14,15
socioeconomic
disadvantage,10
smoking in
pregnancy,11
critical partner
relationship,16,17
parents previous crime
involvement18,19
or maternal
psychopathology20
were entered as predictors of any disruptive behaviour disorder v. no
disorder in separate univariate regression analyses for the current ALSPAC
participants (260 with a positive diagnosis v. 3712 with no positive
diagnosis) and previous ALSPAC participants (72 with a positive diagnosis
v. 1058 with no positive diagnosis). To determine statistical
difference in prediction, previous and current ALSPAC (factor: group
membership) were combined and the interaction between group membership and
individual predictor was computed. None of the interaction terms should be
statistically significant if the prediction model did not differ between
current and previous ALSPAC children.
Simulations
A series of 36 simulations was carried out to explore the impact of
selective participant drop-out on the prediction of Y (disruptive behaviour)
from a predictor X. Of primary interest were simulations in which drop out and
disruptive behaviour were predicted by the same factor (X) (i.e. the drop-out
occurred by selection on a predictor X in regression) and the degree of
selection was varied between simulations. In each simulation, we generated a
sample of 5000 cases, which was then subjected to a drop-out process. Each
case i was characterised by a predictor value Xi
and a criterion value Yi, such that X and Y approximated a
bivariate standard normal distribution in the sample. The correlation between
X and Y varied between simulations, in the range of 0.1–0.9, in steps of
0.1 (note that, because the variables were standardised, the Pearson
correlation coefficient is identical to the linear regression coefficient in
an ordinary least-squares model). For each correlation level, we simulated
four stochastic drop-out processes, which differed in selectivity (although
keeping the overall drop-out rate constant). We used the following drop-out
rule:
![]() |
i was the probability that case i
was dropped from the sample, and
was a scaling parameter that was
manipulated between simulations. The general form of this logistic rule is
shown in Fig. 2. For each value
of
, the expected proportion of dropped cases is 0.5. In all the
simulations, the proportion of dropped cases was within the 0.49–0.51
range. The drop-out process was more selective (i.e. dependent on the value of
X) for lower values of
. Across a typical simulated 5000-case sample, the
point-biserial correlations between X and a binary drop-out indicator were
0.10, 0.42, 0.61 and 0.78 for
values of 5, 1, 0.5, and 0.1 respectively,
confirming the high selectivity of drop-out for the lower values of
.
![]() View larger version (12K): [in a new window] [as a PowerPoint slide] |
Fig. 2 Probability of dropping out ( ) as a function of X, for different
values of .
|
|
|
|---|
|
View this table: [in a new window] | Table 1 Prevalence of disruptive behaviour disorder diagnoses according to cohorta |
Is drop-out selective or random?
The comparisons between the current and previous ALSPAC children are shown
in Table 2. Drop-out from
ALSPAC was systematically related to having a mother who was single, had no
educational qualifications, encountered financial difficulties, being raised
in a large family where the mother smoked, had a poor relationship with the
partner, lived in poor housing, had been involved in crime and been convicted
or suffered psychopathology during pregnancy. When prediction was adjusted for
all other factors, being single (odds ratio (OR) = 1.45, 95% CI
1.19–1.77), family size (OR = 3.17, 95% CI 1.55–6.46), smoking
(OR) = 1.41, 95% CI 1.15–1.73), no educational qualifications (OR) =
1.35, 95% CI 1.07–1.71) and financial difficulties (OR) = 1.39, 95% CI
1.07–1.81) remained significant independent predictors of drop-out.
|
View this table: [in a new window] | Table 2 Prediction of drop-out (current v. previous ALSPAC participants) |
Does drop-out reduce the validity of prediction of disruptive behaviour disorder?
Disruptive behaviour prediction with the ALSPAC data
The same variables that were related to the drop-out process were used as
predictors for the disruptive behaviour disorder criterion. The individual
predictors and the magnitude of prediction were very similar for the previous
and current ALSPAC groups. Teacher-reported disruptive behaviour disorder in
middle childhood was more likely when parents had low education, financial
difficulties or critical partner relationships, when the mother had
psychopathology or smoked in pregnancy, and for boys
(Table 3). There were no
significant interactions between group membership (previous ALSPAC v.
current ALSPAC) and individual predictors (e.g. financial difficulties) when
predicting the presence or absence of disruptive behaviour disorder, i.e. the
same predictive model seemed to apply equally well to previous and current
ALSPAC participants.
|
View this table: [in a new window] | Table 3 Simple univariable prediction of disruptive behaviour disorder for the current ALSPAC and previous ALSPAC children (those who have dropped out) using factors assessed during pregnancy |
The simulations
Figure 3 gives an overview
of the observed correlations between X and Y before and after the drop-out
process in the simulations in which drop-out was selective on X. The results
show that the drop-out process related to X has an effect on the correlations
between X and Y. Figure 3 shows
that in all simulations the correlation between X and Y was reduced in all
simulations, and that the suppression effect was somewhat larger for the more
selective drop-out processes (i.e. in those simulations in which
was
small).
![]() View larger version (13K): [in a new window] [as a PowerPoint slide] |
Fig. 3 Correlation between predictor X and criterion Y before and after drop-out,
as a function of .
|
= 0.1). The plot shows that the
variance in the sample was reduced on both predictor (X) and criterion
variable (Y). However, the non-standardised slope of the best-fitting
regression line was practically unaltered by the drop-out process. The
correlation (which corresponds to the standardised regression coefficient),
was reduced from 0.90 to 0.78 after drop-out, as can be seen in
Fig. 4.
![]() View larger version (16K): [in a new window] [as a PowerPoint slide] |
Fig. 4 Simulated effect of selective drop-out according to the predictor variable
X on least-squares linear regression model. X = predictor, Y = criterion. (a)
before drop-out and (b) after drop-out.
|
The simulations demonstrate that selection on X in a regression has the effect of reducing the variance in X (and Y) and attenuates the correlation between X and Y. As shown here, the effects of selective drop-out in X on predictor–criterion correlation (and, by implication, regression) can be relatively small, even under a highly selective drop-out regime. Range restriction as a result of selective drop-out does not necessarily affect the validity of a regression model, although it can lead to underestimation of the criterion–predictor correlation.
It is important to note that drop-out by selection on the criterion
variable (Y) can have a very different effect on the regression coefficients.
Figure 5(a) shows an example,
based on the same original simulated sample as in
Fig. 4 (r = 0.90
before drop-out,
= 0.1). Figure
5(a) shows that both the regression and correlation coefficient
were reduced as a result of drop-out of participants with higher scores on the
criterion variable (r = 0.79 after drop-out).
Figure 5(b) provides an example
in which there was selective drop-out on both the predictor (X) and the
criterion variable (Y), with participants that scored highly on both variables
more likely to drop out. Drop-out that was selective on both variables
suppressed the regression coefficient (but less so than in the example in
which drop-out was selective on the criterion only) and also reduced the
correlation between predictor and criterion (r = 0.77 after
drop-out).
![]() View larger version (13K): [in a new window] [as a PowerPoint slide] |
Fig. 5 Simulated effect of selective drop-out (a) after drop-out according to the
criterion variable y on least-squares linear regression model and (b) after
drop-out according to the predictor variable x and criterion variable y on
least-squares linear regression model. X = predictor; Y = criterion.
|
|
|
|---|
Selective drop-out and prevalence
Drop-out was considerable, with teacher returns on 37% (5115/13 971) of
those believed to be alive or 49% (5119/10 431) of those eligible to be
contacted. We only consider here the response to one particular assessment
during the eighth year of life of the child. The participation rate is higher
for any contact in a given year, whether for face-to-face assessments or other
questionnaires.5
Overall, the follow-up rate is similar to recent comparable large-scale
longitudinal studies with repeated
assessments.23,24
In general, participation rates are higher in older cohorts enrolled some
decades
ago,25,26
for studies focused on specific high-risk samples in the first
place2,27
or for samples that were small and
selective.28
The attrition from the sample we studied was systematically related to family characteristics, which supports the conclusions of previous work3,4,27 that psychosocial factors are associated with attrition in longitudinal studies. The selective drop-out of participants had an impact on the prevalence of teacher-reported disruptive behaviour disorders, with the prevalence among children who were still participating being approximately half that of children who had dropped out. The factors that influenced retention in the ALSPAC sample also influenced the likelihood of disruptive behaviour disorder, i.e. the missingness was non-ignorable.29 Longitudinal studies are likely to underestimate the prevalence and incidence of disorders as shown here and elsewhere.30 Cross-sectional studies requiring only one single assessment are likely to be a more accurate in estimating prevalence.31
Selective drop-out and prediction
Finally, we investigated whether selective drop-out of participants does
reduce the validity of prediction from longitudinal analysis. Prospective
studies can only rely on the data of the individuals who continue to
participate or they have to estimate missing data using sophisticated missing
value substitution modelling and
imputations.32,33
To our knowledge, this is the first investigation that could compare the
prediction of outcomes of current and previous participants in a prospective
study. We found that selective drop-out of participants according to a range
of predictor variables did not invalidate the prediction of teacher-reported
disruptive behaviour disorders by factors that were assessed as early as
pregnancy and birth that have previously been shown to predict these
difficulties.10–12
Boys were significantly more likely to develop teacher-reported disruptive
behaviour disorder, as were the children of mothers who suffered
psychopathology or smoked during pregnancy, who had poor partner relationships
or who were single, poorly educated or suffered financial
hardship.34 These
same predictions were found for those who were still participating in the
ALSPAC study as well as for those who had dropped out. Despite reduction to a
super-normal current ALSPAC sample, the predictive factors and their strength
were about the same as for the previous participants. Contrary to common
assumptions,21,35
the presence of a substantial selection bias did not markedly attenuate the
relationship between exposure and outcome in this study. Although prevalence
rates do have an impact on statistical power, differences in prevalence
per se did not alter prediction in this instance. Similarly, Moffitt
and colleagues13
investigated factors suspected to predict disruptive behaviour disorders in a
sample of approximately 1000 children, half of them girls. They found, that
despite girls being much less likely to develop disruptive behaviour disorder
(low prevalence), the same factors predicted disruptive behaviour problems in
both girls and boys.
We conducted simulations to explain why the effects of selective participant drop-out on predictor–criterion correlation (and regression) were relatively small in our empirical study. We found that a range of social and parental variables previously described as precursors of disruptive behaviour disorder in children affected the drop-out process. The simulations confirmed that if the selection is on X in a regression, the effect is one of reducing the variance in X (and Y), not affecting the regression but attenuating the correlation between X and Y (see Berk,36 p. 389). Our simulations add that even under a highly selective drop-out regime related to X, the overall reduction in the correlations is small to moderate (Fig. 3). Therefore, range restriction as a result of selective drop-out according to X does not affect the internal or external validity of the regression model,36 although the correlation coefficient after selective drop-out may underestimate the true correlation between the predictor and criterion variable.
In our empirical ALSPAC study, we see little evidence that teachers selectively underreported on children with disruptive behaviour. It seems unlikely that teachers would have been less likely to report on those with more disruptive behaviour since teachers are usually well aware of those who disturb lessons.37 Nevertheless, we carried out a second set of simulations (examples shown in Fig. 5(a) and (b)) that showed that if selection on the criterion (Y) had occurred (i.e. if those with high disruptive behaviour disorder were less likely included in the sample), then the regression would be attenuated and the original regression line would no longer fit the data. Although confirming that the internal and external validity are weakened in these circumstances36 and the true relationship between X and Y is systematically underestimated, our simulation also demonstrated that when drop-out is influenced by the predictor as well as the criterion variable, this only mildly reduces estimates of the slope of the true regression line.
We conclude that the regression coefficients hold for the current, previous and entire cohort due to the fact that, despite selection bias on X (and thus restricted range), the differences between the current and previous groups with disruptive behaviour disorder are small. Where the predictor variables have small to moderate (linear) associations with both, the drop-out of participants and the outcome variable, the impact on the predictor–criterion regression is small. However, if the drop-out process is dependent on the criterion variable (e.g. high scorers systematically excluded), then internal and external validity is threatened and the true relationship between predictor and criterion can no longer be estimated reliably. Particularly in cases where the selection process follows a complex pattern (e.g. with dependencies on several variables or non-monotonic dependencies; see Berk36 for a full discussion) internal and external validity are under threat.
Limitations and conclusions
There are limitations to our study. Even fewer teachers than parents
completed the diagnostic instrument in the current sample and this itself
could have introduced bias. For example, teachers may have been more likely to
complete the DAWBA in well-organised affluent schools. However, as these
schools are also likely to have a lower prevalence of disruptive behaviour
disorders, and possibly better strategies of managing them, this would be
likely to lower prevalence across all three groups. Teachers would not have
been aware of which children were or had been participants in ALSPAC when they
completed measures on all the children in their class. Our diagnosis of a
disruptive behaviour disorder was based only on teacher reports, although the
limitation of having just one informant is partly offset by the fact that
teacher reports are particularly informative for diagnoses for externalising
disorders.37
Nevertheless, our findings may not be applicable to diagnoses of a disruptive
behaviour disorder based on parent data, self-report data or multi-informant
data, or indeed to other outcomes within this or other studies.
In conclusion, participant loss in the ALSPAC cohort was systematic, with children with teacher-reported disruptive behaviour disorder being more frequently lost to follow-up. Our results suggest that longitudinal studies are likely to underestimate the prevalence and incidence of disorders,4 but that this might not negate findings in relation to the predictors of disorder if selection occurs according to the predictor variables. Our results need replication in relation to other cohorts and other outcomes. However, the simulations indicate that despite highly selective drop-out as a result of X and reduced range in both predictor and outcome variables, the regression parameter estimates are only mildly affected. Our demonstrations do not imply that selective drop-out is always harmless. For instance, selective drop-out effects can have significant implications if the selection is according to the outcome variable, if the drop-out process is complex or incidental36 or there is a non-linear relation between predictor(s) and criterion. In such cases, explicit modelling of the drop-out process (e.g. Diggle & Kenward38 and Little39) might help to clarify the implications of drop-out for model validity. Nevertheless, although everything should be done to reduce participant loss in cohort studies,40,41 it is reassuring to find that aetiological models from longitudinal samples can be valid and robust under specific conditions of selective loss of participants.
|
|
|---|
|
|
|---|
Related articles in BJP:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||