Background Worldwide, 340 million people are affected by depression, with high comorbid, social and economic costs.
Aims To identify potential predictors of effect in prevention programmes.
Method A meta-analysis was made of 69 programmes to reduce depression or depressive symptoms.
Results The weighted mean effect size of 0.22 was effective for different age groups and different levels of risk, and in reducing risk factors and depressive or psychiatric symptoms.Programmes with larger effect sizes were multi-component, included competence techniques, had more than eight sessions, had sessions 60–90 60–90 min long, had a high quality of research design and were delivered by a health care provider in targeted programmes. Older people benefited from social support, whereas behavioural methods were detrimental.
Conclusions An 11% improvement in depressive symptoms can be achieved through prevention programmes. Single trial evaluations should ensure high quality of the research design and detailed reporting of results and potential predictors.
Unipolar major depression is predicted to become the second leading cause of disease burden worldwide by the year 2020, accounting for 12% of disability-adjusted life-years (Murray & Lopez, 1996). Depression is comorbid with other psychiatric harm, including substance use disorders (Rohde et al, 1991; Gilvarry, 2000), and anxiety and personality disorders (Andrews et al, 2001). Personality, low competence, vulnerability to stress, and marital, occupational, financial and neighbourhood stressors are some of the risk factors for depression, and can be used to identify potential target groups for prevention (Gillham et al, 2000).
Can depression be prevented?
Since the 1980s there has been increased development and implementation of universal, selective and indicated programmes (Mrazek & Haggerty, 1994) that aim to reduce risk factors for depression, depressive symptoms and depressive disorders. Universal prevention interventions target a whole population group that has not been identified on the basis of increased risk; selective prevention targets subgroups of population whose risk of developing a mental disorder is significantly higher than average, as evidenced by biological, psychological or social risk factors; indicated prevention targets high-risk persons who are identified as having minimal symptoms foreshadowing mental disorder or biological markers indicating predisposition for mental disorder but do not meet diagnostic criteria for disorder at that time. Although there is substantial evidence that depressive symptoms can be reduced (Muñoz et al, 1993; Gillham et al, 2000), only a few programmes have shown that depression can be prevented (Clarke et al, 1995,1995, 2001). This meta-analysis aims to identify some of the population, programme and research design characteristics that predict the effect of primary prevention programmes targeting depression.
Search procedures and inclusion criteria
Trials were identified through systematic literature searches in four databases (Current Contents, ERIC, Medline, PsycINFO), published meta-analyses, review articles, reference lists and through contact with members of the Society for Prevention Research. Key words for the literature searches were divided into five groups (details available from the author upon request). Two groups aimed to identify studies dealing with depression and mental health: resilience and other protective mental health factors related to depression (e.g. MASTERY, SELF-ESTEEM), and specific risk factors for depression (e.g. NEGATIVE THINKING, DYSTHYMIA). Three groups were defined to restrict the hits to the inclusion criteria: prevention and outcome measures (e.g. UNIVERSAL, SELECTIVE, DEPRESSIVE SYMPTOMS), evaluation (e.g. EFFICACY, CONTROL GROUP) and research design (e.g. RANDOMISED CONTROLLED TRIAL).
Studies were selected within the definitions of universal, selective and indicated prevention, excluding any pharmacological intervention, if they included the prevention of depression as a primary or secondary goal or outcome, the improvement of protective factors for depression or mental health (e.g. self-esteem) or the reduction of risk factors related to depression (e.g. negative thinking). Selection was restricted to English-language publications between the years 1985 and 2000 that could be retrieved through the library system, had either a randomly allocated control or equivalent comparison group, had pre–post measures, had objective outcome measures and had sufficient statistical information to calculate an effect size. From the selected trials, only those that had depressive symptoms or incidence of depression as an outcome measure were included for analysis.
Coding system and procedures
A coding instrument was developed to operationalise programme outcomes and hypothesised effect predictors. The coding instrument included trial descriptors, target group characteristics, programme characteristics, programme development characteristics, implementation characteristics, quality of the research design, and outcome indicators. The coding system comprised a code book with codes for each variable, a coding sheet to operationalise the information, and a coding instructions book with definitions and instructions (details available from the author upon request).
A trained coder and E.J.-L. undertook the coding process. Measures to minimise bias were taken into account, such as coders’ training (Cooper & Hedges, 1994). A random sample of one in five trials was double-coded to assess interrater reliability. The kappa coefficient averaged across codes was 0.91, indicating excellent agreement beyond chance between the two coders (Cooper & Hedges, 1994). Data were entered into a Statistical Package for the Social Sciences (Version 10) data file, checked and cleaned to control for data entry and coding errors.
Calculation of effect sizes and weighted effect sizes
An effect size estimate using the standardised mean difference (Hedges & Olkin, 1985: p. 79, formula 3) was calculated from the published data for every outcome measure reported, corrected for pre-test measures and small sample sizes (Lipsey & Wilson, 2001). Positive effect sizes indicate improvement for the intervention group. Each effect size was weighted based on the inverse of its variance (Hedges & Olkin, 1985: p. 86, formula 15). For multiple programmes within one study, weights were calculated according to Gleser & Olkin (1994: p. 346, formula 22.13). Weighted mean effect sizes (Hedges & Olkin, 1985: p. 111, formula 6) and 95% confidence intervals were calculated (Hedges & Olkin, 1985: p. 86, formula 16).
Unit of analysis and sample heterogeneity
The unit of analysis in this study is at the programme level, where a programme was defined as an intervention with a preventive goal and a measure of depressive symptoms. When the efficacy of more than one programme within a study was compared with a control condition, the different interventions were treated independently and an effect size was calculated for each (Gleser & Olkin, 1994: p. 346, formula 22.13). Effect sizes were averaged within each programme across outcome measures and follow-up times, resulting in one effect size per programme.
The Q statistic was calculated to test for heterogeneity (Hedges & Olkin, 1985). The sample was heterogeneous (Q=474.72, d.f.=69, P<0.001) and characteristics of the programmes, target groups and research methodology were examined as independent variables to account for this heterogeneity (Lipsey & Wilson, 2001).
Predictors in the study
Gender was coded as a continuous variable, as the percentage of male participants in the programmes. Initial level of risk was defined as universal, selective or indicated (Mrazek & Haggerty, 1994). The duration of programmes was coded in months, and the length of individual programme sessions in minutes. The quality of the research design was assessed with the Cochrane nine-item dichotomous scale (items scores 1 or 0); high-quality programmes were considered to be those with a score of 8 or above (Brown et al, 2000). Intervention methods were classified into one of five groups: behaviour (e.g. behaviour change, pleasant activities, modelling); cognition (e.g. cognitive restructuring, counselling, explanatory-style training); competence (e.g. broad skill training, social resistance skills); education (e.g. direct instruction, lectures and workshops); and social support (e.g. network building, fostering socialisation). Programme providers were divided into healthcare personnel (physical and mental health professionals) and lay personnel (peers, family members, schoolteachers).
Statistical analyses to test hypotheses
The z-scores were used as significance tests to compare the weighted mean effect sizes of the different values of the categorical independent variables. Weighted least squares regression analyses were used to identify possible relationships and interactions between predictors in explaining the variation in effect sizes between studies for continuous and categorical independent variables, recoded to dummy variables. Each unweighted programme effect size was corrected by its weight as defined above. Regression coefficients were obtained and the adjusted R2 was used to measure the proportion of variance accounted for by an independent variable. The standard errors of the regression coefficients (B) were corrected according to Hedges & Olkin (1985: p. 174) and used in a z test. Separate regression models were built, first testing for main effects and then for interaction effects of the target group characteristics gender, age and level of risk, which had been identified in previous research as possible moderators of effect (Price et al, 1992,Price et al, 1992; Gillham et al, 2000). Scatter plots and plots of residuals found no evidence for violation of regression model assumptions. All the tests for statistical significance were based on two-tailed tests.
Trial flow and distribution of effect sizes
The searches identified 1474 publications. Screening on the basis of titles and abstracts reduced the number of publications according to the inclusion criteria to 201. Detailed inspection of the retrieved 201 publications reduced the sample to 108 studies that reported sufficient information to calculate an effect size (excluding those with missing data) and that had either a randomly allocated control or equivalent comparison group. From the 108 studies, only 54 trials reported an outcome measure for depressive symptoms, 11 of which compared more than one type of programme, resulting in 69 programmes for analysis. Table 1 provides a description of the included programmes. The effect sizes of the 69 programmes were normally distributed. The unweighted mean effect size was 0.25 (range 71.08 to 1.8; 95% CI 0.16–0.35) and the weighted mean effect size was 0.22 (95% CI 0.14–0.30).
About a quarter of the programmes (16) targeted children, 9 targeted adolescents, almost a half (32) were aimed at adults and 12 were for older people (Table 2). There was no significant difference in effect size between the different age groups. There was also no significant difference in effect size between universal, selective and indicated programmes. Of the 63 programmes in which gender distribution was specified, weighted least squares regression analyses indicated a direct positive relationship between percentage of male participants and effect size (Table 3). There was an interaction between percentage of male participants and level of risk, so that the relationship between percentage of males in the programme and effect size was present for universal and selective programmes, but not for indicated programmes (Table 3).
Programmes with more than eight sessions were significantly better than those with eight sessions or fewer. Programmes with session lengths of 60–90 min were significantly better than those with sessions lasting less than 60 min or longer than 90 min. No significant difference was found for duration of programmes or distribution of sessions (Table 4).
There were 102 programme providers reported in 62 programmes (Table 5). Programmes that used a combination of health care professionals and lay personnel had the largest effect sizes. Programmes provided by health care professionals (physical and mental health personnel) and those provided by both health care and lay personnel yielded significantly larger effect sizes than programmes provided by lay personnel alone. Programmes provided by health care professionals had larger effect sizes than programmes run by lay personnel only for selective (z=2.04, P=0.045) and indicated populations (z=2.37, P=0.016), but the difference was not significant for universal populations (z=1.90, P=0.057).
Methods and techniques
Programmes that involved a competence enhancement component yielded the largest effect sizes, whereas programmes including behavioural methods yielded the lowest effect sizes (Table 6). When analysed by age group, the worse performance of programmes that included behavioural methods was present for all age groups, although it was significant only for the older population (with a behavioural component, the weighted effect size (WES) was –0.10; without, WES=0.95; z=7.14, P<0.001). Programmes that included competence enhancement techniques did significantly better than those that did not include them. Programmes that included social support did generally worse than those that did not, except for the older group, for which social support programmes yielded larger weighted effect sizes (with social support, WES=0.92; without, WES=–0.12; z=7.13, P<0.001). Programmes that included three or more different types of methods were significantly better than those that included only one or two.
Research methodological characteristics
The programmes with a high quality of research design were significantly more effective than those of low quality (Table 7). Programmes that reported attrition rates were significantly better than those that did not. Programmes rated as having a well-defined intervention were better than those that did not.
Changes in risk factors and symptoms
The outcome measures of each programme were subsequently divided into an averaged measure per programme indicating changes in risk factors (n=49), changes in depressive symptoms (n=69), and changes in psychiatric symptoms other than depression, such as anxiety (n=51). Measures for each group were averaged across programmes to obtain three mean effect sizes, one per group. There was no significant difference in effect size between depressive symptoms (WES=0.24, 95% CI 0.13–0.35), risk factors (WES=0.28, 95% CI 0.15–0.41) 0.15–0.41) and other psychiatric symptoms (WES=0.18, 95% CI 0.09–0.27), all three of which had significant and independent positive outcomes. Weighted mean effect sizes were further subdivided within these three groups into universal, selective and indicated approaches (Fig. 1). Comparisons of means indicated no significant difference between the type of preventive approach and changes in depressive symptoms, risk factors and changes in other psychiatric symptoms.
Limitations of the study: sampling bias
Caution is needed in interpreting metaanalytical findings because of the potential upward bias of the mean effect size (Williams & Garner, 2002), which can be examined on a funnel plot (Begg, 1994). The graph indicated no evidence of publication bias (Fig. 2). The fail-safe N is an estimate of the number of unpublished studies reporting null results needed to reduce the cumulated effect to the point of non-significance (Wolf, 1986). The calculation of the fail-safe N to a criterion level of 0.1 resulted in a figure of 104 studies that would need to be included to reduce the effects to this criterion level. As the availability of such a large number of studies with null effects is unlikely, we assume that the studies included in the analysis are reasonably representative of the mean effect size.
Effective prevention and variation in outcome
Consistent with earlier meta-analyses for mental health promotion (Durlak & Wells, 1997; Tobler & Stratton, 1997; Brown et al, 2000), our meta-analysis found a weighted mean effect size of 0.22. This is equivalent to an 11% improvement in the intervention groups compared with the control groups. Effect sizes for prevention programmes tend to be smaller than those of treatment, largely because prevention applies the same strategies to a population group that might or might not be at risk for a later mental health problem. However, from a public health perspective the prevention strategy can be cost-effective, as a small effect size in a large number of people can lead to a greater population gain than a large effect size in a small number of people (Rose, 1993).
What leads to increased effects in depression prevention?
There was a large variation in programme outcomes. Subsequent analyses aimed to identify what might predict this variation.
There was a relationship between the percentage of male participants in universal and selective programmes and effect size, but not in indicated programmes. This finding is consistent with some within-trial findings (Gillham et al, 1995) but not with others (Seligman et al, 1999,Seligman et al, 1999). It is possible that indicated programmes, which target specific problems with focused techniques, are more tailored to depressive symptoms and disorder than universal and selective programmes, and also take into account gender differences in the development of the programme. The results should be interpreted with caution, because the relationship is between the proportion of male participants in the programme and the effect size, not the actual effect size for each gender subgroup, which unfortunately is rarely reported. The results stress the importance of analysing and reporting gender differences in single trial evaluations to understand gender-specific programme effectiveness.
Initial level of risk
No difference was found between universal, selective and indicated programmes. There has been a marked preference for targeted interventions for depression prevention, because of evidence in reducing symptoms and incidence (Clarke et al, 1995,Clarke et al, 1995) and because subgroups identified at increased risk have seemed to benefit the most (Price et al, 1992,Price et al, 1992; Gillham et al, 1995). However, evidence has also accumulated that universal preventive interventions can be beneficial for those at risk, because of lowered stigma and better socialisation (Kellam et al, 1998; Reid et al, 1999). The results of our analysis have supported both these directions and there seems merit in interventions that combine both universal and targeted prevention (Conduct Problems Prevention Research Group, 2000).
Number and length of sessions
Research has focused on testing the efficacy of shortened versions of existing prevention programmes (Muñoz et al, 1993). The results of our analysis indicated that programmes with more than eight sessions and programmes with session lengths of 60–90 min yielded the larger effect sizes. The number of sessions is relevant for participants’ ability to internalise methods and processes offered by the interventions; fewer than nine sessions might not be enough. The length of sessions is important because of the group focus of prevention, where sufficient time needs to be allocated for interaction and group processes; less than an hour might not allow participants to feel engaged in a group process.
The promise of competence enhancement techniques
In addition to cognitive techniques (Price & Bennett Johnson, 1999; Seligman et al, 1999,Seligman et al, 1999; Gillham et al, 2000; Clarke et al, 2001), competence methods were also found to be effective across different age groups. Programmes that included behavioural techniques were detrimental for the elderly and were not superior for the other age groups. Programmes that combined three or more intervention methods were more effective than those that did not, suggesting the importance of multi-component programmes.
Lay personnel have been proposed as potential efficient programme providers for preventing depression (Muñoz et al, 1993). However, our meta-analysis found that lay personnel alone were not the best providers for selective and indicated programmes. The specificity and severity of depression in targeted populations who are already experiencing risk factors or symptoms may require trained personnel who are aware of and skilled in dealing with depressive symptoms.
Quality of the research design
Consistent with earlier findings (Tobler & Stratton, 1997; Brown et al, 2000) high-quality research trials were predictive of better outcomes. Well-defined intervention aims and accounting for attrition rates were independent predictors of effect size. Well-defined aims have already been identified as effect predictors in health promotion (Kok et al, 1997). Reporting attrition rates might indicate a deeper analysis of intervention effects, and studies that do so might be more likely to have accounted for patient withdrawal at the outset, and to have provided the target group with incentives to continue in the programme.
Changes in depressive outcomes, risk factors and other psychiatric symptoms
Simultaneous positive changes in risk and protective factors and in related psychiatric symptoms (e.g. anxiety) were found in addition to the reductions in depressive symptoms, indicating the multiple outcome potential of prevention programmes. However, despite the evidence that prevention programmes can reduce depressive symptoms for both universal and targeted populations, few have demonstrated that the incidence of depression can be reduced (Clarke et al, 1995,1995, 2001). There is an urgent need for further trials of sufficient power to study the impact of preventing the onset of depression and the role of moderating and mediating variables.
Clinical Implications and Limitations
Prevention programmes to reduce depressive symptoms can lead to an 11% improvement in the intervention groups compared with control groups. However, the large variation in outcome stresses the importance of implementing only practices for which there is evidence of effect.
Health and mental health care providers should be informed and provided with training in interventions to reduce and prevent depressive symptoms for targeted populations.
Programmes that do not primarily target depression can lead to reductions in depressive symptoms, although unfortunately this is not often measured. When making choices for implementation, programmes targeting common risk and protective factors in addition to those focusing on depressive symptoms could lead to larger gains in other associated symptoms and disorders.
The inclusion criteria set for this study and insufficient information reported in single trial evaluations might have excluded other programmes that have targeted depressive symptoms.
Although there was no evidence of publication bias, as with all meta-analyses, caution should be taken into account when interpreting the results because of non-included studies and non-reported findings.
The gender result should be treated with caution because the finding is related to the proportion of male participants and the programme effect size, not the actual effect sizes of the two gender subgroups, which unfortunately are rarely reported.
This research was supported by the Dutch Health Research and Development Council (ZON), grant number 2200.0020.
We express our deep appreciation to Dr Hendricks Brown for his statistical and methodological support and feedback during the different research phases of the project. We also thank Sietske van Haren, the second coder, for her dedicated input during the coding process and Rianne Kassander for her support during the revision of the paper.
- Received November 19, 2002.
- Revision received April 14, 2003.
- Accepted May 6, 2003.
- © 2003 Royal College of Psychiatrists