Effect of exercise on depression severity in older people: systematic review and meta-analysis of randomised controlled trials

Christopher Bridle , Kathleen Spanjers , Shilpa Patel , Nicola M. Atherton , Sarah E. Lamb
  • Declaration of interest




The prevelance of depression in older people is high, treatment is inadequate, it creates a substantial burden and is a public health priority for which exercise has been proposed as a therapeutic strategy.


To estimate the effect of exercise on depressive symptoms among older people, and assess whether treatment effect varies depending on the depression criteria used to determine participant eligibility.


Systematic review and meta-analysis of randomised controlled trials of exercise for depression in older people.


Nine trials met the inclusion criteria and seven were meta-analysed. Exercise was associated with significantly lower depression severity (standardised mean difference (SMD) = –0.34, 95% CI –0.52 to –0.17), irrespective of whether participant eligibility was determined by clinical diagnosis (SMD = –0.38, 95% CI –0.67 to –0.10) or symptom checklist (SMD = –0.34, 95% CI –0.62 to –0.06). Results remained significant in sensitivity analyses.


Our findings suggest that, for older people who present with clinically meaningful symptoms of depression, prescribing structured exercise tailored to individual ability will reduce depression severity.

Depression is the most common mental illness among older people, and is associated with increased morbidity, premature mortality and greater healthcare utilisation.13 Treatment of depression is inadequate for most older people, being complicated by poor recognition and an increased prevalence of medication side-effects, polypharmacy and poor adherence to treatment.46 Depression is predicted to become the leading cause of disease burden among older people by 2020,7 at which time one in five of the population will be aged over 60 years.8 Effective treatment of depression in older people is a salient public health priority, for which exercise has been increasingly evaluated as a potential therapeutic strategy.9,10 Findings from recent reviews, however, are difficult to interpret clinically, since they reflect qualitative syntheses of evidence from randomised and non-randomised trials, and trials in which pre-existing depression has not been an eligibility criterion.11,12 There is uncertainty concerning the effect of exercise on depression among older people with clinically significant symptoms of depression. The aim of this study was to provide a clinically meaningful synthesis of evidence to support treatment decisions. The primary objective was to estimate the effect of exercise on depression severity among older people with clinically significant symptoms of depression. The secondary objective was to investigate any potential variation in treatment effect among pre-specified subgroups of the study stratified by depression eligibility criteria, specifically the selection of participants according to clinician-diagnosed depression or a symptom checklist threshold.


Eligibility criteria

Studies were considered for inclusion if they were randomised controlled trials (RCTs) of exercise interventions for depression among older people. A trial was accepted as a RCT if the allocation of participants to treatment and comparison groups was reported as randomised. Studies were considered for inclusion if the sample mean age was ⩾60 years. Setting the minimum age criterion at ⩾60 years is consistent with previous reviews11,12 and the World Health Organization’s classification of older age,13 and linking age to the sample, rather than the individual, recognises that trials vary in the use and precise specification of the minimum age criterion. The review included studies in which participant eligibility required pre-existing depression determined by a clinically valid method of assessment, such as a clinical interview, clinician diagnosis or symptom checklist threshold. Trials of any exercise intervention compared with any concurrent control were eligible. Exercise was defined as any planned or structured movement of the body performed systematically in terms of frequency, intensity and duration. Included trials reported depression as an outcome assessed at follow-up of ⩾3 months.

Study identification

To identify relevant published, unpublished and ongoing trials, as well as existing systematic reviews, the following electronic databases were search from inception to January 2011: CDSR, DARE, UK-NRR, CCT, HSRProj, CENTRAL, Medline; Embase, PsycINFO, SSCI, SportsDiscus, AMED, CINAHL, BioMed Central, HealthPromis, Index of Conference Proceedings, Theses, SIGLE and GreyLit. Search parameters were adapted to database requirements, and combined exploded MeSH terms and text words related to exercise, depression and age (see online supplement). The bibliographies of all included studies and review articles were screened for further references. Search results were recorded to bibliographic software, and two reviewers independently screened each citation for potential relevance against eligibility criteria. For all potentially relevant citations, full-text papers were obtained and assessed against eligibility criteria by two reviewers independently, with disagreements resolved by discussion.

Data abstraction

Data were extracted by one reviewer and checked for accuracy by another, using a template that included: (a) design, for example depression eligibility criteria, sample size and recruitment context; (b) participants, for example age, gender and baseline depression; (c) intervention, for example type, frequency and format of exercise; (d) outcome, for example depression measure, follow-up schedule and depression severity (mean and standard deviation) for each group at each follow-up; and (e) process, for example number of eligible patients invited and, for the exercise group, adherence, including the criteria used and the level achieved.

Two reviewers independently assessed risk of bias in each trial according to the adequacy of sequence generation, allocation concealment, masking of outcome assessors, completeness of follow-up and analysis by intention to treat. Each component was assessed as either adequate, inadequate or unclear, using Cochrane risk of bias criteria.14 Risk of bias in each study was classified as either low (all criteria graded adequate), moderate (one criterion graded inadequate, or two graded unclear) or high (two or more criteria graded inadequate, or more than two graded unclear).

Data analysis

All analyses were conducted using Review Manager version 5.1 software for Windows. All trials reported depression as a continuous outcome, but different measurers were used in the assessment. Thus, the summary measure of treatment effect was the between-groups difference in mean severity of depression, expressed as a standardised mean difference (SMD) using Hedges’ (adjusted) g, which includes a correction term for sample size bias.15 Statistical heterogeneity was assessed by the I2 test, which describes the percentage of variability among effect estimates beyond that expected by chance. Heterogeneity can be considered as unlikely to be important for I2 values up to 40%.14 In the absence of statistical heterogeneity (I2 = ⩽40%), individual effect sizes were combined statistically using the inverse variance random-effects method, which assumes that true effects are normally distributed. The random-effects model is more conservative than the fixed-effect model since, by incorporating both within- and between-study variance, confidence intervals for the summary effect are wider. Risk of small study bias was assessed by visual assessment of funnel symmetry in the plots of each trial’s SMD against its standard error (s.e.).14

The effect of exercise on depression severity was estimated in pre-specified subgroups of the study stratified by depression eligibility criteria. Specifically, we distinguished between trials in which participant eligibility was dependent on either satisfying clinical diagnostic criteria for depression or achieving a threshold score on a depression symptom checklist. The robustness of results was assessed in separate sensitivity analyses that excluded trials with moderate or high risk of bias, non-active or no intervention control comparators and end-points within rather than beyond the intervention period.


After removal of duplicates, the search strategy identified 2933 distinct citations, of which 2757 (94%) were excluded during the initial screening phase (Fig. 1). For the remaining 176 citations, full-text papers were ordered, obtained and independently assessed against the eligibility criteria, with five discrepancies resolved by discussion (97% agreement, k = 0.75). Nine studies met the inclusion criteria.1624 The main reasons for exclusion of full-text papers were use of non-randomised designs, primary end-points less than 3 months and depression not required for participant eligibility.

Fig 1

Flow diagram of study selection.

a. Some studies excluded for multiple reasons.

Characteristics of included studies

Of the nine included trials (Table 1), four were conducted in the USA,1619 and one each in the UK,20 Australia,21 New Zealand,22 China23 and Hong Kong.24 Three trials were explicitly identified in the study report as being either feasibility,21 pilot16 or efficacy studies.18 The nine trials randomised 667 participants (69% female), with sample size ranging from 14 to 193. The mean age of trial populations ranged from 65 years20 to over 80 years.19,22,24

View this table:
Table 1

Characteristics of included studies

Depression eligibility was determined by clinician diagnosis,17,18,20 symptom checklist,16,19,21,23 either a diagnosis or symptom checklist,24 or a three-question depression screen validated for use in primary care.22 Baseline receipt of antidepressant medication was required for eligibility in one trial,20 was an exclusion criterion in three16,18,21 and allowed but not required in four.17,19,22,24 Common exclusion criteria included medical conditions for which exercise was contraindicated, psychiatric illness, cognitive impairment, alcohol or substance misuse and, to a lesser extent, being a regular exerciser18,20 or lacking motivation to exercise.21

In two trials the exercise intervention was classified as three-dimensional (3D) training, which included Tai Chi23 and Qi Gong.24 The remaining seven trials included elements of endurance and strength training, and were classified as mixed exercise, including four trials described in the trial’s report as mixed, two18,21 that were based mostly on strength training but included elements of endurance training, and one17 in which endurance training was prescribed and strength training activities were encouraged. Interventions typically involved exercising for three to five, 30–45 min sessions per week for 3–4 months. Exercise was completed in participants’ homes,17,22 including care homes,19,24 and various community-based facilities.16,18,20,21,23 Exercise was supervised in all but two trials17,22 and completed in either group16,18,20,23,24 or individual17,19,21,22 formats.

Four trials compared exercise alone to an active usual care control, which included a referral letter sent to the primary care clinician recommending usual care,17 brief advice about exercise,21 a structured health education programme20 and telephone discussions of health status.16 A waiting-list control group (i.e. no contact intervention) was used in one trial,23 whereas in four trials exercise was compared with a non-active control intervention, for example equal contact or attentional control.18,19,22,24 Outcomes were assessed at follow-up ranging from 3 to 12 months. In four trials the primary end-point (3–6 months) coincided with the end of the intervention period16,18,19,23 and in five it was 2–6 months post-intervention. In one study18 outcomes were assessed at 20 weeks and at 26 months, but only data from the former are synthesised so as to avoid introducing between-study variation in follow-up assessments.

In four trials17,2022 that reported the number of eligible patients invited to participate in the trial, the uptake, or recruitment rate, was 52% (38 of 73),21 55% (193 of 353),22 76% (86 of 113)20 and 92% (138 of 150).17 Four trials18,2022 assessed adherence to the exercise intervention. In two trials, 75% (12/16)18 and 58% (11/19)21 of participants satisfied the adherence criterion of attendance at ⩾20 of 30 exercise sessions. One trial20 reported that the mean attendance at exercise sessions was 67%, which approximates to 13 of 20 exercise sessions.20 In the final trial,22 adherence was defined as completing ⩾2of3 prescribed exercise sessions per week as well as ⩾2 of 3 recommended walking sessions per week. At 6 months (1 month post-intervention), 64% of participants satisfied the criteria for adherence and, at 12 months (7 months post-intervention), 57% were adherent.

Risk of bias varied among the included studies (Appendix 1). Risk was assessed as high in two trials,23,24 moderate in four16,18,19,21 and low in three.17,20,22 Across the nine trials a total of 27 (60%) risk of bias items were assessed as adequate, 11 (24%) were unclear and 7 (16%) were inadequate. Common methodological limitations included failure to analyse data according to the intention-to-treat principle, lack of masked outcome assessment and incomplete follow-up of participants. Risk of bias assessment was hindered by poor reporting practices, including both inconsistent and insufficient reporting.

Effect of exercise on depression

The point estimate of effect for each trial indicated lower depression severity among participants allocated to the exercise group compared with those allocated to the non-exercise control (Fig. 2). In four17,20,23,24 of nine trials the difference in depression severity was statistically significant. Two trials23,24 of 3D exercise reported effect sizes of far greater magnitude than the remaining trials, and statistical heterogeneity was detected among trial-level effects (I2 = 58%, χ2 = 18.97, d.f. = 8, P = 0.02). The two trials were removed and neither contributed to subsequent analyses. The decision not to combine the trials in a separate synthesis was based on the detection of statistical heterogeneity between the trials (I2 = 60%, χ2 = 2.48, d.f. = 1, P = 0.12) and assessment of each trial as having high risk of bias.

Fig. 2

Trial-level data, effect estimates and forest plots for depression severity. SMD, standard mean difference.

Only trials of mixed exercise contributed data to the pooled analyses (Table 2). The synthesis of data from seven trials produced a small but statistically significant effect in which exercise was associated with lower severity of depression (SMD = –0.34, 95% CI –0.52 to –0.17). There was no evidence of statistical heterogeneity among the pooled estimates (I2 = 0%), and no indication of small study bias (Egger –0.52, 95% CI –3.72 to 2.69, P = 0.72).

View this table:
Table 2

Summary results for pooled analyses

View this table:

Risk of bias within trials

Small, statistically significant effects emerged from the synthesis of three trials17,18,20 in which participant eligibility required a current diagnosis of depression (SMD = –0.38, 95% CI –0.67 to –0.10), and in four trials16,19,21,22 using a symptom checklist threshold (SMD = –0.34, 95% CI –0.62 to –0.06). For the latter synthesis there was some indication of variation among the pooled estimates (I2 = 25%), but this was unlikely to be important and did not exceed what would be expected by chance alone (χ2 = 4.02, d.f. = 3, P = 0.26).

Small, statistically significant effects favouring exercise were observed in three trials17,20,22 with low risk of bias (SMD = –0.36, 95% CI –0.61 to –0.10), four trials17,18,20,21 using an active intervention control (SMD = –0.44, 95% CI –0.67 to –0.20), and four trials17,2022 with extended follow-up (SMD = –0.32, 95% CI –0.54 to –0.10). Variation among pooled estimates was detected but did not exceed what would be expected by chance alone in the analyses for risk of bias (I2 = 25%, χ2 = 3.20, d.f. = 2; P = 0.20) and follow-up period (I2 = 19%, χ2 = 3.72; d.f. = 3, P = 0.29). There was no evidence of statistical heterogeneity among trials comparing exercise with an active usual care control (I2 = 0%, χ2 = 2.43; d.f. = 3, P = 0.49).


Summary of main results

The review identified nine RCTs evaluating the medium-term (3–12 months) effect of exercise on the severity of depressive symptoms in older people. Synthesised data from seven trials of mixed exercise indicated a small but statistically significant effect favouring exercise. Small, statistically significant effects favouring exercise were similarly observed in a pre-planned analysis of trials stratified by depression eligibility criteria (clinician diagnosis or symptom checklist threshold). These findings were robust in sensitivity analyses that excluded trials with higher risk of bias, non-active intervention comparison groups or in which the primary end-point was within rather than beyond the intervention period.

Strengths and weaknesses of the study

The study adhered to the pre-specified protocol, adopted procedures to limit the potential for bias and used appropriate methods to select, evaluate and synthesise relevant evidence. A comprehensive search for published and unpublished studies, which included multiple electronic databases and scanning of bibliographies, yielded nine trials, all of which were published studies. Absence of data from unpublished studies is a potential weakness, since effects estimated from published studies may be inflated because of bias towards the non-publication of small studies with null effects. However, null effects on the outcome of interest in five of nine included trials, and five of seven synthesised trials, mitigates concerns about publication bias, since decisions to publish appear independent of the observed effect.

There was at least a moderate risk of bias in all but three of the included trials. As study quality and effect size typically show an inverse association, the underlying risk of bias may have inflated our estimate of the treatment effect. Sensitivity analysis restricted to three trials of low risk of bias yielded a pooled estimate of a nearly identical magnitude and precision as the estimate derived from synthesis that included trials of higher risk of bias. These data are inconsistent with the suggestion that bias, due to poor methodological quality, may have inflated the observed effect of mixed exercise on symptoms of depression.

A strength of the review is that it not only provides data crucial to healthcare decision-making, such as uptake of and adherence to exercise among the target population, but that data are derived from trials conducted under conditions that most closely match the context of usual healthcare practice. Specifically, in three trials of low risk of bias, with older people who were not excluded for or prohibited from use of antidepressant medication, who were identified, approached and invited to participate in the context of routine clinical practice, 68% (417 of 616) of eligible patients agreed to participate in a trial of exercise for treatment of depression, at least three-quarters of whom achieved the minimum criteria for adherence.

Comparison with other studies

Other reviews of exercise for depression in older people11,12 have included both randomised and non-randomised study designs, and trials in which current depression was not required for participant eligibility. Findings based on different levels of evidence in clinically heterogeneous populations are not easy to use to inform healthcare decisions. Moreover, previous reviews have used qualitative methods of synthesis and relied on quasiquantitative methods for interpretation based on a simple count of studies with/without significant results. This approach is less than ideal, not least because there is an increased potential to conclude that exercise is beneficial for depression, when the magnitude of the effect is too small to be meaningful. Our study provides the first quantitative estimate of the effect of exercise on depression severity among older people with clinically significant symptoms of depression, and pre-planned subgroup and sensitivity analyses suggest that the effect is both stable and robust.

The pooled effect of exercise on depression severity observed in this review (SMD = –0.34) is comparable with the range of effects estimated for different classes of antidepressant medication (SMD = 0.2–0.5)25 and psychotherapy (SMD = 0.18–0.34).26 However, although age-associated factors can complicate use of antidepressant medication and resource-related factors can impede timely access to psychotherapy, for older people with or without medical morbidity, individualised mixed exercise has very few risks, is easy to access and has the potential to improve a wide range of additional health outcomes.

Meaning and implications of the study

The clinical relevance of an SMD can more easily be considered when converted back into units of the original scale, or when represented as the overlap of distributions. At a group level, an SMD of –0.34 is equivalent to 63% of exercise participants having lower severity of depression than the average control participant or, put another way, 13% of the population of exercise participants doing better than would otherwise have been expected. For individuals at the symptom checklist threshold, an SMD of –0.34 translates into a reduction of approximately 20% in the severity of depressive symptoms. The magnitude of effect estimated in this study is clinically meaningful at the individual level, and may have substantial public health significance at the population level.

Our findings must be interpreted in relation to the quantity and quality of available evidence. For exercise interventions involving 3D training (Tai Chi and Qi Gong), two trials with a high risk of bias demonstrate clearly that evidence was insufficient in both quantity and quality. For interventions involving mixed exercise, the available evidence comprised seven trials with low to moderate risk of bias. Although the quantity and quality of evidence was less than ideal, these limitations are not sufficient to dismiss the findings of the review. Evidence is drawn from RCTs of direct relevance to the population, intervention and outcome of interest. All analyses were pre-specified, synthesised results, yielded consistent effects and there was no evidence of small study effects, including publication bias. Thus, there is a moderate-quality evidence base for the medium-term effect of mixed exercise on depression severity.

The finding that mixed exercise has a small, but clinically important effect on symptoms of depression, has general applicability to people aged over 60 years who are experiencing elevated symptoms of depression. However, as depression may reduce the appeal of exercise, participants in exercise trials may not be representative of the population of older people with depression. As none of the included trials stratified randomisation by depression severity, it is unclear whether our findings are equally applicable to patients with elevated, but subthreshold, symptoms as they are to patients with more severe symptoms, such as those that satisfy diagnostic criteria. Similarly, the findings may have limited applicability for patients who are more frequent exercisers or who have more severe comorbid physical illness, since several trials excluded patients classified as regular exercisers or as too ill to participate.

Research to reduce residual uncertainty concerning the applicability of moderate-quality evidence should be considered a public health priority. This research should be in the form of a pragmatic RCT with sufficient power to detect an effect equivalent to an SMD of at least 0.3. Such research might usefully stratify randomisation by depression severity, receipt of antidepressant medication and/or level of regular exercise. As uptake of exercise in this population will be the crucial driver for cost-effectiveness, interventions should include integrated strategies, based on behaviour change techniques, to maximise uptake of and adherence to exercise regimens.

The findings of this review are consistent with the suggestion that, for older people who present with clinically meaningful symptoms of depression, prescribing structured exercise with mixed elements of endurance and strength training tailored to individual ability, will likely reduce the severity of depression. Whereas the evidence on the effect of mixed exercise is minimally sufficient, for Tai Chi and Qi Gong the available evidence is insufficient in both quantity and quality.

  • Received April 4, 2011.
  • Revision received November 20, 2011.
  • Accepted January 17, 2012.


View Abstract