The British Journal of Psychiatry
Randomised controlled trials investigating pharmacological and psychological interventions for treatment-refractory depression
Systematic review


Background About 30% of people with depression do not respond to an antidepressant at the recommended dose and can be described as having treatment-refractory depression.

Aims To summarise the findings from all randomised controlled trials (RCTs) that have assessed the efficacy of a pharmacological or psychological intervention for treatment-refractory depression.

Method We used a systematic search strategy to identify RCTs that included adults aged 18-75 years with a diagnosis of unipolar depression that had not responded to a 4-week course of a recommended dose of an antidepressant.

Results We identified 16 RCTs. None of the included trials assessed the efficacy of psychotherapy. All the trials were too small to detect an important clinical response. We found only two trials on lithium augmentation, which randomised 50 subjects in total.

Conclusions There is little evidence to guide the management of depression that has not responded to a course of antidepressants. Treatment-refractory depression is an important public health problem and large pragmatic trials are needed to inform clinical practice.

Approximately 30% of people with depressive illness do not respond to the usual recommended dose of antidepressants. The World Psychiatric Association made one of the earliest definitions of ‘resistant’ depression as, ‘ an absence of clinical response to treatment with a tricyclic antidepressant at a minimum dose of 150 mg/day of imipramine (or equivalent drug) for 4 to 6 weeks’ (World Psychiatric Association, 1974). A number of alternative definitions have been used but the term ‘treatment-refractory depression’ that we adopt here will be the World Psychiatric Association definition with a 4-week time criterion. Most other definitions require more ‘ severe’ treatment-refractory depression, in the sense that patients have failed to respond to more than a single course of antidepressant (Thase & Rush, 1995).

Current guidance

There is little current guidance on the management of treatment-refractory depression. Current guidelines (American Psychiatric Association, 1993; Anderson et al, 2000) suggest increasing the dose of antidepressant, switching to a different class, adding psychotherapy or augmenting with lithium or electroconvulsive treatment. The lack of guidance is reflected by variation in the management of treatment-refractory depression. A third of psychiatrists in the north-east of the USA preferred lithium augmentation (Nierenberg & White, 1990). Canadian psychiatrists (Chaimowitz et al, 1991) had an equal preference for a second tricyclic, augmentation with a monoamine oxidase inhibitor and augmentation with lithium. The most popular choice in the UK (Shergill & Katona, 1996) was to increase the dose or to change class. However, 39% of respondents in this study stated that they were not confident when treating this condition.

Previous systematic reviews

Systematic reviews of the literature attempt to provide an unbiased and succinct summary of all of the available evidence and, when possible, produce a meta-analysis that summarises results more precisely (Chalmers & Altman, 1995; Lewis et al, 1997). Previous systematic reviews have assessed the efficacy of lithium augmentation (Austin et al, 1991; Bauer & Dopfmer, 1999) and triiodothyronine augmentation (Aronson et al, 1996). The systematic review of Austin et al included 5 trials, but 4 of these used only 3 weeks to define treatment resistance. One of the trials treated subjects with lithium for only 48 hours, and another reported very low (less than 0.3 mmol/l) blood lithium levels. Bauer & Dopfmer (1999) included randomised controlled trials (RCTs) in their review that studied both unipolar and bipolar depression. It would seem unwise to generalise from patients with bipolar depression to those with unipolar depression, especially in relation to lithium use. The systematic review of four randomised double-blind studies of triiodothyronine (Aronson et al, 1996) also included studies that used a 3-week criterion and patients with bipolar depression.

The aim of this systematic review was to identify and summarise all the RCTs that had investigated the pharmacological and psychological management of patients with treatment-refractory depression.


A literature search was carried out in association with the Cochrane Collaboration (Depression, Anxiety and Neurosis Group). The Cochrane Controlled Trials register (CCTR) 2000 edition was searched, as were the following electronic databases: EMBASE (1980-1999), Medline (1966-1999), Psychlit and PsychInfo (1974-1999), LILACS (1982-1999). The standard search strategy for identifying RCTs developed by the Cochrane Collaboration was used ( Keywords to identify treatment-refractory depression trials include DEPRESS*; THERAPY or TREATMENT, REFRACT*; RESISTANT; NON-RESPOND*; UNRESPONS*; FAIL*; AUGMENT*; POTENTIATION and COMBIN*. The abstracts of these trials were read to identify those that appeared to reach the inclusion criteria. Paper or electronic copies of trials that appeared, from the abstract, to achieve the inclusion criteria were collected for further inspection.

When the search strategy had been completed, the authors of all identified trials (both those to be included and the ‘near misses’) and all known experts in the field were contacted for any further information on trials that were unpublished, in press or were currently in progress. If trials presented data on both unipolar and bipolar depression the authors were asked for the results of the unipolar participants.

Inclusion criteria

Randomised controlled trials were included in the review if the participants had a diagnosis of unipolar depression that had not responded to a minimum of 4 weeks of antidepressant treatment at a recommended dose (at least 150 mg/day imipramine or equivalent). This definition was chosen in order to include as much evidence as possible. Trials that concentrated solely on patient groups either under the age of 18 years or over the age of 75 years were excluded, as were trials including patients with comorbid schizophrenia. Participants with bipolar disorder were excluded. These criteria and the details of the search strategy were decided before beginning the review and published as a protocol in the Cochrane Database of Systematic Reviews (Stimpson et al, 2000).

Summary data from each of the identified trials were extracted independently by at least two of the three reviewers and entered onto predesigned data extraction forms. Any disagreements were discussed until a consensus was reached. If additional information was needed the first author of the trials was contacted.

Statistical methods

Where possible we planned to carry out meta-analysis of the results from trials. We wished to use a dichotomous outcome, the numbers who had ‘ recovered’. This is usually reported as a 50% reduction in Hamilton Rating Scale for Depression (HRSD) scores (Hamilton, 1960). This outcome was chosen for two main reasons. First, it avoids the difficulty of establishing whether a continuous variable has a normal distribution. Second, it allows fairly simple analyses that aid interpretation, particularly from a clinical perspective. We chose to calculate the absolute risk difference (i.e. the difference in proportion recovered). The reciprocal of this measure is the number needed to treat (Sackett & Cook, 1995). A positive value for a risk difference was given when the proportion recovered was greater in the intervention than in the placebo group. For the small trials, exact confidence intervals were calculated. Otherwise, risk difference, 95% confidence intervals and tests for heterogeneity were calculated using the Metan command within Stata (StataCorp, 1999).


Using our search strategy, 753 potential trials were initially identified and this number increased as the search was updated quarterly until January 2001 to give a total of 919 trials. Forty studies were excluded from the review, in accordance with our published protocol (Stimpson et al, 2000). The search and identification of studies is summarised in Fig. 1.

Fig. 1

Flowchart of progress through systematic review. RCTs, randomised controlled trials.


Fourteen trials were excluded from the review as they included participants with unipolar and with bipolar depression and it was not possible to extract data on unipolar depression alone. In 11, participants had been on antidepressant medication for less than 4 weeks or at a dose of less than 150 mg imipramine or equivalent. Three trials were abandoned on the grounds of the randomisation. In one relevant trial the randomisation had given rise to a striking imbalance between the randomised groups (Gitlin et al, 1987). This may well have resulted from the small size of these trials (n=16). One trial randomised participants to identical treatments (Antonuccio et al, 1984). A full list of excluded studies is available from the author upon request.

Two crossover trials were also excluded because it was impossible to extract data from the initial phase of the trial before the crossover took place. One published (Gagiano et al, 1993) and one unpublished trial (source available from the author upon request) had to be excluded as they did not describe the study with sufficient detail to know whether the inclusion criteria were met. One trial had to be excluded as data were not available on the subset of participants that were randomly assigned to cognitive—behavioural therapy (Barker et al, 1987). Two papers presented previously published results and the duplicated results are not included in the review (Zohar et al, 1985; Joffe & Singer, 1992).

Included trials

Seventeen RCTs were identified, which included a total of 645 participants. A variety of different designs were adopted. After extracting the data we have chosen to classify these designs according to the following four categories.

Antidepressant (or other) v. placebo (Table 1)

View this table:
Table 1

Antidepressant/other v. placebo

There were four trials which compared a pharmacological agent with a placebo (Table 1). The agents investigated were oestrogen (Klaiber et al, 1979), viqualine (Faravelli et al, 1988), ketoconazole (Malison et al, 1999) and paroxetine (Tyrer et al, 1987). Two of these studies were also crossover trials from which we extracted data for the 2 weeks prior to crossover.

Two of these trials (Klaiber et al, 1979; Faravelli et al, 1988) found a significant advantage compared with placebo, despite their low statistical power. The largest of these four trials randomised 47 subjects. In three trials that reported recovery rates, none of the 38 subjects randomised to placebo recovered (97.5% CI 0-9%).

We excluded the results from the second phase of the crossover designs.

Comparison of two active treatments

There were four trials that compared two pharmacological agents (Table 2). The comparisons made were: intravenous maprotiline v. intravenous clomipramine (Drago et al, 1983); brofaromine v. tranylcypromine (Nolen et al, 1993); venlafaxine v. paroxetine (Poirier & Boyer, 1999); and olanzapine v. fluoxetine (Shelton et al, 2001).

View this table:
Table 2

Antidepressant 1 v. antidepressant 2

The venlafaxine v. paroxetine comparison seems most relevant to current practice. The results of this trial did not support the superiority of one or other compound. Three of the performed analyses led to a result that favoured venlafaxine, but two of these did not adopt an intention-to-treat policy and most were of marginal statistical significance. Almost two-thirds of the subjects had been on a selective serotonin reuptake inhibitor previously. The Shelton study examined the policy of ‘switching’ between fluoxetine and olanzapine as all the subjects had failed to respond to fluoxetine. There was little information on previous medication for the other studies.

Antidepressant+augmenter v. antidepressant+placebo

The comparison of an augmentation startegy with a placebo seems the most relevant to clinical practice. Two trials of lithium as an augmentation agent (Zusky et al, 1988; Joffe et al, 1993; Table 3) could be included and a meta-analysis performed. In summary, lithium had a recovery rate by the end of the trial 25% greater than placebo (95% CI 2-49%), corresponding to a number needed to treat of 4 (95% CI 2-50). In all, there were only 50 patients in the two lithium trials. There was no statistical evidence to support heterogeneity between the trials (χ2=0.6, d.f.=1, P=0.44).

View this table:
Table 3

Antidepressant+augmentor v. antidepressant+placebo

There were also three trials of pindolol as an augmenter (Maes et al, 1996; Moreno et al, 1997; Perez et al, 1999) reporting on 106 subjects, although one of these (Moreno et al, 1997) did not report any recoveries and therefore does not contribute towards the summary estimate. Overall, those given pindolol had an 8% better recovery rate (95% CI 21% to -6%) but this was not statistically significant. There was little evidence to support any heterogeneity between the three pindolol trials (χ2=5.46, d.f.=2, P=0.07). Three further trials also used this design but investigated different augmentation strategies (Maes et al, 1996; Clifford et al, 1999; Shelton et al, 2001).

The overall recovery rate on placebo in all the eight trials was 14 out of 107 subjects or 14.4% (95% CI 7.9-23.4%).

Augmentation without a placebo

There were three trials that investigated augmentation of an antidepressant but did not compare with a placebo (Joffe & Singer, 1990; Fava et al, 1994; Rybakowski et al, 1999) (Table 4).

View this table:
Table 4

Augmentation trials without placebo group

Methodological quality of trials

None of the trials would have met all the requirements of the CONSORT guidelines on reporting results of randomised trials (Begg et al, 1996). Two of the trials mentioned that the random numbers were generated with a computer program. Of the ten trials that used a placebo, four mentioned that the placebos were identical in appearance to the active treatment. None of the trials gave an indication of how the allocation of randomisation was conducted, and only one trial (Perez et al, 1999) described how the randomisation was concealed. The two lithium trials mentioned that faked blood results were used to maintain blindness.

Four studies (Joffe & Singer, 1990; Joffe et al, 1993; Perez et al, 1999; Poirier & Boyer, 1999) reported a power calculation, although one reported a power of 20%. Two trials recruited the exact number of participants required by their power calculations (Joffe & Singer, 1990; Perez et al, 1999). One trial reported that the small sample size recruited had limited the power of their trial (Joffe et al, 1993) and one trial reported a power calculation incorrectly and did not report the sample size it required (Poirier & Boyer, 1999). The size of the randomised groups ranged from a maximum of 62 participants to a minimum of 5 participants. Only 2 of the 17 trials had a group with 25 or more subjects.

Issues not addressed by studies

No RCTs were identified that assessed the efficacy of psychotherapy and also met the inclusion criteria. A number of trials of psychotherapy were excluded on various grounds (further details available from the author upon request).

No RCTs were identified that investigated increasing the dose of antidepressant, or that compared switching to a new class of antidepressant with remaining on the original antidepressant.


Only 17 RCTs were identified, including 645 participants, covering any pharmacological or psychological intervention for treatment-refractory depression. The most striking impression is that there is currently very little evidence to guide the management of those who have not responded to a standard dose of antidepressant for 4 weeks. Augmentation of existing antidepressant medication was the strategy that had received most investigation, whereas there were no studies of any psychological treatment. It was possible to conduct a meta-analysis with the results from two trials that investigated lithium and the three that studied pindolol. The remaining studies mostly investigated a range of therapeutic options that, overall, did not address questions of current clinical relevance. Treatment-refractory depression is a common clinical problem and this lack of evidence is reflected in an absence of consensus among clinicians and the vagueness of current guidelines.


The systematic review used a thorough search strategy as part of the Cochrane Collaboration. It is still possible, however, that some trials have not been identified despite our efforts, and we would welcome any information about trials, particularly those that are unpublished.

The major limitation of the review reflects the major weakness of the constituent trials. Almost all the studies were small in size. Only 2 of the 17 trials had 25 or more subjects in a randomised group. A trial with 25 subjects in each group would be able to detect the difference between 10% and 50% recovery with 80% power and 5% significance. This is a large difference in outcome, much larger than the 14% difference reported in a recent meta-analysis of fluoxetine v. placebo (Bech et al, 2000). A trial would have to randomise 219 subjects to each group to detect a difference between 10% and 20% recovery with 80% power and 5% significance. All the trials in this study were therefore severely underpowered. Small trials can also lead to a failure of randomisation, resulting in an imbalance between the randomised groups. We came across two studies where this had occurred and excluded them, but smaller degrees of imbalance might still be present.

Publication bias was impossible to assess as the trials studied such a diverse range of interventions. It is usually assumed that systematic reviews of small trials are likely to be more susceptible to publication bias than those that include larger trials. Even meta-analysis of moderately sized trials can provide biased conclusions (LeLorier et al, 1997).

Since 1996, the CONSORT statement has provided guidance on the reporting of RCTs (Begg et al, 1996). None of the 17 studies, including those published after the CONSORT statement, followed all aspects of its guidance. Trials with inadequate concealment of allocation are associated with an increased estimate of benefit (Moher et al, 1998). Only one trial described how they kept the allocation of subjects concealed from the clinicians involved in their care (Perez et al, 1999). Overall, the trials did not meet the current expectations concerning the adequate reporting of randomised trials.

Inclusion criteria

The World Psychiatric Association (1974) defined treatment-refractory depression as a failure to respond after a 4- to 6-week period on a recommended dose of antidepressant. When planning the review and without prior knowledge of the included studies, we chose to set our inclusion criteria using a time limit of 4 weeks. This minimum time limit was considered appropriate for a systematic review as it would ensure that we collected all relevant studies. It also reflected the commonest clinical dilemma: what to do next after lack of response to an antidepressant. We were surprised that we excluded nine trials on the grounds that they defined treatment-refractory depression using a time limit of 3 weeks. Because the response to antidepressants can be delayed, we think this definition is rather too broad. We also excluded 14 trials on the grounds that they included both patients with bipolar and with unipolar depression. The management of depression in those with bipolar depression differs in some important respects from those with unipolar depression. Antidepressants are used more cautiously in case this precipitates a manic relapse. In the context of a trial, a manic relapse might lead to an apparent ‘improvement’ in depression scores. Most people with established bipolar disorder would also be on a mood stabiliser such as lithium.

Design of trials

We excluded the second phase of crossover designs as these are inappropriate for antidepressant trials in which subjects may recover. Antidepressants have a delay of 2-3 weeks before they take effect and so short periods before crossover are uninformative, as acknowledged by Tyrer et al (1987).

We identified four different designs in our included studies. Four studies compared an antidepressant v. a placebo, thus investigating removing an antidepressant agent and replacing with placebo. Because some subjects with ‘ treatment-refractory depression’ will have had a partial response, removal of antidepressant would be expected to lead to a worsening of symptoms. Two of the four trials using this design found improved recovery on active antidepressant. These results argue against stopping antidepressant medication in those who have not had a good response.

Four trials compared two active treatments. This also investigates switching to another antidepressant following failure to respond. However, the most relevant trial (Poirier & Boyer, 1999), which compared venlafaxine and paroxetine, included subjects that had been exposed to either selective serotonin reuptake inhibitors, tricyclics or both. To study the policy of switching to a new antidepressant, a more informative design would be to recruit subjects who had been treated with a single class of antidepressant and then randomise to either staying on the same class of antidepressant or switching to an alternative class. This design was used (Shelton et al, 2001) to compare remaining on fluoxetine with switching to olanzapine.


The most informative designs were those in which an augmenting agent was added to antidepressant medication and compared with a placebo and antidepressant. Our finding that 14% (95% CI 8-23%) of the placebo group recovered emphasises the necessity of a placebo comparison for studies of augmentation.

The two lithium trials were small, with only 50 patients in all, and treated subjects for 1-2 weeks, a relatively short duration. Although there was a statistically significant benefit for lithium, the confidence intervals are so wide (2-49%) that it does not exclude an inconsequential benefit. Meta-analysis of small trials often leads to unreliable results as randomisation is less effective and publication bias more common. These studies provide very weak evidence to support the use of lithium, although it is a common strategy and has widespread clinical support.

Pindolol is a β-adrenoceptor/5-HT1A receptor antagonist and has been investigated as an augmentation agent in three randomised trials. Overall, there was no significant benefit demonstrated in these three trials. In aggregate, only 106 patients were studied and the wide confidence intervals did not exclude the possibility that pindolol would be an effective augmenting agent.

Further research

The results of our review support the view that further RCTs need to be conducted to investigate the management of treatment-refractory depression. The STAR*D project ( funded by the US National Institute of Mental Health will hopefully address a number of the deficiencies in the current literature. We suggest that future RCTs should concentrate on studying the effectiveness of psychotherapy as it is a popular and acceptable option for many patients. The second area of research should be into augmentation strategies. Lithium is supported by the most encouraging results at present, but the evidence is still weak. Further trials should estimate the likely benefits of lithium more accurately and also attempt to refine the indications for its use.

Clinical Implications and Limitations


  • Treatment-refractory depression is common in clinical practice but there is little evidence to inform management.

  • There was some evidence of benefit for lithium augmentation, but the evidence was very weak.

  • In the absence of good evidence, clinicians will have to rely upon their own clinical judgement in deciding upon treatment.


  • Like all systematic reviews it is limited by the quality of the constituent studies.

  • The main conclusion is that further research is required as the findings are not strong enough to support any clinical guidance.

  • It proved difficult to perform much quantitative synthesis because the interventions were so diverse.

  • Received November 29, 2001.
  • Revision received May 14, 2002.
  • Accepted May 17, 2002.


View Abstract