Background Previous meta-analyses of fluoxetine as an antidepressant have many methodological problems, including diagnosis of major depression, validity of outcome measures and lack of intention-to-treat analyses.
Aims To provide an estimate of the effect of fluoxetine compared with placebo and tricyclic antidepressants (TCAs), and to investigate reasons for early discontinuation from acute treatment.
Method Randomised trials were analysed using both intention-to-treat, efficacy and end-point.
Results Fluoxetine was superior to placebo but effect size was low. In trials comparing fluoxetine v. TCA, the results for all trials and for the USA trials showed a trend in favour of fluoxetine. Those for the non-USA trials showed a trend in favour of TCA. When combined, the results showed that significantly fewer patients on fluoxetine discontinued treatment because of adverse events.
Conclusion Fluoxetine is superior to placebo, irrespective of the analytical approach use, whereas the results obtained v. TCAs depend on the approach used. Hence, the results should be interpreted in this light.
Previously published meta-analyses of selective serotonin reuptake inhibitors (SSRIs) v. tricyclic antidepressants (TCAs) or placebo (Anderson & Tomenson, 1994; Greenberg et al, 1994; Anderson, 1998) were based on published data only and did not analyse data for all randomised patients (the intention-to-treat approach) since these were not available in the published reports. The only previous meta-analysis with this approach was reported by Bech & Cialdella (1992). In the present analysis we used the Eli Lilly and Company (Lilly) fluoxetine database and included patients from published and unpublished randomised clinical short-term trials of fluoxetine. A protocol described our objectives, inclusion and exclusion criteria for trials, and the analyses to be performed. We used different analytical approaches for completers and non-completers. Our objectives were to obtain quantitative estimates of the fluoxetine treatment effect compared with: (a) placebo; (b) TCAs; and (c) to analyse the reasons for early discontinuation from treatment.
MATERIAL AND METHOD
Types and sources of data
In keeping with our original protocol all randomised clinical trials completed and analysed up to the end of December 1992 (the Lilly fluoxetine database) that satisfied the selection criteria were included. After this date, no pertinent trials comparing fluoxetine with placebo or TCAs were added to the database. We analysed the trials performed in the USA (USA trials) separately from those performed elsewhere (Canada and Europe; non-USA trials) because the psychiatric methods and clinical trial procedures were sufficiently different in the USA compared with elsewhere, and this could be a source of heterogeneity between the trials (Ansseau, 1992).
In our protocol for this meta-analysis we defined the criteria for selecting trials, before we accessed the trials database: (a) identical or very similar clinical inclusion criteria for patients (major depression as defined by DSM-III (American Psychiatric Association, 1980); (b) use of the Hamilton Depression Rating Scale (HDRS-17; Hamilton, 1967; and the first 17 items from trials that used more than 17); and (c) a double-blind follow-up phase of at least six weeks. For the non-USA trials, we analysed only trials of fluoxetine v. TCAs since the three non-USA placebo-controlled trials (116 patients) in the database did not satisfy our inclusion criteria or included very few patients. The same inclusion criteria were used for non-USA trials, except that trials with a five-week, double-blind follow-up period were also included since their exclusion would have led to only a handful of trials with a small number of patients being included. The database contained only one USA trial with a five-week double-blind follow-up period, but this was not included.
Trials without a control treatment (e.g. dose-ranging trials) and those with a control treatment other than placebo or a TCA were excluded. In addition, trials in which all control patients received fixed doses ≤75 mg/day of a TCA were eliminated, as were those in which treated patients received less than 10 mg/day of fluoxetine. Within a trial, all patients were pooled according to the treatment received, irrespective of the dose received, this being equivalent to comparing a single fluoxetine-treated group with a single TCA-treated group and a single placebo-treated group.
The first evidence-based diagnostic system in psychiatry is the DSM-III. New-generation antidepressants are indicated for major depression as defined using this diagnostic system in most countries, and this is the reason we decided to use DSM-III major depression as the only diagnostic inclusion criterion.
The database contained information for 69 trials, including 6633 patients; of these, 21 trials were USA trials and 48 had been performed elsewhere (non-USA trials). Of the 21 USA trials, five including 400 patients were excluded for the following reasons: three because of the diagnostic system used (Research Diagnostic Criteria; RDC; Spitzer et al, 1978); one because the double-blind follow-up was only for five weeks; and one because the TCA dose was too low. In addition 96 patients randomised to receive a fixed dose of 5 mg/day were excluded, as per our protocol. Of the 48 non-USA trials, 34 trials including 2047 patients were excluded for the following reasons: 23 because the DSM-III was not used (RDC; Feighner diagnostic criteria; ICD-9; World Health Organization, 1978); five because the control treatment was not a TCA (maprotiline, a monoamine reuptake inhibitor, was considered to be similar to TCAs although it is not tetracyclic, but mianserin was not); two because they were open-label uncontrolled trials; two because only one and two patients, respectively, had been recruited, one because a fixed dose of a TCA was used (clomipramine 75 mg); and one because there were only sparse data available for the 11 patients included. In total, 30 trials (16 USA and 14 non-USA) and 4120 patients (3447 USA and 673 non-USA) were included (62% of the total database) in accordance with the criteria defined in our protocol.
ANALYSIS GROUPS AND METHODS
The analyses of continuous data were performed by M.B. and confirmed by M.H. using the Cochrane Collaboration software (Review Manager, 1997). The analyses of the binary outcomes were performed by P.C. and M.H. using a specific software package (EasyMA; Cucherat et al, 1997).
The trials were analysed in groups defined by where they were performed (USA and non-USA trials) and type of control treatment (placebo or TCA). Three types of analyses were performed for each out-come: (a) all randomised patients, classifying prematurely discontinued patients (before Day 42 in USA trials and Day 35 in non-USA trials) as failures (intention-to-treat); (b) all randomised patients who completed at least four weeks of therapy using “ a last-observation-carried-forward” technique (efficacy analysis); and (c) all randomised patients with at least one post-baseline visit (end-point analysis) using “a last-observation-carried-forward” technique.
Frank et al (1991) suggested using the term remission, rather than recovery, when defining response to drug therapy in the short-term treatment of depression. Partial remission after 4-6 weeks of treatment can be defined as at least a 50% reduction compared with the baseline value for the HDRS-17 score, which corresponds to very much or much improved on the Clinical Global Impression Scale (CGI) (Guy, 1976). The CGI was used in all the USA trials, but only in a few of the non-USA trials. The primary outcome for USA and non-USA trials was defined as a binary variable on the HDRS-17; partial remission, that is at least 50% reduction compared with the baseline score on the HDRS-17 instrument. The secondary outcome in the USA trials was also a binary variable, defined as a much improved or very much improved on the CGI scale. Another secondary, but quantitative, outcome was the mean change in HDRS-17 scores from baseline to end-point. In this part of the analysis an HDRS subscale, the depression factor (including the six items of depressed mood, guilt, work and interests, retardation, psychic anxiety and general somatic), was also used (HDRS-6; Bech, 1989; O'Sullivan et al, 1997).
The reasons for early treatment discontinuation were analysed as binary variables (adverse event, lack of efficacy or any reason).
Log odds ratio analysis for binary data
We used the logarithm of the odds ratio method, which is based on a multiplicative model, that is the success rate (partial remission) in the treatment group is assumed to be a multiplicative function of that in the control group (Boissel et al, 1989). Due to the large number of statistical tests performed the level of statistical significance was set at a robust value P=0.01 or less. A test for heterogeneity was also performed, and because this is an insensitive test, the level of statistical significance was set at a value of P=0.10 or less. When heterogeneity was detected we analysed the data using a random effects model, which gives more conservative results, but can deal with a certain amount of heterogeneity.
An odds ratio equal to one indicates that there is no difference between the two treatment groups. A value greater than one indicates that more patients in the fluoxetine group were classified as being in partial remission, and therefore that fluoxetine was better; a value of less than one indicates that more patients in the control group were classified as being in partial remission, and therefore that control treatment (placebo or TCA) was better. However, in the analyses of early treatment discontinuations an odds ratio of less than one indicates fewer discontinuations in the fluoxetine group, and that fluoxetine was better. Conversely, a log odds ratio of greater than one indicates that there were fewer discontinuations in the control (placebo or TCA) group, and therefore that the control treatment was better.
Effect size for the meta-analysis of quantitative data
Effect size analysis was introduced by Glass (1976) as a means of combining data from several independent clinical trials. In our analysis the effect size was defined as the mean change of HDRS from baseline to end-point of the two groups under investigation divided by the standard deviation of the change score (Cohen, 1977). The 95% confidence intervals (95% CIs) were calculated according to Hedges & Olkin (1985). Data for all randomised patients with at least one post-baseline visit (end-point analysis), using a ‘last-observation-carried-forward’ technique, were included in these analyses. The method of calculation used is in accordance with that described by Whitehead & Whitehead (1991), using either a fixed or random effects model as deemed appropriate. As for the meta-analysis of binary data, a test of heterogeneity (Cochran's Q-test; Laird & Der-Simonian, 1986) and a test of significance of the effect size were performed.
The list of trials showing some details of their characteristics, for example, number of investigators, number of patients, dose of medication, are given in Tables 1a and 1b (further details and references of published trials can be obtained from the authors upon request). One trial (non-USA-10) was excluded after the initial analyses which showed that the trial was responsible for a statistically significant heterogeneity, and inspection of the results suggested that they were unlike the others (i.e., partial HDRS-17 response rates were 86.7% and 6.7% for the fluoxetine-treated and TCA-treated groups, respectively). For the USA-trial analysis, data were analysed from 16 single- and multicentre, randomised, double-blind trials involving 3543 patients (see Table 1a). Of the 3447 patients, 1914 had received fluoxetine, 847 had received placebo and 686 had received TCAs (either amitriptyline, desipramine, doxepin, imipramine, or nortriptyline). A total of 96 patients were excluded, per our protocol, because they had been randomised to receive a fixed dose of 5 mg/day of fluoxetine. Therefore, data for a total of 3447 patients were included in the meta-analyses (1914 in the fluoxetine-treated group, 847 in the placebo-treated group and 686 in the TCA-treated group).
For the non-USA trials, data were analysed from 13 single- and multi-centre, randomised, double-blind trials in which fluoxetine was compared with a TCA in 643 patients (i.e. without non-USA trial 10; see Table 1b). Of these 643 patients, 314 had received fluoxetine and 329 had received TCAs (either amitriptyline, clomipramine, dothiepin, doxepin, imipramine or maprotiline). There were no statistically significant differences between the treatment groups in the percentage of men included in the trials (approximately 40% overall), the mean age (approximately 45 years), or the baseline HDRS-17 score (total mean score approximately 22). Only one of the USA trials v. placebo included both in- and out-patients, the others included only out-patients.
The dose ranges for the individual trials are shown in Tables 1a and 1b. Only two of the USA trials v. TCA included both in- and out-patients (both started with only in-patients and the protocols were amended during the trials); the other trials included only out-patients. Three non-USA trials v. TCA included only in-patients, three included only out-patients, six included both, and this was not specified for the remaining trial. The percentage of patients completing the trial was generally higher in the non-USA trials (Table 1b) than in the USA trials (Table 1a).
Meta-analysis of binary data for treatment effects
Table 2 shows the results obtained with HDRS-17, using both the percentage of responders and odds ratio analysis. The efficacy analysis had the highest response rates in the comparisons. The overall difference for fluoxetine v. placebo was 21.4% in the efficacy analysis but only 13.6% in the intention-to-treat analysis. In all the analyses fluoxetine showed a statistically significant benefit compared with placebo. In the USA trials of fluoxetine v. TCA no statistically significant differences were observed. In the non-USA trials no statistically significant differences were observed.
Table 3 shows the results for the CGI outcome, both the remission rates (percentage of ‘very much improved’ and ‘much improved’) and the odds ratio analyses. The analyses for the fluoxetine v. placebo trials gave results that were similar to those obtained for HDRS-17 outcome, that is, all differences were statistically significant.
Meta-analysis of quantitative data (effect size)
When the results for all seven trials assessing fluoxetine v. placebo are pooled an effect size of -0.30 in favour of fluoxetine was obtained, with a 95% CI of -0.39 to -0.21 (see Fig. 1). For the HDRS-6 outcome an effect size of -0.37 was observed (95% CI: -0.46 to -0.28). Figure 2 shows the results for the trials v. TCAs. The pooled effect size for the HDRS-17 outcome in the USA trials was 0.00 with a 95% CI of -0.18 to 0.10. The pooled effect size for the HDRS-6 outcome showed a non-significant trend in favour of fluoxetine, (-0.10; 95% CI -0.21 to 0.01). A trend in favour of TCAs was observed for the non-USA trials v. TCAs, with a pooled effect size for the HDRS-17 outcome of 0.17 (95% CI 0.01 to 0.34). There was a stronger trend in favour of TCAs for the HDRS-6 outcome, with a pooled effect size of 0.18 (95% CI 0.01 to 0.34). When the results from all the trials comparing fluoxetine v. TCAs were pooled the effect size for the HDRS-17 outcome showed a non-significant trend in favour of TCAs (0.05; 95% CI -0.04 to 0.14). The pooled effect size for the HDRS-6 outcome also showed a non-significant trend in favour of fluoxetine (-0.02; 95% CI -0.11 to 0.07).
Meta-analysis of early treatment discontinuation data (binary)
The results of the analyses of the reasons for discontinuations in the trials v. placebo were as predicted, that is significantly more discontinuations in the fluoxetine-treated group due to an adverse event, and significantly more discontinuations in the placebo-treated group due to lack of efficacy, with a non-significant trend for discontinuation for any reason favouring fluoxetine (see Table 4). Using the fixed effects model the test for homogeneity was significant indicating heterogeneity among the trials for the three outcomes, and visual inspection of the graphical results (not shown) suggested this was due to two trials (USA-trial-15 and USA-trial-16). We therefore decided to use a random effects model, which gave more conservative results, but removed the heterogeneity.
The analysis of the reasons for discontinuation in the USA trials of fluoxetine v. TCA showed that, while receiving fluoxetine, significantly fewer patients discontinued their treatment because of an adverse event, and significantly fewer patients discontinued for any reason. No significant difference was seen with respect to discontinuations due to lack of efficacy. The results from a similar analysis for the non-USA trials v. TCA did not indicate any significant differences between the two groups, however, the width of the confidence intervals suggest a potential lack of power to detect clinically significant differences. When the USA and non-USA trials were combined the results showed that significantly fewer patients on fluoxetine discontinued treatment due to adverse events or for any reason.
DSM-III major depression
In our protocol for this meta-analysis it was our intention to compare USA trials with those performed elsewhere (non-USA trials). An a priori condition for such a comparison required that the diagnostic system should be evidence-based and accepted by the health care regulators in both the USA and elsewhere. More non-USA trials than USA trials were excluded because the diagnosis of depression in Europe was made using a classification system other than the DSM-III criteria. We had not anticipated that this would lead to 23 non-USA trials (including around 1400 patients) being excluded. However, since the official indication for the use of SSRIs such as fluoxetine in patients with depression worldwide, including Europe, is major depression, we felt that it was not justified to change the original inclusion criteria in our protocol.
Antidepressive responsiveness to fluoxetine in major depression
A 50% reduction in the baseline HDRS-17 score was the primary outcome in our study. In the intention-to-treat analysis, both for HDRS-17 and for CGI, fluoxetine showed an advantage of approximately 15% over placebo. This is a similar result to that found in one of the first overviews comparing TCAs with placebo (Smith et al, 1969) as well as that reported in the Medical Research Council trial (Medical Research Council, 1965). The odds ratio analysis confirmed that fluoxetine was significantly superior to placebo, although no difference was seen between the USA trials and non-USA trials.
Improved safety acceptance of fluoxetine
In this meta-analysis the results for discontinuation due to adverse reactions were evaluated by the intention-to-treat analysis. Compared with placebo we observed that significantly more patients ceased treatment with fluoxetine due to adverse events while significantly more patients dropped out on placebo due to lack of efficiency. This is reflected in the relatively lower differences in antidepressive improvement in the intention-to-treat analysis. However, compared with patients in the TCA groups, in the USA trials, and to a lesser extent the non-USA trials, we observed that significantly fewer trials in the fluoxetine group stopped treatment due to adverse events. This seems to explain that the intention-to-treat analysis for the USA trials favoured fluoxetine while that for the non-USA trials did not. However, when combined, significantly fewer patients on fluoxetine compared with those on TCAs discontinued treatment due to adverse events.
These results are in agreement with results of the meta-analyses published by Andersen & Tomenson (1995) and Hotopf et al (1997). In the latter meta-analysis, Hotopf et al analysed the ‘old’ TCAs (e.g. imipramine and amitriptyline) separately from the ‘newer’ TCAs (e.g. dothiepin, nortriptyline, clomipramine and doxepin). They found that the lower rate of discontinuation in patients on SSRIs was observed in the comparison with the old TCAs. This may explain our finding concerning the intention-to-treat analysis in the USA v. non-USA trials, as the old TCAs were used in 65% of the USA trials compared with 47% of the non-USA trials.
Comparison with other meta-analyses with fluoxetine
In previous meta-analyses the effect size was mainly used, for example, Song et al (1993), Greenberg et al (1994) or Anderson & Tomenson (1994). Our results for fluoxetine v. placebo are in agreement with Greenberg et al (1994), although our effect size of -0.30 for the HDRS-17 remission outcome is low. In the Greenberg et al (1994) analysis we have detected some publication bias (i.e. unpublished trials not included) and double publication (i.e. data included from two publications of the same trial). When using the core symptoms of depression, the HDRS-6 outcome, we showed an effect size of -0.37, indicating that fluoxetine has an effect on the specific symptoms for major depression. This is in agreement with the results from our previous meta-analyses on citalopram and fluvoxamine (Bech, 1989; Bech & Cialdella, 1992). Our results for fluoxetine v. TCAs are in agreement with those published by Anderson & Tomenson (1994), that is, there is no difference in the antidepressive effect. This was confirmed by the HDRS-6 outcome results.
In conclusion, we have shown that results from meta-analyses can differ depending on how patients who withdraw from treatment early are counted in the analyses. Generally, the approach used is intention-to-treat, whereby patients who withdraw from treatment early are considered failures in the trial group to which they were allocated, and it may be important in the future to consider using other approaches (efficacy and end-point) in meta-analyses, to determine if there is a difference. Thus, in our analyses, we have confirmed the superiority of fluoxetine over placebo for the short-term treatment of major depression, and although we were unable to show a difference in efficacy with TCAs, fewer patients on fluoxetine withdrew due to adverse effects.
Clinical Implications and Limitations
The different statistical analyses converged in showing that fluoxetine is significantly superior to placebo and equal to tricyclic antidepressants (TCAs).
In clinical terms fluoxetine had an 15-20% improvement advantage to placebo which was maintained in the core symptoms of depression on the Hamilton Rating Scale.
The discontinuation rate of fluoxetine due to adverse drug events was significantly lower than with TCAs.
The meta-analysis criterion of DSM-III major depression excluded a rather high proportion of the European trials.
The trials were insufficient for evaluating the relationship between dose of fluoxetine and clinical response.
The USA trials have used older reference TCAs whereas the European trials used newer TCAs.
- Received June 22, 1998.
- Revision received October 18, 1999.
- Accepted October 18, 1999.
- © 2000 Royal College of Psychiatrists