Background Extracts of Hypericum perforatum (St John's wort) are widely used to treat depression. Evidence for its efficacy has been criticised on methodological grounds.
Aims To update evidence from randomised trials regarding the effectiveness of Hypericum extracts.
Methods We performed a systematic review and meta-analysis of 37 double-blind randomised controlled trials that compared clinical effects of Hypericum monopreparation with either placebo or a standard antidepressant in adults with depressive disorders.
Results Larger placebo-controlled trials restricted to patients with major depression showed only minor effects over placebo, while older and smaller trials not restricted to patients with major depression showed marked effects. Compared with standard antidepressants Hypericum extracts had similar effects.
Conclusions Current evidence regarding Hypericum extracts is inconsistent and confusing. In patients who meet criteria for major depression, several recent placebo-controlled trials suggest that Hypericum has minimal beneficial effects while other trials suggest that Hypericum and standard antidepressants have similar beneficial effects.
Extracts of Hypericum perforatum (St John's wort) are widely used to treat depression. Systematic reviews published between 1996 and 2000 concluded that such extracts are more effective than placebo and are comparable with older antidepressants in the treatment of mild to moderate depression (Linde et al, 1996; Volz, 1997; Linde & Mulrow, 1998; Josey & Tacket, 1999; Gaster & Holroyd, 2000; Williams et al, 2000). Several older trials included in these reviews were criticised because they included patients with few or mild symptoms who did not meet criteria for major depression, were conducted by primary care physicians who were not experienced in depression research, or used low doses of comparator drugs (Shelton et al, 2001). Also, smaller trials included in the reviews tended to report larger treatment effects, which might be explained by publication bias or lower methodological quality of smaller trials (Sterne et al, 2000).
Several large studies, including some with negative findings, have been published recently (Montgomery et al, 2000; Shelton et al, 2001; Hypericum Depression Trial Study Group, 2002). We therefore updated our previous review (Linde et al, 1996; Linde & Mulrow, 1998), paying particular attention to factors such as type and severity of depression and trial size that might explain conflicting results. Our updated review addresses the following specific questions. Are extracts of St John's wort (Hypericum perforatum) more effective than placebo, and as effective as standard antidepressants, in improving symptoms in adults with depression? Are Hypericum extracts less effective in patients who meet criteria for major depression than in patients with depressive symptoms who may not meet criteria for major depression? Do trials show that Hypericum extracts have less adverse effects than standard antidepressants?
We searched for English and non-English language and published and unpublished trials indexed in the register of the Cochrane Collaborative Review Group for Depression, Anxiety and Neuroses (last search July 2003) and PubMed (text word HYPERICUM, search dates 1998 to May 2004). We also checked reference lists of trials and reviews, contacted manufacturers and experts in the field, and relied on our prior extensive searches (Linde et al, 1996; Linde & Mulrow, 1998). One reviewer (K.L.) initially screened reference lists to identify controlled clinical studies of Hypericum preparations in humans. At least two reviewers independently reviewed the full text of all such articles to assess whether they met inclusion criteria. Disagreements occurred for two studies; these were resolved by consensus.
We selected studies that met the following criteria:
study design - double-blind, randomised, controlled trial;
participants - adult patients treated for depressive disorders;
experimental intervention - Hypericum monopreparation for at least 4 weeks;
control intervention - placebo or a synthetic standard antidepressant;
outcome measure - assessment of symptoms with a depression scale or general assessment of clinical response.
These criteria were more restrictive than those used in our prior reviews, which allowed single-blind trials, controlled trials without explicit randomisation, trials shorter than 4 weeks, combinations of Hypericum and other plant extracts, and comparison groups that were treated with drugs other than standard antidepressants, for example diazepam (Linde et al, 1996; Linde & Mulrow, 1998).
Data extraction, outcome definition and assessment of methodological quality
Using a pre-tested form, two reviewers independently extracted information regarding trial participants, methods, interventions, outcomes and study quality. Authors and/or sponsors were contacted to provide missing information. Disagreements were resolved through discussion. We extracted the numbers of patients who were randomised and analysed and who completed protocols, the number and reasons for drop-outs and withdrawals, numbers of patients reporting adverse effects, and the number and type of adverse effects that were reported. We assessed numbers of patients who were classified as responders based on score improvements on the Hamilton Rating Scale for Depression (HRSD; first preference), the Clinical Global Impression index (CGI; sub-scale global improvement rating as at least ‘much improved’; second preference) or any other clinical response measurement (third preference). We used the Jadad scale (items on randomisation, masking and reporting of drop-outs and withdrawals) and a checklist developed by one of us (items on treatment allocation, concealment of allocation, baseline comparability, physician and patient masking, and selection bias after allocation) to help guide assessments of study quality (Jadad et al, 1996; Linde et al, 2001).
We considered the proportion of responders at the end of treatment as the main outcome measure, or in case of treatment phases longer than 6 weeks, at the time point defined for primary outcome measurement by the study investigators. We used response rate ratios (ratios of the number of patients classified as responders divided by the number of patients randomised to the respective group) and their 95% confidence intervals for the analysis of treatment response. Rate ratios greater than 1 indicate better response in the Hypericum group. The main outcome measure for the safety analysis was the number of patients who dropped out because of adverse effects. Secondary measures were the total number of patients who dropped out and the number of patients reporting adverse effects. Because of the highly variable frequency of side-effects or adverse effects reported, odds ratios instead of rate ratios were calculated. Odds ratios less than 1 indicate that fewer events occurred in the Hypericum group. We combined results on the rate ratio or odds ratio using fixed or random effects models, using the Cochrane Collaboration's Review Manager Software 4.1 (Update Software, Oxford, UK). In addition, meta-regression analyses were performed using Stata 8.0 (Stata Corporation, College Station, TX, USA). To investigate the degree of between-trial heterogeneity, the chi-squared test was performed and I squared (Higgins et al, 2003) and tau squared (Thompson & Sharp, 1999) were calculated. A statistical test of funnel plot asymmetry, which may indicate the presence of publication bias, was performed (Egger et al, 1997). The extent to which one or more study-level variables explained heterogeneity in the treatment effects was then explored by fitting random effects meta-regression models (Thompson & Sharp, 1999; Sterne et al, 2001). The following variables were entered in the model: type of depression (major depression v. other); severity of depression (HRSD scores at baseline; as both the 17-item and the 21-item HRSD scales were used, baseline scores were standardised by multiplying the scores from the 21-item scale by 0.81 (17/21)); dosage of Hypericum extract (mg per day); type of extract (LI 160 v. other); study location (German-speaking Europe v. other); study location (German-speaking Europe v. other), study duration (weeks); and year of publication. Two variables relating to the quality of trials were also included (whether or not an adequate method of allocation concealment was described, and whether or not patients dropping out were reported). Finally, we included the variance of the rate or odds ratio to explore the importance of small-study effects (the tendency for smaller studies to show larger treatment effects; Sterne et al, 2001). For reasons of simplicity more precise studies (trials with smaller variance) are described in the results as larger trials, less precise studies as smaller trials.
Identification of eligible trials
Of 68 possible trials, 37 trials met inclusion criteria and contributed 26 comparisons with placebo and 14 comparisons with standard antidepressants (Fig. 1). We excluded 18 trials that involved either healthy volunteers (Herberg, 1991; Johnson et al, 1992, 1993; Schmidt et al, 1993; Schulz & Jobert, 1993; Staffeldt et al, 1993; Brockmöller et al, 1997; plus one unpublished trial by Wienert et al, described at the Third Phytotherapy Congress in Lübeck-Travemünde in 1991) or patients without depression (Bendre & Dharmadhikari, 1980; Panijel, 1985; Albertini, 1986; Werth, 1989; Dittmer, 1992; Maisenbacher et al, 1995; Häring et al, 1996; Hottenrott et al, 1997; Sindrup et al, 2000; Volz et al, 2002); five that lacked placebo or standard antidepressant control groups (Spielberger, 1985; Martinez et al, 1993; Lenoir et al, 1999; Zeller, 2000; plus one unpublished trial by Bernhardt et al described at the Fifth Phytotherapy Congress in Bonn in 1993); two that only measured physiological outcomes (electroencephalograph) (Czekalla et al, 1997; Kugler et al, 1990a), two that were not masked (Warnecke, 1986; Kugler et al, 1990b), and three that tested combinations of Hypericum and other plant extracts (Steger, 1985; Ditzler et al, 1994; Hiller & Rahlfs, 1995). Among the 30 excluded trials, seven had been included in previous versions of our reviews. We were unable to obtain the report of one trial (Agrawal et al, 1994) and only had a report from an oral presentation for another: anonymous (2000) on a study by Bjerkenstedt et al. The latter trial was included in the descriptive review but not in meta-analyses. One trial was available only as a thesis (König, 1993). Published abstracts of two trials were supplemented with additional information from an author (Osterheider et al, 1992), and a detailed hand-out and additional information from a sponsor (Montgomery et al, 2000). Overall, we obtained additional information from authors, sponsors or both for 31 trials.
Twenty-six trials involving 3320 patients had placebo-control groups (Table 1). Twenty-one originated from German-speaking countries (Germany, Austria and Switzerland), two from the USA and one each from the UK, France and Sweden. The latter five trials, as well as eight trials from German-speaking countries, were restricted to patients with a diagnosis of major depression according to DSM (III or later) (American Psychiatric Association, 1980, 1987, 1994) or ICD-10 (World Health Organization, 1993) criteria. Severity of depression was classified as mild to moderate in most trials.
Older trials differed from more recent ones in several respects (Table 2). Older trials were exclusively performed in German-language countries. Newer trials had larger sample sizes, were of longer duration and more often used a placebo run-in design. Newer trials also were more often restricted to patients who met criteria for major depression, and tended to include patients with more severe depression (i.e. higher scores on depression scales). Indicators of methodological quality and daily dosage also were slightly higher in more recent trials.
Of 24 trials with data on response to treatment, 21 used HRSD scores to characterise response, but definitions of response were not uniform across trials (see Table 1). One trial (Osterheider et al, 1992) was excluded from pooled analyses because no response occurred in either group. For the remaining 23 trials responder rate ratios were heterogeneous (I2=75.4%, τ2=0.191, P<0.0001) and the funnel plot asymmetric (P<0.0001, Fig. 2). In univariate meta-regression analysis, larger trials with smaller variances of rate ratios (P<0.0001), trials limited to patients with major depression (P=0.026) and trials enrolling patients with higher HRSD scores (P=0.010) showed smaller treatment effects. Other factors associated with smaller treatment effects included more recent year of publication (P=0.001), origin from a non-German-speaking country (P=0.005) and longer trial duration (P=0.005). There was little evidence for an association of response with the daily dosage (P=0.33), the type of extract (P=0.74) or indicators of trial quality (method of concealment, P=0.15; reporting on drop-outs, P=0.12).
A bivariate model, which included the two variables related to our a priori hypotheses (type of depression and variance of rate ratio), explained a large proportion of between-trial heterogeneity (reducing τ 2 from 0.191 to 0.030). The results from this model are illustrated in Figure 3, which shows a fixed-effects meta-analysis stratified by type of depression (major v. other) and precision (above or below median of variance). In the six smaller trials that were restricted to patients with major depression, the combined response rate ratio was 2.06 (95% CI 1.65-2.59), whereas in the six larger trials it was 1.15 (95% CI 1.02-1.29). In trials not restricted to patients with major depression, the rate ratio was 6.13 (95% CI 3.63-10.38) in five smaller trials and 1.71 (95% CI 1.40-2.09) in six larger trials.
Response rates in both placebo and intervention groups changed over time (Fig. 4). Weighted linear regression analysis shows that response rates in the placebo groups increased by 1.5% per year (P=0.013), whereas rates decreased in the Hypericum groups by 1.1% per year (P=0.049).
Comparisons with standard antidepressants
Fourteen trials with a total of 2283 patients compared Hypericum extracts with standard antidepressants (Table 3); 13 provided sufficient data for efficacy and safety analyses. In six of these, the comparator drug was a selective serotonin reuptake inhibitor (SSRI; fluoxetine in four studies, sertraline in two). Eight studies were performed in German-speaking countries. All trials but one were restricted to patients with a diagnosis of major depression according to DSM or ICD-10 criteria. Responder rates were similar among patients receiving Hypericum extracts and those receiving standard antidepressants, with little evidence of between-trial heterogeneity (I2=4.2%, P=0.40) or funnel plot asymmetry (P=0.55). Combining trials using a fixed effects model gave a responder rate ratio of 1.01 (95% CI 0.93-1.10) for all 13 trials, a rate ratio of 1.03 (95% CI 0.93-1.14) for seven trials comparing Hypericum extracts with older antidepressants, and a rate ratio of 0.98 (95% CI 0.85-1.12) for six trials comparing Hypericum extracts with SSRIs (Fig. 5). In meta-regression analysis there was some evidence (P=0.033) that Hypericum extracts showed better results in the eight trials from German-speaking countries (RR 1.05, 95% CI 0.95-1.16) whereas in the five trials from other countries standard antidepressants were slightly more effective (RR 0.85; 95% CI 0.71-1.01).
In all safety analyses there was little evidence of between-trial heterogeneity or funnel plot asymmetry. Comparing Hypericum extracts with placebo, there was a trend for fewer patients to drop out for any reason (OR 0.83, 95% CI 0.64-1.06), fewer to drop out because of adverse effects (OR 0.60, 95% CI 0.28-1.30) and less reporting of adverse effects (OR 0.79, 95% CI 0.61-1.03) among patients receiving Hypericum. In a comparison with standard antidepressants, patients on Hypericum extracts were less likely to drop out (OR 0.65, 95% CI 0.46-0.92), to drop out owing to adverse effects (OR 0.25, 95% CI 0.14-0.45; Fig. 6) and to report adverse effects (OR 0.39, 95% CI 0.31-0.50). There was a trend towards a lower probability of dropping out because of adverse effects (OR 0.60, 95% CI 0.31-1.15; Fig. 6) and lower reporting of adverse effects (OR 0.75, 95% CI 0.52-1.08) for patients treated with Hypericum extracts compared with patients treated with SSRIs. The proportions of patients dropping out for any reason did not differ (OR 0.95, 95% CI 0.65-1.40).
In this updated meta-analysis, we found that Hypericum perforatum extracts improved symptoms more than placebo and similarly to standard antidepressants in adults with mild to moderate depression. However, pooled analysis of six recent, large, more precise trials restricted to patients with major depression showed only minimal benefits of Hypericum extract compared with placebo. Hypericum extracts caused fewer adverse effects than older antidepressants, and might have caused slightly fewer adverse effects than SSRIs.
We cannot rule out the possibility that selective publication of over-optimistic results in small trials explains our finding that the older trials more often had positive results than the newer ones, although we doubt that this is the case. Extensive searches identified three ‘ negative’ trials that were published only as abstracts or theses (Osterheider et al, 1992; König, 1993; Montgomery et al, 2000). However, we suspect that there are few (if any) additional unpublished trials; the five manufacturers whose products were tested in most of the trials told us they had no other unpublished research that met our criteria, apart from three trials currently being analysed or in the publication process.
We found no systematic difference between trials in major factors generally related to trial quality, but our subjective judgement was that more recent trials were of better overall quality than older trials. All trials were double-blind. Although adequacy of blinding was usually not formally assessed, achieving similarity between Hypericum extract and placebo preparations is not particularly difficult. Most trials concealed allocation assignments by using consecutively numbered identical medication containers, and drop-out rates were generally low. Some investigators in older trials might have had little experience with diagnostic standards and rating scales (Shelton et al, 2001), but even so such inexperience is unlikely to have biased findings in double-blind trials.
Newer trials more often included only patients with documented major depression and patients with higher HRSD values at baseline. Two of the newer trials from the USA (Shelton et al, 2001; Hypericum Depression Trial Study Group, 2002) included large proportions of patients who had been suffering from their current depressive episode for more than 2 years. Older trials were more often carried out in German-speaking countries where extracts are registered as drugs. Primary care physicians in these countries use Hypericum extracts mainly in patients with mild to moderate depressive complaints and use standard antidepressants in patients with more severe and/or long-lasting depression. Accordingly, older trials often included patients with neurotic depression (ICD-9 code 300.4; World Health Organization, 1977) or brief depression (309.0). Some explicitly excluded patients with a current depressive episode lasting longer than 6 months (Hänsgen & Vesper, 1996; Volz et al, 2000). Older trials could have involved more patients with atypical depressive features and somatisation, whereas newer trials could have involved more patients with melancholic symptoms who might be diagnosed as suffering from endogeneous depression according to ICD-9 (Murck, 2002). If so, newer trials might have excluded groups that are particularly responsive to Hypericum extract.
Response rates observed in trials have changed over time. In trials of standard antidepressants, response rates increased over the past 20 years among both treatment and control groups (Walsh et al, 2002). In trials of Hypericum v. placebo, response rates in the placebo groups increased markedly over time, whereas response rates in the Hypericum groups decreased slightly over time. Explanations for these changes over time are not clear, but older trials with unusually low placebo response rates are likely to provide overoptimistic estimates of the benefits of Hypericum.
Most trials that compared Hypericum extracts with standard antidepressants were restricted to patients with major depression. They showed that Hypericum extracts and older and newer antidepressants had similar efficacy. Do these findings contradict those of the recent placebo-controlled Hypericum trials and prove the efficacy of these extracts in patients with major depression? We do not believe so. Although summary estimates of trials comparing antidepressants with placebo consistently show that antidepressants are better than placebo in treating major depression (Williams et al, 2000), a relevant proportion of placebo-controlled trials show no statistically significant benefits of antidepressants (Khan et al, 2000; Kirsch et al, 2002). It is possible that patients in the trials comparing Hypericum extracts with standard antidepressants did not benefit from either the extracts or the antidepressants. Several of the older trials used low dosages of standard antidepressants. More recent trials used dosages generally considered adequate, but still in the lower range of recommended dosages. Theoretically, the dosages used in the trials could have led to underestimates of the efficacy of standard antidepressants, although meta-analyses do not conclusively show that higher doses of standard antidepressants are more effective than lower doses (Furukawa et al, 2002; Kirsch et al, 2002). Three trials of Hypericum included both a placebo and a standard antidepressant control group; however, one of these is not fully published yet (Anonymous, 2000). One trial (Philipp et al, 1999) showed that Hypericum extract and standard antidepressants had similar efficacy and that both were superior to placebo, whereas the other (Hypericum Depression Trial Study Group, 2002) showed no statistically significant difference between any of the groups.
In summary, accumulating evidence regarding the efficacy of Hypericum extracts is complex. We believe that the heterogeneous findings of placebo-controlled trials of these extracts are partly due to an overestimation of their effects in smaller, older studies, and partly to variable efficacy of the extracts in different patient populations. Even though most available comparisons between Hypericum extracts and standard antidepressants suggest similar effects, we believe that current best evidence from placebo comparisons suggests only minor benefits of Hypericum in patients with major depression and no benefit in patients with prolonged duration of depression. There is no evidence about effectiveness in severe depression. We found that current best evidence, derived primarily from older studies in German-speaking countries in primary care settings, still suggests benefits in patients with mild to moderate depressive symptoms who do not necessarily meet criteria for major depression.
Many patients buy St John's wort products from health-food stores and might not disclose this to their physicians. Such uncontrolled use is problematic, because serious interactions can occur with a number of frequently used drugs: see systematic reviews by Hammerness et al (2003) and Knüppel & Linde (2004). Physicians should therefore regularly ask their patients about their Hypericum intake. Also, the quality of Hypericum preparations can differ considerably, and a number of products contain only minor amounts of bioactive constituents (Wurglics et al, 2003). Products that do not provide important information on the content, such as the amount of total extract (e.g. 900 mg), the extraction fluid (e.g. methanol 80% or ethanol 60%) and the ratio of raw material to extract (e.g. 3-6:1) should be avoided. Finally, current best evidence regarding efficacy of Hypericum extracts is not definitive. Mechanisms and specificity of actions of single components need further study. Ultimately, more trials that compare specific extracts with both placebo and standard synthetic antidepressants in clearly defined patient populations with and without major depression are needed.
We thank authors and manufacturers who provided additional information.
- Received November 6, 2003.
- Revision received August 11, 2004.
- Accepted September 21, 2004.
- © 2005 Royal College of Psychiatrists