Background Repetitive transcranial magnetic stimulation (rTMS) may be useful in the treatment of depression but results from trials have been inconclusive to date.
Aims To assess the efficacy of rTMS in treating depression.
Method We conducted a systematic review of randomised controlled trials that compared rTMS with sham in patients with depression. We assessed the quality of design of all studies and conducted a meta-analysis of data from trials with similar rTMS delivery.
Results We included a total of 14 trials. The quality of the included studies was low. Pooled analysis using the Hamilton Rating Scale for Depression showed an effect in favour of rTMS compared with sham after 2 weeks of treatment (standardised mean difference=–0.35; 95% CI –0.66 to –0.04), but this was not significant at the 2-week follow-up (standardised mean difference=-0.33; 95% CI –0.84 to 0.17).
Conclusions Current trials are of low quality and provide insufficient evidence to support the use of rTMS in the treatment of depression.
Repetitive transcranial magnetic stimulation (rTMS) is a non-invasive technique used to stimulate the human brain in vivo using very strong, pulsed magnetic fields. The technique involves the delivery of a magnetic pulse to the cortex of a subject through a hand-held stimulating coil applied directly to the head. The magnetic pulses pass unimpeded through the skull and induce an electrical current in the underlying tissue, which in turn is able to depolarise neurons (Hallet, 2000). Given its relative non-invasiveness, its potential to stimulate very focused areas of the brain and indications that it may have therapeutic effects in neuropsychiatric disorders, especially affective disorders ( George et al, 1999), rTMS has been the focus of considerable research and clinical interest in recent years. Clinical interest originates mainly from about 15 placebo-controlled clinical studies involving around 200 subjects with depressive disorder or bipolar disorder in the depressed phase. It has been shown that rTMS has effects on the brain ( Ji et al, 1998; Keck et al, 2000), but whether its properties are clinically useful and constitute meaningful alternatives to currently available treatments remains to be determined. Today rTMS presents an interesting and potentially promising technique but to our knowledge there has been no systematic evaluation of its efficacy using meta-analysis techniques. We undertook a systematic review of all available randomised trials and conducted a meta-analysis of relevant data to assess the efficacy of rTMS in treating depression.
Identification of studies
We searched Medline (1966–March 2002), Embase (1974–March 2002). PsycLit (1980–2001), the Register of Clinical Trials of the Cochrane Collaboration Depression, Neurosis and Anxiety Review Group (January 2002) and the Cochrane Controlled Trials Register (January 2002) using the search terms MAGNETIC-STIMULATION, TMS, rTMS, DEPRESSION, DEPRESSIVE DISORDER and DYSTHMIC DISORDER and included papers published in all languages. Where possible, we contacted authors of identified randomised controlled trials (RCTs) for additional information or other relevant studies.
Studies included were randomised trials that compared rTMS given at any frequency and at any localisation with a sham intervention in patients of any age and gender with a diagnosis of depression (depressive disorders or bipolar disorders in depressed phase), with or without psychotic symptoms according to either DSM–IV ( American Psychiatric Association, 1994) or ICD–10 ( World Health Organization, 1993).
Selection procedure, data extraction and quality assessment
Potentially relevant studies were obtained, examined independently and quantitative and qualitative data were extracted independently using a standard form.
Our quality assessment of the studies addressed three main criteria: adequate concealment of randomisation; intention-to-treat analysis; and blinding. To assess the adequacy of randomisation concealment, we looked for evidence from the study report of robust concealment of group allocation, such as a centralised system or a process in which allocations were pre-numbered, coded and kept in locked files or in sequentially numbered, sealed, opaque envelopes ( Clarke & Oxman, 2001). For intention-to-treat analysis we looked for evidence that all patients initially randomised had been included in the analysis, regardless of whether they had completed the study or not. We also looked for post-treatment follow-up. With respect to blinding, for practical reasons the professional giving the rTMS intervention itself (whether active rTMS or sham), cannot be blinded. If the patients had been blinded to the treatment allocation, and the outcomes had been assessed either by an assessor who was also blinded to the allocation or by the patient themselves, we classified the trial as being single blind with evaluation by external assessors.
We also looked at details of each trial design and noted whether there had been factors such as concurrent medication or therapeutic setting that may have influenced health outcomes and consequently the apparent performance of the interventions.
The main outcome measure was remission of symptoms, determined by any of the following measures: time to adjunctive treatment; readmission to hospital or hospital discharge; time off work; or appropriate psychometric scales. Acceptability of treatment (as measured by withdrawals from trial) was considered as a secondary outcome.
We undertook a methodological quality assessment of all the included studies. We conducted a pooled analysis of data from those trials in which the intervention given was homogeneous (same localisation, frequency and duration of treatment), using scores from the Hamilton Rating Scale for Depression (HRSD; Hamilton, 1960, 1967), because this psychometric scale was the only outcome measure that was reported by all the studies. In addition we conducted a second pooled analysis of data from those studies with homogeneous interventions that had used the Beck Depression Inventory (BDI; Beck et al, 1961) as a secondary outcome measure. A third pooled analysis was conducted for treatment and acceptability (measured by withdrawals).
In the cross-over studies we excluded a possible carry-over effect between the different phases of the trials by using information only from the first phase ( Jadad, 1998). For continuous data the studies included in the pooled analysis were tested for statistical homogeneity using a χ2 test and, because homogeneity was found, the pooled standardised mean difference was calculated under a fixed-effect model weighted by the inverse variance method ( Cochrane Collaboration, 2000; Sutton et al, 2000). For binary outcomes the relative risks were calculated using a Mantel–Haenszel fixed-effect model ( Cochrane Collaboration, 2000; Sutton et al, 2000) and 95% confidence intervals were calculated. Standardised mean difference ( Geddes et al, 2002) rather than weighted mean difference was used in the pooled analysis to take account of the different versions of HRSD and BDI used in the different studies.
We identified 85 references, from which we excluded 48 (see Appendix 1 and Fig. 1): seventeen trials with no control group ( Höflich et al, 1993; George et al, 1995, 1998; Geller et al, 1997; Epstein et al, 1998; Feinsod et al, 1998; Figiel et al, 1998; García et al, 1998; Menkes et al, 1999; Nahas et al, 1999; Pridmore, 1999; Pridmore et al, 1999; Reid & Pridmore, 1999; Schouten et al, 1999; Triggs et al, 1999; Conca et al, 2000; Zheng, 2000); nine review articles ( Markwort et al, 1997; George et al, 1999a, b; Pridmore & Belmaker, 1999; Tormos et al, 1999; Hansen, 2000; Krystal et al, 2000; Szuba et al, 2000; Walter et al, 2001); seven trial reports whose efficacy outcomes had been published elsewhere ( Koppi et al, 1997; Teneback et al, 1999; Kozel et al, 2000; Little et al, 2000; Speer et al, 2000, Speer et al, 2000; Loo et al, 2001; Moser et al, 2002); nine studies on healthy volunteers ( George et al, 1996; Pascual-Leone et al, 1996a; Bohning et al, 1999; Clark et al, 2000; D'Alfonso et al, 2000; Mosimann, 2000; Loo et al, 2000; Habel et al, 2001); one with no report of any randomisation process ( Stikhina et al, 1999); one descriptive study ( Turnier-Shea et al, 1999); two with outcomes other than depression ( Grisaru et al, 1998; Nahas et al, 2000); and two in which rTMS was given after a previous intervention of sleep deprivation ( Eichhammer et al, 2002; Padberg et al, 2002). We also excluded a further sixteen studies that were either still in progress without completed quantitative data available or for which we are awaiting data ( Shajahan, 2000; Woodruff, 2000; the Avery–George–Hotzheimer database of rTMS Depression Studies, at http://www.ists.unibe.ch/ists/TMSAvery.htm).
We excluded four RCTs and one control clinical trial (CCT) from the identified trials because there was either no sham comparison group ( Conca et al, 1996; Grunhaus et al, 2000; Pridmore, 2000; Pridmore et al, 2000) or because, although a sham group was included alongside two randomly allocated active treatment groups the sham group itself had not been generated by a randomisation process ( Kolbinger et al, 1995). A detailed analysis of these five studies was included as part of a wider systematic review ( Martin et al, 2002).
Among the remaining sixteen randomised controlled studies there was clinical heterogeneity with respect to four variables: localisation of rTMS application (left dorsolateral prefrontal cortex, right prefrontal cortex, vertex or multiple sites; frequency of rTMS (high or low); duration of treatment–10 consecutive working days (2 weeks) or 5 consecutive working days (1 week); and number of interventions a day (one or more) (see Fig. 1). Two studies ( Pascual-Leone et al, 1996b; Speer et al, 2001) are awaiting evaluation of design data and methodological quality to be included, and additional quantitative information is needed for these studies to be analysed.
A total of thirteen published (George et al, 1997, 2000; Avery et al, 1999; Kimbrell et al, 1999; Klein et al, 1999; Loo et al, 1999; Padberg et al, 1999; Berman et al, 2000; Eschweiler et al, 2000; García-Toro et al, 2001a, b; Manes et al, 2001; Szuba et al, 2001) and one study in preparation (further details available from the authors upon request) met the inclusion criteria for assessing the effectiveness of rTMS v. a sham intervention (see Appendix 2). The majority of these studies (13/14) compared left-sided, high-frequency rTMS (left–high) with a group receiving sham, whereas one study ( Klein et al, 1999) compared right-sided, low-frequency rTMS for 2 weeks (right–low–2). Treatment duration was for 2 weeks in nine of the left–high studies (left–high–2) and for 1 week (left–high–1) in the remaining three studies. Among the 12 left–high studies, all used the HRSD as a primary indicator of efficacy, whereas nine (seven with available data) also used the BDI as a secondary outcome. A quantitative analysis of pooled data from the left–high–1 and left–high–2 studies on each of these outcome scales was possible. Two studies ( Kimbrell et al, 1999; Padberg et al, 2002) included a third left-sided, low-frequency (left–low) comparison arm. Because of differences in the nature of the intervention applied with respect to localisation and frequency, these two comparisons (left–low–1 and left–low–2) were not included in the quantitative analysis, but were included in the qualitative review along with the one right–low–2 study and with one study that compared different doses of rTMS per day ( Szuba et al, 2000).
The mean age of study participants ranged from 41.8 to 60.87 years and the ratio of males to females ranged from 0.09 to 2.33 ( Table 1). Thirteen of fourteen studies included in the qualitative review recruited patients who only fulfilled the criteria for major depression or major depressive illness as classified by DSM–IV criteria. Only one study recruited patients with criteria that included minor depression ( Manes et al, 2001). Some studies recruited only patients with unipolar depression whereas others also recruited patients with bipolar depression in the depressed phase. Almost all studies specified that patients who were at a high risk of suicide and/or possible risk of convulsions were excluded.
Quality of included studies
Most of the studies were of low methodological quality. Apart from one study with 70 patients ( Klein et al, 1999), the rest used sample sizes of 6–40 patients (median=19).
Most studies gave only general descriptions of the randomisation process and none described the methods of concealing allocation. One study ( Klein et al, 1999) described only the generation of the allocation sequence through randomised lists of numbers generated by a computer program and another (further details available from the authors upon request) reported a ‘generation of randomised numbers’ without giving details of the allocation process involved.
Although there were withdrawals from six of the included thirteen studies, only two studies ( Berman et al, 2000; Eschweiler et al, 2000) undertook an intention-to-treat analysis by including the last observation carried forward in the analysis. Three studies ( Avery et al, 1999; García-Toro et al, 2001a, b) included a period of post-treatment follow-up of 2 weeks, one study included a period of post-treatment follow-up of 1 week ( Manes et al, 2001) and another study ( Eschweiler et al, 2000) included a period of post-treatment follow-up of 1 week between the first and second phase of the cross-over design. One study used the post-treatment follow-up for only one patient who responded totally and for three who responded partially, but it did not report on the rest of the patients treated in the study ( Berman et al, 2000).
Although most of the studies stated that they were double blind or double masked, they were, more accurately, single blind with evaluation by external assessors. Nine (seven with available data) studies also used the BDI, in which the patients themselves evaluated their response to treatment.
Only three ( Berman et al, 2000; Manes et al, 2001; Szuba et al, 2001) of the further studies stated that the patients were all free of psychotic medication for 1 week before the study and during the study period itself. In seven of fourteen studies the patients were described as medication resistant (failed at least one trial of pharmacological therapy during the current depressive episode) but in some cases pharmacological treatments were continued and in some cases not ( Table 2).
Although most studies stated that they excluded patients at a high risk of suicide, some studies recruited only out-patients (George et al, 1997, 2000; Avery et al, 1999; Manes et al, 2001; further details available from the authors upon request), others recruited only in-patients ( Klein et al, 1999; Loo et al, 1999), some both in-patients and out-patients ( Berman et al, 2000; García-Toro et al, 2001b) and others did not specify ( Kimbrell et al, 1999; Padberg et al, 1999; Eschweiler et al, 2000; García-Toro et al, 2001a; Szuba et al, 2001).
Repetitive TMS (left dorsolateral prefrontal cortex and high frequency) v. sham TMS
Hamilton Rating Scale for Depression. Twelve studies contributed to this analysis, giving an overall sample of 217 patients (119 in the treatment group and 98 in the placebo group). A subgroup analysis was conducted by duration of treatment (1 or 2 weeks) and for those studies that included follow-up data (at 1 or 2 weeks). After 2 weeks of treatment the standardised mean difference (SMD) for rTMS (left dorsolateral prefrontal cortex, high frequency) v. sham TMS was –0.35 (95% CI –0.66 to –0.04; P=0.03; n=9), showing a difference in favour of rTMS. For those studies that reported data after 1 week of treatment or only gave treatment for 1 week, the SMD for rTMS (left dorsolateral prefrontal cortex, high frequency) v. sham rTMS was not significant, at –0.18 (95% CI –0.64 to 0.27; P=0.4; n=5). After 1 week of post-treatment follow-up, the SMD was 0.08 (95% CI –0.64 to –0.81; P=0.8; n=2). After 2 weeks of post-treatment follow-up, the SMD was not statistically significant: –0.33 (95% CI –0.84 to 0.17; P=0.2; n=3) ( Fig. 2).
Beck Depression Inventory. Seven studies contributed to this analysis, giving an overall sample size of 145 patients (81 in the treatment group and 64 in the placebo group). No difference between rTMS and sham TMS was shown for any of the time periods. After 1 week of treatment, the SMD for the rTMS over the left dorsolateral prefrontal cortex and high-frequency v. sham TMS was 0.18 (95% CI –0.47 to 0.82; P=0.6; n=3). The SMD after 2 weeks of treatment was –0.24 (95% CI –0.58 to 0.11; P=0.18; n=6). The SMD after 2 weeks of post-treatment follow-up was –0.06 (95% CI –0.56 to 0.43; P=0.8; n=3) ( Fig. 3).
The analyses were repeated using a random-effects model, but this did not alter the results.
Acceptability of treatment. Four studies (left–high–2) reported withdrawals of patients during the intervention period, with a total sample size of 114 patients (63 in the treatment group and 51 in the placebo group). The relative risk, using a fixed-effect model for rTMS v. sham rTMS for all patients was 0.88 (95% CI 0.37 to 2.13; P=0.8), which is a statistically non-significant difference.
We found that there is currently insufficient evidence to suggest that rTMS is effective in the treatment of depression and that the trials conducted to date have been of relatively low quality. These results do not, however, exclude the possibility that the intervention may be of benefit.
Although the SMD between active treatment and sham groups was significant in favour of the active group when measured by the HRSD immediately after 2 weeks of treatment with left-sided, high-frequency rTMS, this difference was not corroborated by a significant difference in the BDI. Furthermore, analysis of the results of those studies that tested patients at 2 weeks after the intervention period showed that any differences between the two groups had disappeared. Equally, analysis of data from trials that provided results after 1 week of treatment showed no significant effect.
The included studies all had serious methodological weaknesses. Of particular note was the small sample size (median=19), a factor that is known to introduce bias because uncontrolled variables that may influence outcomes may not be sufficiently evenly distributed between treatment and control groups ( Colton, 1974). In all except three of the studies, all or a proportion of the patients (in both treatment and control groups) enrolled in the trials were on some form of psychotropic medication. Although, in some cases, the authors stated that patients were ‘medication resistant’ the definition of resistant was unclear and the potential for concurrent medication to interfere with the possible performance of the rTMS intervention cannot be discounted.
None of the included studies provided information in the published report on the method of allocation concealment used. One of the major sources of selection bias in randomised trials is failure to conceal adequately the group to which a particular patient has been assigned until after that patient's eligibility for the trial has been assessed ( Clarke & Oxman, 2001). Indeed, lack of allocation concealment has been reported to cause more bias than other components of the allocation process ( Schulz et al, 1995). For example, in certain RCTs the patients who are most likely to respond are included only in the active treatment arm ( Berger & Exner, 1999).
The person who gives an intervention such as rTMS obviously cannot be blinded as to whether they are actually administering the active treatment or a sham intervention. It is therefore more accurate to consider the trials as having been single blind with an evaluation by external blinded assessors ( Martin & Casado Collado, 2002). Although, as Day ( 2000) has observed, blinding of outcome assessment may be more important than blinding administration, there is nevertheless potential for patients to guess their group allocation through non-verbal (albeit unintentional) communication with the administrator of the intervention. A further threat to the efficacy of the blind arises from the nature of the sham intervention. As recent authors have commented ( Wassermann & Lisanby, 2001), depending on the way in which the sham is delivered, the physical sensation experienced can differ when receiving sham and active treatment, effectively unblinding the patient.
Measurement of treatment outcomes in depression is difficult and most clinical studies have made use of scales or inventories, of which the most common is the HRSD, based on a semi-structured interview. Some authors ( Hotopf et al, 1999) have reported that rating scales based on semi-structured interviews are more susceptible to observation bias than are self-applied questionnaires such as the BDI. The lack of consistency in effect as determined by the two scales–a positive result after 2 weeks of treatment as measured by the HRSD and a negative result for the BDI–makes definitive conclusions about the nature of the change in mood of the patients impossible. Because of difficulties with interpreting results from psychometric scales ( Rosenberg, 2000) and the subjective or unstable character of this psychopathology, the use of other more objective outcome measures such as readmissions to hospital, time to hospital discharge, time to adjunctive treatment and time off work should be taken into account in the assessment of rTMS in the treatment of depression.
The complexity of the possible combinations for administering rTMS makes the comparison of like with like particularly difficult. For our meta-analysis we categorised the three main variations in administration method: localisation of the intervention on the skull; frequency given; and the duration of the treatment period. In the majority of included studies, rTMS was applied to the left dorsolateral frontal cortex, but it has been pointed out recently that the method for precisely targeting the stimulation in this area is inherently unreliable ( Wassermann & Lisanby, 2001). Evidence that this is the optimal localisation is also lacking. With respect to the frequency given, we classified into high (>1 Hz) and low (≤1 Hz) frequency, according to customary practice. Although localisation, frequency and treatment duration were the main variations, other potential differences in the administration of rTMS that we did not categorise include shape of the coil, number of trains per session and the duration of each train.
Data analysis considerations
In eight of the twelve studies included in the meta-analysis for the HRSD and in six of the seven included in the meta-analysis for the BDI, the baseline mean values for the severity of depression were higher in the treatment group than in the placebo group. Although these differences were not statistically significant at the level of each individual study, they would have introduced a potential bias within the meta-analysis of pooled data by accentuating the tendency for regression to the mean of the more extreme values ( Davis, 1976). Our study was limited because individual patient data were not available from all the studies and an appropriate adjustment according to baseline severity was not possible. In order to reduce, as much as possible, any potential bias caused by these differences in baseline values, we compared final values on depression severity between active and sham groups.
Before our study, a meta-analysis ( McNamara et al, 2001) that included five studies found demonstrable beneficial effects of rTMS in depression. Our findings differ from this earlier paper with respect to the main unit of analysis: this earlier study used the difference in an undefined rate of improvement between groups from psychometric scales used in the trials. In our meta-analysis we used the means and standard deviations because we considered that, owing to probable baseline imbalance between the studies, these estimates reflect a more precise effect size than a dichotomous measure such as the rate of improvement apparently derived from the continuous data of the rating scales.
Consequences of the weak findings about rTMS
Repetitive TMS is a relatively affordable method of applying magnetic fields non-invasively to the human brain. If safety precautions are followed, it also appears to be safe, at least when given within the parameters studied so far: between 1 and 4 weeks of treatment. The non-invasive nature of the intervention has been among the factors that have led to the impetus to research possible therapeutic effects in the treatment of depression and especially in refractory depression, because few other (and certainly no non-invasive) treatment options are currently available. The results of our systematic review show that results to date are not very encouraging.
But this should not be a reason to abandon rTMS in affective disorders altogether. Many of the clinical treatments now used successfully in psychiatry have developed slowly, going through a process of initial enthusiastic approval followed by almost total demise and then back to sensible, widespread clinical use. Electroconvulsive therapy–another method of brain stimulation in affective disorders–underwent this very process. Evidence shows that rTMS has effects on the brain and it therefore has great potential as a research tool ( Hallett, 2000; Lisanby et al, 2000). Data from animal studies demonstrate effects on expression of immediate early genes ( Ji et al, 1998) and on neuroendocrinology ( Keck et al, 2000), and rTMS either alone or combined with procedures such as functional neuroimaging ( Speer et al, 2000, Speer et al, 2000) may be useful for testing functional connectivity, neuroplasticity and information processing. Repetitive TMS therefore can be used to test either general hypotheses concerning brain function at different levels or hypotheses concerning the underlying pathology of affective and other neuropsychiatric disorders. It is worthy of note that one of the only two studies included in the meta-analysis in which all the patients were free of medication before and during the rTMS trial was also the only individual study that showed a statistically significant positive effect on the HRSD for the intervention group.
Today, the total number of patients included in studies of the efficacy of rTMS in the treatment of depression falls far short of the numbers registered in trials for new drug treatments. In addition, many technical details, such as where to stimulate, at what frequency, the total number of stimuli and the duration of the treatment, have yet to be resolved. There is an urgent need for thorough, randomised, controlled, multi-centre studies involving large numbers of patients. Another problem is the lack of consensus about the possible explanatory mechanisms for any anti-depressant effects of TMS, but this is also the case for many other treatments in psychiatry. Repetitive TMS research is basically empirical: many variables play a role and a large number of parameters has to be explored carefully to find the most efficacious treatment.
Repetitive TMS clearly has effects on the brain, an observation that is remarkable in itself and it may well be that it is a treatment modality in search of a suitable application in psychiatry. It is of utmost importance, therefore, that the long and difficult path of research for potential clinical applications of rTMS in affective disorders should continue.
Clinical Implications and Limitations
The findings from this systematic review and meta-analysis provide insufficient evidence to suggest that repetitive transcranial magnetic stimulation (rTMS) is effective in the treatment of depression.
There are several confounding factors in the included studies that should be kept in mind before considering rTMS for clinical use.
The rTMS technique needs more high-quality trials to show its effectiveness for therapeutic use.
Lack of pragmatic variables in the studies, such as time to further treatment, time off work, readmission to hospital or hospital discharge.
Individual patient data from all the studies were not available and an appropriate adjustment according to baseline severity therefore was not possible.
Poor follow-up evaluation in the studies.
The reviewers are grateful for the advice and support of: the Instituto de Salud Carlos III, Madrid, Spain (grant number 00/10099); Xavier Bonfill and Montse Sacristán from the Iberoamerican Cochrane Centre and the Service of Clinical Epidemiology and Public Health, Hospital de la Santa Creu i Sant Pau (Barcelona, Spain); Natalie Khin and Hugh McGuire from the Cochrane Collaboration Depression Neurosis and Anxiety Review Group; Alfonso Casado from 3D Health Research (Barcelona, Spain); Julio Sánchez Meca from the Universidad de Murcia (Spain); and all the trialists who shared their data.
- Received August 2, 2002.
- Revision received November 11, 2002.
- Accepted December 3, 2002.
- © 2003 Royal College of Psychiatrists