Background Persistent impairments in neurocognitive function have been described in patients with bipolar disorder whose disease is in remission. However, methodological issues such as the effect of residual mood symptoms and hypercortisolaemia may confound such studies.

Aims To assess neurocognitive functioning in prospectively verified euthymic patients with bipolar disorder.

Method Sixty-three patients with bipolar disorder and a matched control group completed a comprehensive neurocognitive test battery. Euthymia was confirmed in the patient group by prospective clinical ratings over 1 month prior to testing. Saliva samples were collected to profile basal cortisol secretion.

Results Patients were significantly impaired across a broad range of cognitive domains. Across the domains tested, clinically significant impairment was observed in 3% to 42% of patients. Deficits were not causally associated with residual mood symptoms or hypercortisolaemia.

Conclusions Neurocognitive impairment persists in patients whose bipolar disorder is in remission. This may represent a trait abnormality and be a marker of underlying neurobiological dysfunction.

Converging evidence suggests that people with bipolar disorder exhibit persistent cognitive impairment across a range of tasks of attention, memory and executive function during remission (van Gorp et al, 1998; Ferrier et al, 1999; Rubinsztein et al, 2000; Clark et al, 2002; Tavares et al, 2003; Martinez-Aran et al, 2004). However, small sample sizes, the effects of residual mood symptoms and different rates of biological abnormalities such as hypercortisolaemia may confound such studies. Importantly, few studies have examined the magnitude of impairment in a meaningful way, i.e. as effect sizes or in terms of the proportion of patients with ‘clinically significant’ impairment. This study sought to address previous limitations by testing a large sample of well-characterised, prospectively verified euthymic patients with bipolar disorder on a comprehensive neuropsychological test battery. We predicted that patients would demonstrate clear neurocognitive impairment compared with healthy controls.



Sixty-three people with a DSM–IV diagnosis of bipolar affective disorder (American Psychiatric Association, 1994) were recruited from out-patient clinics in secondary and tertiary care in the north-east of England; 54 had bipolar type I disorder, 9 had type II and 5 were rapid-cycling. Diagnoses were confirmed using the Structured Clinical Interview for DSM–IV (SCID; First et al, 1997). Illness characteristics were derived from retrospective life charts constructed from patient interview and hospital medical records (Leverich & Post, 1996). Patients were excluded if they were taking corticosteroids or antihypertensive medication, had any other current Axis I diagnosis or had a neurological or medical condition. A history of substance or alcohol misuse in the past 6 months (6 patients met DSM–IV criteria for a previous history of alcohol dependence, 4 met criteria for a previous history of substance dependence and 1 patient met criteria for previous substance and alcohol dependence) or electro-convulsive therapy (ECT) in the past year also led to exclusion.

Euthymia was prospectively defined as scores of 7 or below on both the Hamilton Rating Scale for Depression (HRSD; Hamilton, 1960) and the Young Mania Rating Scale (YMRS; Young et al, 1978) at initial assessment and after 1 month. Patients also completed the Beck Depression Inventory (BDI; Beck et al, 1961) and the Altman Mania Rating Scale (AMRS; Altman et al, 1997) each week during the euthymia verification month. Saliva samples collected at 08.00 h, 12.00 h, 16.00 h and 20.00 h on the day before testing confirmed that patients were eucortisolaemic (i.e. exhibited normal rhythm and secretion of cortisol), as measured by directed disequilibrium radioimmunoassay. With the exception of 3 patients who were taking no medication, all patients were stabilised on prophylactic medication at test; 40 were receiving combination treatment. Demographic and clinical characteristics of the sample are presented in Table 1.

View this table:
Table 1

Demographic and clinical characteristics

For the control group, 63 healthy volunteers were recruited from the community by local advertisement. Controls were matched on an individual basis with patients for age (±5 years), gender, race, handedness (Briggs & Nebes, 1975), years of education (±3 years), and premorbid IQ (±5 IQ points; Nelson, 1982). Controls were screened for significant medical conditions and were excluded if they had a current or past psychiatric illness (confirmed by SCID) or a family history of affective disorders in a first-degree relative, or were taking any medication other than the oral contraceptive pill. Control participants completed the same clinical ratings as patients on the study day, 1 week after completing a pre-screen AMRS and BDI.

For all participants, historic and current substance use was assessed using DSM–IV criteria and a detailed inventory was derived from the major DSM–IV substance classifications. To exclude people with current alcohol misuse, participants had to have a current alcohol intake of less than 28 units per week for men and 21 units per week for women. The Modified Mini-Mental State Examination (Teng & Chui, 1987) was administered on the study day to screen for dementia. The local ethics committee approved the investigation.

Neuropsychological measures

Participants completed a comprehensive battery of neurocognitive tests spanning four broad cognitive domains. To control for the possible effects of diurnal variation on performance, cognitive testing commenced at 14.00 h. Tests were administered according to standard instructions and took about 2 h to complete. The tasks were given in the same order to the whole sample. The instruments administered for each domain were as follows:

  1. Psychomotor performance: the Vigil test (response latency) (Cegalis & Bowlin, 1991), Digit Symbol Substitution Test from the Wechsler Adult Intelligence Scale – Revised (WAIS–R; Wechsler, 1981) and the Trail Making Test part A (Reiten, 1958).

  2. Attention and executive function: Trail Making Test part B, Vigil (errors of omission and commission), Stroop Neuropsychological Screening Test (Trenerry et al, 1989), the Tower of London task from the Cambridge Neuropsychological Test Automated Battery (CANTAB, Cambridge Cognition Ltd, Cambridge, UK), the Controlled Oral Word Association Test (Benton & Hamsher, 1976), the Digits Backward sub-test from the WAIS–R, a computerised version of the Abstract Designs Self-Ordered Pointing Task (SOPT; Petrides & Milner, 1982) and the CANTAB Spatial Working Memory test.

  3. Immediate memory: the Digits Forward sub-test from the WAIS–R and the CANTAB Spatial Span.

  4. Declarative memory (visual and verbal): the CANTAB Pattern Recognition Memory and Spatial Recognition Memory tasks, CANTAB Simultaneous and Delayed Matching to Sample, CANTAB Paired Associates Learning test and the Rey Auditory Verbal Learning Test (RAVLT; Rey, 1964).

The SOPT and CANTAB tasks were presented on a 486 microcomputer fitted with a high-resolution 38 cm touch-screen monitor. Detailed descriptions of the CANTAB tasks are provided in Robbins et al (1997) and further details regarding the pen-and-paper measures in Lezak (1995).

Statistical analyses

Analyses were conducted using the Statistical Package for the Social Sciences, version 9 (SPSS, 1998). Data were first examined to see whether they fulfilled the assumptions for parametric analyses. Variables fulfilling these assumptions were analysed by independent samples t-test or analysis of variance (ANOVA), with group (patient or control) as the between-subject factor. For tests with more than one level and the cortisol data, an additional within-subject factor of ‘time’ or ‘problem level’ was added. Where sphericity was violated, within-subject degrees of freedom were adjusted using Greenhouse–Geisser or Huynh–Feldt corrections as appropriate. Adjusted P values are reported, although the original degrees of freedom are also reported for clarity. Data not fulfilling the assumptions of parametric analyses were either subjected to an appropriate transformation or analysed non-parametrically (Howell, 1997).

To calculate clinically significant performance impairments, the proportion of patients scoring on or below the fifth percentile was determined (i.e. – 1.64 standard deviations from the mean of the control sample). Estimates of effect size were calculated for untransformed data using the formula (μpatients–μcontrols)/σpooled (Howell, 1999); the first part of this equation was reversed for tasks where a high score indicates poorer performance (i.e. μcontrols–μpatients) to standardise the scoring schemes across tasks. All reported P values are two-tailed. To examine the impact of illness severity on neurocognitive performance, correlations between illness characteristics and neurocognitive test variables were calculated using Spearman’s method.


Demographic and mood data

There was no significant between-group difference across the demographic variables. On the clinical rating scales patients exhibited few symptoms during the euthymia verification period, although their scores were still significantly higher than the controls on most of the measures completed by both groups (Table 1).

Basal salivary cortisol measures

Basal salivary cortisol samples were collected from 54 people in the control group and 56 patients. Comparison between patients and controls illustrated the expected main effect of time (F(3,324)=99.18, P<0.0001) but no main effect of group (F(1,108)=1.13, P=0.29) or group×time interaction (F(3,324)=0.54, P=0.56). Overall cortisol output did not differ between the groups (P>0.2).

Cognitive measures

Group mean performance and statistical comparisons for all cognitive measures are summarised in Table 2. In tests comprising a delay or difficulty level variable, only main effects and interactions involving group variables are reported below: main effects of delay or difficulty level were significant in all cases (excluding Vigil latency and omissions) but are not presented here. In Table 3, outcome measures from each test are sorted by effect size, with Cohen’s conventions used to indicate small, medium and large effects (Cohen, 1988). The proportion of patients scoring at or below the fifth percentile of the control group is also presented.

View this table:
Table 2

Cognitive performance comparisons (untransformed means are reported for clarity)

View this table:
Table 3

Cognitive performance: effect sizes and percentage of patients scoring below the fifth percentile of the control group

Psychomotor performance

Patients’ response times were significantly slower than the control group on the Vigil task. There was no group×time interaction (F(3,369)=0.438, P=0.67), suggesting that patients were impaired throughout the task. Patients were also significantly slower than controls to complete part A of the Trail Making Test and produced significantly fewer correct responses on the Digit Symbol Substitution Test.

Attention and executive function

In contrast to the performance deficit on part A of the Trail Making Test, patients and controls did not significantly differ on part B of this task. On the Vigil task, patients made significantly more errors of omission than controls; however, commission errors did not differ between groups. Analysis of omission errors across time revealed that patients were impaired throughout this task, indicated by the absence of a significant group×time interaction (F(3,369)=0.833, P=0.471). Patients’ performance was also significantly poorer than that of the control group on the Stroop task and their response accuracy on the Tower of London task was impaired. On the latter task’s latency measures, patients’ motor initiation and motor execution times were significantly greater than those of the controls, as were their overall initial and subsequent thinking times. However, when the motor times were subtracted from the thinking times to provide indices of planning times per se, patients’ initial and subsequent response times did not differ significantly from controls. No group×level interaction was present across any Tower of London index (P≥0.169). On the Controlled Oral Word Association Test, patients generated significantly fewer correct responses than controls, but made no more perseverative responses, and they recalled significantly fewer digits on the Digits Backward test. On the SOPT, there was a significant main effect of group, as patients made significantly more errors than controls, but no group×set-size interaction (F(3,309)=0.632, P=0.586). For Spatial Working Memory between-search errors there was a significant main effect of group, with patients making significantly more errors than controls, but again no group×level interaction (F(2,248)=0.931, P=0.387); however, on this test’s strategy index, patients’ scores were no different from controls.

Immediate memory

Unlike their performance on the Digits Forward task, patients’ CANTAB Spatial Span scores were significantly below those of the controls.

Declarative memory

Patients’ performance was no different from that of controls on the CANTAB Pattern Recognition Memory task, but they showed significant impairment on the Spatial Recognition Memory task. On the CANTAB Matching to Sample tasks there was no between-group difference when the stimuli were presented simultaneously. On the delayed trials, however, patients’ performance was significantly poorer than controls’. The absence of a group×delay interaction (F(2,240)=0.867, P=0.422) suggests that the deficit is not delay-dependent. On the Paired Associates Learning test patients required significantly more trials than controls to complete the task. Patients also completed fewer sets successfully and located fewer patterns correctly after a single presentation.

Patients’ verbal learning was significantly poorer than that of controls (RAVLT trials A1–A5). Patients’ performance was also impaired on the RAVLT distracter word list recall trial (list B), post-interference recall (trial A6), number of targets correctly identified on the recognition trial, and on the standard index of delayed recall (A7). However, when the delayed recall index was modified to control for the confounding effects of patients’ reduced encoding (see footnote 3 in Table 2), patients’ performance did not differ from controls, suggesting that retention in long-term memory per se is intact. Also, patients committed no more errors of commission than did controls on the RAVLT recognition trial and their immediate span (trial A1) was intact.

Post hoc multivariate analysis by neurocognitive domain

Studies of neurocognitive function are frequently at risk of type I error because of the number of comparisons conducted in the analysis. Multiple comparisons are a product of the need to use several tasks to profile the range of different neurocognitive processes. One approach that has been suggested to overcome this problem is to group together tests and outcome measures that have some degree of theoretical overlap (Stevens, 2002), i.e. apply to a specific neurocognitive domain. This method was therefore adopted and it confirmed that a significant multivariate statistic (Hotelling’s trace) was present in each of the four domains (from Table 2): psychomotor performance (MANOVA=0.244, F=9.86, P<0.001), attention and executive function (MANOVA=0.492, F=3.33, P<0.001), immediate memory (MANOVA=0.061, F=3.63, P=0.029) and declarative memory (MANOVA=0.2389, F=2.17, P=0.014).

Effects of residual mood symptoms

Residual mood symptoms have been found to impair cognition in affective disorders (Ferrier et al, 1999; Clark et al, 2002). As patients’ mood scores on many of the clinical rating scales used in this study were significantly higher than those of controls, a series of analyses were performed to rule out this potential confound on the observed deficits. Correlations between the clinical rating scales and neurocognitive tests illustrating between-group differences were first calculated, to establish which deficits might have been influenced by mood. Partial correlations were subsequently performed on any cognitive index that significantly correlated with the mood ratings, to examine whether the previously observed between-group differences on these indices remained when the effect of mood on performance was partialled out. These analyses illustrated that 11 cognitive indices correlated significantly with the rating scales. However, when the effects of mood on these variables were controlled, all between-group effects remained significant (apart from subsequent thinking time on the Tower of London task) when the BDI scores on the day of test and at week 4 were partialled out (P=0.066).

Relationship between illness characteristics and neurocognitive function

To restrict the number of correlations computed between the illness characteristics and the neurocognitive test variables, only those indices illustrating between-group differences were examined. In addition, only the most representative variable from each of the cognitive tasks was included. The results of these analyses are presented in Table 4.

View this table:
Table 4

Correlations between illness characteristics and impaired neurocognitive performance in patients with bipolar disorder

Relationship between basal cortisol levels and neurocognitive function

Correlations between cortisol area under the curve and neurocognitive indices were also examined for patients and controls separately. In patients, the only significant correlations observed were for the Stroop colour–word latency (rs=-0.330, P=0.015), Tower of London excess moves (rs=-0.312, P=0.022) and verbal fluency on the Controlled Oral Word Association Test (rs=0.303, P=0.025). In controls, no significant correlation was observed (data not shown).


Our study demonstrates significant neurocognitive impairment in a prospectively verified sample of euthymic patients with bipolar disorder, compared with a well-matched control group. Patients were impaired across a range of cognitive domains, including attention and executive function, immediate (spatial) memory and verbal and visuospatial declarative memory. Significant psychomotor retardation was also evident. These impairments were not attributable to hypercortisolism, because basal salivary cortisol profiling revealed no difference between patients and controls. Also, dysfunction was still evident after controlling for the effects of residual mood symptoms post hoc using a partial correlational analysis.

Previous neuropsychological findings in bipolar disorder

Our findings are consistent with a growing body of evidence that people with bipolar disorder experience a range of cognitive deficits during disease remission (van Gorp et al, 1998; Ferrier et al, 1999; Rubinsztein et al, 2000; Clark et al, 2002; Martinez-Aran et al, 2004). Many studies have used structured interviews and standardised rating scales to demonstrate the euthymic status of patients, but generally their sample sizes were smaller than ours and prospective follow-up or cortisol measurements were not recorded. The majority of studies to date have typically defined patients as impaired on the basis of a between-group difference (from controls) on an arbitrarily selected significance level. However, although a result might be statistically significant, this says nothing about the size of the effect, nor does it guarantee that it is clinically important. Few studies have provided data on the number of patients falling within the clinically impaired range on particular tasks, despite normative data being readily available to do so (but see Rubinsztein et al, 2000). We sought to address this by providing effect sizes alongside statistical significance tests and by calculating the percentage of patients falling below the fifth percentile on the measures employed.

Magnitude of the impairment: statistical v. clinical significance

Previous studies have reported deficits in up to 32% of people with bipolar disorder (Astrup et al, 1959; Bratfos & Haug, 1968; Dhingra & Rabins, 1991; Martinez-Aran et al, 2000). Our results demonstrate that the proportion affected is extremely variable and is dependent upon the particular task employed. For example, for tasks within a medium to large effect size, on average 25% of patients scored at or below the fifth percentile, although on some tasks (such as the Digit Symbol Substitution Test) the proportion was over 33%. This figure averaged almost 15% for tasks falling in the small to medium effect size range. Even for tasks with little or no between-group difference (i.e. d<0.2), clinically significant deficits were still evident in some individuals. This is of particular clinical importance because these deficits were observed in a cohort of patients who had been euthymic for an average of 27.3 months (median 14 months), suggesting that neurocognitive impairment persists long beyond the point of symptomatic recovery. The enduring nature of this impairment is also highlighted by the absence of association between the length of time patients had been in remission and the extent of neurocognitive impairment.

Factors affecting neurocognitive impairment in bipolar disorder

All but three of the patients in this study were receiving medication at the time of testing, therefore the effects of psychotropic drugs on neurocognitive functioning cannot be excluded. Lithium use, for example, has been shown to have subtle but definite effects on several domains, including psychomotor speed and possibly verbal memory. Similarly, antidepressants have been shown to have cognitive effects, particularly those with anticholinergic properties (Amado-Boccara et al, 1995). However, in their review, Bearden et al (2001) suggest that the cognitive impairments in bipolar illness are unlikely to be a primary effect of medication. In a comparison study of euthymic patients with bipolar disorder and controls, neurocognitive impairment was observed not only in patients receiving mood-stabiliser monotherapy but also in those who were drug-free (Goswami et al, 2002). None the less, many patients with this disorder take several psychotropic medications at varying doses, and it is unknown what the effects of combined therapy might be, particularly over time. Similarly, although ECT may affect neurocognitive function in some patients, only half the patients in our study had ever received ECT and a negative effect of the number of previous treatments on performance was observed on a small number of tests.

Different rates of neurobiological abnormalities among patients with bipolar disorder may also affect the pattern and magnitude of neurocognitive impairment. Elevated cortisol levels have been shown to impair specific domains of neurocognitive functioning, both in studies in which synthetic glucocorticoids were administered exogenously and in patient groups with chronically elevated endogenous cortisol levels (e.g. Cushing’s disease). As hypothalamic–pituitary–adrenal axis dysfunction in bipolar disorder may also persist in a proportion of euthymic patients (Watson et al, 2004), in our study saliva samples were collected on the day prior to testing to provide a simple, non-invasive assessment of basal cortisol secretion. No difference between patients and controls was observed, potentially excluding this confound. However, basal cortisol profiling is relatively insensitive compared with ‘activating’ challenges such as the dexamethasone/corticotrophin releasing hormone test, which might be more informative in future studies (Watson et al, 2004).

Several studies have reported that residual mood symptoms may affect the degree of neurocognitive dysfunction observed in euthymic patients with bipolar disorder (Ferrier et al, 1999; Clark et al, 2002). When residual mood symptoms were statistically controlled in our study, all between-group differences remained with the exception of one. As we recruited a larger cohort than in the earlier studies, residual symptoms may exert only subtle effects on performance and are less problematic when the statistical power of the study is increased. The observed impairment is therefore unlikely to be an epiphenomenon of mood.

Clinico-cognitive correlations: disease process or trait deficit?

Several studies have reported that patients with a more severe course of prior illness and greater number of episodes suffer greater neurocognitive decline (Kessing, 1998; van Gorp et al, 1998; Denicoff et al, 1999). In our study, examination of the correlation between illness history characteristics and neurocognitive functioning revealed an effect of several factors, particularly lifetime duration of illness and number of hospitalisations, consistent with several previous reports (Tham et al, 1997; Denicoff et al, 1999; Rubinsztein et al, 2000). Such associations have typically been interpreted as indicating a progressive disease process. However, the direction of causality cannot be determined from correlational analyses. These results may equally indicate that patients with neurocognitive impairments are more vulnerable to developing a severe and recurrent bipolar disorder. Preliminary evidence indicates subtle neurocognitive impairments in ‘high-risk’ groups, i.e. first-degree relatives of patients with bipolar disorder (Keri et al, 2001; Chowdhury et al, 2002; Sobczak et al, 2002). Therefore, although some deficits might be the result of disease progression, evidence that impairments occur both in euthymic patients with bipolar disorder and their healthy first-degree relatives may represent an endophenotypic marker of genetic vulnerability.

Implications and future research

Neurocognitive deficits are evident in euthymic patients with bipolar disorder. These deficits are often a cause of considerable distress and can lead to impairment of psychosocial and occupational functioning (e.g. Martinez-Aran et al, 2004). Our study confirms that these deficits are of both statistical and clinical significance, and persist independently of mood symptoms. Early intervention may be particularly important in order to ameliorate such impairments, as several studies – including this one – indicate that the degree of dysfunction may increase with disease progression. One of the most important aims of future research should therefore be the identification of the underlying neurobiology of neurocognitive impairment in euthymic patients, thereby providing a target for therapeutic intervention. Cognitive and psychoeducational rehabilitation programmes may be warranted to improve the long-term outcome for some patients.

Clinical Implications and Limitations


  1. Patients with bipolar disorder are impaired across a range of neurocognitive domains. This is evident in terms of both statistical and clinical significance.

  2. These deficits persist in the euthymic state and suggest that neurocognitive impairment persists long beyond the point of symptomatic recovery.

  3. These deficits are not simply related to basal hypercortisolaemia.


  1. The patients in the study were stable on medication, but were not drug-free.

  2. The study design was cross-sectional and the longitudinal course of these deficits remains to be fully demonstrated.

  3. Basal cortisol levels do not fully characterise hypothalamic–pituitary–adrenal dysfunction.


This project was made possible by the generous support of the Stanley Medical Research Institute and the Mental Health Foundation. We thank Mr M. Leitch, Ms S. Dodds, Mrs M. Smith and Ms M. Watson for technical assistance and Dr B. S. Wilkinson and Mrs C. L. Harrison for assistance with data collection.

  • Received February 25, 2004.
  • Revision received September 9, 2004.
  • Accepted September 10, 2004.


View Abstract