The British Journal of Psychiatry


Background Neurocognitive impairments in euthymic patients with bipolar disorder may represent trait rather than state variables.

Aims To test the hypothesis that euthymic patients with bipolar disorder would exhibit impairment in verbal learning and memory and executive function compared with healthy controls matched for age, gender and premorbid IQ.

Method Twenty euthymic patients with bipolar disorder were matched, on a case-by-case basis, to twenty healthy community controls. Cases and controls were tested with a battery of neuropsychological tests.

Results Impairments were found in cases compared with controls in tests of verbal learning and memory. Verbal learning and memory correlated negatively with the number of manic episodes.

Conclusions Impaired verbal learning and memory may be a trait variable in bipolar disease. There are implications for adherence to medication and relapse and for the role of early treatment interventions. Prospective designs and targeting first-episode groups may help to differentiate trait v. disease process effects.

There is increasing evidence that neurocognitive impairment in some patients with bipolar disorder is enduring and may represent a trait rather than a state variable. The main abnormalities lie within mnemonic and executive domains (e.g. Dupont et al, 1990; Morice, 1990; Goldberg et al, 1993). Structural abnormalities in the brains of patients with bipolar disorder (Dupont et al, 1995; Elkis et al, 1995; Soares & Mann, 1997; Videbech, 1997) point to enduring brain changes, and some positive associations have been made between these structural abnormalities and neurocognitive impairments (Lesser et al, 1996). There is evidence also of a positive association between the number of affective episodes and neurocognitive impairment. The few studies that have addressed the trait—state distinction by examining patients with bipolar disorder in the euthymic state (Kessing, 1998; van Gorp et al, 1998; Ferrier et al, 1999) support the trait rather than the state hypothesis.

To test the hypothesis that euthymic patients with bipolar disorder exhibit impairment in verbal learning and memory and in executive function compared with healthy controls, we aimed to recruit 20 euthymic patients with bipolar disorder and to match these on a case-by-case basis, for age, gender and premorbid IQ with 20 healthy controls.


Patients were matched on a case-by-case basis with healthy controls. Matching was on the basis of age, gender and premorbid IQ. A battery of neuropsychological tests was administered to both the cases and their matched healthy controls. Ethical approval for the study was sought and granted by the local area ethics committee. Patients and controls gave informed consent. All tests on an individual were carried out on the same day.


The patient sample was identified via the computerised psychiatric database at the Royal Edinburgh Hospital. Inclusion criteria required that patients conformed to DSM—IV (American Psychiatric Association, 1994) criteria for bipolar I disorder, were aged 18-70 years and currently were euthymic. Exclusion criteria were: significant physical or neurological illness; history of stroke or head trauma; evidence from the Schedule for Affective Disorders and Schizophrenia — Lifetime version (SADS—L; Spitzer & Endicott, 1978) of a significant history of (or ongoing) alcohol and/or drug misuse; electroconvulsive therapy (ECT) in the preceding 6 months; a score above 7 on the 21-item Hamilton Rating Scale for Depression (HRSD; Hamilton, 1960) and a score above 2 on the Modified Manic State (MMS) rating scale (MMS; Blackburn et al, 1977); evidence of comorbid psychiatric disorder; neurodegenerative disorders; learning disability; endocrine abnormalities.

Thirty-seven patients were identified who met DSM—IV criteria for bipolar disorder by case note review: 20 were recruited successfully for the study, 12 did not fulfil entry criteria and 5 refused to participate. Information in the case notes indicated that those who refused did not appear to differ in terms of demographic variables from those who participated. Entry criteria were not met because the patients were unwell at the time or were over 70 years of age.


The controls were recruited from the community and, as far as possible, were from the same community and occupational background as the patients. The controls were subject to the same exclusion criteria as the patients were. In addition, any current or past psychiatric history resulted in exclusion. Twenty controls were approached, all of whom agreed to participate. Controls included friends of other controls, non-research nursing staff, hospital administrative staff, computing staff and ancillary staff. No inducement was offered for participation in the study. Participation was voluntary for both patients and controls.

Initial assessment

In addition to the SADS—L, HRSD and MMS, all cases and controls were screened using the National Adult Reading Test (NART; Nelson & Willison, 1991) as a measure of premorbid IQ.

Neuropsychological assessment

Patients and controls were assessed using the same tests administered in the same order by J.T.O.C. and M.V.B. in the Department of Psychiatry. The whole testing session lasted approximately 90 min.

Tests of verbal learning and memory

California Verbal Learning Test. The California Verbal Learning Test (CVLT; Delis et al, 1987) is a memory test that allows the differentiation of various aspects of verbal learning, recall and recognition. It comprises a list-learning task of 16 items presented orally to the participant by the tester. The list is composed of four category groupings: fruit; tools; clothing; herbs.

There are five immediate free recall trials of list A followed by an interference list B that is presented once. Immediately after list B, short-delay recall of list A is tested in both free and cued forms. Cues are given in the form of names of the four categories. Following 20 min of non-verbal tests, long-delay free and cued recall from list A are recorded. Immediately following this, recognition is measured by asking for a yes/no response to a list containing the list A items, with an additional 28 distractors made up of list B words and novel words both related and unrelated semantically to list A words.

The total score of immediate free recall trials 1-5 provides a global measure of learning performance. The number of words remembered in trial 1 is a reflection of immediate recall. The change in the number of words recalled from trial 1 to trial 5 shows the rate of learning and reflects little or no learning if the number of words recalled on later trials is not much more than the number remembered during the first trial (Lezak, 1995). Recognition is an estimate of ability to recognise target items correctly and to reject distractor items.

Tests of executive function

Hayling Sentence Completion Test. The Hayling Sentence Completion Test (HSCT; Burgess & Shallice, 1996) assesses the ability to inhibit and override a dominant response tendency. It comprises two conditions, in both of which the sentence must be completed as quickly as possible with a one-word answer.

Condition 1 requires participants to finish a sentence with a word that allows the sentence to make sense. Condition 2 requires participants to finish the sentence with an absurd word that renders the sentence nonsensical (the incongruous condition). The errors are scored according to the degree of sense made by the sentence completion. Category A errors are scored if a sentence in the incongruous condition is completed correctly. Category B errors are scored if the sentence makes some sense (e.g. The whole town came to hear the Mayor.................. ; answer: sing).

Raw scores then are converted to scaled scores. The overall scores were analysed in this study.

Verbal Fluency Test (FAS version). In the Verbal Fluency Test (Benton & Hamsher, 1978) participants were asked to recall within 1 min as many words as possible beginning with each of the letters F, A and S.

Stroop Colour Word Test. The Stroop Colour Word Test (Trenerry et al, 1988) is a test of executive function that involves an initial trial (‘priming trial’) reading aloud the words in a list of colour names written in incongruous coloured ink. This is followed by a second trial (‘colour—word trial’) reading down a list of colour names, again written in incongruous coloured ink but this time reading aloud the colour of the ink in which the words are printed.

Behavioural Assessment of the Dysexecutive System. The Behavioural Assessment of the Dysexecutive System (BADS; Wilson et al, 1996) is used to predict everyday problems arising from executive disturbances.

The BADS presents a collection of six tests similar to real-life activities, which could present difficulties for some people with dysexecutive syndrome. The test chosen from the six was the Modified Six Elements Test. This test was based on a task developed by Shallice & Burgess (1991). The reasons for choosing this test were as follows: the entire BADS did not fit within the time constraints of testing; the modified test makes demands in the three areas of planning, organising and monitoring behaviour; in addition, it taps ‘ prospective memory’ quite highly (i.e. the ability to carry out an intention at a future time) (Burgess, 1997).

The participant is given instructions to do three tasks comprising dictation, arithmetic and picture naming. Each of these tasks is subdivided into two parts called A and B. The participant is asked to attempt at least something from each of the six sub-tasks within a 10-min period. There is one rule that may not be broken, namely, that it is forbidden to do two parts of the same task consecutively.

The point of the test is how well participants organise themselves, not how well they perform on individual tests. The profile score is based on the number of tasks completed, the number of tasks where rule breaks were made and the time spent on any one task.

Statistical analysis

Parametric statistical tests were used for all tests with the exception of the HSCT, which is known to have a non-normal distribution and therefore non-parametric tests were employed. Homogeneity of variance was checked using Levene's test. Independent t-tests then were used to test the null hypothesis that the mean test scores between cases and controls did not differ. The Mann—Whitney U test was used in the case of the HSCT. Spearman's correlational analysis was carried out between clinical variables and performance on the neuropsychological measures. Partial correlational analyses also were conducted, controlling for age, gender, IQ, HRSD score and duration of illness.


Table 1 shows the demographic profiles, premorbid IQ, medications and clinical scores of depression and mania in the bipolar group and controls. Patients and controls did not differ with respect to age, gender or premorbid IQ.

View this table:
Table 1

Demographic, medication and baseline measures in patients and controls

In the bipolar group, the 17-item and 21-item HRSD and MMS had mean scores of 1.0, 1.1 and 0.5. The respective scores for the controls were zero.

Those patients currently taking medication were divided as follows: antidepressants, namely, selective serotonin reuptake inhibitors (SSRIs), n=7; antipsychotics, n=8; ‘atypical’ antipsychotics, n=0; lithium, n=8; anticonvulsant (carbamazepine), n=5. All doses were within British National Formulary (BNF) guidelines. No controls were receiving medication at the time of testing.

Table 2 describes the clinical indices of the patients.

View this table:
Table 2

Clinical indices of the patients

Neuropsychological tests

Table 3 illustrates the raw scores and paired differences between patients and controls. Significant differences in the neuropsychological tests between the bipolar group and controls were found within the tests for verbal learning and memory (i.e. CVLT). Deficits were found in immediate recall on trial 1 of the CVLT. The total immediate recall (sum of trials 1-5) of the CVLT was worse in patients than in controls. The other impairments were in short- and long-delay recall and in delayed recognition with false positives removed. The latter is regarded as a more accurate reflection of delayed recognition. Impairment in delayed recognition itself almost reached conventional statistical significance.

View this table:
Table 3

Raw scores and paired differences between patients and controls

With respect to the other tests, there were no significant differences between patients and controls in verbal fluency (FAS), the profile score of the Modified Six Elements Test or in the converted error score of the HSCT. However, differences in the Stroop Colour Word Test almost reached statistical significance.

Correlational analysis was performed between those neuropsychological tests that showed statistically significant differences on univariate testing and clinical indices. Significant negative correlations were found between performance on neuropsychological tests and number of manic episodes (Table 4). Specifically, there was a negative correlation between the number of manic episodes and performance on delayed recognition (r=-0.6, P=0.01) and delayed recognition minus false positives (r=-0.7, P=0.002). There was also a negative correlation between delayed recognition and duration of illness in years (r=-0.5, P=0.03). However, it should be noted that delayed recognition (including false positives) did not reach conventional statistical significance on univariate testing (P=0.09).

View this table:
Table 4

Spearman's rank correlational analysis

Table 5 illustrates the results of the partial correlational analyses between CVLT, number of manic episodes and number of depressed episodes, controlling for age, gender, IQ, HRSD and duration of illness. On controlling for age, gender and IQ, delayed recall, delayed recognition and delayed recognition minus false positives remain significantly correlated with the number of manic episodes. The same results occurred on controlling for duration of illness. On controlling for the HRSD score, trial 1 of the CVLT remained significantly correlated with number of manic episodes, in addition to delayed recall, delayed recognition and delayed recognition minus false positives.

View this table:
Table 5

Partial correlational analyses: correlation coefficient r (P value)


This study demonstrates significant neurocognitive deficits in euthymic patients with bipolar disorder compared with a healthy control group matched for age, gender and IQ. The impairment was specifically in the domains of verbal learning and memory. In particular, the relationship between recall and recognition measures highlights difficulties within encoding and retrieval. Importantly, performance on the CVLT (especially delayed recognition) showed a negative correlation with duration of illness and number of manic episodes, although not with depressive episodes or number of hospital admissions. The results from the partial correlational analyses imply that the negative correlations between delayed recall and recognition measures and the number of manic episodes were not due to confounds of age, gender, IQ, duration of illness or unresolved depressive symptoms.

These findings support the proposed association between the number of affective episodes and neurocognitive impairment. These results also confirm that cognitive impairment persists during the euthymic phase of illness and does not appear to be due to the effects of active illness. Therefore, these deficits may represent trait variables.


The relatively small sample size gives rise to power limitations. Such power limitations, rather than an absence of any such deficit, may explain the lack of evidence for deficits in the executive domain. For example, the trend towards significance in the Stroop Colour Word Test may have been convincingly significant with a larger sample.

Given the mean number of episodes and admissions, this patient group may represent those at the more severe end of the illness spectrum and it is not known whether these results can be generalised to milder forms of bipolar disorder.

Details of illness history were obtained from the subjects and from hospital records. There is the risk of recall bias, in that patients may forget episodes of illness, especially less severe episodes. However, mania is more likely to be recorded in case notes and to result in hospital admission, and there was a significant correlation between number of manic episodes and extent of cognitive impairment. A greater number of episodes resulted in poorer test performance.

The lack of any score on the HRSD in the controls may indicate an unusually healthy group. However, as far as possible, controls were recruited from similar social and occupational backgrounds and not from an exceptionally healthy group.

A potential confound is that of subclinical pathology. Both cases and controls were interviewed using the SADS—L, HRSD (17- and 21-item) and MMS. Cases were, on average, free from illness for 118.3 weeks (i.e. over 2 years), which indicates a period of stability. However, it must be recognised that more detailed tests of subclinical pathology were not employed (e.g. the Cambridge Cognitive Examination; Roth et al, 1986). Also, the absence of psychopathology was a prerequisite for inclusion in the study. Given that some cases (n=4) had relapsed more recently than others, the analyses were repeated but excluding those who had relapsed within the previous 8 weeks. The results remained robust. The effects of medication on cognitive function must be considered, but there is some debate in the literature as to the strength of any effect. With respect to lithium, some studies have demonstrated cognitive impairment among those on the drug (e.g. Kocsis et al, 1993) whereas others have identified no significant deficit (e.g. Engelsmann et al, 1988). Specifically, some studies have found that lithium, antidepressants and antipsychotics can have a deleterious effect on motor speed and memory (Shaw et al, 1987; Cassens et al, 1990). Other studies have found no differences in lithium- or neuroleptic-treated patients in tests of attention, visuomotor function and memory (Joffe et al, 1988; King, 1990).

Among the best-designed studies in this area is that by Engelsmann et al (1988), who followed up a cohort of patients treated with lithium and observed them over a 6-year period. There was remarkably stable cognitive performance among the sample, with only 10% of memory sub-tests demonstrating a statistically (but not clinically) significant decline. With respect to anticonvulsants, research to date has found little or no cognitive impairment with these (Devinsky, 1995). Similarly, the majority of evidence indicates that antidepressant treatment is not associated with cognitive impairment (Thompson, 1991). It must be recognised that the most methodologically effective technique is to examine drug-free patients. However, in contemporary clinical practice it is extremely rare to find patients with established bipolar disorder who are medication-free. In addition, to answer the question of trait v. state, patients must be euthymic. In order to reach stability and euthymia, patients are required to be adherent to an effective medication regime. This further militates against the likelihood of recruiting a drug-free sample.


Possible explanations for the findings of this study include abnormalities in brain structure and function (neuroanatomical) and a deleterious effect of the illness itself (disease process), which are not necessarily mutually exclusive.


Although this study included no neuroimaging techniques, areas found to be abnormal in neuroimaging studies subserve the impairments in verbal learning and memory identified. For example, volumetric studies in patients with bipolar disorder have shown smaller temporal lobes (e.g. Swayze et al, 1992). Patchy white matter hyperintensities have been found in diencephalic structures thought to subserve aspects of memory. However, impairments have been observed in a variety of neuropsychological tests thought to be subserved by several different neural regions, and structural imaging studies of patients with bipolar disorder have revealed a wide variety of abnormalities: for example, increased ventricular/brain ratio and patchy white matter hyperintensities in frontal lobe and periventricular regions as well as basal ganglia (Videbech, 1997). The role of ventromedial prefrontal cortex has been emphasised further by the influential findings of Drevets et al (1997) of differential activation in mania and depression of subgenual prefrontal cortex accompanied by decreased cortical volume.

As yet, the relationship between these neuroanatomical findings and the neurocognitive functions that they subserve is unclear. Few studies have examined directly the correlations between neuroanatomy amd neurocognitive impairment in patients with bipolar disorder. In those that have done so, the hippocampus has been implicated (Sax et al, 1999; Ali et al, 2000). More recent findings point to an association between poor outcome and subcortical white matter lesions in patients with bipolar disorder (Moore et al, 2001), and Dupont et al (1990) reported that patients with white matter lesions performed poorly on neuropsychological testing.

Disease process

The finding of negative correlations between neurocognitive function and the number of manic episodes lends itself to the explanation that the greater the number of affective episodes, the greater the neurocognitive impairment. Kessing (1998) found that patients with recurrent affective episodes were more impaired than those with a single episode and more impaired than controls. There is a degree of biological plausibility in this hypothesis. Hypercortisolaemia and dexamethasone suppression test non-suppression are well-recognised findings in some patients with affective disorders. Furthermore, significant correlations have been found between mean urinary free cortisol (Rubinow et al, 1984), dexamethasone suppression test non-suppression (Brown et al, 1999) and cognitive impairment in depressed patients. Although less well studied, Cassidy et al (1998) reported abnormal dexamethasone suppression test and plasma cortisol responses in mania and mixed states. Animal data have indicated that hypercortisolaemia may be toxic to the hippocampus, reduce the number of glucocorticoid receptors and cause cell death in the temporal lobe region (e.g. Sapolsky & McEwen, 1988). Perhaps the enduring neurocognitive impairment seen in these patients with bipolar disorder is due to hypothalamic—pituitary—adrenal axis-mediated damage. If this occurs with each episode of affective illness, then it is reasonable to assume that the greater the number of episodes, the greater the potential risk to neurocognitive function.


Gallo et al (2000) found that active life expectancy decreased when measures of syndromal depression and cognitive impairment were included along with the usual measures of activities of daily living. Similarly, impairments in learning and memory might affect a patient's ability to remember directions for medications, why medication is important and the recollection of what symptoms are side-effects and the reporting of these at the clinic, all of which have an impact on adherence to treatment.

In addition, psychological treatments are, by their very nature, verbal and impairments in these domains may be detrimental to progress. This is especially relevant to cognitive—behavioural techniques designed to maintain adherence to medication. The negative correlation with the number of manic episodes illustrates the need for further research on early warning signs in bipolar disorder and on the efficacy of early intervention. An earlier and more robust intervention with mood stabilisers may have a role in reducing the number of episodes. Techniques for ensuring adherence to effective treatment regimes require testing and implementation if effective. These techniques should account for the possibility of neurocognitive impairment as a cause of non-adherence. Further studies of neurocognitive function within prospective study designs are required in order to explain the existence of cognitive impairment as a trait variable in patients with bipolar disorder. The targeting of first-episode groups may help in attempts to differentiate trait v. disease process effects.

Clinical Implications and Limitations


  • Impairments in verbal learning and memory might have a detrimental impact upon adherence to therapeutic interventions and thus lead to relapse.

  • The negative correlation with the number of manic episodes implies that earlier and more robust therapeutic intervention might be protective.

  • Interventions designed to improve adherence should account for neurocognitive impairment as a cause of non-adherence. Such interventions require testing and implementation if effective.


  • The relatively small sample size gives rise to power implications. Differences in executive function may have been observed with a larger sample.

  • The effects of medication on neurocognitive function cannot be discounted.

  • The presence of subclinical psychopathology cannot be discounted.


We are grateful to Professor Ronan O'Carroll of the University of St Andrews, Scotland for his expert advice on the neuropsychological assessments.


  • Received September 21, 2000.
  • Revision received May 10, 2001.
  • Accepted May 17, 2001.


View Abstract