Background The neurocognitive basis of auditory verbal hallucinations is unclear.
Aims To investigate whether people with a history of such hallucinations would misattribute their own speech as external and show differential activation in brain areas implicated in hallucinations compared with people without such hallucinations.
Method Participants underwent functional magnetic resonance imaging (fMRI) while listening to pre-recorded words. The source (self/non-self) and acoustic quality (undistorted/distorted) were varied across trials. Participants indicated whether the speech they heard was their own or that of another person. Twenty people with schizophrenia (auditory verbal hallucinations n=10, no hallucinations n=10) and healthy controls (n=11) were tested.
Results The hallucinator group made more external misattributions and showed altered activation in the superior temporal gyrus and anterior cingulate compared with both other groups.
Conclusions The misidentification of self-generated speech in patients with auditory verbal hallucinations is associated with functional abnormalities in the anterior cingulate and left temporal cortex. This may be related to impairment in the explicit evaluation of ambiguous auditory verbal stimuli.
Auditory verbal hallucinations are a cardinal feature of schizophrenia but their neurocognitive basis is unclear. Theoretical accounts proposed that such hallucinations result from a breakdown in the monitoring of the intention to generate inner speech, through a loss of the ‘efference copy’ associated with the generation of verbal material. This efference copy serves to inform an internal monitor of forthcoming action and may thus help to distinguish self-generated from externally generated verbal material (Blakemore et al, 2002). In the absence of this signal, inner speech may thus be misidentified as ‘alien’ and perceived as externally generated voices (Feinberg, 1978; Frith & Done, 1988). Hallucinations have therefore been conceptualised as resulting from a breakdown in the systems monitoring the current intention to make actions (Frith & Done, 1988).
However, monitoring can also occur at the level of the conscious evaluation of the verbal output (Levelt, 1983) when speakers hear their own voice. Impairment at this level may also lead to the erroneous misattribution of self-generated speech. When patients with schizophrenia who are prone to auditory verbal hallucinations speak and hear an acoustically distorted version of their own voice they tend to misidentify their own speech as being that of somebody else (Johns & McGuire, 1999; Fu et al, 2001; Johns et al, 2001). Although this impairment is consistent with a loss of efference copy, it could equally result from a problem with the conscious evaluation of auditory verbal feedback (Allen et al, 2004).
The purpose of our study was to use functional magnetic resonance imaging (fMRI) to examine the brain regions involved in the conscious appraisal of speech in people with schizophrenia who were and were not prone to auditory verbal hallucinations. The subjective experience of these hallucinations in schizophrenia is associated with activation in the inferior frontal, anterior cingulate and temporal cortex (McGuire et al, 1993; Shergill et al, 2000b). Furthermore, the processing of verbal material in people who are prone to such hallucinations has been associated with differential engagement of these regions relative to people with schizophrenia who do not experience hallucinations and controls (McGuire et al, 1995; Shergill et al, 2003) particularly, in the temporal cortex (Fu et al, 2001). We tested the hypothesis that in people with auditory verbal hallucinations the appraisal of speech would be associated with the differential engagement of temporal, prefrontal and anterior cingulate cortices. More specifically, we tested the prediction that external misattributions in people with these hallucinations would be associated with altered activation of the temporal cortices.
All participants were right-handed men who spoke English as their first language and had no history of hearing problems. The study had local research ethics committee approval and all participants gave informed consent.
A control group of 11 healthy volunteers was recruited from the local community through advertisements. Applicants with a history of medical or psychiatric disorder, a drug or alcohol use problem, a family history of psychiatric disorder, or who were receiving medication were excluded. Their mean age was 28 years and their mean IQ, estimated with the National Adult Reading Test (NART; Nelson & O’Connell, 1978), was 115 (see Table 1).
All patients met DSM–IV criteria for schizophrenia (American Psychiatric Association, 1994) and were recruited through the South London and Maudsley National Health Service Trust. Clinical teams were systematically contacted with a request to identify patients with schizophrenia who either had prominent and current auditory verbal hallucinations, or had no current or previous history of such hallucinations. This information was corroborated by careful review of the patients’ clinical records. Potentially eligible patients were then approached by the investigators and assessed using the Scale for the Assessment of Positive Symptoms (SAPS; Andreasen, 1984a), the Scale for the Assessment of Negative Symptoms (SANS; Andreasen, 1984b), the Calgary Depression Scale (Addington et al, 1990) and the NART.
The hallucinator group (n=10) comprised patients who scored ≥3 on the SAPS auditory hallucination item (clear evidence of voices and that they had occurred in the past week). All of these patients had a documented history of auditory verbal hallucinations. Patients in this group were also experiencing other positive symptoms, particularly delusions, and had low levels of negative symptoms (see Table 1). Nine of this group were in hospital at the time of testing and one was receiving out-patient treatment. None reported hallucinations during the fMRI scanning procedure.
The non-hallucinator group (n=10) was composed of patients who were not experiencing auditory verbal hallucinations at the time of testing and had no previous history of such hallucinations. This was assessed by detailed inspection of the patients’ notes, and consultation with clinical staff. Patients with any history of such hallucinations were excluded. Patients in this group had positive symptoms other than hallucinations – particularly delusions (see Table 1). Eight of these patients were in hospital at the time of testing and two were receiving out-patient treatment.
Exclusion criteria for both patient groups included the presence of an Axis II DSM–IV diagnosis or another Axis I diagnosis, a neurological disorder or a history of substance or alcohol misuse. Patients with an IQ below 80 were also excluded. All patients had been receiving regular doses of antipsychotic medication for at least 1 month prior to testing. Potential participants who reported a history of hearing problems were excluded. The healthy volunteers had a higher premorbid IQ than either patient group; the IQ score was therefore included as a covariate in the between-group analyses.
Eighty adjectives applicable to people were used (e.g. ‘ perfect’, ‘tall’). All the words were monosyllabic or bisyllabic with a Thorndike–Lorge frequency greater than 50 (Gilhooly & Logie, 1980), and were selected from lists used in a previous study (McGuire et al, 1996). The emotional valence of these words had previously been rated by 40 healthy volunteers as either negative, positive or neutral (Johns et al, 2001). Thus the 80 words used consisted of 27 positive, 27 negative and 26 neutral words. The sets of words presented in each condition were balanced for the number of syllables (i.e. equal amounts of one and two syllable words), word frequency and valence (equal amounts of positive, negative and neutral words).
The participants’ speech was recorded on Cool Edit 2000 for Windows, which allowed the recordings to be normalised, pitch-shifted and edited into 80 individual wave files. A pitch shift of –4 semitones was used because it made the speaker’s voice more difficult to recognise without making the speech incomprehensible. A male researcher who was unknown to the participants recorded the words for the non-self condition (40 words in total). A researcher was chosen who used English received pronunciation.
A factorial design was used, with two levels for sources of speech (self, alien) and two levels of distortion (0, –4 semitones). There were 20 words in each of four speech conditions presented in the fMRI experiment (20 self undistorted, 20 self distorted, 20 alien undistorted, 20 alien distorted). The experimental manipulations were source of speech (self, alien) and distortion level (0, –4 semitones). Words were presented in a non-self (alien) voice as well as in the participant’s voice, to test whether any response bias was specific to self-generated words.
Patients underwent symptom assessment using the SAPS and SANS either the day before or on the day of the fMRI scan. Approximately 1 hour before scanning all participants were presented with a list of 80 words on a piece of paper and asked to read them aloud in a clear voice at a rate of approximately one word per second. Participants read all 80 words, even though half would subsequently be presented to them in another person’s voice; this was to ensure that participants could not make judgements based on source information during the task. They were not asked to remember the words. Their speech was recorded by a computer. The experimenter then edited the recordings so that 40 of the words were replaced by a recording of the same word spoken in another person’s voice, and 40 were pitch-shifted. The subsets of words that were replaced and pitch-shifted respectively were pre-designated (allocated so that the subsets subsets were matched for word length, frequency and valence). The same subsets of words were used for all participants. Once participants had been placed in the scanner a standardised instruction script was read out to them. Participants were told to listen carefully to each word and make a decision regarding the source of the speech; they were able to register a response of either ‘self’, ‘unsure’ or ‘ other’ by means of a button box. The option to register an unsure response was included to avoid participants having to make a forced choice between a self or alien source even when they were unsure.
Images were acquired in a 1.5 T Magnet (Signa LX; GE, Milwaukee, Wisconsin, USA) using a compressed gradient echo (Edmister et al, 1999), echoplanar image acquisition (Hall et al, 1999), with a time to repetition (TR) of 1.2 s (0.8 s of silence), flip angle 80°, time to echo (TE) 40 ms, 64 × 64 pixels, field of view 200 mm, slice thickness 7 mm and interslice gap 0.7 mm (voxel size 3.125 mm × 3.125 mm × 7 mm); 482 image volumes were acquired in two runs of 6 min each. Of the 482 images 80 were experimental events (20 in each speech condition) and the remainder were rest (i.e. no auditory stimulus was presented). Each whole-brain volume consisted of 14 axial slices parallel to the anterior–posterior intercommissural line.
Stimuli were presented in random order in an event-related design, with a variable interstimulus interval (4–12 s) following a non-gaussian random distribution (Poisson function peaking at 7 s) individually set for each condition (Dale, 1999). Image acquisition and stimulus presentation were synchronised by a transistor–transistor logic (TTL) pulse from the scanner to the computer used to present the stimuli and record the behaviour. The compressed acquisition permitted presentation of each word in in the the absence of acoustic scanner noise. Each response time was locked to the beginning of the word presentation.
Data were analysed with software developed at the Institute of Psychiatry, using a non-parametric approach. Data were first processed (Bullmore et al, 1999a) to minimise motion-related artefacts. Responses to the experimental paradigms were then detected by first convolving each component of the experimental design with each of two gamma variate functions (peak responses at 4 s and 8 s respectively). The best fit between the weighted sum of these convolutions and the time series at each voxel was computed using the constrained blood oxygen level dependent (BOLD) effect model suggested by Friman et al (2003). Following computation of the model fit, a goodness-of-fit statistic was computed. This consisted of the ratio of the sum of squares of deviations from the mean image intensity (over the whole time series) due to the model to the sum of squares of deviations due to the residuals (SSQ ratio). Following computation of the observed SSQ ratio at each voxel, the data are permuted by the wavelet-based method described and extensively characterised by Bullmore et al (2001). Using this distribution it is possible to calculate the critical value of SSQ ratio needed to threshold the maps at any desired type I error rate. The detection of activated voxels is extended from voxel to cluster level using the method described in detail by Bullmore et al (1999b). Events in the four experimental conditions (self, self distorted, alien and alien distorted speech) were contrasted against rest volumes for all participants.
The observed and permuted SSQ ratio maps for each individual, as well as the BOLD effect size maps, were transformed into the standard space of Talairach & Tournoux (1988) using the two-stage warping procedure described in detail by Brammer et al (1997). Group activation maps were computed by determining the median SSQ ratio at each voxel (over all individuals) in the observed and permuted data maps (medians are used to minimise outlier effects). Cluster-level maps were thresholded at less than one expected type I error cluster per brain. The computation of a standardised measure of effect SSQ ratio at the individual level, followed by analysis of the median SSQ ratio maps over all individuals, treats intra- and inter-individual variations in effect separately, constituting a mixed-effect approach to analysis which is deemed desirable in fMRI.
The analysis was performed using the brain activation data from each participant under each condition. The permutation-based analysis was performed by first determining the median change across all participants and between participant treatments. The treatment labels were then permuted and the median change computed. The use of median statistics renders this analysis robust to outlier data in individual cases. The data were then analysed using a non-parametric repeated-measures analysis of covariance (Bullmore et al, 1999b). The experimental conditions were defined according to the source of the speech (self or alien) and the level of distortion (undistorted or distorted). The data were analysed using a series of non-parametric factorial analysis of variance (ANOVA). We examined the main effect of speech source, distortion and their interactions with group. The effect of the emotional valence of the words on the fMRI data was not examined because it had no significant effect on behavioural results. To test for the interaction between the source of speech, level of distortion and group we examined the main effect of distortion on self speech and the interaction with group and the main effect of distortion on alien speech and its interaction with group. To examine the neural correlates of the misattribution of speech, we analysed the main effect of the accuracy of attribution (correct responses or misattributions errors). Events were categorised as correct or misattributions according to each participant’s behavioural response. Trials associated with unsure responses were excluded from this analysis. Maps of the difference in the effect size of the BOLD response associated with correct and incorrect attributions were generated. In this particular analysis the effect size statistic was used because the numbers of trials associated with correct and incorrect responses were not equal across conditions. The effect size statistic is relatively insensitive to differences in the number of responses per condition. Use of the effect size statistic also avoids the possibility that differences in BOLD response could reflect changes in the denominator of the statistic (noise) rather than signal, as can occur when using standardised statistics such as F, t or SSQ ratio. All between-group contrasts were covaried for NART premorbid IQ scores (using XBAM version 3.4; http://www.brainmap.co.uk/xbam.htm).
The demographic and clinical characteristics of the participants are shown in Table 1.
Analysis of variance was conducted for misattribution errors, defined as misidentifications of the source of the speech (i.e. an ‘other’ response when hearing their own speech or a ‘self’ response when hearing alien speech), excluding ‘unsure’ responses (Fig. 1). The data were analysed using an ANOVA for repeated measures.
Analysis of variance
For misattribution errors the main effects for source (F=6.00, d.f.=1,28, P=0.02), distortion (F=12.36, d.f.=1,28, P=0.002) and group (F=6.18, d.f.=2,28, P=0.006) were all significant. As there was a significant between-group variance in NART scores this variable was used as a covariate. After the inclusion of this covariate the between-subjects effect for group remained significant (F=4.67, d.f.=2,28, P=0.02). There was a significant interaction between the effects of source of speech and group (F=3.50, d.f.=2,28, P=0.04). A post hoc one-way ANOVA revealed a significant group difference in the self speech condition (F=11.24, d.f.=2,30, P<0.001). A Bonferroni t-test showed that those in the hallucinator group made significantly more misattribution errors than the participants in both the non-hallucinator (P=0.001) and control groups (P=0.001). There was no significant group difference in either of the alien speech conditions (for alien undistorted speech, F=0.09, d.f.=2,29, P=0.91; for alien distorted speech, F=0.21, d.f.=2,29, P=0.13). The interaction between source, distortion and group was nonsignificant (F=1.16, d.f.=2,28, P=0.32). All main effects and interactions involving valence were also non-significant.
Imaging data: task-related activation independent of condition
Performance of the task across all conditions and all groups (independent of performance) was associated with bilateral activation in the inferior frontal, anterior cingulate and superior temporal gyri, the brain-stem and the cerebellum.
Source of speech and group interaction
The main effect of source of speech is presented in Table 2. There was a significant interaction between the source of speech and group in the left superior temporal gyrus (Fig. 2(a,b)). Examination of the SSQ ratios from this region revealed that both the control group and the non-hallucinator group showed greater activation when processing alien speech compared with self speech. However, in the hallucinator group the response in this area was similar for alien and for self speech.
Distortion and group interaction
The main effect of distortion is shown Table 2. There was an interaction between the effects of distortion and group (Fig. 2a,c). In both the control group and the non-hallucinator group processing distorted relative to undistorted speech was associated with activation in the cingulate gyrus. In the hallucinator group the response in this region was unaffected by acoustic distortion (Table 2).
Effects of distortion on self and alien speech and group interactions
There were significant interactions between the effect of distortion on self speech and group in the left anterior cingulate and the right superior temporal gyrus (Fig. 3a,b; Table 3). In the cingulate gyrus both the control group and the non-hallucinator group showed greater activation when processing distorted v. undistorted self speech, whereas the opposite was true in the hallucinator group. In the right superior temporal gyrus the hallucinator group showed greater activation for distorted v. undistorted self speech, the converse was evident in the non-hallucinator group, and distortion had little effect on activation in the control group. The group interaction for the effect of distortion on alien speech was restricted to the right anterior cingulate gyrus (Table 3). In this region both the control group and the non-hallucinator group showed greater activation when processing alien speech that was distorted as opposed to undistorted. However, in the hallucinator group distortion had no effect on the level of activation in this region.
Main effect and group interaction for correct v. misattributed responses
For all participants correct responses (regardless of speech source or the level of distortion) were associated with greater activation in the middle temporal gyrus bilaterally relative to misattributions. No area was more activated in association with misattributions than with correct responses. There was an interaction between response accuracy (correct/misattribution) and group in the left middle temporal gyrus. In both the control and non-hallucinator groups there was greater activation for correct responses (correct identification of either self or alien speech) than for misattributions, whereas there was no difference in the hallucinator group. In order to test our specific hypothesis about activation being associated with external (self to alien) misattributions, the analysis was then restricted to the self speech condition (i.e. the correct identification of self speech v. its misattribution to an external source). Again there was an interaction with group in the left middle temporal gyrus, with the same patterns of activation as described above (Fig. 3c, Table 3). When the effect of response accuracy was examined in the alien speech condition alone there was no significant interaction with group.
Our study used fMRI to study the neural correlates of making self/non-self judgements about the source of pre-recorded speech in the presence and absence of acoustic distortion. We examined the effects of speech source and of distortion in patients with auditory verbal hallucinations, patients without such hallucinations and controls. In addition, by using event-related fMRI we were able to categorise the neural response to each word according to the accuracy of the self/non-self attribution and thus examine the correlates of external misattributions.
A tendency for patients with hallucinations to misattribute their own distorted speech to an alien source was first demonstrated using a paradigm in which participants overtly articulated single words and heard what they said in real time (Johns & McGuire, 1999). We used the same paradigm, except that participants heard the words but did not speak. As in a recent study using this modified version of the task, we found that patients with auditory verbal hallucinations also made more external misattributions than both the non-hallucinator group and the control group (Allen et al, 2004), particularly when their speech was distorted (although this did not achieve statistical significance in our study). This may reflect a lack of power, as the number of trials per condition was limited by the practicalities of the fMRI experiment.
Overall, the task activated a network of inferior frontal, temporal and cingulate regions as well as areas in the brain-stem and cerebellum. This is consistent with data from previous studies of voice processing (Binder et al, 2000) and a study of the same task in healthy volunteers (Allen et al, 2005). Within this network, across all three groups there were regions that were more activated when participants processed self-generated speech compared with alien speech and vice versa. However, the hallucinator group differed from both controls and the non-hallucinator group in the effect of the source of the speech on activation in the left superior temporal gyrus. In this region both the reference groups showed increased activation when listening to alien speech compared with self speech, whereas the activation in the hallucinator group was relatively unaffected by the source of the speech. Activation during the task was also influenced by the acoustic distortion of the stimuli. Again, there were significant differences in the effects of distortion between the hallucinators and the other two groups. In the control and non-hallucinator groups distortion was associated with the engagement of the anterior cingulate gyrus, but this effect was absent in the hallucinator group.
The above data suggest that when patients who were prone to hallucinations evaluated speech, the left temporal cortex and the anterior cingulate were differentially responsive to its source and its acoustic quality respectively relative to the reference groups. These findings are consistent with our hypothesis and with data from previous studies that have implicated these regions in schizophrenia (Shapleske et al, 1999; Carter et al, 2001) and the pathophysiology of auditory verbal hallucinations (Suzuki et al, 1993; Shergill et al, 2000a).
The group differences in the effects of source on the left superior temporal activation suggest that this region is normally sensitive to whether speech has been self or externally generated, but that this sensitivity might be impaired in patients who are prone to auditory verbal hallucinations. Interestingly, a difference in BOLD signal for the perception of one’s own actions, compared with the perception of the actions of another, has been reported in pre-motor areas (Grezes et al, 2004). This may be due to a closer match between stimulated and perceived action for self-generated actions. Although our study involved the auditory modality it is possible that a similar mechanism applies to the perception of self speech and the speech of another. Functional differences in processing in the secondary auditory cortex are of particular interest, because an impairment in the ability to distinguish self-generated from external speech is fundamental to most cognitive models of auditory hallucinations (Frith & Done, 1988; Seal et al, 2004).
The group differences in the effects of distortion on activation in the dorsal part of the anterior cingulate cortex occurred regardless of the source of speech. The caudal portion of the anterior cingulate is implicated in directed attention, response monitoring and selection (Corbetta et al, 1991; Carter et al, 1998). Its activation in association with distortion may thus have reflected increased engagement of these processes in response to stimuli that become more difficult to perceive as a result of the pitch shift. The failure of patients with hallucinations to activate the anterior cingulate in the presence of distortion may thus reflect impairments in these cognitive processes. However, when the effect of distortion was restricted to self-generated speech an interaction with group was observed in the right superior temporal gyrus. In this region patients with hallucinations showed increased activation to distorted self-generated speech. The basis of the increased activation is unclear, but it could reflect altered modulation from other regions that are themselves differentially engaged in this group during this condition, such as the anterior cingulate. Furthermore, several studies have reported that patients with schizophrenia demonstrated relatively greater activation of the right temporal gyrus cortex (compared with the left) when listening to normal speech, and this may reflect a disruption in left lateralisation of language function seen in right-handed individuals (Woodruff et al, 1997).
Information on the neural correlates of misattributions themselves was obtained by comparing activity associated with misattributions and correct responses. When participants in the hallucinator group made external misattributions (when processing their own speech) these were associated with activation in the left middle temporal gyrus, whereas in the control and non-hallucinator groups there was a greater left temporal response when participants correctly identified their own speech. This distinction between the groups was specific to external misattributions, as there were no group difference in activation when participants misidentified alien speech as their own (internal misattributions).
Both the behavioural and neuroimaging results of our study are similar to those reported using a version of the task that involved participants articulating the words aloud (McGuire et al, 1996; Fu et al, 2001). Thus, in both cases, patients with hallucinations tended to make external misattributions when processing their own distorted speech, and this misattribution was associated with activation of the temporal cortex relative to the correct recognition of self-generated speech. The overall similarity of the results despite the absence of an efference copy component in this study suggests that the differences between the hallucinator groups and the other groups might be related to impairment with the evaluation of auditory verbal material, rather than defective corollary discharge. For example, patients with auditory verbal hallucinations usually have delusions, and delusions are associated with abnormalities of reasoning manifested as a tendency to ‘jump to conclusions’ (Garety et al, 1991). Indeed, recent behavioural work suggests that misattribution errors on verbal self-monitoring tasks may be related to delusions rather than to hallucinations (Johns et al, 2006). However, this finding was not replicated in our study.
The study has some limitations. Although it focused on how biased judgements might contribute to the experience of externality, it does not explain how the events that are being judged occur in the first place. Contemporary models of hallucinations propose that they arise through the combination of the generation of anomalous experiences and problems in the appraisal of these experiences (Seal et al, 2004; Ditman & Kuperberg, 2005) The biased judgement of sensory material could also contribute to other symptoms, such as delusions: in this case faulty judgements might lead to the misinterpretation of external events such as other people’s behaviour. The coincidence of auditory hallucinations and delusions in schizophrenia is consistent with these symptoms sharing cognitive mechanisms. Second, it is possible that attentional problems may contribute to the tendency to make misattribution errors. The patient groups did not differ on a measure of SANS attentional problems; however, a more rigorous assessment of attentional impairments would have helped to exclude this possibility. The attenuated anterior cingulate response observed in the hallucinator group may reflect problems in these domains. Furthermore, there are strong reciprocal connections between the anterior cingulate and temporal cortex (Petrides & Pandya, 1988). It is possible that the superior temporal gyrus response seen in the hallucinator group is associated with altered ‘top down’ modulation of this region by the anterior cingulate (Fletcher et al, 1999). Although the causation is speculative, it is possible that impaired anterior cingulate modulation of the temporal cortices is associated with making faulty source judgements about perceived speech. The functional integration between the cingulate and temporal cortices could be tested in future work examining the effective connectivity between regions and how this altered in patients with hallucinations.
In summary, external misattributions of speech in patients with hallucinations can occur independently of any self-monitoring deficit, suggesting that hallucinations may be related to problems with the conscious evaluation of verbal material rather than the breakdown of an ‘efferent copy’. This impairment was associated with the abnormal engagement of the temporal cortex along with the anterior cingulate. Although the study involved the evaluation of external rather than inner speech (which is more relevant to verbal hallucinations), it is possible that the same mechanisms are used to appraise internal and external speech.
- Received April 25, 2006.
- Revision received June 27, 2006.
- Accepted September 25, 2006.
- © 2007 Royal College of Psychiatrists