Temporal course of auditory hallucinations
Sukhwinder S. Shergill, Mick J. Brammer, Edson Amaro, Steve C. R. Williams, Robin M. Murray, Phillip K. McGuire


Summary We used functional magnetic resonance imaging to examine how brain activity associated with auditory verbal hallucinations in schizophrenia changed during hallucinatory events. Activation in the left inferior frontal and right middle temporal gyri was evident 6–9 s before the person signalled the onset of the hallucination, whereas activation in the bilateral temporal gyri and the left insula coincided with the perception of the hallucination. This supports the hypothesis that during hallucinations activation in cortical regions mediating the generation of inner speech may precede the engagement of areas implicated in the perception of auditory verbal material.

Auditory verbal hallucinations are a cardinal feature of schizophrenia. Their pathophysiology is unclear; one model proposes that they occur because self-generated inner speech is misperceived as externally generated speech (Frith & Done, 1989) as a result of a failure to recognise the internal nature of the former. Another theory suggests that a primary generator of activity within the auditory cortex (similar to an epileptiform focus) gives rise to these hallucinations (David, 1994). Recent neuroimaging studies suggest that both speech generation and perception areas are activated during auditory verbal hallucinations (Dierks et al, 1999; Lennox et al, 1999; Shergill et al, 2000), but the sequence in which these areas are activated remains unclear. One case study suggested that activation of the temporal cortex was evident 3 s before the reporting of auditory verbal hallucinations (Lennox et al, 1999), and patients with Charles Bonnet syndrome demonstrated visual cortical activation preceding the perception of visual hallucinations by 12 s (Ffytche et al, 1998).


We successfully studied two male dextral patients with DSM–IV schizophrenia. Both were experiencing frequent and intermittent auditory verbal hallucinations. We screened six other patients, but three failed to hallucinate in the scanner, and the pattern of the reported hallucinations did not permit examination of the time course in the other three (as their epochs of hallucinations were not separated by the required minimum of 9 s). The first participant was 47 years old with a 22-year history of illness, and was being treated with clozapine, amisulpride and sodium valproate. The second was 26 years old, had a 6-year history of illness and was being treated with olanzapine. In both cases the hallucinations involved people making derogatory remarks to the patient, the majority expressed in the second person. All eight patients gave informed consent to participate in the study, which was approved by the local ethics committee.

Image acquisition and analysis

Participants were scanned at rest (while they were intermittently hallucinating). They were asked to press a button with their left index finger at the onset of a hallucination and to release the button when it stopped. This was repeated for every hallucination they experienced during the 5 min session. Gradient-echo echoplanar magnetic resonance (MR) images were acquired using a 1.5 tesla GE Signa System (General Electric, Milwaukee, WI, USA) fitted with Advanced NMR hardware and software (ANMR, Woburn, MA, USA) at the Maudsley Hospital, London. In each of 14 non-contiguous planes parallel to the intercommissural (anterior–posterior) plane, 100 T2*-weighted MR images depicting blood oxygen level-dependent (BOLD) contrast were acquired, with time to echo 40 ms, time to repetition 3000 ms, in-plane resolution 3.1 mm, slice thickness 7 mm and slice skip 0.7 mm in a 5 min run. At the same session a 43-slice, high-resolution inversion recovery echoplanar image of the whole brain was acquired in the intercommissural plane (time to echo 73 ms, inversion time 180 ms, time to repetition 16 000 ms, in-plane resolution 1.5 mm, slice thickness 3 mm).

The data were first realigned to minimise motion-related artefacts (Bullmore et al, 1999), corrected for slice timing and smoothed using a Gaussian filter (full-width half-maximum 7.2 mm). Responses to the experimental paradigms were then detected by time-series analysis using gamma variate functions (peak responses at 4 s and 8 s) to model the BOLD response. The analysis was implemented as follows (Brammer et al, 1997). First, each experimental condition was convolved separately with the 4 s and 8 s Poisson functions to yield two models of the expected haemodynamic response to that condition. The weighted sum of these two convolutions that gave the best fit (least squares) to the time series at each voxel was then computed. Following this fitting operation, a goodness-of-fit statistic was computed at each voxel. This was the ratio of the sum of squares of deviations from the mean intensity value due to the model (fitted time series) divided by the sum of squares due to the residuals (original time series minus model time series). This statistic is called the sum of squares (SSQ) ratio. In order to sample the distribution of SSQ ratio under the null hypothesis that observed values of SSQ ratio were not determined by experimental design (with minimal assumptions), the time series at each voxel was permuted using a wavelet-based resampling method described in detail by Bullmore et al (2001). This process was repeated ten times at each voxel and the data combined over all voxels, resulting in ten permuted parametric maps of SSQ ratio at each plane for each participant. Combining the randomised data over all voxels yields the distribution of SSQ ratio under the null hypothesis. Voxels activated at any desired level of type I error can then be determined obtaining the appropriate critical value of the SSQ ratio from the null distribution. The observed and randomised SSQ ratio statistic maps were then transformed into standard space. Median SSQ ratio maps for the two participants were constructed at the P<0.005 level of significance. The early and late phases of auditory verbal hallucinations were examined by repeating the above analysis after shifting the hallucination log (indicated by the button-press) with respect to the functional MR time series in steps of one scan (shifts of -9 s, -6 s, -3 s, +6 s and +9 s), following the method described by Ffytche et al (1998).


Each auditory verbal hallucination lasted an average of 16 s (range 3–42) with an average silent (inter-hallucination) interval of 34 s (range 9–75); each participant made six button-presses during the 5 min investigation. The main areas activated before the reporting of an hallucination (relative to non-hallucinating events) were the left inferior frontal gyrus and the right middle temporal gyrus (Fig. 1). As the individual became aware of the hallucination, this activation extended to the left insula as well as the left inferior frontal gyrus, and to the middle and superior temporal gyri bilaterally. There was also activation in the right middle frontal gyrus and the right sensorimotor cortex (probably related to the action of button-pressing). After the hallucination had subsided the activation in the insula persisted and there was additional involvement of the orbitofrontal cortex (Fig. 1). Activation within most of the above regions was evident in both the individual activation maps.

Fig. 1

Ascending transverse sections through the brain from left to right, with the right side of the brain shown in the left side of each image, at -18, -7, -2, +26 and +42 mm relative to the intercommissural plane, illustrating the areas activated before (top row), during (middle row) and after (bottom row) hallucinations. (a) Regions of significant activation occurring 9 s before the button-press signalling the onset of the hallucination are shown in yellow and include the left inferior frontal gyrus (Talairach coordinates [x,y,z]-23, 22, -7), the right middle temporal gyrus (55, -28, -7) and posterior cingulate gyrus (6, -50, 26); (b) activation coinciding with the button-press involves the middle temporal gyri bilaterally (-58, -17, -7; 52, -36, -2), the retrosplenial cortex (9, -28, -2), the anterior cingulate gyrus (0, 33, 26) and right sensorimotor cortex (40, -17, 42); (c) activation 9 s after the button-press in the bilateral orbitofrontal cortex (32, 39, -18; -17, 44, -18).


These results demonstrate activation of the left inferior frontal gyrus prior to the perception of auditory verbal hallucinations, with activation in the temporal cortex mainly occurring when the participant subsequently perceived auditory speech. As the left inferior frontal region is normally activated during the generation of inner speech (Shergill et al, 2001), this is consistent with the notion that these hallucinations result from the misidentification of self-generated verbal material (Frith & Done, 1989). The timing of the activation in the temporal cortex suggests that these regions are more involved in the actual perception of auditory hallucinations. The hallucination may thus begin with the generation of auditory verbal material in the left inferior frontal cortex, followed by conscious awareness of external speech coincident with the subsequent engagement of temporal cortical areas (Dierks et al, 1999; Lennox et al, 1999; Shergill et al, 2000), perhaps reflecting direct communication through frontotemporal connections (Shergill et al, 2002).

  • Received January 12, 2004.
  • Revision received July 2, 2004.
  • Accepted July 9, 2004.


View Abstract