REVIEW ARTICLES |
Department of Psychiatry, University of Hull, Hertford Building, Cottingham Road, Hull HU6 7RX. Email: a.m.mortimer{at}hull.ac.uk
Declaration of interest A.M.M. has received funding from several pharmaceutical companies.
|
|
|---|
Aims To appraise the usefulness of symptom rating scales in evaluating the outcome of people with schizophrenia.
Method Literature on the use of the Brief Psychiatric Rating Scale (BPRS) the Positive and Negative Syndrome Scale (PANSS) and the Clinical Global Impression (CGI) in schizophrenia research was studied.
Results Scales were designed to make diagnoses, to categorise patients, syndromes or both, and to demonstrate antipsychotic efficacy, as well as to measure outcome. There is much redundancy both between and within scales. Early work suggests limited concurrent validity with external outcome variables. Data are at best ordinal and there are particular difficulties in equating outcome with percentage changes in scores. The concept of remission, which uses absolute item score thresholds with a duration criterion, is a promising outcome measure.
Conclusions Symptom rating scale scores can only comprise a limited part of outcome measurement. Standardised remission criteria may present advantages in outcome research.
|
|
|---|
Outcome is not a unitary construct defined simply by lack of symptoms: personal and social function, cognition and quality of life must be of substantial relevance. Other aspects such as economic outcome, although important to commissioners and providers of services, might be of limited consequence to clinicians and patients, who naturally focus on professional and consumer (satisfaction) standpoints respectively. Hence, outcome evaluation applied to services differs from that applied to patients.
Symptom rating scales in schizophrenia were not initially designed to assess the efficacy of antipsychotic drug treatments. Nevertheless, they have been used in this role more than any other. This is not surprising as antipsychotic drugs are used primarily to control patients' symptoms; the underlying neuroscience is consistent with this, and not with any direct therapeutic effects on cognition, personal and social function, or quality of life (unless mediated by symptom control). Although such distal effects have been proposed, there are numerous independent variables which influence these aspects of outcome (e.g. upbringing, premorbid personality and adjustment, intellect and mood, social circumstances and availability of a support network). Furthermore, it has been proposed that antipsychotic drugs, particularly conventional antipsychotics, have little effect on negative symptoms of schizophrenia. Negative symptoms are one of the most clinically important targets, and overlap with cognition and function (Mortimer & Spence, 2001).
|
|
|---|
Limitations
Symptom rating scale data can never be anything more than ordinal; the
overall total of symptom item scores will often lump together categorical
data, containing symptoms associated in clusters, such as the positive,
negative and disorganisation syndromes. Specific syndrome scores derived from
scales may have more utility than the total score regarding an overall
perspective. Current thinking includes that schizophrenia syndromes may
comprise positive (disorganisation and reality distortion) and negative
categories, with non-negative affective symptoms (mostly depressive) in a
significant minority of patients. Consequently three or four syndrome scores
in the context of a defined range may give a reasonable `snapshot' of a
patient's current clinical status. Such quantification may inform judgement
regarding aetiology, treatment and prognosis
(Van Os et al, 1996).
For example, negative symptoms are known to have adverse consequences for
personal and social function and cognition
(Rocca et al, 2005).
By contrast, even extensive, but isolated, reality distortion may generate
minimal functional consequence, whereas disorganisation syndrome is usually
very disruptive (Schuldberg et
al, 1999). Depression may arise from several sources, with
varying outcome (Emsley et al,
1999). Such data have implications for treatment interventions.
The Clinical Global Impression–Schizophrenia scale (CGI–SCH;
Haro et al, 2003)
represents, conceivably, a step in this direction although its positive,
negative, depression and cognitive scores are rated according to judgement of
severity rather than from items comprising these syndromes.
The value of symptom item or even syndrome score totals per se is increasingly questioned in the determination of outcome status. A more patient-centred definition of outcome, stressing personal and social function, is often viewed as more practical than the presence or absence of esoteric phenomena (symptoms), which may have little bearing on subjective experience or uptake of healthcare. Influential work has attempted to explore the meaning and consequences of delusions and hallucinations for patients (Chadwick & Birchwood, 1995), but scales derived from this work are not in widespread use outside the research setting. Self-administered symptom scales have been developed (Hamera et al, 1996) but again these have not found wide usage, in contrast to the emphasis on patient-rated quality of life as an outcome. Clinicians increasingly seek treatment outcomes such as degree of independent living, time to discontinuation of medication, and time to relapse and rehospitalisation rather than changes in symptom rating scale scores (Tiihonen et al, 2006).
Concurrent validity
The question remains whether any rating scale (or factorial components of
it) demonstrates sufficient concurrent validity to predict these external
outcome variables. Operational definitions of remission may achieve this.
These consist of multiple item threshold rather than factorial scores, with
the addition of a duration condition. In the absence of concurrent validity
with other outcome measures, symptom rating scales can only constitute a small
part of the appraisal of overall outcome. Symptom rating scales will answer
the question `Did the antipsychotic drug work on this patient's symptoms?' as
opposed to `What is this patient's outcome?'
Marshall et al, 2000
emphasise that the use of unpublished rating scales in controlled trials is
associated with consistent claims of superiority of new treatments and that
familiar, well-validated scales may give a more accurate answer.
|
|
|---|
Brief Psychiatric Rating Scale
The Brief Psychiatric Rating Scale (BPRS;
Overall & Gorham, 1962) is
a one-page, 16- or 18-item rating scale which was developed more than 40 years
ago. It assesses a range of psychotic and affective symptoms rated from both
observation of the patient and the patient's own report. The original purpose
of the BPRS was the rapid evaluation of clinical change irrespective of origin
(e.g. natural remission or treatment response) in the broad range of
psychiatric patients, not just those with schizophrenia. It was not,
therefore, specifically designed as an outcome measure; the authors hoped that
the scale would develop into a diagnostic instrument, which they considered of
greater long-term value than detecting change. Standard definitions of outcome
were developed later, e.g. `consumer outcome is the effect on a patient's
health status attributable to an intervention by a health professional or
health service' (Andrews et al,
1994). Even so, the authors later stated that the BPRS was
designed to fill a special need in clinical psychopharmacology research, at
the inception of the Early Clinical Drug Evaluation Units of the National
Institute of Mental Health in the USA
(Overall & Gorham,
1988).
Extent of use and adaptation
The BPRS has perhaps been used more extensively than any other symptom
rating scale, in many diagnostic groups and for a wide range of purposes. It
is highly sensitive to change, and excellent interrater reliability can be
achieved with training and a standard interview procedure
(Overall & Rhoades, 1982).
As well as the evaluation of efficacy of several classes of psychotropic
medication (Hedlund & Vieweg,
1980; Overall & Rhoades
1982; Perry et al,
1997; Hamilton et al,
1998), the BPRS has been used extensively to compare diagnostic
concepts internationally and in epidemiological studies
(Delmonte et al,
1970; Engelsmann &
Formankova, 1967; Engelsmann
et al, 1970; Overall
& Beller, 1984). It has been translated into many languages
and frequently modified for specific purposes, including for use with children
(Overall & Pfefferbaum,
1982; Emslie et al,
1997). It has been expanded to 24 items to make it more
comprehensive in the area of psychotic and affective symptoms, with items on
bizarre behaviour, suicidality, self-neglect, elevated mood, distractability
and motor hyperactivity (Ventura et
al, 2000). The BPRS has been demonstrated as reliable for use
by nursing staff, increasing its utility
(McGorry et al,
1988). Most adaptations of the BPRS use one of two scoring
versions for each item (either a 0- to 3-point or a 0- to 7-point scale.
Limitations
The factor structure of BPRS responses depends upon the characteristics of
the patient group under study, and the version being used. The BPRS was, until
the advent of the Positive and Negative Syndrome Scale (PANSS;
Kay et al, 1987)
which itself is partially derived from the BPRS, the most widely used scale in
schizophrenia research. This reflected its broad coverage of typical
schizophrenia phenomena in the positive, negative and disorganisation
categories. However, its coverage of the negative syndrome has been
criticised; there are only three negative syndrome items, and it has been
suggested that a more extensive scale is necessary for sensitivity to change
(Eckert et al,
1996).
The authors themselves were dismissive of the use of their scale to determine differences between specific symptoms or syndromes during treatment, stating that `Although psychiatric symptomatology is multidimensional, the difference between pre-treatment pathology and post-treatment pathology (or lack of it) can be represented by a single dimension spanning the multivariate space' (Overall & Gorham, 1988). Despite this, with the assistance of 20 psychiatrists, they gave 13 different weights to each item according to diagnosis, in order to increase or reduce the relevance of treatment effects to the total score. For instance, the score on item 8, `grandiosity', would be multiplied by a 0 in a patient with depression and by 3 in a patient with paranoia. This complex and somewhat arbitrary scoring system appears never to have been taken up.
Clinical Global Impression
The CGI is not strictly a symptom rating scale but is included because of
its wide use, influence and the recent development of forms specific to the
schizophrenia syndromes (CGI–SCH). The original version is a simple
instrument which rates the overall severity of any mental disorder
(Guy, 1976). This is rated
entirely according to clinical judgement in routine professional practice, on
a scale for the overall current severity of symptoms from 1 (healthy, not ill)
to 7 (among the most severely ill). There is also a 7-point scale for global
improvement (usually from baseline to the current condition), rating from 1
(very much improved) to 7 (very much worse). The CGI has been used in several
efficacy and effectiveness studies in schizophrenia, is sensitive to change
and correlates well with changes assessed with more complex scales
(Haro et al, 2003;
Leucht & Engel, 2006;
Leucht et al, 2006;
Rabinowitz et al
2006).
The main criticism levelled at the CGI, that it lacks standard definitions (Beneke & Rasmus, 1992), reflects what many consider its main strength – the use of an adequate level of clinical judgement. Its brevity, utility and appeal to clinical commonsense have ensured its continued use over many more complex rating scales. The CGI has been adapted for the assessment of bipolar affective disorder (CGI–BP) and schizophrenia (Spearing et al, 1997; Haro et al, 2003). The CGI–SCH has demonstrated good reliability and validity in the evaluation of severity of positive, negative, depressive and cognitive symptoms, and is recommended for both research and clinical practice.
Positive And Negative Syndrome Scale
The PANSS (Kay et al,
1987,
1988,
1989) originated from a
growing need to reduce the heterogeneity of what was known about
schizophrenia. Crow's (Crow,
1980) positive–negative dichotomy presented a promising
theoretical model for explaining and understanding variability in the
aetiology of schizophrenia, treatment and prognosis. However, attempts to
utilise the model in practice met with inconsistent results
(Andreasen, 1982;
Andreasen & Olsen, 1982;
Pogue-Geile & Harrow,
1984; Lindenmayer et
al, 1986), and it was suggested that this might be because of
the lack of a comprehensive rating scale for positive and negative symptoms
that was feasible, accurate, well validated, reliable, sensitive and
standardised. The PANSS, therefore, was not developed to assess outcome
per se, or even the results of treatment interventions.
Nature and scoring
The PANSS is a 30-item 7-point (1–7) rating scale which amalgamated
the 18-item BPRS and 12 items from the Psychopathology Rating Schedule
(Singh & Kay, 1975). The
items were precisely defined, as were anchor points for the numerical rating
of each item. The PANSS was divided into positive, negative and general
psychopathology sub-scales (a `manic' sub-scale was later derived;
Lindenmayer et al,
2004) and trialled on over 100 well-characterised patients with
chronic illness. Sub-scale scores were shown to be normally distributed and
independent of each other; they were robust to the effects of mood,
chronicity, medication side-effects and cognition. The PANSS was furthermore
sensitive and specific regarding pharmacological manipulation of the levels of
both positive and negative symptoms in patients with schizophrenia. The
validity of its sub-scales was confirmed in an exploration of a classification
of patients by predominant symptom class. Sub-scale scores were associated
with a number of clinical, treatment and cognitive variables, including
premorbid adjustment (Krauss et
al, 1998), but not outcome. One of the strengths claimed for
the PANSS is consistency in scoring individual patients over time and illness
course. A potentially confusing feature of the PANSS, however, is that even
those without any mental ill health will score 30. In effect, this means that
30 must be subtracted from the patient's score in order to gain a meaningful
understanding.
Correlations and factors
Several studies have sought correlations between PANSS total and sub-scale
scores, and other aspects of the illness, to demonstrate concurrent validity.
Other aspects have included ventricular enlargement and cortical atrophy
(d'Amato et al,
1992), work performance (Bell
et al. 1992), neuropsychological impairment
(Bell et al, 1994;
Liu et al, 1997;
Mass et al, 2000;
Bozikas et al, 2004;
Good et al, 2004;
Ritsner et al, 2006)
and violent behaviour (Steinert et
al, 2000). Overall these findings appear not to be
sufficiently convincing as to be of clinical use, and PANSS scores have
generally not been used as proxy variables. For example, when PANSS
`cognitive' items were used to predict global cognitive function 66% of the
variance was unexplained, suggesting that the PANSS lacked sensitivity and
specificity in this regard (Good et
al, 2004). This approach appears not to have generated
further research hypotheses.
Factorial validity (the nature and purity of the syndromal components of the scale) is essential to the success of investigations utilising sub-scale scores. There are many reports on the factor (syndrome) structure of PANSS items, with much controversy over whether data best fit a three-, four-, five- or even six-factor solution (Peralta & Cuesta, 1994; Lindenmayer et al, 1994; Wolthaus et al, 2000; Fresan et al, 2005; White, 2005; Van den Oord et al, 2006). The simplest factor solutions comprise a syndrome made up of negative symptom items (psychomotor poverty syndrome), a syndrome made up of delusions and hallucinations (reality distortion syndrome) and a syndrome made up of thought disorder and inappropriate affect symptom items (disorganisation syndrome). Although several five-factor models have been proposed, none has been validated by confirmatory factor analysis (van der Gaag et al, 2006a). This might reflect the ambiguous definitions of some symptom items, such as lack of judgement and insight, which have more than one cause in schizophrenia.
Another complication is that the depression sub-scale (unlike the Calgary Depression Scale; Addington et al, 1992) is unable to distinguish between depression, negative symptoms and extrapyramidal side-effects (Collins et al, 1996). Negative factor scores have been found to correlate with an independent depression rating instrument (Montgomery Åsberg Depression Rating Scale), although depression factor scores did as well (Wolthaus et al, 2000). The loading of single items by multiple causes, which was suggested in another study (Van den Oord et al, 2006) was confirmed in a statistically novel analysis (van der Gaag et al, 2006b).
Only if syndromes possess concurrent validity with other aspects of schizophrenia such as cognitive impairment and poor social function, and furthermore fit explanatory data, can they represent clinical reality. The implication for the rating scale is that items which load on more than one factor must be replaced by two or more items, each of which load on a single factor, which results in lengthier scales. The alternative is losing data through deletion of such items. Poor fit suggests that correlations between syndrome scores and other illness variables under investigation, including outcome, might be unreliable.
|
|
|---|
Even in randomised placebo-controlled trials for licensing purposes, the use of changes in rating scale scores may lack good face validity. Many trials evaluate clinical response as a percentage change in scores over the treatment period. Equating a 20% improvement in symptoms with response follows the study of Kane et al (1988) which compared clozapine and chlorpromazine in treatment-resistant patients with severe illness. This relatively low percentage reflects the fact that in patients with severe illness even a fairly small attenuation of symptoms might be clinically valuable. The 20% definition of response might not, however, be generalisable to the majority of acute trials with non-resistant patients. Relying on percentage point change to indicate recovery ignores the importance of baseline levels. A 20% reduction of a PANSS score of 100 is double a 20% reduction of a PANSS score of 50, yet both might be recorded as a `clinical response'. The patient with a baseline PANSS score of 100 would, although fulfilling criteria for response with a score of 80, remain severely ill, (albeit noticeably less so), whereas the patient with a baseline score of 50 would remain mildly ill with a score of 40 and perhaps not even be noticeably different.
Concurrent validity
Leucht et al,
2005a addressed the issue of what rating scale scores
mean in clinical terms. They used an equating procedure to anchor BPRS scores
to CGI categories (both severity and improvement) across seven drug trials
which used both scales in patients with acute schizophrenia. Clinician-rated
`minimal improvement' on the CGI equated to a 30% improvement on the BPRS
(substantially greater than the generally accepted standard for response).
`Much improvement' after 4 weeks of treatment equated to a fall in the BPRS
score of almost 58% (Table 1).
In addition they found that clinicians used only a small part of the BPRS
score range of 18–126: patients with minimum illness on the CGI scored
31, those with moderate illnesss scored 41 and those with severe illness 53.
This is probably because patients are only assessed on a minority of the items
and upon most they are scored zero.
|
View this table: [in a new window] |
Table 1 Clinical implications of BPRS scores
|
Using the same approach with the PANSS (Leucht et al, 2005b) they found that `mildly ill', `moderately ill', `markedly ill' and `severely ill' according to the CGI equated to total PANSS scores of 58, 75, 95 and 116 respectively (Table 2). At 6 weeks, to achieve CGI ratings of `minimally improved' and `much improved' the PANSS decrements were 28% and 53%. The authors suggested that response ought to be defined as a 50% improvement in PANSS score, although in treatment-resistant groups a decrement of 25% might suffice.
|
View this table: [in a new window] |
Table 2 Clinical implications of PANSS scores
|
A later study (Leucht et al, 2006) compared the PANSS and BPRS with each other and with the CGI and replicated the findings overall, emphasising that smaller absolute score reductions equated to perception of improvement in patients with severe illness compared with those with mild illness (Table 3). For a reduction of 1 point on the CGI Severity of Illness scale there were decreases of 15 and 10 on the PANSS and BPRS respectively.
|
View this table: [in a new window] |
Table 3 CGI Global Improvement in relation to absolute reductions in PANSS and BPRS
scores
|
A similar study (Cramer et al, 2001) found that clinician-rated `improved' and `much better' patients had PANSS scores lowered by 21 and 45% respectively. Quality of life scores were also increased by similar degrees (26 and 50%). This is consistent with the Leucht et al (2006) study, and perhaps demonstrates some concurrent validity of the PANSS with subjective quality of life as an outcome. A further report indicated that a decrement of 20% on the PANSS equated to a 1-point severity decrease on the CGI–SCH (Rabinowitz et al, 2006).
|
|
|---|
The Remission in Schizophrenia Working Group was convened in April 2003 to develop a consensus definition of remission in schizophrenia (Andreasen et al, 2005). Taking precedents in physical medicine and affective disorder, remission should be defined as low or mild symptom levels (which by definition do not influence behaviour) and which should last for a minimum, defined duration. Such a standardised definition, unlike several previous published definitions, could be applied across treatment studies and would permit immediate, transparent comparison. This approach does, however, require attention to levels of baseline severity across studies.
The Working Group aimed to map the chosen remission symptoms, which had to be rated mild or less, onto the three best validated syndromes of schizophrenia (reality distortion, disorganisation and negative symptoms) and the five DSM–IV criteria for schizophrenia (delusions, hallucinations, disorganised speech, disorganised or catatonic behaviour, negative symptoms, American Psychiatric Association, 1994). They picked appropriate items from the BPRS, the PANSS the SAPS and the SANS (Table 4).
|
View this table: [in a new window] |
Table 4 Proposed items for remission criteria with cross-scale correspondence and
relationship to historical constructs of psychopathology dimensions and DSM-IV
criteria for
schizophrenia1
|
The BPRS, with limited coverage of negative symptoms, was perhaps less useful in determining remission. The Working Group set 6 months as the minimum duration of symptoms remaining mild for the patient to qualify for remitted status.
Use of remission criteria
Remission is already being used in attempts to test efficacy of drugs in
`head to head' comparisons by re-analysing existing data
(Sethuraman et al,
2005). A study of stable patients using PANSS-based remission
criteria demonstrated that nearly 70% were not in remission; 20% achieved
remission when switched to depot treatment and 85% of those already in
remission remained so a year later on depot
(Lasser et al, 2005).
Application of the criteria to data from other published studies produced
similar findings (Gharabawi et
al, 2005; Kissling et
al, 2005). In all studies remission was associated with PANSS
total and subtotal scores, CGI–SCH scores, functioning and quality of
life. Moreover, an analysis of six clinical trials comparing two definitions,
one PANSS based and the other BPRS/CGI based, found that achievement of
remission using either definition was associated with better quality of life
(Dunayevich et al,
2006). This was particularly so if remission was sustained.
Nevertheless, total BPRS change score still contributed the greatest part of
the variance in quality of life.
Two reviews of the Working Group remission criteria (Nasrallah, 2006; Van Os et al, 2006) proposed that the definition was conceptually viable and feasible in both clinical trials and clinical practice. Both reviews considered that the use of remission criteria would raise clinical expectations and drive clinical services to achieve and document better outcomes. In clinical trials, the concept should improve the quality of methodology and data reporting, while extending its relevance to cognition and functional outcomes in patients. The advantages of remission derive from adding duration to absolute symptom score thresholds, and avoiding percentage change scores (a hitherto dubious benchmark).
|
|
|---|
These limitations have led to interest in another perspective on outcome, remission. This is based on accepted practice in medicine and other psychiatric disorders, such as affective disorders, and goes beyond rating scale scores alone. Its utility, however, remains to be seen. There are already indications that remission may be short lived in many patients (Dunayevich et al, 2006). Until recovery can be defined accurately in schizophrenia (Leucht & Lasser, 2006) symptom control, remission and quantified cognitive, personal and social functioning should be used together as measures of treatment outcome. This accepts that outcome has multiple facets, which vary in importance between patients. Symptom rating scales play an important role in overall appraisal of outcome, but should not dominate the picture, which still requires meaningful appraisals of cognition, personal and social functioning.
|
|
|---|
This article has been cited by other articles:
![]() |
T. Burns Evolution of outcome measures in schizophrenia The British Journal of Psychiatry, August 1, 2007; 191(50): s1 - s6. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||