Routine use of mental health outcome assessments: choosing the measure


Background There is little consensus about which outcome measures to use in mental healthcare.

Aims To investigate the relationship between the items in four staff-rated measures recommended for routine use.

Method Correlation analysis of total scores and factor analysis using combined data from the Health of the Nation Outcome Scales (HoNOS). The Camberwell Assessment of Need Short Appraisal Schedule (CANSAS), the Threshold Assessment Grid (TAG) and the Global Assessment of Functioning (GAF) were performed. Procrustes analysis on factors and scales, and Ward's cluster analysis to group the items, were applied.

Results The total scores of the measures were moderately correlated. The Procrustes analysis, factor analysis and cluster analysis all agreed on better coverage of the patients' problems by HoNOS and CANSAS.

Conclusions A global severity factor accounts for 16% of the variance, and is best measured with TAG or GAF. The CANSAS and HoNOS each provide a detailed characterisation of the patient; only CANSAS provides information about met needs.

Pressure to use outcome measures in routine clinical practice is increasing (Department of Health and Aged Care, 1999). However, the majority of psychiatrists in the UK do not routinely measure patients' care needs and outcomes in a standardised way (Gilbody et al, 2002). Concern about the psychometric properties of available outcome measures has been one reason; however, in recent years outcome measures subjected to adequate psychometric evaluation and explicitly intended for routine use have emerged (Stedman et al, 1997; Thornicroft et al, 2005). This study compared the results from four staff-rated measures recommended for routine clinical use. We had two goals: to identify the extent to which there is overlap in the information provided by these outcome measures; and to make recommendations about which outcome measures provide the most clinically relevant information for adult mental health services.



Ten mental health teams (eight community mental health teams, one day service team and one older adults team) throughout London participated in the study between 1999 and 2000 (Slade et al, 2002). The teams' catchment areas were chosen to maximise generalisability and consisted of three inner-city, five outer-city and two suburban sites. These areas had levels of deprivation measured by the Mental Illness Needs Index (mean 100; higher scores indicate greater deprivation) varying from 98 to 124 (Glover et al, 1998).


The Health of the Nation Outcome Scales (HoNOS; Wing et al, 1998) assess social disability in 12 domains (see Table 3); each is scored from 0 (no problem) to 4 (severe to very severe problem), and the HoNOS total score is the sum of the 12 domains (Wing et al, 1998).

View this table:
Table 3

Factor analysis of TAG, HoNOS and CANSAS unmet need items (weight > 0.35 shown)

The Camberwell Assessment of Need Short Appraisal Schedule (CANSAS) assesses health and social needs across 22 domains (see Table 3), scored 0 (no need), 1 (met need), 2 (unmet need) or 9 (not known) (Phelan et al, 1995). The CANSAS produces two subtotal scores: ‘total unmet needs’ is the number of domains rated as an unmet need, and ‘total met needs’ is the number of domains rated as a met need (Andreasen et al, 2001). The sum of met and unmet needs is the total need (maximum 22).

Global Assessment of Functioning (GAF; Jones et al, 1995) rates symptoms and social functioning on a scale ranging from 10 to 100, with anchor points for each 10-point band. In the version used in this study the two dimensions are disaggregated and the mean score is used for the GAF total (Jones et al, 1995).

The Threshold Assessment Grid (TAG; Slade et al, 2000) assesses the severity of a person's mental health problems across seven domains (see Table 3): items 2, 3, 6 and 7 are scored from 0 (none) to 3 (severe), and the remaining three items can also be scored as 4 (very severe), when immediate action is needed.

All four measures used in this study are staff-rated, and have been recommended for routine clinical use (Jones et al, 1995; Wing et al, 1998; Slade et al, 2000; Andreasen et al, 2001). The GAF, CANSAS and HoNOS have been translated into many foreign languages and are widely used internationally (Thornicroft et al, 2002).


Recent referrals to each mental health team were retrospectively audited to identify the most frequent referrers. Letters were sent to these referrers and other local non-statutory sector organisations describing the study and asking for their participation. The sample comprised 60 consecutive referrals from professionals for each service, plus self-referrals or informal carers' referrals. The total number of referred patients was 605, of whom 483 patients were offered an assessment by the mental health teams and 350 patients were actually seen by them.

Socio-demographic and clinical information was recorded for each referral. Training in the use of all four standardised measures (CANSAS, GAF, HoNOS and TAG) was provided for mental health service staff; this comprised one session, lasting 60-90 min, during which the four measures were described and their use demonstrated with two vignettes (Slade et al, 2002). When each patient was seen by the service, the assessing clinicians completed CANSAS, GAF, HoNOS and TAG at or immediately after their first clinical contact.


Representativeness of the sample for whom full data were available was tested using Mann-Whitney and chi-squared statistics. Correlations between total scores were analysed using graphical modelling, Procrustes analysis was used to compare multidimensional structures, and the overlap between individual items was investigated using factor and cluster analyses. A ‘ graphical model’ is a particular type of graph based on a model of conditional independence (Edwards, 2000). For multivariate normal data, conditional independence between a pair of variables implies a zero partial correlation, and is indicated by the lack of a link between variables in the diagram. A link with an intermediate variable implies an indirect association. In this study a backwards, stepwise procedure for model selection, with a stringent P value (0.0001, equivalent to partial correlations above about 0.1), was used in order to focus on clinically significant levels of association.

A preliminary factor analysis of the correlation matrix based on principal components (Munro & Page, 1993) was performed on all items. A subsequent varimax rotation was performed (excluding the single-item GAF score, since the focus was on the overlap of individual items of the TAG, HoNOS and CANSAS). The number of factors chosen was based on a scree plot, the requirement for a minimum number of items per factor and interpretability.

Procrustes analysis (Gower, 1975) was then used to compare the multidimensional structures represented by the factor scores with those represented by each of the three scales. This technique rotates, translates and reflects a pair of multidimensional representations so as to optimise fit between them. The lack of fit (the percentage residual error) is a measure of the dissimilarity of the two multidimensional representations under consideration. The analysis was aimed at indicating how far any one scale can replicate the information in all the scales combined.

Cluster analysis (Everitt et al, 2001) was used to group together items having similar values across cases. Ward's method was used for the primary analysis, based on Euclidean distance after z-scoring the data to mean 0 and standard deviation 1. A dendrogram (a diagram of the levels at which clusters join during clustering) was used to decide on the number of clusters in addition to considerations of interpretability. Checks for robustness were made by rerunning the analyses on random halves of the data, on data standardised to have a range 0-1, and by using average and complete linkage methods.

For other examples of the factor and cluster analysis used in similar applications see Shiori et al (1996) and Cordingley et al (2001). Krzanowski (1987) gives an application of Procrustes analysis for identifying subsets of variables preserving multivariate structure. All analyses were carried out using the Statistical Package for the Social Sciences version 11.0, MIM 3.1 (Edwards, 2000) and Genstat 5.


The mental health teams saw 350 newly referred patients between June 1999 and September 2000. Three-quarters of the patients (n=264) had a complete assessment and their socio-demographic and clinical characteristics are shown in Table 1. Over half of these patients had a neurotic disorder, including depression, and 14% had schizophrenia. Eighty-six patients did not have a full assessment; their mean age was 44.3 years (s.d.=18.4), 47% were female and 42% had a clinical diagnosis of depression. There was no significant difference on these variables between those with complete and incomplete assessments.

View this table:
Table 1

Socio-demographic and clinical characteristics of the sample (n=264)

Assessments that were incorrectly completed or blank were ignored, comprising 34 HoNOS (11%), 25 (8%) GAF, 23 (7%) CANSAS and 4 (1%) TAG. Missing TAG data were either pro-rated (where five or six domains were completed) or assumed to be 0 for missing domains.

Bivariate and partial correlations between the total scores (all at best moderate) are given in Table 2; Figure 1 shows the strongest partial correlations remaining after the stepwise elimination and refitting procedure of graphical modelling. Both bivariate and partial correlations indicate that all variables are associated in the expected direction and that the CANSAS ‘total met needs’ score is relatively independent of the other measures, except for ‘unmet needs’. The CANSAS ‘ total met needs’ score was therefore omitted from subsequent item-level analysis.

View this table:
Table 2

Correlations between total scores for the four measures

Fig. 1

Graphical model showing strongest partial correlations between total scores for the four measures after stepwise elimination of least significant links (CANSAS, Camberwell Assessment of Need Short Appraisal Schedule; GAF, Global Assessment of Functioning; HoNOS, Health of the Nation Outcome Scales; TAG, Threshold Assessment Grid).

A preliminary principal component analysis (not shown) showed a first component (accounting for 16% of the variance) with loadings on most items, including all the TAG items. Since all the items are scored in the same direction, and since there tend to be small to moderate correlations between the items, this is as expected. The strongest item loading for this general ‘ severity’ factor, as it is interpreted, was for GAF total score with which it was correlated at -0.37. The correlation between this factor and total score of TAG was 0.40, with HoNOS it was 0.35 and with CANSAS ‘ total unmet needs’ it was 0.28.

Unrotated and rotated principal component analyses were performed using TAG, HoNOS and CANSAS items. Twelve unrotated components had eigenvalues greater than 1.0 and a scree plot suggested an ‘elbow’ between four and eight components. Seven components, interpreted as factors, were chosen since this solution retained a reasonable degree of detail while ensuring that at least three items were present in each factor. The Procrustes fit of the structure based on each individual scale to the structure based on these seven factors was 38% for TAG, 48% for HoNOS and 43% for CANSAS.

The rotated seven-factor solution, which accounted for 50% of the variance, is shown in Table 3. All HoNOS items load (at the level of 0.35) on at least one factor with overlap in three items. Similarly, all CANSAS items (except ‘childcare’) load on at least one factor, and there is overlap on two factors for three items. Most importantly, both CANSAS and HoNOS have at least one item in every factor. No TAG item appears in one of the factors (five), and all TAG items appear in at least two factors, except for the items ‘intentional self-harm’ and ‘risk to others’, which are associated with only one factor each.

Two solutions from Ward's method of cluster analysis are presented in Table 4, with interpretations for the clusters. A large jump in the dendrogram was evident at four clusters (termed the ‘broad’ solution). A ‘narrow’ solution is also tabulated, since this has a strong resemblance to the factors shown in Table 3, at least in terms of overall interpretation. The membership of each narrow or broad cluster is listed under each heading. At least two items from the HoNOS and two items from the CANSAS contributed to each broad cluster, and to all but one of the factors. Both HoNOS and CANSAS had items appearing in all eight narrow clusters, but TAG did not add any information to four of these clusters (‘psychotic symptoms’, ‘substance misuse’, ‘ company and activities’ and ‘accommodation’). Even in the broad cluster solution, TAG missed information for one of the four clusters (‘company and activities’/‘accommodation’).

View this table:
Table 4

Cluster membership of HoNOS, TAG and CANSAS unmet needs items in broad (four-cluster) and narrow (eight-cluster) solutions1


Four measures intended for routine clinical use were tested on a sample of patients from mental health services. The relationship between the total scores of the four measures was examined first and this indicated that the CANSAS ‘total met needs’ score showed low association with the other measures, apart from the CANSAS ‘total unmet needs’ score with which it was moderately correlated. However, there was some degree of dependence between GAF, TAG, HoNOS and CANSAS ‘total unmet needs’ score. Factor and cluster analyses were then applied to the individual items in the item-based measures. The goal was to investigate whether one measure could adequately describe patients (at some level) or whether, conversely, meaningful and comprehensive clinical information could only be provided by a combination of measures. Before considering this, it is worth commenting on the measurement of overall severity.

Overall severity factor

A weak first factor, which can be interpreted as ‘severity’, was found in the preliminary factor analysis. The proportion of variance accounted for (16%) was low compared with the 50-69% found using patient-rated measures (Fakhoury et al, 2002). This may reflect the fact that there are many variables (and hence sources of measurement error) or that there are underlying factors that do not relate directly to severity, or both. Many items from each of the four measures loaded on this factor and any of the separate scale totals could be used as a proxy for it. Strongest correlations were with TAG total (0.40) and GAF (-0.37). The GAF would be the briefest proxy measure for this severity factor, but TAG had all seven items loading above the threshold on this factor and so provides the more meaningful measure.

Choice of scale

Turning to the subsequent analyses of the items, the rotated factor analysis found seven interpretable factors, whereas the narrow cluster analysis revealed eight interpretable clusters; these two groupings of items were similar. The Procrustes analyses comparing the overall structure represented by the factors with the individual scales indicated that HoNOS and CANSAS matched the factor structure better than TAG. This finding indicates that differences between patients (as reflected in the factors) are best replicated by HoNOS or CANSAS. However the percentages of variation explained suggest that no single scale is entirely adequate for this.

As Table 2 shows, at least two items from the HoNOS and two items from the CANSAS contributed to each broad cluster, and to all but one of the factors. Even at the more detailed eight-cluster level, both HoNOS and CANSAS contributed at least one item to each cluster. In an epidemiological study one could thus use either HoNOS or CANSAS to represent discrete categories of patients' problems. In a clinical situation this might also be the case, depending on the particular focus of the evaluation; for example, one could decide whether the particular item or pair of items could be considered a reasonable proxy for the domain or area under consideration or - in the case of the TAG - whether the missing information was relevant. The information in Table 4 can be used to make choices between the scales if this is required.

The CANSAS has the advantage of also providing information about met needs. Needs can be met through the efforts of the mental health team, through the patient's efforts, or through help from informal sources such as friends or family. Therefore the interpretation of met needs is complex. Nevertheless, it may be important to consider met needs when evaluating case-loads (Phelan et al, 1995). Thus CANSAS might be the single measure of preference, if only one were to be chosen. The TAG did not have any item in four narrow clusters out of eight, and when a broader solution with four clusters only was considered, TAG missed information in one out of the four broad clusters. The results of the factor and cluster analyses at both broad and detailed levels agree therefore on a higher meaningfulness for HoNOS and CANSAS than for TAG in this sample.


Several methodological limitations can be identified. For the purpose of this study, the reliability of each of the four measures was assumed to be adequate on the basis of their published psychometric properties. However, no study has yet compared their relative reliability when used in the same setting. Furthermore, there is some evidence that HoNOS ratings are less reliable when completed by clinical staff (as in this study) rather than by research staff (Bebbington et al, 1999). Similarly, the interrater reliability for staff-rated CANSAS ‘total unmet needs’ score (0.80) has been found to be higher than that for ‘total met needs’ (0.53) (Andreasen et al, 2001). However, the results for the individual scales are similar to those of other studies involving equivalent mental health service populations (e.g. Slade et al, 1999; Ruggeri et al, 2000).

Data were collected in routine clinical settings, so only clinical diagnosis and easily available socio-demographic characteristics were recorded. The strength of this approach is that the study sample is representative of patients referred to adult and elderly mental health teams, but the study sample is not comprehensively characterised (Harrison & Eaton, 1999). Also, the data collected regarded new referrals, and these patients are unlikely to be representative of patients receiving continuing care from community mental health teams.

This study used exploratory techniques to investigate the relationship between the four measures. The factor analysis was at the limit of acceptability in terms of the number of cases per variable (about six). The use of methods based on the correlation matrix may be questionable when the data are binary or ordinal, although according to Joliffe & Morgan (1992) this is a relatively minor problem when the aim is exploratory, as it is here. The cluster analysis entailed subjective choices of standardisation and method. Nevertheless, these two sets of results, although not necessarily definitive summaries of the data, were consistent with each other and interpretable.

Future work

Future work will need to confirm the existence of a global severity factor, the independence of the CANSAS ‘total met needs’ score, and the comprehensiveness of CANSAS and HoNOS using confirmatory analysis. This could involve systematic comparison of the four routine outcome measures used in this study with psychometrically validated research measures (such as the Needs for Care Assessment Schedule; Brewin et al, 1987) or triangulation using qualitative approaches to investigate whether both CANSAS and HoNOS span the full range of domains relevant to providing and evaluating mental health care. Overall, a more analytical approach to investigating the data could usefully include consideration of the extent to which the psychometric properties of these measures are preserved in routine use.

Rather than choosing a specific scale, a possible approach would be to choose items from all three scales that would span these domains, thus effectively designing a new scale. The Procrustes analysis suggests that this could be worthwhile, and the methods described by Krzanowski (1987) could be employed. These would entail finding the best subset from the complete pool of items from all three scales, rather than accepting pre-existing sets of items.

Despite the limitations noted above, several conclusions can be drawn. In relation to the first goal of the study, a global severity factor was identified which accounted for some of the variance in each staff-rated measure, but there was no evidence of substantial overlap between the four measures. They do not all measure the same underlying construct. For the second goal, this study allows some recommendations to be made regarding which outcome measures to use routinely. When a detailed characterisation of clinical and social needs of the patient and outcomes is required, HoNOS and CANSAS should be used. When a meaningful but more limited characterisation of the patient is required, either CANSAS or HoNOS could be used, but CANSAS has the advantage of providing extra information about met needs. Finally, when the goal is to evaluate severity only, this can be measured using either TAG or GAF: TAG provides the most meaningful assessment and GAF provides the briefest assessment.

Clinical Implications and Limitations


  1. A global severity measure accounts for only a small amount of the variance in ratings, and can be assessed using either the Threshold Assessment Grid or the Global Assessment of Functioning.

  2. Either the Health of the Nation Outcome Scales (HoNOS) or the Camberwell Assessment of Need Short Appraisal Schedule (CANSAS) can be used to obtain a detailed characterisation of clinical and social needs of the patient.

  3. Compared with HoNOS, the CANSAS provides extra information about met needs.


  1. The study used exploratory techniques that entailed subjective choices of standardisation and method.

  2. Patients were described by clinical diagnosis and easily available socio-demographic characteristics only.

  3. Previous evidence suggests that the reliability of HoNOS is reduced when it is completed by clinical staff.


The other lead investigators of the Threshold Assessment Grid study were Drs Sharon Cahill, Wendy Kelsey, Robin Powell and Geraldine Strathdee. We thank Professor Graham Thornicroft of the Institute of Psychiatry for his helpful comments and Professor Mike Baxter of Nottingham Trent University and an anonymous referee for their valuable statistical advice. The study was funded by North Thames Responsive Funding Programme (RFG549). The views in this publication are those of the authors and not necessarily those of the National Health Service Executive or the Department of Health.

  • Received January 29, 2004.
  • Revision received August 11, 2004.
  • Accepted August 26, 2004.


View Abstract