The British Journal of Psychiatry
Diagnostic stability of psychiatric disorders in clinical practice
Enrique Baca-Garcia, Maria M. Perez-Rodriguez, Ignacio Basurte-Villamor, Antonio L. Fernandez Del Moral, Miguel A. Jimenez-Arriero, Jose L. Gonzalez De Rivera, Jeronimo Saiz-Ruiz, Maria A. Oquendo

Abstract

Background Psychiatric disorders are among the top causes worldwide of disease burden and disability. A major criterion for validating diagnoses is stability over time.

Aims To evaluate the long-term stability of the most prevalent psychiatric diagnoses in a variety of clinical settings.

Method A total of 34 368 patients received psychiatric care in the catchment area of one Spanish hospital (1992–2004). This study is based on 10 025 adult patients who were assessed on at least ten occasions (360 899 psychiatric consultations) in three settings: in-patient unit, 2000–2004 (n=546); psychiatric emergency room, 2000–2004 (n=1408); and out-patient psychiatric facilities, 1992–2004 (n=10 016). Prospective consistency, retrospective consistency and the proportion of patients who received each diagnosis in at least 75% of the evaluations were calculated for each diagnosis in each setting and across settings.

Results The temporal consistency of mental disorders was poor, ranging from 29% for specific personality disorders to 70% for schizophrenia, with stability greatest for in-patient diagnoses and least for out-patient diagnoses.

Conclusions The findings are an indictment of our current psychiatric diagnostic practice.

Diagnosis is essential in clinical practice, research, training and public health. Definitions for psychiatric diagnoses are derived from expert opinion rather than the biological basis of the disorder. The modest knowledge base regarding the causation of disease has hindered the use of aetiological factors in psychiatric classification systems. The current classifications (World Health Organization, 1992; American Psychiatric Association, 2000) were designed to achieve high interrater reliability of diagnostic assessment. It is widely believed that if future editions of the DSM and the ICD are to be a significant improvement on their predecessors, the validity of the diagnostic concepts they include will have to be enhanced (Kendell & Jablensky, 2003). Follow-up studies including evidence of diagnostic stability and diagnostic consistency over time have traditionally been proposed to test the validity of psychiatric diagnoses (Robins & Guze, 1970; Kendler, 1980; Andreasen, 1995). However, several authors have noted that as longitudinal data become available, significant fluctuations in diagnostic stability and changes in clinical presentation are seen (Krishnan, 2005).

The aim of our study was to evaluate the long-term stability of the most prevalent chronic psychiatric diagnoses according to ICD–10 in a range of clinical settings.

METHODS

Participants

In total 34 368 patients received psychiatric care in the catchment area of Fundacion Jimenez Diaz General Hospital, Madrid, between 1 January 1992 and 31 December 2004. This hospital is part of the Spanish national health services and provides free medical coverage to a catchment area of 280 000 people. There were 449 317 psychiatric consultations in a variety of clinical settings, including visits to out-patient psychiatric facilities (438 622), emergency visits (9101) and admissions to the psychiatric brief hospitalisation unit (1594). The current study is based on 10 025 patients aged 18 years and over who were assessed on at least ten occasions during the period studied. These patients had 360 899 psychiatric consultations, including visits to out-patient psychiatric facilities (355 166), psychiatric emergency visits (4628) and admissions to the psychiatric brief hospitalisation unit (1105).

Individual service users are reliably identified in the database used for our analyses because each patient is given an identifying number (a numeric code is used to ensure patient anonymity), which remains the same throughout all contacts with psychiatric services within the study area. To ensure that no patient had been assigned more than one identifier, we reviewed all the cases in the database and removed any duplicates we found. We defined duplicates as `patients with identical first name, family name, gender and year of birth'; `patients with identical first name, family name, gender and street address', or `patients with identical first name, family name, gender and hospital/ambulatory record number'. We deleted any cases with significant suspicion of duplication.

Settings

Participants (n=10 025) were assessed in three different clinical settings: in-patient unit (psychiatric brief hospitalisation unit), 2000–2004 (n=546); psychiatric emergency room, 2000–2004 (n=1408); and out-patient psychiatric facilities (mental health care centres) within the catchment area of the Fundacion Jimenez Diaz General Hospital, 1992–2004 (n=10 016).

Diagnostic procedures

Procedure during ambulatory visits

Since 1986 public mental health centres within the province of Madrid have had to record all ambulatory visits in a regional registry, the Registro Acumulativo de Casos de la Comunidad de Madrid. All diagnoses in this registry must be coded according to the ICD–9 (World Health Organization, 1978). Since 1992 diagnoses have been assigned according to ICD–10 (World Health Organization, 1992) criteria and recorded with the appropriate ICD–9 coding numbers; ICD–10 codes were converted to ICD–9 codes using the guidelines published by the World Health Organization (Organizacion Mundial de la Salud, 1993). The psychiatrists at each mental health centre recorded one or two diagnoses per patient during each ambulatory visit. Diagnoses were assigned after reviewing all available information, including data from medical records and clinical interviews with the patient and relatives.

Procedure during emergency visits

The emergency diagnoses were taken from the emergency medical records. Emergency diagnoses were assigned by clinical psychiatrists after reviewing all available information, including data from clinical interviews with the patient and relatives.

Procedure during admissions to the in-patient unit

Clinical diagnoses during admissions are the result of an intensive diagnostic and treatment process by physicians with specialty training in psychiatry, including data from medical records, other research assessments and clinical interviews. The psychiatrists who assigned the clinical diagnoses were not aware of the study in process.

Diagnostic groups included in analysis

Among all chronic psychiatric diagnoses, we selected those disorders assigned to more than 500 patients in our sample (prevalence higher than 5%). According to data from naturalistic studies like ours, the frequency and use of the ICD–10 two-digit, three-digit and four-digit diagnostic categories show significant variations. Some categories are not used at all, and others represent less than 0.1% of the samples studied (Mussigbrodt et al, 2000). In the latter study of a sample of 33 857 treated cases from 19 departments of psychiatry in ten different countries, `on a four-character level (Fxx.x), the ten most often used diagnostic categories represented 40% of all main diagnoses, and 70% on a three-character level (Fxx.-)' (Mussigbrodt et al, 2000). The diagnoses analysed here (with ICD–10 codes) are:

  1. schizophrenia, schizotypal and delusional disorders (F20–29), including individual diagnoses of schizophrenia (F20), paranoid schizophrenia (F20.0), residual schizophrenia (F20.5) and persistent delusional disorders (F22);

  2. mood (affective) disorders (F30–39), including individual diagnoses of bipolar affective disorder (F31), bipolar affective disorder, current episode mild or moderate depression (F31.3), recurrent depressive disorder (F33), persistent mood (affective) disorders (F34), and dysthymia (F34.1);

  3. obsessive–compulsive disorder (F42);

  4. eating disorders (F50);

  5. disorders of adult personality and behaviour (F60–69), including the individual diagnoses of specific personality disorders (F60) and other specific personality disorders (F60.8).

Data extraction and analysis

Diagnostic stability through all the evaluations is calculated according to Schwartz et al (2000). Three measures of stability are presented for each diagnosis. The first, `prospective consistency', is the proportion of individuals in a category at the first evaluation who retain the same diagnosis at their last evaluation. This would correspond to positive predictive value if the last diagnosis were the gold standard. The second measure, retrospective consistency, is the proportion of individuals with a diagnosis assigned at the last evaluation who had received the same diagnosis at the first evaluation; this is conceptually similar to sensitivity. The third measure is the proportion of patients who received the same diagnosis in at least 75% of the evaluations. The agreement between diagnoses at the first and the last evaluations was calculated by the kappa coefficient, which measures the agreement correcting the effect of chance.

Using the Statistical Package for the Social Sciences, version 13.0 for Windows, we performed four different analyses: three separate analyses for each clinical setting (psychiatric emergencies, out-patient visits and hospitalisations) to control for influences of the setting on the stability of diagnoses; and a fourth analysis of the combined data from the three clinical settings to reflect the evolution of diagnoses through the clinical process.

RESULTS

The socio-demographic characteristics of the sample are presented in Table 1.

View this table:
Table 1

Socio-demographic characteristics of the sample (n=10 025)

Stability of diagnoses

Data about the prospective and retrospective consistency of the diagnoses across settings, in the out-patient setting, in the emergency setting and in the in-patient setting are presented in Tables 2, 3, 4, 5 and graphically in a data supplement to the online version of this paper. The percentages of patients who received the same diagnosis in at least 75% of their evaluations, across settings, in the out-patient setting, in the emergency setting and in the in-patient setting are presented in Table 6.

View this table:
Table 2

Prospective and retrospective consistency of ICD–10 diagnoses across settings (n=10 025)

View this table:
Table 3

Prospective and retrospective consistency of ICD–10 diagnoses in the out-patient setting (n=10 016)

View this table:
Table 4

Prospective and retrospective consistency of ICD–10 diagnoses in the emergency setting (n=1408)

View this table:
Table 5

Prospective and retrospective consistency of ICD–10 diagnoses in the in-patient setting (n=546)

View this table:
Table 6

Percentage of patients who received a diagnosis in at least 75% of the evaluations across settings, in the out-patient setting, in the in-patient setting and in the emergency setting

Across clinical settings

Prospective consistency ranged from 28.7% for other specific personality disorders to 69.6% for schizophrenia, (Table 2). The prospective consistency of the three most prevalent diagnoses at first evaluation was 44.7% for dysthymia, 69.6% for schizophrenia and 49.4% for bipolar affective disorder (see Table 2). Retrospective consistency at the last evaluation ranged from 23.4% for bipolar affective disorder, current episode mild or moderate depression, to 58.0% for eating disorders; it was 43.7% for dysthymia, 45.9% for schizophrenia and 38.1% for bipolar affective disorder (see Table 2). The proportion of patients who received the same diagnosis during at least 75% of their evaluations ranged from 9.8% for other specific personality disorders to 47.1% for schizophrenia, schizotypal and delusional disorders see Table 6).

Out-patient setting

Prospective consistency ranged from 29.4% for other specific personality disorders to 69.1% for schizophrenia. The prospective consistency of the three most prevalent specific diagnoses at the first evaluation was 45.7% for dysthymia, 69.1% for schizophrenia and 50.6% for bipolar affective disorder (see Table 3). Retrospective consistency at the last evaluation ranged from 23.2% for bipolar affective disorder, current episode mild or moderate depression, to 57.7% for eating disorders; it was 43.6% for dysthymia, 46.0% for schizophrenia and 39.3% for bipolar affective disorder (see Table 3). The proportion of patients who received the same diagnosis during at least 75% of the evaluations ranged from 10.7% for other specific personality disorders to 49.6% for schizophrenia, schizotypal and delusional disorders (see Table 6).

Emergency department setting

Prospective consistency ranged from 44.4% for other specific personality disorders to 81.1% for bipolar affective disorder. The prospective consistency of the three most prevalent specific diagnoses at the first evaluation was 79.2% for schizophrenia, 81.1% for bipolar affective disorder and 62.5% for dysthymia (see Table 4). Retrospective consistency at the last evaluation ranged from 41.7% for obsessive–compulsive disorder to 80.0% for recurrent depressive disorder; it was 67.0% for schizophrenia, 70.6% for bipolar affective disorder and 69.0% for dysthymia (see Table 4).

The proportion of patients who received the same diagnosis during at least 75% of the evaluations ranged from 19.5% for residual schizophrenia to 54.6% for schizophrenia, schizotypal and delusional disorders (see Table 6).

In-patient setting

Prospective consistency ranged from 66.7% for recurrent depressive disorder to 100.0% for obsessive–compulsive disorder and eating disorders. The prospective consistency of the three most prevalent specific diagnoses at the first evaluation was 90.9% for schizophrenia, 91.5% for bipolar affective disorder and 81.8% for dysthymia (see Table 5). Retrospective consistency at the last evaluation was between 63.1% for specific personality disorders and 100.0% for recurrent depressive disorder and obsessive–compulsive disorder; it was 91.5% for schizophrenia, 89.3% for bipolar affective disorder and 75.0% for dysthymia (see Table 5).

The proportion of patients who received the same diagnosis during at least 75% of the evaluations ranged from 37.5% for bipolar affective disorder, current episode mild or moderate depression, to 100.0% for obsessive–compulsive disorder and other specific personality disorders (see Table 6).

DISCUSSION

The main variable influencing diagnostic stability for the most prevalent chronic psychiatric diagnoses was the clinical setting in which the patients were assessed. The in patient setting showed the highest diagnostic stability, followed by the emergency and out-patient settings. The temporal consistency of psychiatric disorders was lower than that found in other studies.

Strengths and weaknesses of the study

The main strengths of this study are the large, representative sample, the length of follow-up (up to 12 years) and the large number of evaluations. Moreover, although most previous studies focused on one psychiatric diagnosis assessed in a single clinical setting, we assessed the stability of all psychiatric diagnoses naturally presenting in clinical practice. Psychiatric diagnoses were evaluated in three different clinical settings, using the same diagnostic procedure that is used during regular clinical practice. Clinicians who assigned the diagnoses were masked to the study process. Other work has used semi-structured interviews and other diagnostic instruments not used ordinarily in clinical practice. The results of our study may more accurately reflect the real use of diagnostic classifications in psychiatric practice and may be more useful in estimating the clinical utility of current psychiatric classification systems.

Diagnostic changes over time may reflect the evolution of an illness, the emergence of new information or unreliability of measurement (Schwartz et al, 2000). Spitzer et al (1978) divided the sources of unreliability that lead to diagnostic disagreement among clinicians into categories (sources of variance): subject variance, occasions variance (e.g. different episodes of bipolar disorder), information variance (e.g. the differences across settings and informants), observation variance (e.g. differences among clinicians) and criterion variance. Our study has limitations that may reflect the influence of these sources of unreliability. The stability of bipolar disorder may be affected by the occasions variance, particularly the diagnostic category of bipolar affective disorder, current episode mild or moderate depression (ICD–10 F31.3). Information and observation variances can be significantly reduced by training clinicians in interviewing techniques and observational skills, and by the use of structured or semi-structured clinical interviews. Because of the naturalistic nature of our research, structured or semi-structured clinical interviews were not used in the study. This might have increased the criterion variance. The clinicians who assigned the diagnoses were not specifically trained to improve interrater reliability, which might have influenced the consistency of the analysed diagnoses. Psychiatrists used different diagnostic classifications to code the diagnoses through-out the study period.

Other research

The stability of chronic psychiatric diagnoses has been evaluated in a number of studies (Tsuang et al, 1981; Schwartz et al, 2000; Lieb et al, 2002; Shea et al, 2002; Mojtabai et al, 2003; Barkow et al, 2004; Grilo et al, 2004; Veen et al, 2004; Culverhouse et al, 2005; Kessing, 2005a,b; McGlashan et al, 2005; Rufino et al, 2005; Schimmelmann et al, 2005). Most of these studies have focused on one diagnostic cluster, mainly psychoses (schizophrenia spectrum and mood psychoses; Schwartz et al, 2000; Mojtabai et al, 2003; Veen et al, 2004; Kessing, 2005b; Rufino et al, 2005; Schimmelmann et al, 2005) and personality disorders (Shea et al, 2002; Grilo et al, 2004; McGlashan et al, 2005). These studies usually have a small number of evaluations – two or three in most of them (Schwartz et al, 2000; Lieb et al, 2002; Barkow et al, 2004; Grilo et al, 2004; Schimmelmann et al, 2005) – and the follow-up period is usually under 3 years (Schwartz et al, 2000; Shea et al, 2002; Barkow et al, 2004; Grilo et al, 2004; Veen et al, 2004; McGlashan et al, 2005; Rufino et al, 2005; Schimmelmann et al, 2005) with a few exceptions (Tsuang et al, 1981; Lieb et al, 2002; Mojtabai et al, 2003; Culverhouse et al, 2005; Kessing, 2005a,b). Kessing (2005b) recently pointed out that no study has investigated the diagnostic stability of the most common ICD–10 psychiatric diagnoses given under ecological clinical conditions.

Other authors have reported rates of consistency that are much higher than the ones found in our study (Tsuang et al, 1981; Schwartz et al, 2000; Veen et al, 2004; Kessing, 2005b; Schimmelmann et al, 2005). However, most studies that have evaluated the stability of chronic psychiatric diagnoses have shorter follow-up periods than in our study and have focused on a single clinical setting (mainly the in-patient setting). Schwartz et al (2000) reported that rates of consistency of some diagnoses decreased as the follow-up period increased. For example, the retrospective consistency of schizophrenia was 73.1% in a comparison of 6-month and 24-month diagnoses, but fell to 55% (similar to the figure of 45.9% obtained in our study across clinical settings) when baseline and 24-month diagnoses were compared. However, the retrospective consistency of bipolar disorder remained high: 84.8% (6-month and 24-month diagnoses) and 73% (baseline and 24-month diagnoses). Compared with the data from the study by Schwartz et al (2000), the retrospective consistency of bipolar disorder across clinical settings in our study (38.1%) is strikingly low. The third measure of stability that we calculated (the percentage of patients who received the same diagnosis in at least 75% of the evaluations) may more accurately reflect the diagnostic process through different evaluations, and was also strikingly low in our study. Some examples of low values are bipolar affective disorder (23.1%) and specific personality disorders (12.7%), whereas schizophrenia (42.4%) and eating disorders (43.9%) showed the highest rates of stability.

The very low consistency for the category `bipolar affective disorder, current episode mild or moderate depression' may be explained by the fact that this diagnosis is inherently expected to change, since it represents an episode rather than a disorder. Perhaps the use of semi-structured interviews would have enhanced reliability and therefore stability. A structured interview, the Structured Clinical Interview for DSM–III–R was used to provide DSM–III–R psychiatric diagnoses in the study by Schwartz et al (2000).

Explanations and implications for clinicians and policy makers

There may be several explanations for the differences in diagnostic stability across clinical settings. First, it may be easier to diagnose a disorder correctly when symptom severity is at its highest, as in hospital admissions and emergency visits. We did not have data regarding illness severity; however, it would be interesting to conduct a similar study controlling for symptom severity. Second, during hospitalisations, round-the-clock surveillance and symptom observation may increase the accuracy of the diagnoses. In addition, during hospitalisations, clinicians can more easily interview the patient's family, and there is more time for thorough diagnostic assessment and questioning about areas of functioning and symptoms. According to Spitzer et al (1978), this may contribute to information variance, and may partially explain the differences in diagnostic stability across clinical settings. Third, the duration of the follow-up period was much longer in the out-patient setting (1992–2004) than in the emergency and hospitalisation settings (2000–2004). Finally, the number of psychiatric contacts was different in each setting (data not shown). Some authors have suggested that the causal relationship between diagnostic stability and the number of psychiatric contacts is unknown: It is surprising that diagnostic stability was higher in the emergency department setting than in the out-patient setting. Other authors (Segal et al, 1995; Rufino et al, 2005) have noted that psychiatric diagnoses assigned in an emergency department may be less accurate than diagnoses assigned in other settings. In emergency department settings, time is usually limited, frequently there is no additional information from relatives, and in most cases, there is a need for immediate intervention (Segal et al, 1995; Rufino et al, 2005).

`Patients who have many psychiatric contacts may present with more unstable psychiatric illness leading to more diagnostic variation. On the other hand, it may be that clinicians have problems with diagnosing some patients accurately and that this may lead to less effective treatment and more psychiatric contacts for these patients.' (Kessing, 2005b).

The temporal consistency of mental disorders in our study is lower than that found in other longitudinal studies. The relative lack of diagnostic stability over time is striking given that there is likely to be a bias towards maintaining the same diagnosis over time. Psychiatrists treating the patients in this study often had access to past records and diagnoses, and may have been inclined to keep the previous diagnosis rather than assign a different one. It should be noted that the view that disorders may not be discrete `disease entities' but rather dimensions of continuous variations has gained currency (Kendell & Jablensky, 2003). The categorical approach to psychiatric diagnostic classification has been criticised in favour of other classification systems, such as symptom-cluster dimensions (Kendell & Jablensky, 2003). The possibility of alternative approaches to diagnoses also raises questions about the value of diagnostic stability as an indicator of the validity of the diagnoses. Krishnan (2005) has recently stated that `the limits of the nominalist tradition have been reached' and has suggested four criteria for defining disease: clinical symptoms; course and outcome; familial pattern; and treatment response.

The results of our investigation raise worrisome concerns regarding the validity of results of epidemiological, clinical and pharmacological psychiatric research, particularly in studies of chronic disorders with short follow-up periods that may not allow enough time to reach the right diagnosis or in studies that do not take setting into account. This underscores the inherent weaknesses in our diagnostic system, leading to instability of diagnoses which could reflect limitations of the nosology and result in inappropriate treatment recommendations or interventions.

Future research

It is likely that psychiatric diagnostic categories require revision. This can only be determined definitively with a large-scale study using structured or semi-structured interviews. Such a project may be feasible, but we believe that it might not accurately reflect the conditions of psychiatric practice in the real world.

  • Received March 23, 2006.
  • Revision received August 18, 2006.
  • Accepted September 29, 2006.

References

View Abstract