Medical School, University of Tampere and Psychiatric Clinic, Tampere University Hospital
Tampere School of Public Health, University of Tampere and Tampere University Hospital, Research Unit
Tampere School of Public Health, University of Tampere and Psychiatric Clinic, Tampere University Hospital, Tampere
Department of Psychiatry, University of Turku, Turku University Central Hospital and Turku Psychiatric Clinic, Turku, Finland
Correspondence: Dr Outi Poutanen, Department of Psychiatry, Medical School, FIN-33014 University of Tampere, Finland. Email: outi.poutanen{at}uta.fi
Funding from the Medical Research Fund of Tampere University Hospital.
|
|
|---|
Aims To study the ability of the Depression Scale and its items to recognise and predict a depressive episode.
Method A sample of patients attending primary care was examined in1991-992 and again 7 years later. The accuracy of the Depression Scale at baseline and at follow-up was tested against the Short Form of the Composite International Diagnostic Interview (CIDI-SF) diagnosis of depression at follow-up. The sensitivity and specificity of the Depression Scale and its items were assessed.
Results Both baseline and follow-up Depression Scale scores were consistent with the CIDI-SF diagnoses. It was possible to find single items efficient at both recognising and predicting depression.
Conclusions The Depression Scale is a useful screening instrument for depression, with both diagnostic and predictive validity.
|
|
|---|
|
|
|---|
Seven years later a follow-up study was conducted. The number of participants to whom the follow-up questionnaire could be posted was 413 (11 people were dead, no address could be found for 6 and 6 others had attended psychiatric out-patient care and were excluded from subsequent analysis in the present primary care study). Of these 299 returned the questionnaire, and 250 (57.3% of the baseline sample) were willing to take part in the telephone interview. Men (P=0.050) and married individuals (P=0.018) participated more frequently than women or those who were not married. The study protocol was approved by the Tampere University Hospital ethics committee and written informed consent was obtained from the participants.
Study procedure
The Depression Scale includes ten items, with four response alternatives
scoring 03: `not at all', `a little', `quite a lot' and `extremely'
(see Table 2). In the baseline
study the cut-off point for the screening sum score was >8.
|
View this table: [in a new window] |
Table 2 Sensitivity and specificity of Depression Scale items at baseline and at
follow-up compared with depression assessment with the Composite International
Diagnostic Interview.
|
In the follow-up study participants again filled in the Depression Scale, the Michigan Alcoholism Screening Test (Selzer, 1971), parts of the Hopkins Symptom Checklist (Derogatis et al, 1974), and structured questions. To assess major depressive episode, 38 items from the Short Form of the Composite International Diagnostic Interview (CIDISF; World Health Organization, 1989; Kessler et al, 1998) were used in a telephone interview. The CIDISF questions concerning the occurrence of symptoms of a major depressive episode referred to the previous month. Three trained psychiatrists (A.M. and Drs Liisa Groth and Niko Seppälä), each with at least-years' experience in psychiatry, conducted the interviews, masked to the baseline PSE diagnoses.
Statistical methods
The accuracy of the Depression Scale as a screening instrument for
depression was assessed by receiver operating characteristic (ROC) curve
analyses. The follow-up Depression Scale score (DEPSF) was compared
with the CIDISF diagnosis of depression. The ability of the baseline
Depression Scale score (DEPSB) to predict the CIDISF diagnosis
at follow-up was also evaluated. In ROC analyses, sensitivity, specificity and
areas under the curve were calculated. Sensitivity and specificity were
calculated for each reasonable cut-off point of the Depression Scale.
![]() View larger version (7K): [in a new window] [as a PowerPoint slide] |
Fig. 1 Receiver operating characteristic curves: (a) Depression Scale score at
follow-up v. Composite International Diagnostic Interview-Short Form
(CIDI-SF) depression at follow-up; (b) Depression Scale score at baseline
v. CIDI-SF depression at follow-up.
|
To identify an ideal pair of Depression Scale items for composing a short version of both DEPSB and DEPSF, sensitivity and specificity for every possible DEPSB and DEPSF item pair were calculated. An ideal pair of items implied that both of the items scored above 1. Only pairs in which sensitivity was at least 50% were regarded as relevant and reported.
Analyses were performed using the Statistical Package for the Social Sciences version 11.5 for Windows; P<0.05 was considered statistically significant.
|
|
|---|
|
View this table: [in a new window] |
Table 1 Sensitivity and specificity of different Depression Scale cut-off
points
|
Depression Scale items v. CIDI-SF
The three most sensitive DEPSF items were 3 (`I have felt everything
was an effort'), 6 (`I have felt hopeless about the future') and 4 (`I have
felt low energy or slowed down'), and the most specific items were 8 (`I have
had feelings of worthlessness'), 5 (`I have felt lonely') and 9 (`I have felt
all pleasure and joy has gone from life')
(Table 2). In the case of
DEPSB, item 3 had a high sensitivity whereas items 9, 5, 8 and 10 (`I
felt that I cannot shake off the blues even with help from family and
friends') had a reasonably high specificity. One item (item 3) was quite
sensitive in both analyses, for both recognising and predicting CIDISF
depression.
In logistic regression analyses, DEPSF items 3 and 6 were significantly associated with CIDISF depression, whereas DEPSB items 1 (`I have suffered from insomnia'), 3 and 9 significantly predicted occurrence of subsequent CIDISF depression (Table 3).
|
View this table: [in a new window] |
Table 3 Depression Scale items at baseline and at follow-up from logistic
regression analyses significantly associated with depression at follow-up
assessment.
|
Best Depression Scale item pairs v. CIDI-SF
Sensitivity and specificity were calculated for every possible pair of
Depression Scale items to ascertain which two items had the best balance of
recognition and prediction. Only the pairs with sensitivity of at least 50%
are reported (Table 4). The
three best pairs for recognition were items 3 and 6, items 3 and 4, and items
4 and 6, whereas the best pairs for prediction were items 2 (`I have felt
blue') and 3, items 3 and 4, and items 3 and 9.
|
View this table: [in a new window] |
Table 4 Sensitivity and specificity of Depression Scale item pairs at baseline and
at follow-up compared with depression at follow-up assessment
|
|
|
|---|
Sensitivity and specificity
The first validation of the Depression Scale was reported in an earlier
study, in which the cut-off point for depression was >8
(Salokangas et al,
1995). In the baseline validation study, using the PSE as the
criterion, the sensitivity of the Depression Scale for clinical depression was
74% and the specificity for non-depression 85%. For severe depression the
figures were 84% and 93%. In the present study the figures for sensitivity and
specificity were better than those of the earlier validation study. In the
baseline validating analyses the sampling ratio was taken into account, but
this was not done in the present study, which was mainly intended to ascertain
the ability of the scale to predict an episode of depression and to evaluate
its individual items. The differences in the levels of sensitivity and
specificity between the baseline validation analyses and these follow-up
analyses are perhaps partly explained by this fact. There are also differences
in the validity criterion between the two diagnostic instruments. The PSE is
based on symptoms, and the CIDI is based on syndromes
(Lowe et al, 2004).
With the CIDISF the definition of depression was clearer because there
were only two categories: depressive and non-depressive. It should also be
kept in mind that the PSE interviews at baseline were held face-to-face,
whereas the CIDISF interviews at follow-up were conducted by telephone.
A telephone interview relies more on the examinee's own assessment, and is
closer to a self-rating instrument like the Depression Scale. The same items
of the CIDISF were used as in a previous Finnish depression study
(Isometsa et al, 1997;
Lindeman et al, 2000)
using the computer-assisted telephone interview method.
According to Lowe et al (2004) the sensitivity of screening questionnaires should lie above specificity and be as high as possible, and the specificity should be at least 75%. In this study the cut-off point >11, which has a sensitivity of 90.5% and specificity of 86.8%, could be ideal.
When the ability of the Depression Scale to predict an episode of depression was analysed, the area under the curve was 0.803. An earlier study with primary care patients (Salokangas et al, 1994) showed that the rate of clinical depression in people with a Depression Scale score above 12 was about 47% and in those with a score above 15 it was about 57%. These percentages are high enough to have some clinical value. In this study, with a cut-off point of >11 sensitivity was 86.4% but specificity only 62.5%. When an instrument is used as a predictor it is perhaps more important to avoid false positives and not to stigmatise patients; this justifies a higher cut-off point.
What did the Depression Scale actually assess?
In a study of general practice patients
(Williamson et al,
2005), four mental health self-report scales and a composite of
those four were assessed to determine their accuracy in predicting psychiatric
caseness for depression, dysthymia, generalised anxiety disorder, social
phobia, agoraphobia and panic attack. One scale measuring neuroticism
the Neuroticism Scale of the Eysenck Personality Questionnaire (EPQN;
Eysenck et al, 1985)
and a composite of all four scales were found to be very strong and
accurate predictors of psychiatric caseness, but they were unable to
differentiate between specific disorders. In our study only episode of
depression not other psychiatric diagnoses was assessed.
In an extensive follow-up study (Tyrer et al, 2004) the quick-to-use HADS was good for recognising both depression and anxiety, and was better than any other single measure for predicting the outcome of both anxiety and depressive disorders after an interval of 12 years. The MontgomeryÅsberg Depression Rating Scale did not have such predictability.
When the Depression Scale and two common self-rating instruments (the BDI and the HADS) are compared, they differ in many ways. The Depression Scale concentrates on the previous month, whereas the BDI concentrates on the previous week (the BDIII on the past 2 weeks; Beck et al, 1996) and the HADS on current feelings. Of the criterion standards used in this study, both the PSE and the CIDISF refer to the previous month. It is difficult to say, however, what the true significance of the differences in these time periods is.
The Depression Scale is the shortest of the three instruments, and the BDI is the longest. The formulation of the items is different: the most evident difference is that the Depression Scale gives exactly the same short-answer alternatives for all ten items, whereas there are several different sets of alternative answers in both the BDI and the HADS. This makes the Depression Scale very quick and easy to use, and increases adherence.
The BDI includes most of the Depression Scale topics. Only the topics of items 5 (loneliness), 7 (no fun) and 10 (not helped even with family and friends) are missing in the BDI. The Depression Scale item 5 was specific in recognising depression and item 10 specific in predicting it. However, the BDI covers the symptoms of depression more comprehensively than the former scale. The HADS covers both depression and anxiety, but lacks most of the Depression Scale topics (items 1, 2, 3, 5, 8 and 10); the symptoms covered are less severe than in the BDI or in the Depression Scale. Common topics for all the three self-rating instruments are the Depression Scale items 4 (low energy), 6 (hopelessness), and 9 (lost pleasure and joy). These topics probably relate to the core of depression symptomatology; other topics can be said to be consequences of the core symptoms and not so essential to depression only.
The Depression Scale items 3 and 4 were good at both recognising and predicting depression. Item 3 (`I have felt everything was an effort') suggests reduction of energy, which is one of the main symptoms of depression according to the ICD10. Item 6 was good for recognition even though its wording refers to the future (`I have felt hopeless about the future'); hopelessness is also a symptom of depression in the ICD10. Item 9 was good in predicting depression. The wording of item 9 (`I have felt all pleasure and joy has gone from life') refers to something that has already happened, something that is possibly endured as beyond help. Item pair 2 and 3 was the best at predicting depression. The wording of item 2 (`I have felt blue') may be experienced as persistent low mood, referring to a more chronic state. It is almost the same as lowering of mood, one of the main symptoms of depression in ICD10. The best combination and a possible quick version of two items for recognising depression was items 3 and 6, and the best combination for predicting depression was items 2 and 3.
The use of psychometric scales is in general problematic. Among people who appear to be healthy according to standard mental health scales it is possible to identify a subgroup of people who may not be psychologically healthy at all: mental health scales may assess not mental health but instead defensive denial (Shedler et al, 1993). Moreover, any scale that is valid for assessing current depression will have some long-term predictability because depression is recurrent. However, if a scale has predictability, it means it has the ability to catch not just reactive and short-term symptoms but more chronic or recurrent core features of the disorder.
Limitations and strengths of the study
It is a limitation of the study that the interviews were held by telephone.
However, the CIDISF telephone interviews were conducted with care and
by experienced psychiatrists. Some information about the mental state of these
patients during the follow-up period was gathered, but this was self-report
information and possibly not so reliable, and we decided not to use it in this
study. This was not a follow-up study in its truest sense: the assessments
were made only twice at baseline and 7 years later. Thus, the mental
state of the participants during the intervening period is obscure, decreasing
slightly the credibility of the study. It is strength of the study that the
sample was fairly large, and that it was a follow-up study with a wide range
of primary care patients.
Implications
The Depression Scale is not only an easy-to-use screening instrument, it
also appears to be a reasonably good predictor for a depressive episode years
ahead. It seems to work well with patients who have vague psychiatric
symptoms, as is often the case in primary healthcare. Some of its items have a
better ability to recognise or to predict depression than others; this
suggests the possibility of creating an even shorter version of this
scale.
|
|
|---|
Related articles in BJP:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||