Background Most studies of the recognition of depression in primary care have used a categorical definition of depression. This may overstate the extent of the problem.

Aims Our objective was to investigate the relationship between severity and recognition of depression, and its modification by patient and practitioner characteristics.

Method An association study in multiple consecutive adult cohorts of 18 414 primary care consultations drawn from a representative sample of 156 general practitioners in Hampshire, UK.

Results There was a curvilinear relationship between the severity of depression and practitioners' ratings of depression. One case of probable depression was missed in every 28.6 consultations. Anxiety and unemployment altered the chances of recognition, but age, gender and deprivation scores did not.

Conclusions A dimensional approach to severity of depression shows that general practitioners may be better able to recognise depression than previous categorical studies have suggested. Efforts to improve the care of depression should therefore focus on doctors who have been shown to have difficulty making the diagnosis and on improving the treatment of identified patients.

Depression, with or without anxiety, is the most prevalent form of mental disorder in primary care (Goldberg & Lecrubier, 1995) but it is often unrecognised by general practitioners (GPs). Most studies report detection rates between 30 and 40% (range 7-70%: Docherty, 1997). Higher rates have been reported in women, the middle aged, the unemployed, those with more severe disorders or comorbid anxiety, while physical symptoms impede recognition (Marks et al, 1979; Bridges & Goldberg, 1985; Boardman, 1987; Von Korff et al, 1987; Ormel et al, 1990; Kirmayer et al, 1993; Coyne et al, 1995; Dowrick, 1995; Simon & von Korff, 1995; Sartorius et al, 1996; Tiemens et al, 1996; Odell et al, 1997; Ronalds et al, 1997). Almost all of these studies have adopted a categorical definition of depression such as major depressive disorder (DSM-IV; American Psychiatric Association, 1994), or depressive episode (ICD-10; World Health Organization, 1992) or have dichotomised relatively small samples into those above or below a threshold on a severity rating scale. Although this has the advantage of simplicity it may also introduce bias since, in primary care, minor psychiatric morbidity seldom separates out into discrete diagnostic entities (Goldberg & Huxley, 1992). It may be better conceptualised as a continuum in which anxiety and depression behave as highly correlated aspects of the same disorder. We have therefore re-xamined the evidence for poor recognition of depression by GPs, assessing severity as a continuum, and have explored the effects of age, gender and concomitant anxiety, and socio-economic status, on the relationship between recognition and severity. This approach requires large sample sizes such as that accumulated during the Hampshire Depression Project (Thompson et al, 2000), in which an educational intervention failed to produce a significant effect on recognition allowing us to aggregate the subjects into a single large multiple consecutive sample of general practice attenders.



The Hampshire Depression Project was a randomised controlled trial to investigate the effects of a clinical practice guideline and practice-based education on the recognition and outcome of depression. Full details of the methods have been published elsewhere (Thompson et al, 2000). Since no effect of the intervention on detection was seen at any stage of the study, patient contacts from all four screening phases have been combined for this study.

All general practices in Hampshire (n=224) were invited to take part. One hundred and fifty-two GPs in 55 practices completed the study and, although they were to some degree self-selected, were shown to be representative of Hampshire in their list size, the number of principals, patients per principal, the proportion of women and the proportion of part-time partners. These 152 GPs and their attending patients formed the sample for this study.


Self-ratings of depression and anxiety

In each of four screening phases over 2 years researchers distributed the Hospital Anxiety and Depression (HAD) scale to consecutive attenders aged 16 years and above in the practice waiting room during routine surgeries. This continued until at least 30 patients had been screened per GP in multi-partner practices and 40 for sole partner GPs. The HAD scale is a self-administered rating scale with 14 questions yielding separate scores for anxiety and depression (Zigmond & Snaith, 1983). It has been validated as a screening tool in general practice (Wilkinson & Barczak, 1988), and the sub-scales appear to provide a valid measure of the severity of mood disorders in primary care (Upadhyaya & Stanley, 1993). A score ≥8 on the HAD depression sub-scale (HAD-D) is the conventional threshold for identifying ‘possible depression’. GPs' age, gender, qualifications and working time were ascertained at recruitment to the study.

Patients were also asked to record their gender, date of birth and employment status.

GP ratings

Blind to the result of the HAD scale, practitioners completed a four-point rating of depression for each patient. Ratings were: 0, no depression detected; 1, sub-clinical emotional disturbance; 2, clinically significant depressive illness — mild; 3, clinically significant depressive illness — moderate or severe.

In the original study the recognition of depression was defined as the proportion of patients with a score ≥8 on the HAD—D sub-scale who were scored ≥2 on the GP scale. In this study an analysis of recognition rates for each HAD—D score was carried out and the effects on recognition and false positive rates of varying the threshold were explored.


Each attender was eligible to take part once during each of four phases of the trial. Acceptance by those approached was 89%, 20 832 attenders were screened. All patients attending more than once therefore had their second attendance removed. Analysis was carried out on 18 414 consultations by unique patients (85.4%).

Socio-economic status

Underprivileged area (Jarman, 1983) scores were allocated to practices according to the electoral ward of the surgery address. The score has been shown to account for almost half the variance in the prevalence of depressive symptoms between practices in this study population (Ostler et al, 2001).


Practitioner characteristics

The average GP case recognition rate at a standard HAD scale threshold of 7/8 will be around 30-40%, consistent with previous literature. GP characteristics such as gender and length of time in practice will influence the recognition of depressive symptoms.

Patient characteristics

More severe depressive symptoms will be recognised more frequently. A sensitivity analysis will identify the effect on recognition rates of changing the threshold for case definition. Low anxiety scores, male gender and increasing age will reduce recognition rates independently of depression scores.

Socio-economic setting

Underprivileged area score might be expected to influence recognition rate in two contrasting ways. Taking the GP's interview with the patient as a diagnostic test, their performance might be expected to be more sensitive, but less specific, in high deprivation areas where prevalence of depression is higher (Kraemer, 1988). Alternatively, GPs in high deprivation areas might more often attribute depressive symptoms to social conditions rather than illness — thus reducing recognition rates. The relative effect of these two contrasting influences is unknown.

Statistical analysis

Diagnostic sensitivity (Boardman, 1987; Goldberg & Huxley, 1992) was defined for each GP as the proportion of patients with an HAD—D ≥8 who were rated as scoring 2 or 3 by the GP. For each value of the HAD—D score from 0 to 21, the proportion of patients with a GP rating of 2 or 3 was calculated. Logistic regression was used to model the data, generating equations in the form

logit (p)=c+(b*HAD—D)

where p is the probability of a positive GP score (≥2).

Recognition (or sensitivity) curves were plotted for patients grouped by age (16-64 v. 65+ years), gender, HAD—A sub-scale (0-10 v. 11-21) and underprivileged area score (<-10, -10 to +10, > 10). These thresholds were adopted prior to analysis. Finally, multiple logistic regression was used to examine the effect of controlling for the severity of depression and anxiety on the recognition of screened cases by gender, age and occupational group of patients.


Sample characteristics

Characteristics of the aggregated patient sample are given in Table 1. The prevalence of possible depression was 19.9% in this sample ranging from 12.4% in students to 44% in those permanently unable to work. The median number of patients screened per GP over the four phases was 123 (interquartile range 32.75), with only four GPs contributing less than 50 patients.

View this table:
Table 1

Characteristics of the patient sample

Practitioner characteristics and recognition rates

The mean recognition rate (sensitivity) across all practitioners was 36.1% (95% CI 33.8-38.4) with specificity 91.5% (95% CI 90.6-92.5) and κ=0.31 (0.28-0.33), consistent with previous studies.

Practitioners rated a mean of 13.6% (s.d.=6.9%) patients as being depressed with extremes of 0/95 (0%) and 28/73 (38.4%). Median diagnostic sensitivity was 0.67 (interquartile range 0.38), only slightly lower than the figure of 0.78 quoted by Goldberg et al (1982) for the recognition of psychological morbidity by British GPs.

Part-time practitioners were less likely to diagnose depression (median recognition rates 0.57) than full-time GPs (median 0.70, Mann—Whitney U-test, P=0.033) and they also tended to work in less deprived areas (mean underprivileged area score difference -9.00, t-test P=0.004). There was no significant effect on recognition of GPs' gender (Mann—Whitney U-test P=0.67), length of time working in general practice (rs=0.111, P=0.172), prevalence rate in the patient sample (rs=-0.101, P=0.216) or underprivileged area score (rs=-0.099, P=0.221).

Depression severity and recognition rate

Figure 1 shows the relationship between severity of depressive symptoms and recognition rates. Apart from the very high scores at 19-21, where there are few cases, there is a strong relationship between HAD—D sub-scale score and recognition.

Fig. 1

Distribution of Hospital Anxiety and Depression (HAD) scale depression scores (Zigmond & Snaith, 1983) and probability of being rated as depressed.

Using the conventional analysis of the number of ‘cases’ that are ‘missed’, the performance of the GPs in this study was similar to other reports (Docherty, 1997). At the threshold of eight or above 64.7% of cases were missed. However, the dimensional approach to the data demonstrates that at progressively higher scores the lower prevalence and the increasing recognition rate makes this simple analysis misleading. This is because the proportion of missed cases drops markedly with small increments of the threshold score and the total number of cases also falls. Thus, 72.6% of all ‘ missed cases’ scored 8-10 (‘mild’ or ‘ doubtful’ depression).

This dimensional approach shows the critical effect of the choice of threshold for defining the ‘case’ of depression. It can be illustrated by examining the effect of a progressive rise, by a single point at a time, in the threshold for case definition. Table 2 shows that the proportion of missed cases diminishes as the threshold increases, which is not surprising — although the rate at which it diminishes may be. In addition (bearing in mind the difficulty of identifying a `gold standard diagnosis for depression) wherever the threshold is set, 30-50% of all missed cases lie only one point above that threshold.

View this table:
Table 2

Effect of threshold on the proportion of missed cases, and the missed cases as a proportion of the total screened group

The results of recognition rate studies are usually presented as the proportion of ‘true’ cases that are missed. However, a better indicator of the acceptability of practice would set the denominator as the total consultations, that is, 18 414, rather than the number of ‘ cases’. On this analysis 12.9% of all consultations contain a failure to identify a ‘possible or doubtful case’ (33.6% of which are one point above that threshold at the time of interview). Some 3.5% of consultations contain a failure to identify a ‘probable case’ (of which 34.7% are one point above that threshold at the time of interview). Thus at this more robust, higher threshold one patient with a probable depression is missed every 28.6 consultations without allowing for error in the questionnaire.

GPs may recognise depressive symptoms but categorise them as sub-clinical emotional symptoms (score one on their questionnaire), a clinical judgement that may be appropriate at borderline levels of severity. Of those patients with probable depression 75.9% were rated by the GP as having some emotional disturbance (score one or above). Figure 2 shows the effect of different recognition thresholds on the relationship between recognition and severity. Using a score of one as the criterion the number needed to screen before a case of probable depression is missed increases from one in 28.6 to one in 58 consultations.

Fig. 2

Recognition and severity of depression. HAD—D, Hospital Anxiety and Depression scale — depression sub-scale score (Zigmond & Snaith, 1983).

Factors affecting the relationship between recognition and severity

Figure 3 shows that patients with higher anxiety scores were more likely to be recognised as depressed at all levels of depression severity. There was a moderate correlation between HAD scale anxiety and depression scores (r=0.599, P<0.0005). The effect of adjusting for severity of depression and anxiety on the recognition of screened cases is shown in Table 3. Before adjusting for severity, women, the unemployed and those who were permanently unable to work were significantly more likely to be recognised, while the elderly and retired patients were more likely to be missed. Adjusting for depression severity eliminated the significance of being permanently unable to work so their higher rate of recognition is explained by more severe symptoms. Adjustment for both anxiety and depression scores eliminated the significance of gender, age and retirement status. Thus, after adjusting for the severity of depression and anxiety symptoms the only remaining bias was an increased sensitivity to depression among the unemployed and those temporarily away from work, possibly mediated by prior knowledge of treated depression.

Fig. 3

Recognition of depression: effect of anxiety. HAD—D, Hospital Anxiety and Depression scale — depression sub-scale (Zigmond & Snaith, 1983)

View this table:
Table 3

Effect of controlling for severity of depression and anxiety on recognition of screened cases by gender, age and occupational group. Cases defined as HAD depression score ≥8 general practitioner (GP) rating of depression as ≥2 (clinically significant depressive illness, mild moderate and severe)


Methodological considerations

The practices in this sample were 24% of all those available in Hampshire and were representative of the whole group in terms of organisation and personnel. Although the participants may be assumed to have a greater interest in depression than the non-participants this effect must also have operated in previous studies and indeed these GPs fared no better than their colleagues who were studied in previous reports when we used a standard dichotomous analysis.

The study might be criticised for not employing a diagnosis of depression based on a research interview against which to judge practitioners' skills rather than a self-rating questionnaire. Such an approach would have appropriately eliminated some patients with depressive symptoms whose primary diagnosis was not depression and would have set a longer duration of symptoms for identification than the HAD scale response period. We discuss below the value of the dimensional approach, but in addition the size of this study rendered full diagnostic procedures impractical. They would also have introduced observer bias in the interpretation of the depressive symptoms that is not present when dealing with self-ratings. The patients included only those who were ambulatory and able to attend the doctors' practice premises, excluding potentially depressed patients among the chronically ill and disabled group, who were therefore under-represented.

Findings in the context of the previous literature

The recognition of depression by GPs has been a subject of investigation in many studies, all of which have suggested low true positive rates of identification. Some of these have previously shown that recognition is dependent on severity (Coyne et al, 1995; Ormel & Tiemens, 1995; Dowrick, 1995), a conclusion with which we concur. We disagree, however, with previous research suggesting that there is better recognition of depression in women and the middle-aged, and poorer recognition in the elderly (Boardman, 1987; Katona et al, 1995). These effects may have been significantly confounded by severity of illness and after allowing for this we have shown that diagnosis is based on symptoms rather than stereotypes. It is also reassuring that practitioners working in deprived areas were not biased against a diagnosis of depression owing to reduced expectations of patients' quality of life. Indeed they appear to be somewhat over-sensitive to the possibility of depression in patients who were currently unemployed.

Our findings, however, go further than the prior literature in two ways. First, we calculate non-recognition rates by reference to the consulting population, rather than as a proportion of the number of patients with depression. Second, in dimensional conditions such as depression, apparently low rates of diagnosis can be produced by adopting a low score as a case threshold. We have shown that the choice of threshold critically affects the recognition rate because of the diminishing prevalence of higher scores combined with increasing recognition, thus explaining the wide variations of previous estimates. Furthermore, some of the one-third of missed cases that lie only just above any given threshold on the HAD—D sub-scale may be true negatives since all questionnaires and diagnostic procedures have rating errors. In addition, these recognition rates are obtained from a single 9-minute consultation, and the GP's prior knowledge of the patient. They must be taken together with evidence that many ‘missed’ patients are diagnosed correctly at a subsequent visit (Ormel & Tiemens, 1995).

Implications, for practice and education

One criticism of our study might be the absence of a ‘gold standard’ diagnostic criterion for depression. We believe this criticism would be hard to sustain because psychiatric diagnostic categories are rarely used routinely by GPs despite their need to make some dichotomous decisions, for example whether or not to treat with anti-depressants. This may be because of the absence of clear validity data for DSM and ICD syndromes in primary care — a category such as major depressive disorder having no dichotomous relationship to clinical disability, need for treatment or level of risk. In such a situation a dimensional approach has greater epidemiological validity then a categorical one since it makes fewer assumptions. The importance of this dimensional view of depression is strengthened by the evidence that milder, so-called sub-syndromal symptoms are very common and are associated with considerable health and social problems (Judd et al, 1996).

An appropriate criterion for a gold standard definition would be validated by reference to evidence of treatment benefit and this may vary from one treatment to another depending on the definitions of treatment and of acceptable benefit. If one were to become available in the future it would make a dichotomous approach more tenable since it would demonstrate a tangibly impaired access to effective treatment as a result of missed diagnoses. Even in the absence of this evidence, however, it is reasonably safe to assume from our findings that increasing the sensitivity of GPs to depression through educational interventions will also increase the false positive diagnostic rate of some hypothetically valid depressive entity — with the consequent dangers of unnecessary treatment. Since the size of the non-depressed population is larger than that of the depressed group any shift of the recognition point to the left will lead to a greater increase in the numbers of non-depressed unnecessarily treated than in the numbers of patients with depression correctly treated. In this regard, the difference in the recognition curve between ‘sub-clinical emotional disturbance’ and ‘ clinically significant depression’ demonstrates that GPs are using their clinical judgement in recognising emotional disturbance that they believe does not require medical intervention. These results should also be placed in the broader practice perspective. Patients often present multiple ill-defined complaints and GPs rarely address mental health in isolation from other problems. Indeed, focusing on mild depressive symptoms has an opportunity cost, leaving less time for possibly more pressing demands in the relatively short time of the consultation (Klinkman, 1997). There is little evidence for the efficacy of intervention in milder depressive syndromes, and many resolve spontaneously so GPs may reasonably judge that diagnosis is not critical in these borderline states (Paykel & Priest, 1992).

Taking these factors into account it seems likely that the recognition rates of depression in general practice are not so poor as has been claimed in the past. Interventions that aim to improve GPs' recognition of depression face a difficult task if they are not also to reduce specificity and lead to potentially unnecessary treatment. Educational and research programmes should therefore concentrate primarily on targeting under-performing practitioners and enabling the better treatment of diagnosed patients (Thompson, 1999).

Clinical Implications and Limitations


  • Contrary to the prevailing consensus, general practitioners only miss one ‘probable case’ of depression in every 28.6 consultations. Recognition is directly related to severity of depression, moderated appropriately by the severity of anxiety symptoms.

  • The relationship between severity and recognition is modified only by unemployment, which increases sensitivity, and there is no evidence of bias due to age or gender.

  • Increasing general sensitivity of general practitioners to depression will increase the false positive rates of ‘diagnosis’, thus increasing unnecessary treatment, and is unlikely to improve the care of depression as a whole. For this, more complex clinical and organisational solutions, based on better research evidence, will be needed.


  • The study had large numbers, but limited information about the consultation and the patient.

  • A self-rating of depression was used rather than an interview-based definition of depression, giving dimensional rather than categorical descriptions.

  • In the absence of a clear relationship between severity of depression and response to medical treatment we cannot be sure that the thresholds adopted were the most clinically beneficial for the patient population.


This study was funded by the Medical Research Council, the Research and Development Committee of the South West Region of the NHS Executive and the Southampton Community Health Services Trust. We are grateful to Professor Andrew Stevens who was a grant-holder for the original Hampshire Depression Project and to the general practitioners and patients who took part.

  • Received August 7, 2000.
  • Revision received January 18, 2001.
  • Accepted January 23, 2001.


View Abstract