Mathematical models as an aid for improving the validity of descriptive psychiatry
Ian M. Goodyer
  • Declaration of interest



Despite available therapies, mental disorders are the predominant chronic diseases of young people. Increasing the validity of descriptive psychiatry is now essential. Mathematical approaches can help characterise clinical phenotypes and aid both causal research and therapeutics in the community and the clinic.

The DSM-5 retains the existing rules-based method for clinicians to reliably determine the presence or absence of a particular diagnosis. Using this method we cannot easily determine the relative importance of each symptom to the underlying hypothesised illness. This is in part because the rules-based system invariably assumes each symptom to be equally important and contributing fully and independently to the disorder. Additionally, most rule-based diagnoses are formulated from cross-sectional clinical information. Improving diagnostic validity can occur by applying formal mathematical models to longitudinal as well as cross-sectional data.

This editorial outlines a statistical approach for retaining reliability but building greater validity into our classification of complex behaviours and mental states. The systems for using formal mathematical models are now in widespread use in many fields of biology but as yet not influencing the classification of mental illness and behavioural syndromes. A brief introduction is given using studies on understanding unipolar depression across the life course. The objective is to illustrate how psychiatric research can contribute to building a better clinical taxonomy for future aetiological and therapeutic purposes.

Modelling signs and symptoms

A formal mathematical model undertakes a statistical analysis to establish that the clinical information is collectively the best representation of the inferred illness. The aim is to relate the observed signs and symptoms or self-reported items to a set of latent unobserved variables. When clinical signs and symptoms are used to determine whether a patient meets diagnostic criteria, the mental illness is not actually measured but inferred to ‘exist’ from this information. The validity question is how ‘good’ are these symptoms at representing the hypothesised underlying illness? Mathematically this is done in two parts: first by factor analytic methods summarising the shared variance that exists between symptoms. This reflects how clinical features ‘move together’, and when this is achieved they do so with varying degrees of proximity to each other. This variation creates a quantitative latent variable. The variance now left for each symptom is independent of the other symptoms (termed local independence). The second part of a mathematical model is to reveal the importance of each ‘locally independent’ symptom on the underlying latent variable. This is achieved by locating the non-shared or unique variance of each symptom on the latent variable, thereby indicating how strongly or weakly each item is related to the underlying construct. It is this two-stage procedure that distinguishes a quantitative latent variable from previous factor analytic and descriptive psychiatry models.

Here two relatively straightforward modelling methods are described and illustrated with findings from cross-sectional and longitudinal studies on unipolar depression. Unipolar depression is a good ‘illness’ to study in this way given the limited progress in refining the validity of the clinical phenotype over the past 30 years.1 A ‘true’ classification based on pathophysiology currently remains beyond our grasp. Improving the validity of the clinical phenotype will contribute to aligning diagnoses with biomarkers, intermediate phenotypes and perhaps genetics.

Item response theory: a variable-centred approach to depression

Item response theory (IRT) or latent trait analysis determines how ‘good’ a symptom is by locating all items on the quantitative latent trait. For example an IRT analysis has shown that dysphoria is less likely to be endorsed by patients with depression over 65 years of age than younger patients and therefore its use as a first-line detector of affective disorders in the elderly is weak.2 Among adults with depression, at least six distinct latent traits have been identified, with IRT suggesting multiple and distinct aetiologies and treatment responses for the symptoms located on each of these traits.3 A longitudinal study using five waves of self-report data on a birth cohort repeatedly sampled over 40 years showed six distinct trajectories for anxiety and/or depressive symptoms emerging in the adolescent or adult years.4 Finally, in adolescents, locating depression symptoms on the latent trait of depression revealed markedly different strengths and therefore importance of symptoms which, in descriptive psychiatry, are treated as equally important for diagnosis.5 Interestingly, neither weight gain nor appetite increase in the teenage years is located at all on the latent depression trait. Their current inclusion as positive symptoms of depression will likely inflate prevalence and may contribute to some of the known treatment non-response. None of these aforementioned clinical distinctions would be revealed by the existing standard diagnostic methods.

Latent class models: a person-centred approach to depression

A recent meta-analysis of 754 clinical research papers suggested a possible 15 putative clinical subtypes of unipolar depression.6 How can mathematical models help determine the validity of a hypothetical set of subgroups of people with depression? One method is to use a person-rather than a variable-centred approach known as latent class analysis (LCA).

Initially, applying an LCA to a data-set is exploratory and hypothesis-generating because in general it is not known a priori how many subgroups there are within a population. Nor is it entirely clear what weight should be placed on each symptom in order to determine which individuals should be in each group. The aim is to determine how individuals (not variables) ‘move together’ within a discrete latent class (LCA) and ensure that all classes generated are independent from each other. The assumption is that the latent class is the ‘disease’ which causes individuals to be associated. Furthermore, symptoms of class members will be related but different from the symptoms of those in other discrete classes. Thus an individual cannot be assigned to more than one class and the items within each class are independent from each other and from other items in other classes. In longitudinal studies the form of LCA used is termed latent class growth analysis and can be applied to groups of individuals with two or more assessment points over time.

For example, within elderly patients with depression an LCA revealed that the symptom of despondency is a poor indicator of any clinical subtypes.7 A longitudinal study of adults with depression revealed multiple latent classes with five trajectories rather than the three proposed in the DSM classification.8 Here, 50% of those adults with ‘double depression’ (dysthymia of at least 2 years plus a current episode of unipolar depression) considered to have a poor prognosis in the current diagnostic systems were in fact allocated to longitudinal classes with favourable course trajectories. This proposes different underlying mechanisms for individuals with the same observed clinical phenotype at first assessment.

These techniques can also be applied to analysing risks for unipolar depression. For example, a recent LCA of 19 family adversities occurring over childhood (birth to 14 years of age) and recorded retrospectively from interviews with parents of 1143 community-ascertained adolescents revealed 4 discrete subgroups of individuals.10 Here the LCA reduces a complex patterning of family-related variables occurring differentially over time to a small number of distinct populations of adolescents. This person-centred level of description is a hypothesis-generating opportunity for further study of causal and prognostic differences between these subgroups. This may reveal potentially different psychosocially mediated mechanisms for onset and/or treatment response. A ‘vertical’ mathematical approach to risk factors at differing levels of explanation could be used in a similar manner and may get us closer to a biologically driven taxonomy of psychopathologies. This would enhance the validity of descriptive psychiatry without abandoning what is a reliable method of detecting clinical signs and symptoms in individuals across the life course.


Utilising mathematical models, particularly with information obtained from longitudinal data, will reveal more valid diagnostic categories while retaining reliability. The preliminary work is for research groups such as those with access to existing large population-ascertained databases. Further testing of validity within randomised controlled trials and new longitudinal data to determine causal, prognostic and therapeutic mechanisms would give a firm evidence base for use in routine clinical practice. This could be achieved within the next decade and contribute to reformulating clinical taxonomy for clinicians, leading to improved therapeutic decision-making for patients.


  • Funding

    I.M.G. is funded by grants from the Wellcome Trust, Medical Research Council and National Institute for Health Research (NIHR). This article was completed within the NIHR Collaborating Leadership for Applied Health Research and Care (CLAHRC) and the Neuroscience in Psychiatry Network (NSPN) funded by the Wellcome Trust.

  • Received May 18, 2012.
  • Revision received July 19, 2012.
  • Accepted August 13, 2012.


View Abstract