From inventory to benchmark: quality of psychiatric case registers in research

Vera A. Morgan , Assen V. Jablensky


In recent years, there has been a marked increase in the use of psychiatric case registers for research purposes. Registers are a valuable data asset but there are no standard guidelines for evaluating their use in research. It is becoming increasingly important for researchers who use register data, and journal editors who publish their work, to set benchmarks to assess the quality of a register and its research application. Several criteria that could form the basis of such an evaluative framework are discussed. The discussion is illustrated using a Western Australian e-cohort of half a million children for whom we have assembled comprehensive data cross-linked across a number of administrative registers.

Two articles on the use of psychiatric case registers in research were published recently: an editorial in the September issue of the British Journal of Psychiatry1 closely followed by a paper on the Nordic registers in Acta Psychiatrica Scandinavica.2 Their publication is timely and welcome. The longitudinal nature of registers, their size and coverage of defined populations make them an important research asset. Recent years have seen an increase in the use of the psychiatric case register for research purposes, including linkage across diverse health and other population databases such as criminological databases. It is over 20 years since the publication of Ten Horn’s comprehensive inventory of the psychiatric case register and its use in research.3 An update on the current status of psychiatric registers worldwide is well overdue. It is also an opportune time to move from the inventory to the benchmark, and discuss guidelines for the appropriate use of registers in research, including a re-examination of their limitations and a dialogue about suitable methods for dealing with these challenges.

A framework for evaluating registers

Mortensen4 provides an excellent framework for evaluating the use of registers for epidemiological research by utilising four objectives of epidemiological research against which a study’s methodology can be assessed:

  1. maximisation of the precision of the disease estimate

  2. minimisation of selection bias

  3. minimisation of information bias

  4. control of confounders.

The aim of this paper is to expand on these objectives. Where appropriate, we illustrate our points using a Western Australian e-cohort of half a million children for whom we have assembled comprehensive data cross-linked across a number of administrative registers including the Western Australian Mental Health Information System, which is the psychiatric case register for a State of 2.5 million square kilometres and a population of 2.2 million. This provides a concrete demonstration of both the advantages and limitations of the use of psychiatric case registers and record linkage in psychiatric research.

The first objective, precision, refers to the reliability of a parameter estimate and is reflected in the width of its confidence interval. Sample size is an important determinant of precision. The larger the sample, the narrower the confidence interval, the greater the precision and the less likelihood of a type II error (false negative). An advantage of register-based studies is that they are a cost-effective means of collecting data on large samples, especially for the study of rare outcomes. However, this advantage is offset by the time needed to clean and validate data, and build meaningful constructs using a set of variables collected primarily for administrative purposes.

Mortensen’s second objective relates to selection bias affecting a register-based study when the sample it covers is not representative of the total population of interest. Different forms of selection bias include: geographical bias (e.g. a sample may only include persons living in the metropolitan area); treatment bias (e.g. a sample drawn solely from an in-patient hospital register may include more severe cases only); social bias (e.g. a sample may cover public facilities only and exclude private facilities); and selective survival bias (with differential survival skewing the sample profile). For each research undertaking, the coverage of the psychiatric case register must be examined and evaluated against the study’s aim, to determine its adequacy for the purpose. For example, the Western Australian Mental Health Information System is a whole-of-population register of long standing (since 1966) with no periods of discontinuity. It covers all public and private in-patient admissions, as well as public out-patient and ambulatory care contacts with mental health services across the State.5 As such, it is an excellent resource for the study of low-prevalence disorders such as the psychoses where the vast majority of affected individuals have had contact with either in-patient or out-patient services. However, as it misses contacts with general practitioners and psychologists/psychiatrists working in private practice, its use in the study of high-prevalence disorders more commonly seen outside of in-patient and out-patient services is limited. Such use calls for justification and explicit statement of the limitations.

The third objective, minimising information bias, is concerned with reducing bias associated with: loss to follow-up due to death or movement out of the area of the register; lack of information prior to first entry on the register due to migration into the region covered by the register (in-migration); failed linkages across registers; and periods of discontinuity of the collection of data on the register. In Western Australia, linkage failures due to name changes and spelling variants are reduced through the use of aliases and phonetic spelling in probabilistic linkage protocols. Cross-linkage with a range of other state-wide registers, updated regularly, further aids the identification of identity errors. Loss to follow-up due to death is reduced by linkage to mortality data which include date and cause of death. Loss to follow-up through migration out of the region covered by the register (out-migration) is minimised owing to the State’s geographical isolation and its economic advantages that give it one of the highest rates of Australian interstate in-migration in the past decade, and one of the lowest levels of out-migration. Where in-migration is an issue, we have limited case-finding to people born in Western Australia, a variable recorded on the psychiatric case register (e.g. Morgan et al6). One advantage of registers is that, as a result of prospective recording of information, they avoid recall bias, a form of information bias that may affect the retrospective collection of data.

Mortensen’s final objective, control of confounders, is critical. A well-designed prospective or retrospective study collects data on variables that previous research or theory suggest may confound or modify the relationship between the exposure and outcome of interest. In register-based studies, the researcher is restricted to variables available on the register. Since many of these are collected for administrative rather than research purposes, important confounders and modifiers may be missing. There are a number of ways to deal with their absence. Linking several different registers covering the same population increases the chance that variables not available on one database may be available on another. For example, we linked women with psychosis on the psychiatric case register to their obstetric data on the midwives register to examine familial and environmental (obstetric) risk factors for adverse neuropsychiatric outcomes in their children.5,7 We also included hospital morbidity, mortality, birth defects and intellectual disability registers, among others, to increase the data sources available for just under half a million offspring (Fig. 1). For some missing data, there may be adequate proxies. For instance, census-based area-level data derived from address fields can be proxies for missing socioeconomic data.8 Where important fields are not available, the statistical impact of the missing variables may be modelled using data from other sources. Alternatively, it may be possible to nest a smaller clinical case–control study into the design of the larger register-based study or to link, with consent, register data for an individual to their survey data. Regardless of the approach taken, discussion of the potential impact on the study of missing confounders and effect modifiers, and justification of the strategy employed, are essential.

Fig. 1

Western Australian registers linked for the population-based study of obstetric, developmental and neuropsychiatric outcomes in the offspring of women with severe mental disorders.

Building constructs using register data

A fifth key objective can be added to Mortensen’s list: development of proxies and constructs from register data fields so that they closely approximate their primary form in the clinical setting. In psychiatric case register research there are three fields in particular that require careful scrutiny: the allocation of a diagnosis; the timing of onset of illness; and the measurement of illness severity. Assigning a diagnosis on the basis of longitudinal register data is particularly complex and will be used here to illustrate the difficulties.

First, the reliability of the diagnosis as recorded by a clinician and coded by the data clerk for administrative purposes will vary by register and time period. For each register, some evidence of the concurrent validity of the register diagnosis against a clinical interview-based diagnosis is essential. However, the review of validity studies by Byrne et al suggests that the number of validation studies is low and their quality variable.9 Second, a register is dynamic and captures clinical and other data at key administrative time points during each episode of illness for an individual who may have had many contacts with services. The diagnoses recorded at these time points may change owing to the acquisition of new information by the diagnostician or changes in a patient’s symptomatology over time. Researchers need to be explicit about how they handle multiple diagnoses. Some of the strategies that researchers employ include: application of a diagnostic hierarchy,1012 at least one discharge with the diagnosis of interest,5 the last diagnosis recorded on the register,13 or a calculation of the most frequent diagnosis, so-called diagnostic dominance.14 Researchers using the Swedish psychiatric case register (which until recently only had access to in-patient data) have set a criterion of at least two in-patient admissions with the discharge diagnosis of interest.15 In Western Australia, the register has long included out-patient contacts, and such information should not be lost. To assign a diagnosis of schizophrenia or affective psychosis, we have developed and validated an iterative algorithm based on the last diagnosis recorded at specific time points.5 Third, changes in the official classification and diagnostic criteria over time also influence the recording of diagnoses. Researchers relying on longitudinal data affected by multiple revisions within a diagnostic classification need adequate correspondence between revisions. Deficiencies in correspondence may be subtle but critical. For example, most concordances assume that major differences between ICD–916 and ICD–9–CM17 occur at the five-digit level only; in fact, coding at the four-digit level has a major impact on differentiating unipolar from bipolar diagnoses with categories reversed for some codes. Moreover, with each new revision, one generally finds a finer level of detail of diagnostic categorisation and a decision needs to be made whether to code up to the new classification (despite the splitting of categories) or down.

Of note, particularly in relation to the last two objectives, is the capacity to link records between psychiatric and other registers. Record linkage does not, of itself, validate the psychiatric case register but it does provide important additional material for analysis, subject to the validation of the data on the linked source. This includes data on confounders, risk modifiers, other exposures and other outcomes. The development of an integrated system of linkages across a jurisdiction to capture multiple sources of longitudinal data is a long-term project. The strategy employed in Western Australia has been described in some detail.18 Of recent interest is the capacity to link to biosamples such as archived neonatal dried blood spot samples19 and DNA samples,18 thereby bridging an important gap between epidemiological and biological research. This is of particular importance in the area of genetics research.

Privacy and scientific accountability

Finally, although psychiatric case registers are an efficient and cost-effective data source, their use in large-scale epidemiological studies raises complex issues related to consent and privacy. Sibthorpe et al20 make an important contribution to the discussion by differentiating the role of individual identity depending on the use of a register. For although the identity of the individual is critical to the use of the register as an administrative management and care planning tool, in research the identity of specific individuals is of no intrinsic consequence, except in the interim to enable cross-linkage. Notwithstanding that, access to registers is a privilege that comes with a large onus on researchers. The onus is not only to preserve confidentiality but also to be accountable to the people whose data they utilise. They need to ensure that the investigations undertaken are well-designed and of scientific merit, so that the findings are of benefit to the broader community, be it directly or indirectly, in the short-term or the long-term. Part of that scientific accountability includes ensuring that the use of register data meets appropriate standards.

Future directions

In other research modalities, we have standard psychometric measures to assess the reliability and validity of scales developed for use in research, while consortia and reference groups have proposed benchmarks for assessing the quality of: clinical trials,21 systematic reviews and meta-analyses,22 reviews of health care interventions23 and qualitative research.24 As more and more psychiatric case registers are employed for research purposes, it is becoming increasingly important for the community of researchers who use register data to set agreed benchmarks to assess the quality of a register and its research application. Some refinement of Mortensen’s criteria (precision, selection bias, information bias, confounders),4 integrating Allebeck’s overlapping characteristics (coverage, attrition, representativeness and validity)2 is a starting point. In turn, publications relying on register data need to provide sufficient details for reviewers and readers to assess how well the study meets these standards.

  • Received February 15, 2010.
  • Revision received February 18, 2010.
  • Accepted March 10, 2010.


View Abstract