Relationship between service ecology, special observation and self-harm during acute in-patient care: City-128 study

Len Bowers , Richard Whittington , Peter Nolan , David Parkin , Sarah Curtis , Kamaldeep Bhui , Diane Hackney , Teresa Allan , Alan Simpson



Special observation (the allocation of nurses to watch over nominated patients) is one means by which psychiatric services endeavour to keep in-patients safe from harm. The practice is both contentious and of unknown efficacy.


To assess the relationship between special observation and self-harm rates, by ward, while controlling for potential confounding variables.


A multivariate cross-sectional study collecting data on self-harm, special observation, other conflict and containment, physical environment, patient and staff factors for a 6-month period on 136 acute-admission psychiatric wards.


Constant special observation was not associated with self-harm rates, but intermittent observation was associated with reduced self-harm, as were levels of qualified nursing staff and more intense programmes of patient activities.


Certain features of nursing deployment and activity may serve to protect patients. The efficacy of constant special observation remains open to question.

Maintaining the safety of acutely disturbed in-patients during periods of psychiatric crisis is difficult. Some may be suicidal or want to harm themselves, and others may be vulnerable, prone to abscond or may pose a danger to other people. One way to keep a patient safe is to allocate an identified person to care for them, called special observation. It can take two forms: the constant presence of the observer with the patient or intermittent checks at short time intervals.

There is no evidence on the efficacy of special observation.1 Deaths during special observation have been reported,2 and the practice may only shift the risk to the time when special observation is terminated, or into the post-discharge period. Some have argued that it is inherently depersonalising and that nursing care should focus more on giving support and developing relationships with patients,3 whereas others see special observation as having an important preventive role.4 Intermittent observation has also been criticised as being by definition inefficacious.5,6

The purpose of this study was to assess the relationship between special observation and self-harm rates, by ward, while controlling for potential confounding variables (patient characteristics, service environment, physical environment, patient routines, other patient behaviours, use of other containment methods and staff characteristics).


Data were collected from acute wards on rates of self-harm, special observation, other conflict and containment methods, the patients admitted, the staff team and the environment of the ward. Multilevel modelling was then used to assess relationships between the main items of interest (special observation and self-harm) while controlling for the effects of other variables.


The target sample size was 128 acute National Health Service (NHS) psychiatric wards, their patients and staff, geographically situated proximate to three centres (London, Central England, Northern England). In the north, the sample included Blackpool and Preston; to the west, Shropshire; to the east, Leicester; and to the south, London. Acute psychiatric wards were defined as those that primarily serve adults who are acutely mentally disordered, taking admissions in the main directly from the community, and not offering long-term care or accommodation. Wards that were organised on a specialty basis, or that planned to change population served, location or function, or which were scheduled for refurbishment during the course of the study were excluded. Each centre identified all eligible wards within reasonable travelling distance of their research base. The initial intention was to randomly sample wards, with replacement for refusals to participate. However, the geographical dispersion of wards meant that to achieve the requisite sample size, the Northern and Central England centres had to recruit all available wards within practical reach for data collection. In London, it was possible to randomly sample from a list of 112 wards. Data were collected over a period of 6 months on each ward. Commencement of data collection by selected wards was staggered over an 18-month period, for logistical reasons. In essence, at each research centre, groups of wards started the study in four or five cohorts during 2004–2005.

Data collection and instrumentation

Information on the ward physical environment and the policies in operation was collected on a site visit by a researcher and a form completed by the ward manager; data on the main outcome measures were collected by end-of-shift reports by the nurses in charge; the ward multidisciplinary team were required to complete a selection of standardised questionnaires, parcelled into several batches to reduce demand on busy practitioners; and smaller samples of patients and staff were asked to complete questionnaires.

The shift report version of the Patient–staff Conflict Checklist – Shift Report7 (PCC–SR) was used to log the frequency of patient conflict behaviours (e.g. self-harm, absconding, violence, medication refusal) either attempted or successful, and the staff containment measures used to maintain safety (e.g. intermittent special observation, constant special observation, seclusion and physical restraint) and was compiled using strict definitions at the end of every nursing shift. On entry to the study, ward nursing staff received training in the use of the PCC–SR, and each ward was provided with a handbook giving definitions of items. For all incidents of self-harm or attempted suicide, a Bongar Lethality Scale8 was completed as part of the PCC–SR to assess the severity of the incident. The PCC–SR was supplemented with additional items to include age, gender, diagnosis, ethnicity and the postcode of patient's place of residence for those patients admitted during the shift. In recent tests based on use with case-note material, the PCC has demonstrated an interrater reliability of 0.69,9 and has shown a significant association with rates of officially reported incidents.10

Basic ward data were collected on two forms: one completed by the researcher visiting the ward in conjunction with the ward manager and the second completed by the ward manager alone. Staff attitude to difficult patients was assessed using the Attitude to Personality Disorder Questionnaire.11 Ward structure was assessed using the Order and Organisation, Programme Clarity and Staff Control sub-scales of the Ward Atmosphere Scale.12 The quality of ward leadership was assessed by taking the score for the ward manager, as rated by ward staff, using the Multifactor Leadership Questionnaire.13 Multidisciplinary team cohesion was assessed using the Team Climate Inventory.14 Burnout was assessed using the Maslach Burnout Inventory.15 Some staff and patients (ten per ward) were asked to complete the Attitude to Containment Measures Questionnaire.16 This scale provides relative measures of views on acceptability, efficacy, dignity, safety for patients and safety for staff of different forms of containment for disturbed behaviour.


Initial management approval for wards to participate in the study (named City-128) was sought in advance from trust chief executives. Ethical approval for the study was obtained from the North West Multicentre Research Ethics Committee. Following sample identification and research governance approval, letters were sent inviting each selected ward manager and their teams to participate in the City-128 study, detailing the purposes and advantages of participation, and the nature of the commitment required. Expression of interest resulted in a site visit to the ward and its team by a researcher, who made a presentation about the study and collected ward assessment data. At this point staff were instructed on how to collect shift reports using the PCC. A project liaison person was appointed from the ward personnel, and contacts were also made with directors of nursing and with senior managers to ensure that everything went smoothly. Data collection commenced immediately and continued for 6 months on each participating ward. Wards were recruited to the study in several separate cohorts at each research centre. Batches of questionnaires for staff were issued to the wards at roughly monthly intervals, with instructions for their completion. Completed questionnaires were posted in a sealed box on each ward and collected at regular intervals by the research assistant.

Response rates

In London, one trust declined to participate, and of wards randomly sampled in participating trusts, 2 declined and 1 was excluded owing to a scheduled refurbishment. In the North West, 16 wards refused to participate, most on the grounds of commitment to other projects, with 3 hospitals (accounting for 8 of the 16 ward refusals) declining to participate at higher management levels than the ward managers. An additional 4 wards were excluded because of scheduled refurbishments, and 3 because of extremely poor response rates (no more than two or three PCC–SRs per week). In Central England, no trust or ward refused to participate and no ward had plans for refurbishment necessitating its exclusion. Because of over-sampling for anticipated drop-outs, which did not occur, a total of 136 wards completed data collection for this study. Over 45 000 PCC–SRs (67% of the total potential returns) and 9000 other questionnaires were collected for this study (mean response rate of 49% per questionnaire). A full analysis of the response rates and other variables in relation to data validity and reliability can be found in the full report.17

Preparation of the data for analysis

The large number of variables available meant that some consolidation was advisable prior to the analysis. Compound scores for the observability and physical environment quality, banned items, restriction on patients, and so on, were therefore created. The separate scores produced by most of the questionnaires were also highly intercorrelated (r=0.7 or greater); where this was the case, scores were combined prior to analysis by taking means at the ward level.

Conflict and containment event counts were standardised to wards of 20 beds (i.e. (count/bed numbers)×20), so that variation due to the size of wards was removed. All continuous variables (conflict and containment rates, compound scores, questionnaire scores and other items) were converted to z-scores prior to analysis to allow for appropriate comparisons of effect, as items were on very different scales.

Information was collected on 16 240 admissions, of which 4112 had valid postcodes that could be matched to local area geographical data, allowing the calculation by ward of a mean Index of Multiple Deprivation18 and Social Fragmentation Score.19,20 Descriptive data on all modelled variables is provided in online Table DS1, together with univariate associations with rates of self-harm incidents.

Analytic method

Multilevel random-effects modelling was carried out using MLwiN 2.02 for Windows on total Bongar Lethality Scale score for the shift, which was dichotomised into no incidents and incidents, owing to distributional problems of the original score (very few incidents). The model was tested to ensure that a binomial distribution was appropriate and that there was no extra binomial variation that needed to be accounted for. Random-effects modelling allows for the fact that the wards were only a sample of all possible wards and, similarly, trusts were only a sample of all possible trusts. A three-level model was explored, with shifts at the lowest level,1 wards at level 2 and trusts at level 3. That is, shifts were nested within wards, which were nested within trusts. Shifts were chosen as a level because of clustering effects within morning, afternoon/evening and night shifts; wards for similar reasons, and trusts because they represent organisational units with single local policies and operational procedures. The penalised quasi-likelihood method of estimation was used with second order linearisation, since this method does not tend to underestimate variance.21

The model was produced through a staged process of backward selection, deselecting the least significant at each stage, leaving only variables significant at P=0.05. Each group of variables (domain) was used to build a separate initial model, then the significant variables were used to construct a final comprehensive model using the same process of backward selection. A small number of the study wards operated on a two 12 h shift pattern, so a categorical variable indicating this was incorporated as a constant at every stage of the analysis, without being removed because of failure to reach statistical significance. Although there were significant associations between some of the independent variables in our study, sometimes to the extent of multicollinearity (see further below), there was no logical reason why any particular variables should be considered to be intervening, rather than potentially causal in their own right; neither is there any evidence in the existing research literature that this is the case.22 However, it is possible that some variables might play that role, perhaps particularly conflict behaviours other than self-harm. We therefore present the results of the separate domain analyses as well as the final complete models.

Following the construction of this overarching model, another model was constructed using the same methods, with more major self-harm (termed `moderate', Bongar raw score of 2 or above) as the dichotomous dependent variable. Analyses using higher cut-off points were not possible, owing to the rarity of incidents at increasing levels of severity.


Study wards

The 136 wards of the sample were situated within 67 hospitals within 26 NHS trusts. The mean number of beds per ward was 21, with a range of 11–30, with an average of 51% of these beds in single rooms. Most wards (48%) were built in the 1980s and 1990s, with 17% in 2000 or later, 19% in the 1960s and 1970s, and only 16% prior to this. The mean number of nursing staff in post per bed was 0.99 whole-time equivalent (s.d.=0.22); the mean proportion of these staff who were qualified nurses was 0.61 (s.d.=0.12), and the mean vacancy rate was high at 15%. Male-only and female-only wards were in the minority (13% and 14% respectively), with most (73%) being for both genders. A significant proportion of wards (41%) had no established occupational therapists allocated to them, and the vast majority (87%) had no dedicated clinical psychologist time at all.

Multilevel models

There were 4062 shifts during which a self-harm incident occurred, representing 8.7% of the total. The vast majority of these (3510, or 7.5% of all shifts) were minor, with Bongar scores of 0 or 1 (death impossible or very highly improbable).

Tables 1 and 2 depict the results of multilevel modelling with self-harm as the dependent variable. The first results column of each table shows the models resulting from within-domain analyses (i.e. just the patient variables, or just the service environment variables); the second results column shows the final combined model; and the third, the results of variance partitioning (using method D of Goldstein),23 identifying at which level associations occur.

View this table:
Table 1

Multilevel models for all self-harm

View this table:
Table 2

Multilevel models of moderate self-harm

For all self-harm (Table 1, final combined model), the proportion of patients admitted with a diagnosis of schizophrenia was associated with decreases in rates, along with the Index of Multiple Deprivation, intermittent observation and having qualified staff on duty. For qualified nursing staff, the main level of association was trust level, perhaps reflecting organisation-wide nurse staffing policies. It is interesting to note that the presence of student nurses in the all self-harm model shows the opposite pattern, with association with self-harm having an impact at the shift level, perhaps indicating a more direct influence. For intermittent observation, the association was at shift level, indicating a within-shift correlation between greater intermittent observation and lower risk of a self-harm incident. Doors locked for less than 3 h had no significant association. However, for any periods greater than this, these were associated with more self-harm at both ward and shift level. Rates of use of constant special observation were not significantly associated with self-harm.

For moderate to serious self-harm (Table 2, final combined model), the variables that were associated with reduced moderate self-harm were having planned patient activities and intermittent observation, the latter again showing an association at the shift level. For all other variables there was an increased significant chance of a moderate self-harm incident. The proportion of patients admitted of Caribbean ethnicity showed the greatest odds of a moderate self-harm incident.

In both models, throughput of patients shows associations at both trust and shift levels. This indicates that not only were shifts in which an admission occurred at a higher risk of a self-harm incident, but that trusts with high patient throughput also had higher risks. Associations at the trust level are, however, difficult to interpret, as they may reflect the impacts of a number of overall policies in relation to practice, service structure or resource allocation.


Several elements of the data-set were consolidated prior to analysis (ward observability, physical environment quality, banned items, restrictions, etc.) in order to provide for meaningful results and to reduce the total number of variables to a manageable level. Where questionnaires producing more than one score were highly correlated with themselves (0.7 or larger) compound measures were created. Multicollinearity did not influence our resulting models, as all pair-wise correlations were less than 0.4 and the highest variance inflation factor was 1.4.

Sensitivity analyses

Three analyses were conducted to assess the sensitivity of the above results to different ways of dealing with missing data. In the first of these exercises, the ten lowest responding wards (returning fewer than 196 PCC–SRs) were excluded and the multilevel model of all self-harm was conducted again. In the second exercise, the ten wards that declined most sharply in their response rates over the course of the study (correlation response rate/week with time × week of less than –0.67) were excluded and the modelling exercise conducted again. Finally, the effect of excluding admissions where three or more data items were missing (excluding postcodes) was assessed.

Excluding the ten lowest responding wards had no effect on the domain models or the full model, producing an identical result. Excluding those wards with the steepest declines in response rates also had little effect, with few changes to the domain models. The full model was only slightly different from that produced by including all the data, with the added inclusion of the proportion of admissions considered to pose a risk of harm to others becoming significant, and `aggression towards objects' substituting `aggression to others'. The use of a more conservative criterion for the inclusion of admission data had an impact on findings related to ethnicity, as well as removing the variable `admissions per day' from both the domain and full models. In relation to patient characteristics, this analysis led to the substitution of `proportion of patients Caribbean' with `proportion of patients White'. However, the proportion of patients White was highly correlated with the proportion of staff White (r=0.79), introducing a problem with collinearity, and possibly indicating that staff and patient ethnicity may have interactive effects.


No relationship was found between constant special observation and rates of self-harm. However, intermittent observation was inversely correlated with self-harm rates. That inverse correlation persisted in the model of moderate to severe self-harm, and in all analyses assessing sensitivity to missing data. The absence of a positive correlation between self-harm and constant special observation is surprising, as risk of self-harm or suicide is the most commonly cited reason for its use.24 The relationship between constant special observation and self-harm may be bidirectional, with self-harm leading to constant observation, which in turn reduces self-harm. Such bidirectional effects would obscure relationships in this cross-sectional study.

Little has been written about the use of intermittent observation. One source25 reports its successful use to reduce absconding rates, and another describes how constant special observation can be reduced by instituting documented intermittent checks on all patients.26 In a study of student psychiatric nurses27 an association is reported between approval of intermittent observation as a containment method and positive attitudes to patients. However, nurses interviewed in one study criticised it as being ineffective,6 and the National Confidential Inquiry into Homicides and Suicides has recommended that alternatives be developed.28 Our findings suggest that the use of intermittent observation may be an effective way to reduce self-harm. It ensures the regular presence of nurses all over the ward, and might provide opportunities for patient-initiated interaction at moments of distress or dysphoria. It could be that there is some intervening variable accounting for this link, although a wide range of potential candidate variables have been accounted for in our modelling exercise. As the study design is correlational, no firm causal conclusion can be drawn.

Our findings do not support the idea that staff attitudes or group factors have any impact upon self-harm rates on acute wards. Previous evidence had suggested that positive attitudes towards patients and the provision of an effective structure of rules and routines acted to reduce self-harm and other patient conflict behaviours.29,30 In our study, no relationship was found between staff attitudes and self-harm rates. The influence of staff functioning over rates of self-harm was supported by the finding that the availability of qualified nurses was associated with reduced self-harm rates (and the presence of student nurses or unqualified nurses with the reverse), but the variance partitioning exercise showed different levels of impact for different staffing variables, possibly indicating that other latent unmeasured variables may underlie these effects. The provision of patient activity sessions was strongly associated with lower levels of more severe self-harm, suggesting that an effective structure of routine for patients has a preventive effect.

The features of admissions that are associated with the rates of self-harm on wards include youth and non-schizophrenia diagnoses. This does not necessarily mean that it was the patients with these features, singularly or collectively, that self-harmed. It could equally well have been the impact of higher numbers of such patients on others and the ward atmosphere that triggered others to self-harm. Having larger numbers of people without schizophrenia probably indicates higher numbers with affective disorders of various types, which are also known to be associated with suicide and self-harm. The lack of an association of self-harm rates with numbers admitted for risk of harm to self is initially curious. However, 61% of all admissions were indicated as coming into hospital because of this risk, and it would appear that (a) the level of identified risk is so much higher than the frequency of the actual event that there is little association, and (b) staff also identified those who were a risk to themselves through cognitive disorganisation and self-neglect, thus reducing the predictive value of this variable.

The association of high proportions of admissions of people of Caribbean ethnicity with rates of self-harm is interesting, especially given the strength of the association. However, our sensitivity analysis of missing data on admissions indicates that some caution is called for with regard to the specific association with Caribbean ethnicity and self-harm, as this may simply represent a wider association between minority ethnic status and self-harm. In the univariate ward-level analysis, higher proportions of admissions of all minority ethnic categories were associated with raised rates of self-harm. Further complications were the association between patient and staff ethnicity, and the geographically localised presence of minority ethnic communities. There is an association between higher numbers of minority ethnic staff/patients and more self-harm; however, the precise nature of this link is difficult to determine from our data. This association has been found before in an ecological analysis of self-harm in the community, where raised rates were found among White people living in areas with large minority ethnic populations.31,32 This finding calls for more detailed research.

The Index of Multiple Deprivation for the localities from which patients were drawn was found to be inversely associated with self-harm, indicating that wards serving localities with lower levels of deprivation experience higher rates of self-harm. Previous research demonstrates positive associations between suicide and deprivation20,33,34 and between self-harm and deprivation.34,35 However, all these studies are of community populations rather than patients admitted with a mental illness. One study in Denmark showed that for admitted patients, there was a direct positive relationship between income and suicide.36 The similar finding in this study may be due to service organisation factors; for example, it is known that different districts vary tenfold in the numbers of people who are admitted to psychiatric care following a self-harm incident.37

A high volume of admissions to a ward (a high throughput) seems to have a negative impact, stimulating increased incidents of self-harm. This effect has been previously reported38 in a longitudinal analysis of admissions and adverse incidents. Some of this impact is likely to be due to new admissions arriving on the ward in a highly disturbed and acutely ill condition, and self-harming within the same shift. An alternative or additional interpretation is that new admissions might make the ward less predictable for existing patients, heighten anxiety and precipitate self-harm by others.

The associations found between self-harm and other conflict behaviours are not all easily explicable. The link with absconding might be indicative of patients leaving the ward and self-harming, and the link of more severe self-harm with aggression to objects might reflect the utilisation of objects in the act, for example a patient putting a fist through a window. The association with aggression to others may reflect a tie between inwardly and outwardly directed aggression by the same patients,39 or it may mean that aggressive behaviour within the ward heightens anxiety and other emotions within the ward community, stimulating self-harm. The link with aggression to objects has been reported by others.38,40 The association with refusal to see workers may suggest that patients withdraw from interaction, activities and staff prior to self-harming.

In this correlational study, the direction of causality cannot be established. This applies to the locking of the ward door, which may have been a consequence or an antecedent of self-harm. If locking the ward door did lead to increases in self-harm, this appears to be limited to more minor self-harm, as the association was not present in the moderate self-harm model. Strikingly, many of the other common security practices of acute psychiatry, such as the banning of harmful items, searches of patient property, and restrictions on patient activities or access to kitchen or bathing facilities, appeared to have no association with self-harm rates.

Strengths and limitations

The basic design of the multilevel modelling element of this study is correlational, therefore although associations between variables have been reported, the direction of causality cannot be concluded. However, many potential additional underlying or intervening variables were incorporated in the analysis.

The large number of variables entered in the modelling exercise means that some reported associations may be due to chance. This weakness is counterbalanced by the overall size of the data-set collected. In addition, the random selection of wards strengthens the external validity of the findings, and the use of multilevel modelling provides more accurate estimates of effects than other methods.

The ideal form of data for this study would have been comprehensive data on patients admitted and occupying the study wards, including rigorous diagnostic information and past patient history, coupled with end-of-shift reports indicating which patients had engaged in which conflict behaviours, or been subject to which containment measures. However, this was not practicable given the size of the study and other commitments of staff.

Despite the size of the data-set collected, there were few incidents of more severe self-harm. Moreover, even to conduct this subsidiary analysis, the criteria for more severe self-harm had to be set at an undesirably low level. As a consequence, the analysis conducted on this was less statistically powerful and less specific. The failure of some variables to show an association might be due to that diminished power, rather than there being no connection with severe self-harm.


The multilevel models suggest that the use of intermittent observation may act to reduce rates of self-harm. A positive association was found between self-harm and locking of the ward door; however, the direction of causality cannot be finally determined using this study design. The potential for positive effect on self-harm rates indicates the need for further research into the effects on patients and staff of door locking.

A large proportion of the variance between wards and trusts in self-harm rates is accounted for by the types of patient admitted, the localities served, and the throughput of patients. Of these patient features, the most striking is minority ethnic status, an association not previously reported. The findings do not support a strong role for staff factors in the determination of self-harm rates on wards, and no association was found with leadership, team functioning, attitude to patients, burnout or ward atmosphere. However, the presence of qualified nursing staff and the provision of patient activity sessions were both associated with lower rates of self-harm.

Wards and trusts providing few planned patient activity sessions, or using low rates of intermittent observation, should reconsider both their policies and their provision of resources to wards so that these may be increased.

The current policy drift towards smaller bed numbers and greater patient throughput seems likely to lead to greater levels of self-harm on wards, and may need to be reconsidered. There is a known problem in the interaction between the psychiatric services and minority ethnic communities in the UK,41 and it is now clear that this extends to rates of self-harm. Further research in these areas is a priority.


The research was funded by the National Institute for Health Research Service Delivery and Organisation Research Programme.

  • Received March 9, 2007.
  • Revision received November 8, 2007.
  • Accepted November 27, 2007.


View Abstract