Department of Psychiatry, Bioethics Program, and Center for Behavioral and Decision Sciences in Medicine, University of Michigan, Ann Arbor, Michigan
Department of Psychiatry, Columbia University, New York
Department of Psychiatry, University of Rochester, Rochester, New York
Department of Psychiatry, University of North Carolina, Chapel Hill, North Carolina
Department of Psychiatry, Duke University, Durham, North Carolina
Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts
Department of Psychiatry, University of California San Diego and VA San Diego Healthcare System, San Diego, California
Department of Psychiatry, University of Rochester, Rochester, New York, USA
Correspondence: Dr Scott Y. H. Kim, 300 North Ingalls, Ann Arbor, MI 48109-0429, USA. Email: scottkim{at}umich.edu
Declaration of interest D.C.G. is on the advisory boards of several pharmaceutical companies. Funding detailed in Acknowledgements.
|
|
|---|
Aims To assess a method for determining the categorical capacity status of potential participants in schizophrenia research.
Method Expert-judgement validation of capacity thresholds on the sub-scales of the MacArthur Competence Assessment Tool - Clinical Research (MacCATCR) was evaluated using receiver operating characteristic (ROC) analysis in 91 people with severe mental illness and 40 controls.
Results The ROC areas under the curve for the understanding, appreciation and reasoning sub-scales of the MacCATCR were 0.94 (95% CI 0.880.99), 0.85 (95% CI 0.760.94) and 0.80 (95% CI 0.700.90). These findings yielded negative and positive predictive values of incapacity that can guide the practice of investigators and research ethics committees.
Conclusions By performing such validation studies for a few categories of research with varying risks and benefits, it might be possible to create evidence-based capacity determination guidelines for most schizophrenia research.
|
|
|---|
In this study, we used the judgements of independent clinicians experienced in capacity assessments to address the following question: given that people with schizophrenia exhibit a range of decisional abilities, how can we use a standardised instrument to distinguish those who are capable from those who are incapable of informed consent? We asked the question in the context of a unique opportunity presented by a multisite clinical trial, funded by the National Institute of Mental Health, the Clinical Antipsychotic Trials of Intervention Effectiveness Schizophrenia (CATIE; Stroup et al, 2003), which used as part of its research protocol the most widely tested measure of decisional ability, the MacArthur Competence Assessment Tool Clinical Research (MacCATCR; Appelbaum & Grisso, 2001).
|
|
|---|
This study was approved by the research ethics committees (institutional review boards) of all participating institutions, and all participants provided written informed consent after full disclosure of study elements. The CATIE participants provided separate informed consent for this ancillary study. For the group with severe mental illness, as has been done in other studies of this kind (Moser et al, 2002; Stroup et al, 2006), given the low risk of this interview study, a relatively undemanding standard for capacity to consent was used.
Measures
Participants were videotaped during their assessment with the
MacCATCR (Appelbaum & Grisso,
2001). The MacCATCR has been extensively used in people
with schizophrenia (Carpenter et
al, 2000; Dunn et
al, 2002; Moser et
al, 2002; Stroup et
al, 2005) and people with major depression
(Appelbaum et al,
1999) and dementia (Kim et
al, 2001), and is a companion instrument to the MacArthur
Competence Assessment Tool for Treatment (MacCATT) (Cairns et
al,
2005a,b).
The MacCATCR contains pertinent disclosure elements of informed consent and is designed to be adapted to specific research protocols, to reflect the task-specific nature of decisional capacity (Appelbaum & Grisso, 2001). The version used in the CATIESchizophrenia study was used for all participants in this study; thus, the non-CATIE and control participants were asked to imagine being invited to participate in the CATIE study as their decisional abilities were assessed. This procedure is commonly employed in capacity research (Carpenter et al, 2000; Moser et al, 2002).
The MacCATCR is structured according to the four-abilities model of decision-making capacity (Grisso & Appelbaum, 1998). These include `understanding [emphasis added] of disclosed information about the nature of the research project and its procedures (13 items for a possible total score of 26 each item in the MacCATCR has a score range of 02 with objective scoring criteria); appreciation of the effects of research participation (or failure to participate) on subjects' own situations (3 items for a possible total score of 6); reasoning about participation (4 items for a possible total score of 8); and ability to communicate a choice (one item for a possible total score of 2)' (Appelbaum et al, 1999). Data on the ability to communicate a choice will not be discussed here as almost everyone received a full score. The MacCATCR does not provide a global score because requirements for each ability related to capacity can vary by jurisdiction and according to the decisional demands of a given study (Grisso & Appelbaum, 1995a). However, it is important to note that the four-abilities model is based on an extensive review of laws, court decisions and ethics literature, such that it provides a reasonable approximation of the standards for capacity broadly laid out in statutes. Thus, researchers have been able to use the MacCAT instruments to approximate, for example, the criteria of the Mental Capacity Act 2005 (Cairns et al, 2005a,b).
The final MacCATCR sub-scale ratings for all 131 participants used for analysis were made by J.S. During the course of the project, the principal investigator (S.K.) independently scored 36 out of the 131 interviews. This was the basis for calculations of interrater reliability. For those 36 participants, discrepancies arising after independent scoring of MacCATCR items by the two raters were resolved through discussion between the two raters. The intraclass correlation coefficients for total scores of MacCATCR subscales were 0.93 for understanding (F=29.3, d.f.=35.0, 35, P<0.0001), 0.89 for appreciation (F=16.7, d.f.=35.0, 35; P<0.0001), and 0.84 for reasoning (F=11.3, d.f.=35.0, 35, P<0.0001).
Psychiatric diagnoses were made using medical records and the Structured Clinical Interview for DSMIV (SCIDIV; First et al, 1997). Severity of psychiatric symptoms was measured using the Positive and Negative Syndrome Scale (PANSS; Kay et al, 1987), which includes positive, negative and general psychopathology sub-scales. Control participants were administered the SCID only.
Expert judgements
Three psychiatrists with experience in assessing decisional capacity (two
consultation psychiatrists and one board-certified geriatric psychiatrist)
were recruited to serve as expert judges and a fourth judge was added as a
back-up. The judges were prepared for their task by informing them of the
basic outlines of the CATIESchizophrenia study (the rationale, the
medications to be tested, the total number of participants to be enrolled, and
the fact that treatment failures would lead to rerandomisation with a new
study drug). They were told that their job was to render a categorical
judgement based on viewing an interview of a semi-structured capacity
assessment (but they were unaware of the actual MacCATCR scores). The
ultimate goal of deriving a final judgement was explained as: `Your task is to
review the tapes carefully and make a categorical judgement (definitely
capable, probably capable, probably not capable, and definitely not capable).
In the real world, decisions need to be made even if things aren't clear, as
we reduce complex clinical data into a yes/no judgement'. They also rated the
statement: `The videotaped interview gave me sufficient basis to make my
decision in this case' on a 5-point Likert scale ranging from `strongly
agree=1' to `strongly disagree=5.'
The final categorical status of each participants was determined by collapsing the `definite' and `probable' categories of the experts' responses to create a dichotomous variable and then using a majority (2 out of 3) or better (3 out of 3) agreement among the three expert judges to determine the final status (Kim et al, 2001). Owing to unavoidable circumstances, two of the three original raters were not able to complete all of the interviews. However, we had a back-up expert judge (a psychiatrist trained in both internal medicine and psychiatry, who primarily works with people with schizophrenia) whose scores were used whenever there was a missing judgement among the first three judges. The experts rendered their categorical judgements independently of one another.
Categorical capacity judgements were rendered for 101 participants: 90 with severe mental illness and 11 controls. The videotape for one participant with mental illness was not used because the sound quality was poor. Moreover, because of lower variance with higher performance (ceiling effect) in the comparison group, we only used 11 of the 40 tapes, including the two lowest scoring control participants. Expert judge 1 reviewed all 101 interviews, judge 2 reviewed 79 interviews and judge 3 reviewed 91 interviews. There were no participants who had a missing judgement from more than one judge. The back-up expert judge rendered judgements for 72 interviews and, of those, 32 judgements in which there was a missing judgement from either judge 2 or judge 3 were used in the final determination of capacity status (the back-up judge rated more than the 32 participants with missing ratings to assess reliability among all four judges).
The rationale and methodology for the expert judgement criterion method have been described elsewhere (Kim, 2006), including its advantages over an a priori cut-off criterion (Wirshing et al, 1998; Moser et al, 2002) and a psychometric criterion (Marson et al, 1995; Grisso et al, 1997; Schmand et al, 1999; Kovnick et al, 2003). Given that most societies look to clinicians' judgement about such decisions, expert judgement offers an arguably more valid standard against which to measure participant performance. Methodologically, expert judgement provides an independent assessment criterion, since the experts are not affiliated with the schizophrenia research studies.
Statistical analyses
Group comparisons of demographic, symptom severity and MacCATCR
summary data were conducted using parametric or non-parametric tests. Pairwise
and group kappa coefficients were calculated to assess categorical agreement
among expert judges. Receiver operating characteristic (ROC) analysis was
conducted to assess the test characteristics of each of the three subscales
(understanding, appreciation and reasoning) of MacCATCR against the
final categorical judgements made by the expert judges. To demonstrate how the
sensitivity and specificity data generated from the ROC analysis can be
applied to potential research scenarios, we calculated the positive and
negative predictive values (PPVs and NPVs) for a range of hypothetical prior
probabilities for three cut-off points on the understanding sub-scale.
Data were analysed using SPSS version 12.0 and Stata version 8.0 (both for Windows).
|
|
|---|
|
View this table: [in a new window] |
Table 1 Participants' characteristics and performance on the MacArthur Competence
Assessment ToolClinical Research
|
Performance on MacCAT-CR
Those with severe mental illness performed significantly worse than the
comparison group on the MacCATCR sub-scales (except for choice). Within
this group, the 55 participants from the CATIE study performed better than the
other participants on all sub-scales of the MacCATCR: understanding,
mean score (s.d.) 21.3 (3.7) v. 19.1 (5.6), t=2.1,
d.f.=54.7, P=0.04; appreciation, 4.1 (1.5) v. 3.4 (1.8),
t=2.0, d.f.=66.0, P=0.05; reasoning, 5.4 (1.5) v.
4.8 (2.3), t=1.4, d.f.=54.2, P=0.17. This is consistent with
our goal of avoiding spectrum bias by expanding the range of scores in the
group with severe mental illness.
Expert judgements
Of the 101 people reviewed, 25 (including 7 of the 55 CATIE participants)
were deemed probably or definitely incapable of consent. The pairwise kappa
coefficients among the four judges ranged from 0.56 to 0.90; the group kappa
coefficient for the four expert judges was 0.69 (Z=14.1,
P<0.001). When asked whether or not the videos provided a
sufficient basis for them to make their capacity determinations, the mean
rating ranged between strongly agree=1 and and agree=2 for three of the
experts, with mean (s.d.) ratings of 1.4 (0.9), 1.4 (0.8) and 1.9 (0.9), and
between agree=2 2 and neutral=3 for the remaining expert judge, whose mean
rating was 2.5 (1.1).
Predictive values of MacCAT-CR scores
Table 2 summarises the
sensitivity and specificity using various cut-off points on the three
sub-scales of the MacCATCR. The area under the ROC curve was higher for
the understanding sub-scale at 0.94 (95% CI 0.880.99) than for the
appreciation sub-scale (0.85, 0.760.94) and the reasoning sub-scale
(0.80, 0.700.90), indicating that MacCATCR scores, especially
for understanding, were significant predictors of categorical capacity status.
However, none of the sub-scales had a single cut-off score with a very high
sensitivity and specificity.
|
View this table: [in a new window] |
Table 2 Sensitivity and specificity of cut-off scores on sub-scales of the
MacArthur Competence Assessment ToolClinical
Research1
|
Sensitivity and specificity are features of tests, not populations, and cannot guide decisions without information about prevalence. For the purpose of determining the acceptable capacity scores that might be recommended (for instance to a research ethics committee reviewing a research protocol to be used to screen people with impaired capacity), the results of the ROC analysis were used to generate positive predictive values (PPV, the probability that a person found to perform at or below a MacCATCR sub-scale cut-off score will in fact be incapable) and negative predictive values (NPV, the probability that a person performing above the cut-point will in fact be capable), as shown in Table 3.
|
View this table: [in a new window] |
Table 3 Positive and negative predictive values for three potential cut-off scores
on the MacArthur Competence Assessment ToolClinical Research
understanding sub-scale, for a range of prevalence values
|
A high PPV implies a low false-positive rate (i.e. low likelihood of mistakenly excluding a capable person); a high NPV implies a low false-negative rate (i.e. low likelihood of mistakenly enrolling an incapable person). In determining what degree of decisional capacity to require of research participants, it would be undesirable to use a high cut-off score when prevalence is low (e.g. understanding score of 21 at 10% prevalence of incapacity) because 76% of persons excluded as too impaired will in fact be capable (given the PPV of 24%). Such a practice would not only be inefficient but also would unfairly exclude willing and capable persons from participating in research. It would also be ethically undesirable to use a low cut-off when the prevalence of incapacity is high (e.g. at 50% incapacity, almost a third of those who test capable will in fact be incapable given the NPV of 69%).
|
|
|---|
In the research context, informed consent disclosures are relatively consistent across participants, since the relevant information, including the riskbenefit ratio, is determined by the characteristics of a research protocol which is applicable to all potential participants. This is in contrast to the treatment context in which the procedures, risks, benefits and hence disclosures might be unique to each individual's treatment situation, and for whom the assessment of decision-making capacity requires individualised patient information (Cairns et al, 2005a). Further, whereas in the treatment context the welfare of the patient is the physician's paramount concern, in the research context, the investigator's priority is the advancement of science, thus increasing the need for a more transparent and objective process for determination of capacity. Therefore, the research context provides an opportunity as well as an imperative to create a standardised capacity assessment by using an assessment instrument that can be benchmarked against ethically appropriate, methodologically rigorous independent validation provided by experienced clinicians.
Objective determination of capacity for research
Our study establishes the feasibility of an objective assessment of
capacity for the research context. By validating the MacCATCR
sub-scales against an expert judgement standard, we can go beyond mere
descriptions of participants' performance on a scale. For example, given that
the prior probability of incapacity among those screened for the
CATIESchizophrenia study was probably quite low
(Stroup et al, 2005),
we can surmise that even a low cut-off score on the understanding sub-scale
such as 15 (which was in fact used by the CATIE study) would rarely include
people lacking capacity (e.g. at an estimate of 10% prevalence, there would
only be a 5% false-negative rate), with virtually no chance of mistakenly
identifying those with capacity as incapable.
Since most research studies could probably be categorised into a handful of categories in terms of their riskbenefit ratio (Maryland Attorney General's Research Working Group, 1998; National Bioethics Advisory Commission, 1998; New York Department of Health Advisory Work Group on Human Subject Research Involving the Protected Classes, 1999), a limited number of targeted validation studies would likely provide a sufficient evidence base for ethically appropriate yet efficient practice for a variety of research studies involving participants with impairment in decision-making. In effect, a series of tables such as Table 3 could provide a systematic and objective guide to research ethics committees and investigators.
How important is the fact that the subscales of the MacCATCR do not seem to have a single cut-off point that has both high sensitivity and specificity? First, it might simply be unrealistic to expect extreme precision and predictability from a standardised instrument when it is applied to making complex, value-laden judgements about a person's decision-making capacity. Second, this limitation might not be a problem as long as the purpose of assessing capacity is clear; for instance, one might focus more heavily on the PPV or the NPV, depending on the situation. So for an early-phase study of an invasive intervention likely to yield no benefit to the participants but which may pose some risk, the thresholds could be set high enough to eliminate any one who lacks capacity (false-negatives). Alternatively, if such a method eliminates too many potential participants as false-positives, a two-step approach could be used: a lower MacCATCR threshold to decrease false-positives and then individual in-depth capacity assessments to ensure that only competent persons are enrolled. As this last example illustrates, we believe that our method should be used flexibly to meet the ethical requirements of the situation, rather than rigidly adhering to a formula.
Strengths
To our knowledge, this is the most thorough validation against expert
judgement of capacity thresholds on a widely used capacity assessment tool for
clinical research, and the first such validation study involving people with
schizophrenia. This study has several strengths. The majority of participants
with severe mental illness were in an actual clinical trial. This group
exhibited a wide spectrum of impairment, allowing us to conduct a meaningful
ROC analysis. The expert judges achieved relatively high levels of non-chance
agreement, and they felt that in general they had sufficient information to
make the capacity determinations. The expert judges based their judgements on
video recordings of interviews, which provided more information than written
transcripts.
Limitations
There are however some limitations to the study and caveats. First, our
sample consisted of both CATIE participants and people with severe mental
illness who were not in the CATIE study. The latter were included to ensure a
sufficient spectrum of performance for the ROC analysis. Thus, no
generalisations regarding the relative performance of the subgroups should be
drawn from our data. The CATIE participants might have performed better on the
MacCATCR because they had a less severe illness but also because, being
involved in the study, the study protocol had been previously explained to
them in more depth.
Second, the experts' judgements were based on their viewing the taped MacCATCR interviews rather than performing their own independent assessments (which was not feasible in this multisite study involving multiple expert judges). Thus, our method is susceptible to incorporation bias that can falsely increase the accuracy of the test (Zhou et al, 2002). However, this limitation must be weighed against the following countervailing considerations. Currently, there is a lack of standardised procedures for capacity determinations. The MacCATCR covers the essential elements of a capacity assessment (Appelbaum & Grisso, 2001) and its standardised nature mitigates the variability of capacity assessments. In the absence of agreed procedures for capacity assessments, a criterion standard based on various experts' evaluations (even if it were feasible in a multisite study such as this) would involve a variety of methods, creating uncertainties regarding the nature of the standards used. Thus, although we cannot rule out the possibility of incorporation bias, we believe our results represent a reasonable balance between feasibility and validity.
Third, before the results of our study are generalised to other contexts, one must take into account the potential adverse effects of focusing on `cut-off scores' of capacity assessment instruments (Grisso & Appelbaum, 1996), especially the danger that the cut-off scores will be seen as inherent features of the assessment instrument (i.e. anyone scoring above a certain level has adequate capacity for any decision), rather than needing context-by-context validation and context-sensitive application. To avoid such misuse, any generalisation of our validation method must take into account two points.
First, the prevalence of incapacity in schizophrenia studies other than the CATIE study might be different for a variety of reasons. For instance, in studies that target people with refractory illness or those who are long-term in-patients (Kovnick et al, 2003), the prevalence rates will be higher and the estimation of PPV and NPV will need to take that into account. Second, any attempt to generalise our validation method to other schizophrenia studies must take into account the risk-sensitive nature of capacity thresholds (Brock, 1991; Grisso & Appelbaum, 1998; National Bioethics Advisory Commission, 1998). In our study, the expert raters made judgements regarding the level of capacity that was adequate for a relatively low-risk clinical trial. However, the riskbenefit ratio might be different for studies involving placebos, symptom provocation, or phase I tests of invasive interventions. The ROC curves for such studies might look quite different from those in this study.
Finally, the fact that 7 of 55 CATIE study participants were deemed to lack capacity by our experts needs to be interpreted with caution. Our CATIE sample was not intended to be representative of the overall CATIE study. The ratings of the expert judges were not available to the CATIE investigators at the time that they made their judgements regarding admission to the CATIE study. Further, a number of unique safeguards (Stroup et al, 2005), including independent participant advocates (Stroup & Appelbaum, 2003), were built into the CATIE project. Finally, in the absence of a true `gold standard' for determining categorical status, we are proposing the expert judgement-based method as a provisional criterion standard that needs to be further studied and improved (Kim, 2006).
Future directions
The results of our study provide an evidence-based decision framework for
how to use instruments for measuring decisional abilities to guide valid
categorical judgements about a potential participant's capacity to give
informed consent. We believe that as long as its limitations and caveats are
kept in mind, future research employing the framework provided in our study
could have important practical implications. By performing validation studies
for a few categories of riskbenefit situations, it might even be
possible to interpolate reasonable guidelines for most schizophrenia research
studies. Such an approach would make the crucial task of determining a
potential participant's capacity status much more transparent, objective and
evidence-based than it is today.
|
|
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. S. Appelbaum Assessment of Patients' Competence to Consent to Treatment N. Engl. J. Med., November 1, 2007; 357(18): 1834 - 1840. [Full Text] [PDF] |
||||
![]() |
Bibliography for Ethics, Professionalism, and End of Life Care Focus, January 1, 2007; 5(4): 417 - 419. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||