The British Journal of Psychiatry
Translation and cross-cultural adaptation of outcome measurements for schizophrenia


Background Research on the comparison of mental health services has identified the need for internationally standardised and reliable measurements.

Aims To describe the strategies adopted in the European Psychiatric Services: Inputs Linked to Outcome Domains and Needs (EPSILON) Study for the translation and cross-cultural adaptation of five European versions of the instruments.

Method A protocol was developed for translation of the outcome scales, describing each step in the translation procedure. Disputed items were discussed in focus groups, which faced seven tasks: a list of topics to be discussed; chossing where the group should meet; composition of participants ; conducting the group; data collection; data completion afterwards; reporting results.

Results Modifications made to instruments were: changes in the instrument structure, contents and concepts; adjustments to the instrument structure; and modifications to the instrument manual.

Conclusion Use of focus groups is an adequate method to apply if concepts, constructs and translation issues are to be addressed; otherwise, less timeconsuming methods should be considered.

Research on the comparison of mental health services has identified the need for internationally standardised and reliable measures which can describe and compare patients, services, costs and outcomes across language and cultural boundaries. Equivalent language versions of an instrument will make it possible to carry out multi-centre research and the meaningful comparison of results obtained in different countries (Sartorius & Helmchen, 1981).

Often mental health measurements and psychological tests have been developed for content, validity and reliability in one country or language exclusively. Some of these instruments are then used in different languages and cultural settings, but often without detailed attention to the cross-national and cross-cultural adaptation that is necessary. Few instruments have been produced in equivalent versions in different languages, thus ensuring, in addition to their validity and reliability, their cross-cultural applicability in the new setting (Sartorius & Kuyken, 1994). The methods involved and their difficulties have been well described by different authors (Simonsen & Mortensen, 1990; Sartorius & Kuyken, 1994; Gaite et al, 1997; Hutchinson et al, 1997).

The aims of the present paper are: (a) to describe the process of transferring instruments from one language or culture to another; (b) to describe the strategies adopted in the European Psychiatric Services: Input Linked to Outcome Domains and Needs (EPSILON) Study for the translation and cross-cultural adaptation, in order to develop European Versions (EU) of five key instruments: Involvement Evaluation Questionnaire (IEQ) (Schene & van Wijngaarden, 1992), Camberwell Assessment of Need (CAN) (Phelan et al, 1995), Verona Service Satisfaction Schedule (VSSS) (Ruggeri & Dall'Agnola, 1993), Lancashire Quality of Life Profile (LQoLP) (Oliver, 1991) and Client Service Receipt Inventory (CSRI) (Beecham & Knapp, 1992); and (c) to summarise the impact that the methods applied had on the development of the instruments.

The process included the following steps: (a) a proper translation process ; (b) cross-cultural verification and adaptation; (c) verifying the psychometric properties of the instrument in the target language. The first two elements of this process will be considered in this paper, which will describe in detail the use of focus groups as part of the cross-cultural instrument adaptation process in the EPSILON Study. The third step - verifying the psychometric properties of the instruments - is described in van Wijngaarden et al (2000) elsewhere in this supplement.


Sartorius & Kuyken (1994) describe four approaches to translation of instruments from a source to a target language, depending on the degree of conceptual overlap between the source and the target culture: (a) the ethnocentric approach (100% conceptual overlap); (b) the pragmatic approach (considerable conceptual overlap); (c) the emic plus etic approach (less conceptual overlap); and (d) translation not possible (when there is no conceptual overlap). ‘Conceptual overlap’ means the extent to which a concept has the same meaning in both cultures - for example, the concept of ‘corruption’ has intuitively very different meanings in different parts of the world, as have the concepts of health and illness.

The translation approach selected depends on the degree of conceptual overlap between the source and the target culture for the concepts in question. In a European context, there is considerable conceptual overlap in most measures regarding patients, services, costs, or outcomes in mental health services; therefore the pragmatic approach is suitable when an instrument developed in Europe is translated into other European languages and cultures.

Within each of these approaches, the aim of translation is to maintain, as far as possible the semantic or the linguistic, the conceptual, and the technical equivalence between the versions of the instruments in the source and target languages (Sartorius & Kuyken, 1994; Hutchinson et al, 1997). ‘Semantic equivalence’ means to retain a similar meaning of a measure in the source and in the target version, while ‘ conceptual equivalence’ refers to the need to obtain an identical meaning of concepts which may have different cultural understandings (for example the concept of ‘good mental health’) (Sartorius & Kuyken, 1994 ; Hutchinson et al, 1997). Finally, ‘technical equivalence’ refers to both the technical features of the languages (i.e. language complexity, question length, acceptable level of abstraction) and their relationship to the sociocultural context (the feasibility of the nature and mode of questioning used in the instrument in the source and target versions; for example: whether the questionnaire is applied as a self-rating questionnaire or as a structured interview).

The three language- culture-related equivalencies are key issues in a proper translation of instruments. They require the translators involved in the translation procedures to be highly qualified: they need to have good technical knowledge of both the source and the target languages, and full emotional understanding of the source and target languages; to be deeply involved in the cultures in question; to know about the cultural problems related to the concepts and terms used in the questionnaire (so as, for example, to avoid the use of stigmatising concepts); and to have integrated knowledge of the area and domains explored in the questionnaire.

Translators will meet these rigorous criteria to varying degrees. When a translator does not have all these characteristics, strategies can be devised to compensate for this, including interactive discussions with experts, monolingual panels not involved in the translation process, or other strategies specifically designed for the translation of health assessment instruments (Gaite et al, 1998: details available from the corresponding author on request).

However accurate the translation process, it will not necessarily guarantee that the instrument has been fully adapted to the target language, in the sense that the concepts and constructs incorporated in the instrument are fully applicable. To achieve this, specific quantitative and qualitative strategies need to be adopted: for example, concept mapping (Russell, 1988), pile sorting (Trotter & Potter, 1993), key-informant consensus meeting (Johnson, 1990) and focus groups. Here we will summarise the characteristics of focus groups, which was the method selected in the EPSILON Study.



The focus group (FG) interview is a procedure first described by Bogardus (Bogardus, 1926). It was initially used by commercial companies for market research, to develop and evaluate an extremely diverse range of products; to analyse target populations' wishes, views, problems, fears, beliefs and vocabulary; and to shape communication in advertising campaigns. More recently, it has also come into use in areas such as political campaigns and health education programmes ; and it has also proved to be useful in mental health research, by providing qualitative information for both qualitative and quantitative research designs (Room et al, 1996).

Focus group process

The FG interview is a qualitative research method. The interview derives from the formal group interview, which in its structure and method derives from group psychotherapy. It is a focused group interview or an arranged talk/communication among a selected group of people. Its aim is to uncover important dimensions of a given problem, experience, service or other phenomenon (Basch, 1987; Krueger, 1994; Bojlén & Lunde, 1995). The advantage of the FG is that by properly selecting the participants, and developing the outline of the FG session in accordance with the aims of the FG interview (exploratory, judgement, phenomenological) (Basch, 1987, p. 418), we can produce a wide range of information and potentially uncover important understanding of the problem to be addressed. It is possible to address instrument problems like the readability of a measure, the construct of the concept, or the understanding of the mental health care system, and at the same time address the issue of acceptability of the content of the questions (such as financial, religious or sexual issues).

Basch (1987) has outlined the key features of the FG interview: the role of the moderator, the physical setting, the psychological climate conducive to a successful FG session, proper selection of participants in accordance with the aim of the FG interview, instrumentation (development of discussion outline and questions to be asked), data collection and analysis, including a summary report on the findings.

It is recognised in group psychotherapy that the optimum size for a good working group is from six to ten participants, and the size of the group affects the nature of the data collected as well as the group structure. In general, it is thought that focus groups should be highly structured, with six to ten members, and with moderators controlling both the questions to be asked and the group dynamics. This approach is appropriate when the moderator knows what the key questions are.

Developing the discussion outline and the questions to be used by the moderator requires careful thought and a considerable amount of effort in planning. As in all questionnaire design, each item of the outline has a specific purpose. The data obtained in the FG interview can be analysed in different ways, depending on the method used for data collection (tape recorder, videotape, written notes); but in any case, in the final version of the report it is important to have a summary outlining the most important ideas and conclusions. Potential problems and technical issues are related to unbiased data reduction and the inferences to be drawn from qualitative data.



The five instruments selected for the translation and cultural adaptation process were originally developed in one of the countries participating in the research project, meaning that each instrument had to be translated into the four other languages.


A protocol was developed for the translation of instruments, describing the procedure for each step in the translation process.

  1. All five instruments were translated from their original language into the four target languages by professional translators who received information on the content of the instrument being translated. The translations from the original language into the target languages were made by translators whose native languages were the target languages and whose second languages were those of the original instruments.

  2. The translator and the research group discussed the first translation. This led to a revision of the translation and a list of disputed translation items.

  3. The translated instruments were then back-translated into the original languages by different translators, whose native languages were those of the original instruments and whose second languages were those of the target language. These also gave their reactions to the first translation and to the list of disputed translations.

  4. The back-translation was compared with the original version. Differences were discussed by the first translator and the researchers. This led to another revision of the translation and a list of disputed items to be considered in FGs.

  5. The focus groups' remarks were discussed between the researchers and one of the translators. Inappropriate and impossible items and sentences were revised. This led to the final version.

With some modification, this procedure has been followed in the translation of all five instruments.

Focus groups

The FG activities involved seven main tasks: (a) establishing the list of topics to be discussed for each instrument; (b) deciding where to hold the FG ; (c) the composition of FG participants; (d) conducting the FG session; (e) data collection during the FG session; (f) post-FG data completion; (g) reporting the results of the FG.

  1. Establishing the list of topics to be discussed for each instrument. The subject of the FGs was the translated version of each of the instruments. The topics discussed were translation adequacy, instrument applicability and the concepts of the constructs. The centre responsible for each instrument prepared a list of important issues to be discussed in the FG, based on problems raised during the translation process. Irrespective of this, FG participants could discuss other themes they considered relevant.

  2. Selection of where to hold the FG. The FGs were conducted in places where the sessions could be guaranteed not to be interrupted by external activities.

  3. Participants in FGs. The FG was composed of a moderator (often the group leader), a co-leader and the participants. The moderator had skills in facilitating effective group functioning and was either a psychiatrist or a psychologist. The co-leader assisted the leader in taking notes of relevant issues during the session, collaborated in the analysis of the group material and provided other perspectives. Six to ten participants were selected for each FG, and at each site separate FGs were held for each questionnaire. Participants represented the different categories of people involved in the delivery and receipt of care: both men and women; doctors, nurses, social workers, patients, relatives, and also (depending on the instrument in question) local administrators, social workers, general practitioners and psychologists. The composition of the group was decided taking into consideration each of the topics selected for analysis: for example, administrators and social workers were considered important participants in the FG for the CSSRI-EU.

  4. Conducting the FG session. A specific guideline was developed for each of the instruments, guiding both the questions and the issues to be raised during the FG, and the composition of the FG; FGs were highly structured, and lasted for about 1 1/2 hours.

  5. Data collection during the FG session. The moderator and co-leader took notes at each FG; they noted, among other things, the participants, late arrivals, relationships to other participants, the start time, the list of questions asked, the time when each major issue or question was asked; and recorded the major probe questions used and the actual arrangement (seating) of participants in the session. Comments made by the moderators about each person's participation and notes on session processes included, for example, notes on who was speaking, notes on the tone of the session and any problem areas that occurred and how (if at all) they were resolved. The FG process was recorded differently in each country. In Italy, for example, sessions were videotaped; while in Denmark two reporters were included in the group to take the notes.

  6. Post-FG data completion. At the end of the session the moderator and the co-leader (including the reporters) completed the notes taken during the FG. The notes included an overall assessment of the session, its strengths and weaknesses, notes on key issues that were raised and notes about individual participants and their contributions, as well as any other useful or relevant information.

  7. Reporting the results of the FG. Immediately after the FG process, we made a report in the native language, and later the key issues of the report were conveyed in a report in English for discussion in the international research group (Table 1).

View this table:
Table 1

Reporting on the focus groups

Structure of the report

Under ‘Items’, the report deals with linguistic problems uncovered during the discussion, opinions about the applicability and relevance of the items, topics arising due to overlapping of items present in different areas of the instrument, and suggestions about items that the participants consider should be included.

‘Topics’: during the FG, a number of general topics in connection with the instrument were discussed by the participants, illustrating differences between participants in the interpretation of different subjects.

Reports were designed to :

  1. Isolate the major themes of each FG in relation to the problems being explored. It is important to recognise categories, themes, issues, and explanations from the descriptions made by the participants in the session.

  2. Ensure that, after the information from the interview had been sorted into categories, it could be compared. This way we could identify possible categories or sets of ideas emerging from the data.

  3. Draw conclusions; once the general patterns are described, a picture of prevailing beliefs, opinions, attitudes and explanations regarding each particular instrument or area of the instrument were taken into account.

Impact of the focus group on instrument development

The results of these strategies in converting the instruments to the different languages/cultures influenced the development of the instrument concepts and constructs. The modifications to the different instruments can be categorised as: (a) changes in the instrument structure, contents and concepts; (b) adjustments to the instrument structure; (c) modifications to the instrument manual. In this paper we shall summarise the most relevant overall decisions adopted in each of these categories, while details are reported in the papers for each instrument.

Changes in the instrument structure, contents and concepts (IEQ-EU and CSSRI-EU)

Involvement Evaluation Questionnaire (IEQ-EU) (van Wijngaarden et al, 2000, this supplement)

The instrument was subjected to focus groups in Denmark, England, Italy and Spain. The conclusion was that the instrument covers the main domains of family burden well. There were some problems with items regarding education, type of professional help, income categories, and drug use; and the response categories were discussed. The instrument has been adjusted in accordance with comments received; the response categories, however, remained unchanged, because otherwise comparison with earlier research would become difficult. The items on psychological distress were taken out and the General Health Questionnaire, 12-item version (Goldberg & Hillier, 1979) was included to describe general well-being.

Client Socio-Demographic and Service Receipt Inventory (CSSRI-EU) (Chisholm et al, 2000, this supplement)

The instrument underwent major changes as a consequence of the FGs in order to enable comparisons to be made between different countries' health care and social welfare systems. The diversity in organisation of the welfare systems in the participating countries made it especially difficult to find a common language in this area of the questionnaire. The many comments from the FG were an important input in solving these problems. In addition, internationally comparable concepts for describing individual socio-demographic variables were added to the instrument.

Adjustments in the instrument structure (VSSS-EU)

Verona Service Satisfaction Schedule (VSSS-EU) (Ruggeri et al, 2000, this supplement)

The instrument was discussed in FGs in Denmark, England, Holland and Spain. The instrument was considered acceptable in all countries and underwent relatively minor modifications. There were some problems regarding the grouping of professional staff, such as psychiatrists and psychologists, and nurses and social workers, caused by the different structures of the mental health care systems. These problems were solved by asking questions regarding each professional group separately. Issues related to the organisation of health care and social welfare were clarified, and translation issues were taken into consideration.

Modification incorporated in the instrument manuals (LQoLP-EU and CAN-EU)

Lancashire Quality of Life Profile (LQoLP-EU) (Gaite et al, 2000, this supplement)

The instrument was used in a focus group process in Denmark, Italy, Holland and Spain, which gave rise to a lengthy discussion regarding its suitability, arising from earlier psychometric analyses of the instrument. All countries agreed that the instrument is the most comprehensive quality-of-life instrument available in the field of mental health services research.

Camberwell Assessment of Needs (CAN-EU) (McCrone et al, 2000, this supplement)

The instrument was in a focus group process in Denmark, Italy, Holland and Spain. The overall views were mixed, and there were many suggestions for additional items, although there was consensus on only a small number of such additional items. However, it was decided not to change the instrument as it has already been used in many countries. Instead, the ‘missing items’ were addressed in a revised manual for the instrument. Translation issues raised through the FGs have been included in each new language version.


International comparison of mental health services has made the need for internationally standardised instruments clear. In recent years interest in the problems of translation and cross-cultural adaptation of health and service outcome measures has grown considerably (Simonsen & Mortensen, 1990; Sartorius & Kuyken, 1994; Gaite et al, 1997; Hutchinson et al, 1997). The main concern in this process is to ensure semantic, conceptual and technical equivalence between the versions of the instrument in different languages (Sartorius & Kuyken, 1994). Meadows et al (1997) have suggested that this set of criteria be considered as a minimum in the adaptation and use of instruments in cross-cultural studies, much in line with Sartorius and Kuyken (1994). Meadows et al (1997) also add (as an intermediate set of key issues) criterion equivalence and content equivalence (when each item of the questionnaire describes a phenomenon relevant to both cultures); and they suggest the use of FG interviews to evaluate the semantic equivalence of the adapted instrument.

This paper has presented the methods used in the EPSILON Study to adapt nationally developed instruments into internationally applicable measurements. We chose five instruments which have already been used in research in several European countries. These five instruments were translated in accordance with strictly defined rules of the translation process.

In the outline of the FG interviews for each instrument, we included questions and probes related to an identified list of disputed translated issues (semantic equivalents), the readability of the instrument (technical equivalence) and the construct of the concepts (concept equivalence), in accordance with the minimum criteria described. We included questions to the FGs regarding the content of instruments, i.e. did the items of the questionnaire describe a phenomenon relevant to the culture and should other items be added to describe relevant phenomena ? All instruments were modified to improve semantic and technical equivalence, while the modifications made to improve conceptual equivalence varied, depending on the extent to which the instrument had already been accepted for use in international research.

To our knowledge, this is the first crossnational study reporting on the use of FGs as a method in the process of converting instruments into internationally comparable measurements, to assess the semantic, conceptual and technical issues in existing, pre-selected instruments. However, FGs of this type are not in general different from other FGs used to identify thoughts, beliefs and feelings; and this method shares, in general, the same advantages and disadvantages as other qualitative research methods (Trotter, 1991; Room et al, 1996).

One of our main concerns in using FG interviews in instrument development has been the question of the reliability and generalisability of the information gathered. The careful selection of participants, representing different positions in mental health care systems, and also representing different gender and socio-demographic groups, helps to generalise the results. However, only people living in cities participated in the FGs in this study; we have no information on the extent to which this population is also representative of people living in the rural areas in Europe.

In the structured FG sessions, the moderator plays an important role in ensuring that the information gathered is representative of the participants present. The FGs require a setting that will encourage a trusting, comfortable and secure atmosphere, so that potentially vulnerable contributors (for example, patients and relatives) do not withdraw themselves from the process; also, it is important to prevent dominant members of the group from determining the content of the discussion. The moderators in the EPSILON FGs were trained in group sessions, and were either psychologists or qualified psychiatrists, which we found enabled us to run the groups so as to establish the necessary atmosphere.

The instruments in the EPSILON Study were at different stages of development for international use. One instrument's main domains were defined (CSSRI-EU), while the internationally comparable concepts and constructs were to be developed (Chisholm et al, 2000, this supplement); another instrument (IEQ) was used in some European studies; reconstruction of the sequence of some questions and constructs (the construct of general well-being) of the instrument was, however, considered as an important improvement to the instrument and accepted by the instrument developers. Finally, three instruments were already in frequent use in European research (CAN, LQoLP, VSSS); changes to their concepts and constructs were considered less appropriate by the developers, because international use of the instrument was already widely accepted.

Organising and running the FGs is a time-consuming process, with many participants and with many parties involved, both in the FG process and in the discussions of the FG results and reports. It is our experience that this time is well spent if the instruments are still in their developmental phase, or have not been used internationally very often. In these situations, FGs contribute important information regarding concepts, construct and language, which is crucial to the development of the instruments for international use (CSSRI-EU, IEQ-EU). For instruments already used extensively in international research, and where major changes in the instruments are considered less appropriate, FGs bring less benefit. Most of the improvements in these instruments were related to semantic and technical equivalence; less time-consuming methods targeting these problems - like monolingual panels and expert groups - should be considered.



During the FG sessions we obtained valid information about the problems we wanted to be discussed. The FG interview was a structured and creative process, producing information on the applicability of the instruments in different cultures and different health care systems. The extent to which instruments were adjusted in accordance with FG results varied; as an alternative, sometimes comprehensive manuals were developed, to clarify problems which might arise in the use of instruments.

Based on our experience using FGs, we suggest that researchers involved in the process of developing instruments for international use first consider :

  1. to what extent the instrument in question is already used in international research;

  2. to what extent changes in the concepts and constructs of the instrument are accepted by the instrument's developers;

  3. the choice of appropriate methods which specifically target the issues in question.

The FG process is an adequate method to apply if concepts, constructs and translation issues are to be addressed. Otherwise, less time-consuming methods should be considered.


Approaches to the development of cross-culturally applicable questionnaires (Hutchinson et al, 1997)

  1. Sequential approach: translation and performance evaluation of an existing instrument

  2. Parallel approach: international conceptualisation of measurement and selection of content

  3. Simultaneous approach: international conceptualisation of construct around which the cross-national core item set is developed, where each nation or culture develops its own specific content.

Approaches to translation of questionnaires already developed (Sartorius & Kuyken, 1994)

  1. Ethnocentric approach: 100% conceptual overlap between source and target culture

  2. Pragmatic approach: considerable conceptual overlap (for example: European context)

  3. Emic plus etic approach: some degree of conceptual overlap

  4. Translation impossible: no conceptual overlap.

Translation and types of equivalence (Sartorius & Kuyken, 1994 ; Hutchinson et al, 1997)

  1. Conceptual equivalence: refers to same concepts underlying the questions in both source and target languages (review of cross-cultural literature or factor analysis)

  2. Semantic equivalence: denotative (what the word indicates or is a sign for) and connotative (what is the primary meaning of the word) sameness; use synonyms to identify the semantic space

  3. Linguistic equivalence: when the item in the target version has a similar meaning to that in the source version; retaining the functional equivalence. The same as semantic equivalence

  4. Technical equivalence: equivalence of technical features of the languages and their relationship to the socio-cultural context (language complexity, question length, acceptable level of abstraction); and feasibility of the nature and mode of questioning of the instrument in the source and target versions (for example, the method of applying the questionnaire: paper and pencil, or semi-structured or structured interview).

Cross-cultural equivalence

  1. Content equivalence: when each item of the questionnaire describes a phenomenon relevant to both cultures

  2. Semantic equivalence: concerned with retaining the meaning of each item

  3. Conceptual equivalence: the validity of the concepts in both cultures

  4. Technical equivalence: concern as to whether the mode in which the data collection is carried out affects the result differently in different cultures

  5. Criterion equivalence: akin to criterion validity, which is an instrument's relationship to independent criteria of the same phenomena.


The following colleagues contributed to the EPSILON Study. Amsterdam: Dr Maarten Koeter, Karin Meijer, Dr Marcel Monden, Professor Aart Schene, Madelon Sijsenaar, Bob van Wijngaarden; Copenhagen: Dr Helle Charlotte Knudsen, Dr Anni Larsen, Dr Klaus Martiny, Dr Carsten Schou, Dr Birgitte Welcher; London : Professor Thomas Becker, Dr Jennifer Beecham, Liz Brooks, Daniel Chisholm, Gwyn Griffiths, Julie Grove, Professor Martin Knapp, Dr Morven Leese, Paul McCrone, Sarah Padfield, Professor Graham Thornicroft, Ian R. White; Santander: Andrés Arriaga Arrizabalaga, Sara Herrera Castanedo, Dr Luis Gaite, Andrés Herran, Modesto Perez Retuerto, Professor José Luis Vázquez-Barquero, Elena Vázquez-Bourgon; Verona: Dr Francesco Amaddeo, Dr Giulia Bisoffi, Dr Doriana Cristofalo, Dr Rosa Dall'Agnola, Dr Antonio Lasalvia, Dr Mirella Ruggeri, Professor Michele Tansella.

This study was supported by the European Commission BIOMED-2 Programme (Contract BMH-14-CT95-1151). We would also like to acknowledge the sustained and valuable assistance of the users, carers and the clinical staff of the services in the five study sites. In Amsterdam, the EPSILON Study was partly supported by a grant from the Nationaal Fonds Geestelijke Volksgezondheid and a grant from the Netherlands Organization for Scientific Research (940-32-007). In Santander the EPSILON Study was partly supported by the Spanish Institute of Health (FIS) (FIS Exp. No. 97/1240). In Verona, additional funding for studying patterns of care and costs of a cohort of patients with schizophrenia were provided by the Regione del Veneto, Giunta Regionale, Ricerca Sanitaria Finalizzata, Venezia, Italia (Grant No. 723/01/96 to Professor M. Tansella).


View Abstract