The British Journal of Psychiatry
Understanding the roles of genome and envirome: methods in genetic epidemiology


Background In order to understand studies of psychiatric epidemiology focusing on the ‘genome’ and ‘envirome’, basic knowledge of the logic and methods is necessary.

Aims To provide a review of typical methods used in genetic epidemiology.

Method Reviews of the research designs usually employed in quantitative and molecular genetic studies. Genotype—environment correlation and interaction are also discussed.

Results Quantitative genetic studies indicate that genetic influences are important for both psychiatric disorders and behavioural traits. Specific gene loci can be tested for associations with both psychiatric risk and behavioural traits by means of molecular genetic techniques. There has been little examination of genotype—environment correlation and interaction, although the few reports that have appeared suggest that these complex relationships are important.

Conclusions Advances in quantitative and molecular genetics now permit more careful examination of genotype—environment interaction and correlation. Studies combining molecular genetic strategies with measurement of the environment are still at an early stage, however, and their results must be awaited.

In order to understand studies of psychiatric epidemiology that focus on the roles of the ‘genome’ and ‘envirome’, one must first have a basic knowledge of the logic and methods employed in such studies. The term genome refers to the totality of a species' genes, or DNA sequences. It is becoming common practice to use this term as shorthand for genetic propensity, or genetic influence, in this context, as molecular genetic techniques are employed by epidemiologists to understand the role of genes in causation of psychiatric disorders. The term envirome was first coined by Anthony, Eaton & Henderson (1995) to refer to the totality of equivalent environmental influences; it includes predisposing factors as various as type of neighbourhood, family income, intra-uterine exposure to teratogens such as maternal cocaine misuse, and exposure to radiation, and provoking environmental factors that can act as triggers of psychiatric disorders, such as crises in personal relationships and social stressors. A more detailed discussion is provided by J. C. Anthony (2001, this supplement).

This review is intended as an introduction to methods that can be used in psychiatric epidemiology to examine genetic influences and the roles of genome and envirome in combination. Genetic epidemiological study designs include behavioural genetic strategies that focus on ‘anonymous’ genetic and environmental influences, and molecular genetic designs that seek to locate and identify genes associated with specific psychiatric disorders. In the context of describing these research designs, the methods employed in disentangling genetic and environmental effects are discussed, followed by an explanation of the complex patterns of genetic and environmental influences, specifically genotype—environment correlation and interaction. More comprehensive reviews of genetic methodology, theory and epidemiology can be found elsewhere (e.g. Sham, 1996; Plomin et al, 1997).


Quantitative genetic designs

Three approaches are typically employed in quantitative genetic research: family, twin, and adoption studies (Table 1). Each has distinct advantages, and corresponding disadvantages that may be partially addressed by combining designs.

View this table:
Table 1

Summary of designs employed in genetic epidemiology

Family studies

The family study first identifies individuals with a given psychiatric disorder (the probands), then assesses their relatives for evidence of that and also other psychiatric disorders (cf. Kendler, 1997; Merikangas & Swendsen, 1997). One of the main advantages of using the family history is the ease of sample collection and assessment. Strictly speaking, family designs can be used for determining whether a trait or disorder is familial in nature, but not for specifying whether any apparent familiality is genetic or environmental in origin. Firstdegree relatives (parents, siblings and offspring) are most commonly assessed, and those not so closely related (uncles, aunts, cousins, grandparents) less often. Because first-degree relatives share both genes and environment, it is impossible to disentangle genetic from environmental risk factors in designs of this nature. When family members other than first-degree relatives are also assessed, it may be possible to gain clues to the relative contributions of the different factors, although the results will still not be conclusive. For example, if parents, siblings and offspring all manifest an increased risk for the same psychiatric disorder as the probands, but second- and third-degree relatives do not, environmental factors may be more important than genetic ones.

Twin studies

Twin studies take advantage of the natural experimental design of identical and fraternal twins — see Kendler (1993) for a review of twin studies in psychiatric epidemiology. Identical or monozygotic (MZ) twins result from the splitting of a fertilised egg into two genetically identical individuals. Fraternal or dizygotic (DZ) twins are the result of two eggs fertilised by two different sperm and are no more similar, genetically, than ‘ regular’ siblings who share 50% of their segregating genes, on average. Because MZ twins are twice as similar genetically as DZ twins (100% v. 50%), MZ twin pairs will be twice as similar phenotypically as DZ twin pairs, if genetic influences are paramount.

Two types of environmental influence can also be identified in twin designs: shared environmental factors (i.e. all nongenetic factors that cause family members to be similar) and non-shared environmental factors (i.e. all non-genetic factors that cause family members to differ from one another — including measurement error). If both MZ and DZ twin correlations are substantial and do not differ by zygosity, then shared environmental influences are indicated. Non-shared environmental influences are indicated by MZ twin correlations of less than 1.0. Any phenotypic dissimilarity in MZ twin pairs reared in the same home must be owing to non-shared environmental factors.

In genetic research, degree of twin similarity is usually reported in terms of concordance rates, calculated separately for MZ and DZ subgroups. Two types of concordance rates are commonly computed: the pairwise and the probandwise. In calculating pairwise concordance, twin pairs are each counted as one unit, and the rate is the proportion (percentage) of pairs in the study sample in which both twins have manifested the disorder in question. In probandwise concordance rate each affected twin counts as a unit, and the rate is the proportion of affected twins whose co-twins have also manifested the disorder (see, for example, Merikangas & Swendsen, 1997). In other words, each concordant pair counts as two units, both in the numerator and the denominator. Concordance rates can be interpreted in much the same way as intraclass correlations, but can also be used to examine degree of twin similarity for dichotomous categories such as absence or presence of a psychiatric diagnosis (Plomin et al, 1997). If MZ and DZ twin concordance rates are similarly high, genetic influences are not indicated but shared environmental influences are. If, on the other hand, concordance rates vary according to the degree of genetic relatedness, then genetic influences are likely to be important. For example, an MZ twin concordance of 46% and DZ twin concordance rate of 14% have been taken to indicate that genetic factors are of primary importance in explaining twin similarity for schizophrenia, at least for one sample (Moldin & Gottesman, 1997).

Twin samples are relatively easy to recruit because many parents of twins, and indeed many twins themselves, believe that there is something special about the twin status and are usually willing or even eager to participate in research studies. This is an important advantage but can also be considered a weakness, because if probands and parents are convinced that there is something special about their status, their behaviour patterns and the degree of similarity between them may, as a result, be influenced in ways that make them untypical of the background population — see Kendler et al (1995a) for an examination of this issue in regard to psychiatric illness. One method of ascertaining the representativeness of twin samples is to compare the means and standard deviations on established standardised measures with those of non-twins drawn from the same population. Generally, twin samples have been found to be comparable on such measures with non-twin samples. A related assumption in twin studies is that environmental influences on twins are no different from those for non-twins and, moreover, that they do not differ for MZ and DZ twin pairs — the equal environments assumption. Numerous studies have addressed this concern (see Hettema et al, 1995, for a review), usually with results that support the validity of the assumption; none the less, the concern remains and should be examined empirically in twin studies when possible.

Adoption studies

Adoption studies provide one of the best methods of understanding the impact of the environment on psychiatric disorders. There are several different types of adoption study design: in the most usual, adoptees are compared with respect to morbid risk both with their adoptive parents and with their biological parents. A high relative risk manifested by both adoptees and adoptive parents is evidence in favour of shared environmental influences, since most adoptees share no genetic inheritance with their adoptive parents. Association of risk between adoptees and their biological parents, on the other hand, points to genetic influences, since the adoptees and their birth parents share 50% of their genes, but not their environment. The estimates of genetic and environmental influences in adoption designs rely on an absence of selective placement. In other words, it is assumed that adoptees are placed with adoptive families who are only randomly similar in relevant characteristics to their families of birth (e.g. in intellectual ability). The impact, if any, of selective placement of the adoptee can be examined if both adoptive and biological parent data are available. Studies that have examined samples of individuals adopted at birth have found little evidence for effects of selective placement (DeFries et al, 1994).

Combination designs

Twin and family studies can be combined in a single research design. For example, when twins and their siblings are included in the same sample, it is possible to begin to disentangle special twin effects from genetic and environmental contributions to the behaviour of the twins. One shortcoming of such a combination, however, stems from the fact that members of a twin pair are the same age, whereas siblings of twins are at different ages. This difficulty can be partly resolved by restricting the size of age differences between twins and siblings included in the study. Twin samples can also be extended by including siblings from other families (e.g. full siblings, half siblings) in the same design. A few studies have attempted to extend twin designs through the inclusion of additional sibling types (see, for example, Reiss et al, 1995; Losoya et al, 1997; Jacobson & Rowe, 1999). Generally, findings from such studies yield heritability and environmental effects that are more similar to those from twin studies than to those from adoption studies. This supports the validity of generalising findings from twin samples.

Twin and adoption designs can also be combined (e.g. Bouchard et al, 1981; Langinvainio et al, 1984; Pedersen et al, 1991), although the chance of obtaining new samples of twins adopted apart at birth is becoming more and more unlikely. In general, the findings from studies that have taken advantage of the natural experimental design of twins adopted apart at birth are similar to those from samples of twins reared together. Adoption and family studies can also be combined by collecting data on the adoptive siblings of adoptees (see, for example, Plomin et al, 1988). Whether the sibling is genetically related to the adoptive parents or not, he or she will usually be genetically unrelated to the adoptee. If a matched, non-adoptive, control sample of siblings is also assessed, sibling similarity for genetically unrelated sibling pairs can be compared with 50% genetically similar full sibling pairs. The logic is the same as for the twin design: if genetically related sibling pairs are more similar than genetically unrelated adoptive sibling pairs, then genetic influences are suggested. Any similarity between adoptive siblings, however, is evidence of shared environmental influences. Given the difficulty of obtaining adoptive and twin samples, adding siblings as they occur naturally in such families is a relatively low-cost method that adds substantially to the power of the studies and the conclusions that can be drawn from them.

Molecular genetic designs

Two primary strategies are employed in conducting molecular genetic studies in genetic epidemiology and psychiatric genetics: allelic association and linkage analysis (Table 1). These strategies are interrelated, both being dependent on the use of DNA markers involving polymorphisms (variations) in DNA. Because thousands of these DNA markers are now available, it is possible to use them to locate a gene causally connected with a given disorder, without any prior clues to the gene's mode of operation; that is, ‘anonymously’. A brief overview is given below, with particular reference to two issues of concern in this field: continuously distributed v. categorical measures, and single genes with large effects v. multiple genes with small or modest effects. Comprehensive reviews of the molecular genetic techniques typically used in psychiatric genetics can be found elsewhere (e.g. Lander & Schork, 1994; Plomin et al, 1994a).

Allelic association

Allelic association rests on the assumption that if a gene influences a trait, individuals who share a particular allelic variant of that gene should be more similar in respect of the trait in question that individuals with different alleles. Correlation between a DNA marker and a measured characteristic is computed for a population of unrelated individuals; that is, association studies, unlike linkage studies, are based on comparison of subgroups within unselected populations (Lander & Schork, 1994). Association studies are most appropriate when a trait is quantitatively distributed and is likely to be influenced by multiple genes of varying effect size as well as by environmental factors. Indeed, the main advantage of allelic association is its power to detect quantitative trait loci that have small effect sizes (Owen & McGuffin, 1993; Risch & Merikangas, 1996; Plomin et al, 1997).

If a DNA marker is located close to a functional quantitative trait locus (QTL) on the same chromosome, alleles for the two loci will only rarely be separated by recombination even after many generations, resulting in so-called ‘ linkage disequilibrium’. For example, with a recombination fraction of 0.01 (about 1 cM or 1 million base pairs' distance) the ‘ half-life’ of an association can be estimated as about 70 generations or 2000 years (Morton, 1998). This could result in finding a positive association with a gene allele that does not itself cause the trait, but is in linkage disequilibrium with the actual cause.

One way to maximise the power of association is to focus on candidate genes or gene markers with known functional polymorphisms, rather than on anonymous DNA markers. Candidate gene allelic association studies are most straight-forward when the DNA marker is itself the functional polymorphism that affects the trait. A classic example is the association between a functional polymorphism for the apolipoprotein genes and normal variation in serum cholesterol levels (Sing & Boerwinkle, 1987; Kessling et al, 1988). Although most DNA markers, especially short sequence repeat markers, tend to reside in non-coding regions of the genome, functional polymorphisms are increasingly becoming available and have yielded several associations with quantitative behavioural traits — for example the dopamine D4 receptor gene with noveltyseeking (Benjamin et al, 1996; Ebstein et al, 1996). However, a valid criticism of such candidate gene association studies is that they have only a small chance of detecting most QTLs because any of the many thousands of genes expressed in the brain could be considered as a candidate gene for common forms of human behaviours.

Allelic association studies have a number of advantages (Risch & Merikangas, 1996). Because they are conducted on unrelated individuals, the problem of collecting large samples of genetically related individuals does not arise. These studies allow the use of existing data-sets, and replication is much more straightforward. There are, however, two corresponding limitations:

  1. False positives can arise owing to population stratification. Because association relies on group differences in allele frequencies rather than the inheritance of DNA markers by relatives, it may be biased by variations in the genetic make-up of the population that have nothing to do with the target disorder (e.g. ethnic groups may differ in gene frequency). Problems associated with false positives owing to population stratification can, however, be overcome by means of within-family comparisons (Falk & Rubinstein, 1987; Spielman et al, 1993; Ewens & Spielman, 1995; Schaid, 1996; Allison, 1997).

  2. Association is limited by the ability to identify promising candidate genes. While this limitation is more difficult to overcome, new genes are being identified at a rapid pace and the functions of these new genes, as well as previously identified ‘anonymous’ genes, are also being uncovered rapidly.

Linkage analysis

Linkage strategies rely on the fact that genes located close to one another on the same chromosome are inherited together (i.e. are not recombined during meiosis); hence a DNA marker that is close on the chromosome to a specific allele will be inherited with that allele within a family. The basic principle of linkage analysis is that if a gene influences a given characteristic, family members who share the same allele for the DNA marker will be more similar for that characteristic than others who do not share the allele. This is tested by correlating the extent to which family members share alleles for a particular DNA marker and their similarity for a particular characteristic. Significant correlations imply that there is linkage between the characteristic and the DNA marker.

It is also possible to consider genetic contributions quantitatively, using the model of quantitative trait loci cited above (Plomin et al, 1997). When QTL designs are employed the power of detecting a linkage is increased, although the gene must still account for at least about 10% of the total phenotypic variance of a characteristic in order for linkage to be detected. Although the effect size of the genes must be fairly large to be detected using linkage techniques, they need be only within a few million base pairs of the marker.

Comparison and evaluation of association and linkage studies

Because linkage analysis looks within families rather than within populations, it can detect a DNA marker 10 million — or even 20 million — base pairs distant from the functional polymorphism. As a result, linkage requires only a few hundred markers for a systematic screen of the genome, whereas thousands of markers spaced at 1 cM or less would be needed to screen the genome systematically for allelic association. Linkage analysis is thus systematic but not powerful, while association analysis is powerful but not systematic. Linkage analysis cannot be made much more powerful, at least with realistic sample sizes, but association can be made more systematic by using appropriate candidate genes. The two techniques are complementary to one another, although association is likely to prove the more useful for identifying QTLs of small effect size. Because most psychiatric disorders are complex in nature, it is to be expected that the genetic contribution will prove to be through many genes of small effect rather than through a single major gene effect.


The emphasis of the methods described above is primarily on genetic and environmental contributions to psychiatric disorders where these are thought to operate more or less independently from one another. It is, however, likely — given the complexity of the development and expression of these disorders — that the functions of genome and envirome are intertwined and interrelated. There are two principal ways of understanding how genetic and environmental factors may operate together: genotype-environment correlation and interaction.

Genotype—environment correlation

Three types of genotype—environment correlation are usually distinguished: passive, evocative and active (Plomin et al, 1977; Scarr et al, 1981; Scarr, 1992). Although there is some slippage in the measurement of these three different forms, each is defined in a conceptually precise manner. Passive genotype—environment correlation results simply from the fact that a child receives 50% of his or her genes from each parent, and that the environment in which the child develops is also to some degree determined by the parents. Thus both the child's genotype (entirely) and its early environment (partially) are determined by the parental genotypes. This type of correlation is likely to be most important in infancy and early childhood, when the parent is the primary source of environmental influences. As the child's character and behaviour begin to evoke responses from the wider social environment, evocative genotype—environment correlation is said to occur. For example, a child who has a difficult temperament may be more likely to elicit aggressive responses from family members, teachers and peers. Finally, active genotype—environment correlation occurs when individuals actively seek out environmental situations that correspond to their genetically influenced propensities. Using the same example of difficult temperament, children who are difficult and aggressive may be more likely to select peers who are also aggressive, increasing their likelihood of fitting in, but also increasing problem behaviours.

A growing body of research has identified genetic influences on measures typically defined as environmental (e.g. Plomin & Neiderhiser, 1992; Plomin et al, 1994b). This is especially relevant for psychiatric epidemiology because of the need to distinguish genetic and environmental influences on complex behavioural patterns. Thus genetic factors may play a part in determining associations between environmental measures and the emergence of psychological disorder (Reiss et al, 1995). At least two studies have explicitly examined evocative genotype—environment correlations in samples of adoptees (Ge et al, 1996; O'Connor et al, 1998). In both of these, characteristics of the biological parents (antisocial behaviour and alcohol misuse) were correlated with adoptees' adjustment (externalising behaviour and/or antisocial behaviour) during middle childhood and adolescence. These results, and the findings of genetic influences on environmental measures in general, emphasise the need to also consider genetic influences when environmental factors are examined in psychiatric epidemiology.

Genotype—environment interaction

When considering the impact of genetic and environmental factors on behaviour, including psychopathology, it is important to consider their possible interaction. In general, studies of twins, siblings and adoptees have not included genotype—environment interaction in their estimates of the different effects. In part, this has been a function of the overall difficulty of detecting interactions, compounded by the use of ‘anonymous’ estimates of genetic and environmental factors. In a handful of studies, however, evidence has been found of such interaction in determining the manifestation of antisocial behaviour (Cadoret et al, 1995), alcohol abuse (Bohman et al, 1981), schizophrenia (Tienari, 1991) and onset of depression (Kendler et al, 1995b). Typically, genotype—environment interaction appears to be expressed as increased vulnerability to environmental stress (e.g. family difficulties, stressful life events) in individuals at high genetic risk of manifestation of the target disorder.

The specific mechanisms and processes involved in genotype—environment interaction are only just beginning to be explored. Two reviews propose six models of the ways in which genetic and environmental factors may operate to influence a phenotype (Ottman, 1996; Yang & Khoury, 1997). In type one, increased risk occurs only when both genetic and environmental risk factors are involved in the pathogenic process; in other words, neither the genotype nor the environment alone is sufficient to cause the disorder. This is the type of interaction typically tested for in quantitative genetic designs, as described above (e.g. Cadoret et al, 1995; Kendler et al, 1995b; Tienari, 1991). Type two effects occur when environmental influences are sufficient to increase phenotypic risk without the presence of any corresponding genetic risk, and conversely type three occurs when genetic loading is adequate to increase the risk of disorder without the presence of environmental factors. In type four, genotype and environmental exposure each play a part independently in increasing risk. Finally, type five and six interactions occur when a particular genotype acts as either a protective or a risk factor, depending on the environment. In type five the genotype is protective (independently of known environmental exposures), whereas in type six the same genotype increases risk, but only in the presence of such an exposure. It should be noted that type two and three effects do not describe genotype—environment interaction, as this term is generally understood. It would be more accurate to call type two effects responses to environmental risk factors, and type three effects responses to genetic risk factors. Nonetheless, the typology is helpful for understanding how genotype and environment operate together, and is a useful tool for studying their respective contributions.


A variety of methods can now be employed in estimating genetic and environmental contributions to behaviour in general and psychiatric disorders in particular. Use of several methods to triangulate on the same problem is the most satisfactory approach, although different methods often result in somewhat different findings. Regardless of these differences, there is growing evidence of both genetic and environmental contributions to behaviour in general and psychiatric disorders in particular. Molecular genetic techniques, in combination with family designs allow for the identification of specific genes that predispose to different psychiatric syndromes, thereby providing the first step in understanding some of the most basic mechanisms involved. Studies of genotype—environment correlation and interaction permit the construction of more complex models of association, which are likely to prove closer to the way the systems operate, that is by interrelation. Finally, the combination of molecular genetic techniques with genotype—environment correlation and interaction allows specific genes to be associated with specific environments, thereby providing a glimpse of these mechanisms at a specific gene-by-specific environment level.


View Abstract