In the second class of this course we will see the most general methodological approaches to detect genetic variation.
Organization of the presentation
In this second class we will focus on three main topics: the technical approach to detect genetic variation, the different types of molecular markers and their properties, and the most important conceptual principles and external factors that mould the genetic variation in biological species.
We will see three basic techniques in this course: the complete process of Southern Blott, and spin-offs and derivatives of it; the principles of the Polymerase Chain Reaction PCR) and DNA sequencing.
This technique was developed by Sir Edwin Southern in 1977, to detect single genes in complex genomes. This technique was developed further by Sir Alec Jeffreys, who utilized the variation at minisatellite loci to identify individuals.
The method consist of the following steps:
Fragmentation of genomic DNA in a reproducible way (by means of restriction enzymes)
Separation of the fragments in an electric field
Transfer of the fragments to a membrane
Probing of the membrane with known DNA
Detection of the probe
The first step consists of a controlled fragmentation of genomic DNA. For this we use RESTRICTION ENZYMES. These are proteins whose enzymatic activity consists of recognition of a 4 - 6 base pairs palindrome (repetitive DNA?) and cutting the DNA double chain.
The products of the digestion are loaded onto an agarose gel, which is subjected to an electric field. The "mesh" formed by the agarose gel will inhibit the mobility of the DNA fragments according to their size: small fragments will migrate faster to the positive electrode, and the bigger fragments will migrate slower.
When repetitive DNA is recognised by restriction enzymes, the product of the digestion are distinctive bands. The interspersed repetitive human sequences Alu were named after the restriction enzyme (Alu I) that recognised a site within these repeat units.
Genomic single copy DNA is as a semar in the gels.
After the DNA has been separated in the gel according to size, this DNA is transferred to a nylon membrane by capillarity. The membrane is placed on top of the gel and a pile of dry paper towels on top then absorbs the solution, leaving the DNA attached to the membrane.
This membrane is then incubated along with a probe: a fragment of DNA of known origin that is labelled with a radioactive nucleotide (with P32). The radioactivity is detected by exposure to an X- film, indicating the position in the gel where the probe found homologous DNA.
Although this method was developed for locating single genes within a genome, its great potential allowed the application of "sub-techniques " and the development of derivatives of it. Following, we will see a few examples.
Southern blot image: http://www.mun.ca/biology/scarr/Gr12-18.html
(See Slide 19 Class I)
DNA fingerprinting is a technique derived from the general Southern blot method. By contrast to the original Southern Blot technique, instead of using a probe that is homologous to a single copy gene, DNA fingerprinting uses probes that come from the repetitive regions of the genome. Sometimes these repeat motifs are scattered throughout the genome and are located in the same or in different chromosomes. These are called multilocus fingerprinting. Unilocus fingerprinting is produced by probes that detect repeat units that occur at ONLY ONE locus. These are more important in the studies of individual identification.
In the graph on top of the slide, we see a diagram of a single locus minisatellite in an heterozygote individual: the two chromosomes present different number of repeat units, two different alleles. The arrows are indicating the recognition site for a restriction enzyme.
The diagram on the left shows the radio-autography of the DNA fingerprinting profile of 5 individuals of rainbow trout, digested with the restriction enzyme Hinf I. Two probes were used to detect multilocus minisatellites with the repeats motifs GATA and GGAT. Multilocus DNA fingerprinting was extensively used in the '80s to detect genetic variation in natural populations.
The radio-autography on the right shows a single locus DNA fingerprinting profile. An homozygote and an heterozygote individuals are indicated.
DNA fingerprinting of trout image: Genet. Mol. Biol. vol. 21 n. 2 São Paulo June 1998
Unilocus DNA fingerprinting image: http://www.chm.bris.ac.uk/motm/dna/dna.htm
For PCR see Slide 9, this presentation
Restriction Fragments Length Polymorphism is a technique that allows detecting changes (point mutations) at a very low cost, with the use of restriction enzymes.
The main objective is to subject a fragment of DNA, usually obtained by PCR, to digestion with multiple restriction enzymes to detect as many polymorphic sites as possible. This technique was also applied in the initial studies of polymorphism in mtDNA. The whole mtDNA was extracted from tissues, digested with restriction enzymes and the fragments were separated by electrophoresis in agarose gels.
Although relatively cheap and fast, some information is lost in the process as it is impossible to know which of the bases at the recognition site has changed, unless we obtain the sequence information for the fragment of interest.
The RFLP in the slide shows a restriction analysis of a 500 bp of the mitochondrial gene COI in the oyster Crassostrea ariakensis. Each lane is representing DNA of one individual digested with a restriction enzyme. We can say then, that this population or species shows two (three) haplotypes with, e.g., the restriction enzyme Hae III. More accurate information about the level of genetic diversity in this species would be obtained with the use of multiple restriction enzymes.
Image : http://www.vims.edu/vsc/images/gel.jpg
PCR Kary Mullis
Polymerase Chain Reaction I a technique that revolutionized the study of DNA, and replaced the utilization of the Southern Blot in studies of single genes. The technique was visualized by Kary Mullis in 1981. He received the Nobel Price in 1993.
Conceptually, PCR is the replication of DNA in a tube, in a controlled manner. Unlike the DNA replication before cell division, this replication only copies a fragment of DNA framed by small complementary fragments of DNA called primers. Primers are usually 20-25 base pairs in length.
The separation of the strands for DNA replication is performed by the action of enzymes within the cell. However, in the reaction tube we use temperature (93 to 95? C) to disrupt the hydrogen bond between complementary bases. In high temperature conditions, the DNA polymerases or organisms living in standard conditions of temperature would be denatured. Therefore, the DNA polymerase utilized in PCR was isolated from thermofilic bacteria that live in temperatures above 100? C). The DNA polymerisation is carried out from the starting point of the primers.
This reaction is repeated for about 30-40 cycles. The increase in the number of chains present in the reaction tube can be expressed as 2n. Thus, at the end of the 30th cycle, the reaction tube would contain 107 x106 copies of the original template DNA.
Applications of PCR: microsatellite genotyping.
See Slide 20, Class I.
In this slide we show an example of the application of the PCR. The upper scheme shows an heterozygote individual for a microsatellite locus, and the flanking regions that are used as a priming site for the PCR.
Microsatellites are co-dominant multi-allelic markers: both alleles can be detected and multiple alleles per loci are frequent. For instance, an average number of alleles per loci in fish species ranges from 10-25; some herring loci have 60 alleles.
The PCR products are resolved in a gel. Each individual is expected to show 1 or 2 bands for each microsatellite locus. In the example below, there is a controlled crossing between two heterozygote fish and the offspring show all the possible expected combinations of genotypes.
See Slide 20, Class I.
Applications of PCR: microsatellites for study of reproductive strategies.
See Slide 20, Class I.
In this slide we show an example of the application of microsatellite markers to elucidate mating strategies in bryozoan organisms.
The marine bryozoans from the order Cyclostomata, have incubating chambers for their embryos, that are occupied by many embryos at different stages of development before they are released into the environment. These embryos were proposed to have originated from a single fertilization event, followed by subdivision and multiplication of embryos within the chamber. This reproductive strategy is called polyembryony. However, the hypothesis of polyembryony was only based on observation of embryos in the chamber; multiple fertilization events or parthenogenesis (the development of an egg without contribution from father) could not therefore be excluded till confirming the genotypic composition of the offspring and the mother.
Many embryos within one chamber were genotyped along with their mother in several colonies. Some results are shown in the table on the slide. The numbers indicate the allele sizes in base pairs, just as they are read after resolving the PCR products in an automatic sequencer.
The table in the slide shows the genotyping of two colonies (A and BA) at 5 microsatellite loci. For the first, 8 chambers were genotyped, and 3 chambers were genotyped for the second colony. N is indicating the number of embryos that were genotyped per chamber.
As these animals have external fertilization, it was not possible to genotype the father.
The most important results can be summarized as follows
There is a father. Paternal alleles are indicated in red. This observation discards parthenogenesis
all embryos within the same chamber show the same genotype, therefore confirming the hypothesis of polyembryony.
The genotypes for loci Cd17-3 at the first chamber show unexpected genotypes: some of the chambers shows alleles that are not present in the mother. The mother's and the embryo's genotypes are all apparently homozygotes.
However, these are not the true genotypes and this example is reflecting one problem not rarely encountered in microsatellite loci and other PCR-based genotyping techniques: the problem of null alleles. Sometimes, mutations occur at the priming sites, preventing the primer from binding to the template DNA, and, therefore, the PCR reaction from proceeding. The result is that in heterozygote individuals only one allele will be amplified.
In our case, the genotype of the mother is 233/ null allele. The genotypes 229/229 and 237/237 in the embryos are showing inheritance of the mother's null alleles and the paternal (either 229 or 237 allele).
Null alleles lead to an underestimation of genetic diversity in natural populations.
Roger N. Hughes1, M. Eugenia D'Amato, John D. D. Bishop, Gary R. Carvalho, Sean F. Craig, Lars J. Hansson, Margaret A. Harley and Andrew J. Pemberton (2005).Paradoxical polyembryony? Embryonic cloning in an ancient order of marine bryozoans. Biol. Lett. 1, 178-180.
Sometimes, the generation of genetic profiles is difficult for some organisms of relatively unknown genomes. This is because no "universal primers" are available or if they are available they may not match well enough as to allow the PCR reaction to proceed and "universal probes" for multilocus DNA fingerprinting do not detect variation.
In these cases, a PCR-based technique may solve the problem. In relaxed PCR conditions, using very short primers, only one primer per reaction (oligomers 10 bases long) and low annealing temperatures (e.g. 37 ?C) to allow these primers to match the genome, a relatively large amount of PCR products are obtained. These products, however, are "anonymous" as their content is unknown. However, the structure of the amplified region, an intervening stretch of DNA situated between inverted repeats, resembles the structure of transposon-like elements. Therefore, the RAPD fragments are more likely to contain repetitive elements.
Although it is a very straightforward, easy and cheap technique, the RAPD profiles are sometimes difficult to reproduce. The scoring of the diversity in these RAPD loci is as follows: each band is considered a biallelic locus, one allele being "band present" and another allele being "band absent". The associated problem with this, is that a heterozygote individual BAND PRESENT/ BAND ABSENT cannot be distinguished from an homozygote individual BAND PRESENT/ BAND PRESENT. For this reason RAPD's are called DOMINANT MARKERS.
In contrast: microsatellites are co-dominant multiallelic markers.
AFLPs (Amplified Fragment Length Polymorphism) are conceptually similar to RAPDs: dominant biallelic systems. However, they are more reproducible across labs after a "molecular trick" that adds adaptors, fragments of DNA of known sequence, that are used as priming sites for the PCR.
On the left: RAPD profiles of different strains of Bradyrhizobium japonicum, a nitrogen-fixing microorganism, frequently used for seed inoculation of soybean to increase the crop yield.
There are several methods available to determine the actual sequence of nucleotides in a segment of DNA. One procedure uses specially altered nucleotides called dideoxynucloetides that have been made either radioactive or fluorescent.
When enough of the targeted DNA fragment has been amplified by PCR, the mix is put through a series of DNA sequencing reactions that are a variation of PCR. The amplified product from the PCR is added to a reaction tube containing the same Taq (thermofilic bacteria) DNA polymerase used in the PCR, one primer that can hybridize at the desired location (as opposed to both strands in PCR), and all four of the nucleotide bases (A, T, C, G.)
In addition, small amounts of fluorescence labelled dideoxynucleotides (A, T, C, G) are added to the mixture. Dideoxynucleotides are human-made nucleotides whose sugar component is slightly different from that of the nucleotides that make up DNA. The sugar lacks of the hydroxyl group that covalently links to the phosphate of the next nucleotide in the growing chain. Therefore, the DNA polymerisation process stops at a dideoxynucleotide as no more nucleotides can be added to that chain. Each dideoxynucleotide is labelled with a different fluorescent compound so that it will give off an identifying colour in a laser beam. Prior to the automatic sequencing, the ddNTPS were labelled with radioactivity and these products were resolved in gels (see Figure on the right) and exposed to X-ray films. We can read the DNA sequence from bottom to top.
After 20 - 30 cycles of the PCR heating and cooling, the resulting mixture will contain a series of fragments of different lengths depending on how many bases had been added to the chain before one of the dideoxynucleotides sneaked in and blocked further growth.
In evolutionary studies we often want to see the pattern of genetic variation at a certain loci. For this, we obtain DNA sequence information for many individuals and we compare the variation through aligning the different sequences. An example of a DNA alignment is given in Slide 25, in the previous class.
See graphic explanation at http://www.contexo.info/DNA_Basics/dna_sequencing.htm
Molecular Marker: is an identifiable physical location or LOCUS in the genome (e.g., restriction enzyme cutting site, gene) whose inheritance can be monitored. Markers can be expressed regions of DNA (genes) or some segment of DNA with no known coding function but whose pattern of inheritance can be determined. In general, we use markers of biparental, Mendelian inheritance, or markers of only-maternal or only-paternal inheritance. The molecular markers should detect genetic variation or polymorphism.
The development of the PCR, automated sequencing and the advent of microsatellites made a profound impact in Molecular Ecology and population genetics. Also important are the latest developments of statistical methods and user-friendly computer packages.
In general, different molecular markers suit different types of questions we want to answer. In terms of molecular change we can distinguish three different levels of population biology at which we can assess information:
1. individual identification
2. genic variation
3. gene genealogies
As for individual identification and analysis of relatedness you can see examples in Slides 10 and 11. The changes in allele frequencies in populations can have many origins. First, we will see how Mendelian principles apply to genetics in populations.
Population genetics 1
The genetic information contained in a group of organisms is called the gene pool. More practically, we can think of it as the set of all copies of all alleles that could potentially contribute to the next generation.
One important and central question in the understanding of how the gene pool is transmitted to the next generation is, if we can predict the genetic composition of the next generation. The answer is YES, applying the principles of Mendelian inheritance (Slide 5, Class I). Here follows an example of a bi-allelic system:
If there are N GAMETES carrying allele A and N gametes carrying the allele a, we can call their respective frequencies in the population p and q. In a graphical example similar to that of the punnet square (Slide 5, Class I), in which the frequencies are represented by a fraction of the sides of the square, we can predict the genotypic and allelic frequencies of the next generation. The sum of all alleles' frequencies for a given locus in a population should sum up to 1, therefore the sides of the square are 1, and the area of the square is also 1.
In the present example, the frequency of ovules with allele A is 0.6, and the frequency of ovules with allele a is 0.4; the frequency for spermatozoid carrying allele A is 0.6 and the frequency of spermatozoids carrying the allele a is 0.4. In this example the allele frequencies in sperm and eggs are thus equal.
According to the Mendelian laws, we can predict all the possible genotypic combinations that can result from a fertilization event. Now, imagine what happens to a population with several hundreds of individuals: we can still predict all the possible combinations and their proportions, the frequency in which they will appear in the next generation, and the allele frequencies in the next generation.
We can obtain the frequency of a specific combination simply by multiplying the frequency in which these gametes occurred in the population. For instance, the frequency of individuals of genotype AA is the frequency of ovules carrying allele A multiplied by the frequency of spermatozoids carrying allele A. In our case, this frequency is 0.36.
Again, the genotypic proportions are 1:2:1 (homozygotes for the dominant allele: heterozygotes: homozygote for recessive allele) and their respective frequencies in the population are p2, 2pq and q2 respectively. These frequencies also sum up to 1.
Hardy Weinberg Equilibrium (HWE)
We can summarize the combination of ovules and spermatozoids by the expression (p + q) 2 and the resulting combinations as the expansion of the above expression, which results in p2 + 2pq + q2. In other words, the chance of two gametes with allele A coming together is, p x p = p2 , the chance of gametes A and a coming together (or the occurrence of heterozygotes has a probability of) 2pq, and finally, the chance of two gametes a coming together is q2.
This is the fundamental concept of the equilibrium of Hardy-Weinberg.
This is no more than a model that can help us make predictions about the fate of alleles in populations. As all models, it is based on assumptions:
· the organism is diploid
· with sexual reproduction
· generations are non-overlapping
· loci are biallelic
· allele frequencies are identical in males and females
· random mating
· population size is infinite
· no migration
· no mutation
· no selection
Implications of HWE
The Hardy-Weinberg model is only a reference model in which no evolutionary forces are acting. The main basic implications of this model are that
(1) Allele frequencies in a population will not change, generation after generation
(2) If allele frequencies in a population are given by p and q, then genotype frequencies can be predicted from the allele frequencies by p2 ,2pq and q2. A similar principle can be applied to multi-allelic loci.
If populations are out of equilibrium, and allele frequencies are the same in males and females, and generations do not overlap, the equilibrium of allele frequencies are attained after only one generation. If generations are overlapping the HWE is attained gradually after a few generations of random mating.
In the examples in the table, we show the genotypic observed frequencies of 4 hypothetical populations. Although all show the same allelic frequencies, the genotypic frequencies differ. For the first population, we estimated the expected genotypic frequencies from the expansion of the expression (p + q)2 using the allele frequencies. After obtaining the expected genotypic frequencies we are interested in knowing how different they are from the observed frequencies. For that we use a Chi-square non-parametric test, at the 0.05 level of significance and two degrees of freedom. The degrees of freedom are calculated using the formula (k-1), where k=number of genotypes (or categories). As one does not use proportions for a Chi-square test, the values are multiplied by a hundred. (AA=20, Aa=80, aa=0 and expected AA= 36, expected Aa=48 and expected aa=16). Using the formula above chi-square has a value of 44.4
The chi-square table for 0.05 significance level and 2 degrees of freedom shows a value of 5.99, which is much lower than our obtained chi-square value of 44.4. Therefore, we reject the null hypothesis of no difference between observed and expected values. In biological terms, the population is out of HW equilibrium.
Following the same approach, we will see that populations 3 and 4 are also out of HWE. If generations are non overlapping and female and male allelic frequencies are the same, all these populations will attain equilibrium after one generation of random mating.
In the following slides we will show several examples of factors that determine departures from equilibrium. This happens when the assumptions of the model are violated. We should start with "mutation", an assumption that is clearly violated in diploid organisms as new combinations can arise by recombination (see e.g. unequal crossing over in microsatellites an other repeated elements). Point mutations and other types of mutations along with mutation rates are reviewed in Class 1, Slides 25, 26, 27, 28.
Selection is differential survival and reproductive success of genotypes. Evolution by natural selection was the central idea in Darwin's "On the Origin of Species by Means of Natural Selection", published in 1859. A famous slogan "survival of the fittest" transcended the barriers of biology and was unfortunately taken to justify political agendas during the 19th and early 20th centuries.
Selection can change allele frequencies generation after generation, and therefore it is not possible to calculate genotype frequencies from allelic frequencies, thus violating the consequences of HWE.
Sometimes the adaptive value of heterozygotes is higher than in homozygotes. This type of heterozygote superiority is called heterosis.
In human ß-hemoglobin, a point mutation changes the glutamic acid (allele A) for valine (allele S), which is obtained by a transversion at second codon of amino acid 6. This mutation in the haemoglobin causes the erythrocytes to change their shape dramatically: they are more fragile than normal cells and they can block blood flow in capillaries, depriving the tissues from oxygen and causing anaemia. This condition is called sickle cell anaemia. However, the individuals that carry a normal and a mutated allele are more resistant to malaria, as parasite cells are selectively destroyed. The selective pressure of malaria parasites, then, favours the occurrence of allele S at a higher frequency in populations exposed to malaria.
The situation in which selection maintains two or more alleles at a locus is called balancing selection. Sickle anaemia is a case of balancing selection in which heterozygotes have a higher selective advantage than homozygotes.
Another type of selection favours only one allele instead of many, and therefore the frequency of this allele shifts continuously in one direction. This principle also applies to phenotypes. This type of selection is called directional selection and favours individuals with extreme traits in a population. We present several examples:
If the mosquitoes Culex pipiens are exposed to the insecticide chlorpyrifos, it blocks the ACE gene (acetyl choline esterase) and they die due to a malfunctioning nervous system. However, some mosquitoes carry a resistant gene ACER. The study of the allele frequencies of several genes along a transect of 54 km in the Mediterranean coast in France, taking in a few sites where the insecticide was previously used, showed higher frequency of the resistance gene only in these sites. All other studied genes showed allele frequencies equally distributed among populations (Chevillon et al., 1995). This shows the effect of selection for the resistant allele in areas where it conferred higher adaptive value. In the figure, the first 4 sites were affected by utilization of insecticide.
Other examples of directional selection are breeding programs of domestic animals and plants, where extreme phenotypes are artificially selected. Resistance to antibiotics and pesticides function as selective agents for the resistant phenotypes.
Other type of selection depends on the frequency of certain variants or phenotypes in the population. The central American butterflies Heliconius erato have many morphs, all poisonous to birds. When a morph is common, birds learn to avoid them, therefore favouring the rare morphs to become more frequent. The same process starts again for the formerly rare but at present frequent morph. This is frequency-dependent selection.
Nowadays we know other mechanisms than selection that are responsible for changes generation after generation. Selection is a non-random process. There are other non-selective processes of evolution. For example, the random fluctuation of allelic or genotypic frequencies when they do not confer any adaptive advantage. This is genetic drift.
Sickle cells image: www.studentbmj.com
Mosquito image: http://www.bio.pu.ru/win/entomol/Kipyatkov/educ/images/c_pipiens.gif
Recommended read: Interview to Douglas Futuyma
C Chevillon, N Pasteur, M Marquine, D Heyse, M Raymond (1995) Population Structure and Dynamics of Selected Genes in the Mosquito Culex pipiens. Evolution, 49, 997-1007.
Drift results from the vagaries of sampling of the gametes that will unite to form the next generation. This process by which allele frequencies do not change in a predetermined way in called random genetic drift.
The change in allelic frequencies from one generation to another can be expressed as
dq = q1 - q0, where q1 is the allele frequency in generation 1 and q 0 the allele frequency in generation 1.
The magnitude of variation of allelic frequencies because of the random sampling is given by
s 2 pq = (p0 q0 ) / 2N
where N is the effective population size (number of individuals that are actually reproducing in the population). It is obvious then, that smaller populations will display a higher variance.
In a simulation process in which several bi-allelic genes with an initial frequency 0.5 are plotted over a few generations, we see the effect of genetic drift in a population of size 10 (above) and a population of size 100. We can see that the frequencies fluctuate longer (for more generations) in the population of bigger size than in the other. Another interesting observation is that the alleles have either of two final fates: they become fixed (frequency = 1) or they are lost (frequency = 0). The direct consequence of fixation is the decrease in diversity: loci loose heterozygosity and become monomorphic. If alleles are lost in multi-allelic loci, the loss in diversity is seen in the reduction of allelic number (Na). It is important to emphasize that the effect of genetic drift is more pronounced in small populations.
A special case of random sampling is the bottleneck effect. Bottlenecks are a extraordinary reduction in the population size due to outbreak of diseases, natural disasters, habitat reduction or predation. The direct consequence of bottlenecks is that the natural diversity is lost: rare alleles tend to disappear, heterozygosity is reduced and inbreeding increases.
An example of a severe bottleneck is that of the cheetah. This species suffered a very significant reduction in population size by the end of the last glacial age, 10 000 years ago (Menotti-Raymond and O'Brien 1993), when massive extinctions of large mammals took place on several continents.
Another example of the bottleneck effect is that of over hunted species, such as the American bison, with estimated population sizes of over 60 million before 1492, that were reduced to only 750 animals in 1890. The present population size recovered to 350 000 animals.
Menotti-Raymond M and O'Brien S J (1993). Dating the genetic bottleneck of the African cheetah. Proc Natl Acad Sci U S A. 15; 90: 3172-3176.
Cheetah image: www.worldalmanacforkids.com
Bison image: www.ownbyphotography.com
Founder effect is another special case of random sampling in populations. It is a special case of bottleneck in which a new population is established by a small number of individuals from the original population that are not representative of the total genetic variation.
The occurrence of porphyria in South Africa at a higher frequency than in other populations is a case of founder effect. Variegate porphyria is an autosomal dominant trait that consists of a mutated (and therefore deficient) protoporphyrinogen oxidase, the last enzyme in the chain to synthesize the hoem group. This causes foto sensibility in the skin and neurovisceral crisis. The origin of all south Africans suffering from prophyria can be traced back to Ariaantje and Gerrit Jansz, who emigrated in the 1680s from Holland to South Africa, one of them bringing along an allele for the mild metabolic disease porphyria. They had 8 children, 4 of them suffered from porphyria. Today more than 30000 South Africans carry this allele.
Hift RJ, Meissner PN, Corrigall AV, Ziman MR, Petersen LA, Meissner DM, Davidson BP, Sutherland J, Dailey HA, Kirsch RE.(1997) Variegate porphyria in South Africa, 1688-1996--new developments in an old disease. S Afr Med J. 87:722-31.
The founder effect is another special case of random sampling in populations. It is a special case of the bottleneck effect in which a new population is established by a small number of individuals from the original population that are not representative of the total genetic variation.
The occurrence of porphyria in South Africa at a higher frequency than in other populations is a case of such a founder effect. Variegate porphyria is an autosomal dominant trait that consists of mutated (and therefore deficient) protoporphyrinogen oxidase, the last enzyme in the chain to synthesize the hoem group. This causes photo sensibility in the skin and a neurovisceral crisis. The origin of all South Africans suffering from prophyria can be traced back to Ariaantje and Gerrit Jansz, who emigrated in the 1680s from Holland to South Africa, one of them bringing along an allele for the mild metabolic disease porphyria. They had 8 children, 4 of them suffered from porphyria. Today more than 30 000 South Africans carry this allele.
Hift RJ, Meissner PN, Corrigall AV, Ziman MR, Petersen LA, Meissner DM, Davidson BP, Sutherland J, Dailey HA, Kirsch RE.(1997) Variegate porphyria in South Africa, 1688-1996--new developments in an old disease. S Afr Med J. 87:722-31.
Migration in the evolutionary sense implies movement of alleles from one population to another population. In other words, there is gene flow between populations.
Mechanisms of gene flow are dispersal of juveniles or larvae, transport of pollen, spores, etc. by wind, occasional long distance dispersal, etc.
One of the most widely utilized models of migration is the continent-island model, in which the direction of the migration in one direction is more important than the almost negligible migration from the island to the continent.
In the slide we see the picture of the continent-island migration model. Imagine that allel A1 is fixed on the island and allele A2 is fixed on the continent (in other words genotypic frequencies for the locus A are, on the island: A1A1=1; A1A2 = 0 and A2A2 = 0; and on the continent A2A2=1; A1A2 = 0 and A1A1 = 0). After a major migration event from the continent, 80 % of the island population is A2A2 (q2 = 0.8) and 20% A1A1 (p2 = 0.8). These genotypes are out of HWE, but the equilibrium is attained after 1 generation of random mating. Migration can thus act as an homogenizing factor.
The population structure is almost universal among organisms. Many occur in flocks, herds, colonies, and display patchy distributions. Besides, the natural habitat can become fragmented and subpopulations consequently undergo isolation.
In this scenario, genetic differentiation occurs because different subpopulations would acquire different allelic frequencies through the processes of selection and drift in the presence of non-random mating (inbreeding) and restricted migration (gene flow). The opposite of this situation is panmixia or random mating.
The differences in allele frequencies among subpopulations caused by local inbreeding and fixation of alleles by random drift, determine that the overall heterozygosity is lower than the expected under HWE in panmixia. This effect was described by Swedish statistician Sten Wahlund, and therefore was so called the "Wahlund effect".
The breeding structure of a population can be described by a hierarchical grouping of inclusive levels: within a subpopulation, between subpopulations, overall subpopulations. The inbreeding effect at each of these levels is described by a specific fixation index F for multiple loci (Nei 1977):
FIS = (Hs - Ho) / Ho
Where Hs is the average expected H within a certain subpopulation.
Ho is the average expected H within subpopulations overall loci
Fis measures the correlation of alleles with individuals, the correlation between 2 uniting gametes within the subpopulation. It indicates the degree of inbreeding of a subpopulation; it is a measure of deviation of HWE within a subpopulation. A positive value indicates deficiency of heterozygotes.
FIT = (HT - H0) / HT
Where HT is the average expected H in the total population overall loci.
FIT is the correlation between 2 uniting gametes relative to the total population. It is the overall inbreeding coefficient (F) of an individual relative to the total population (Individual within the Total population). Fit is rarely used
FST = (HS - HT) / HT
Fst is the correlation between two gametes drawn at random from each subpopulation and measures the degree of differentiation between subpopulations. It measures the reduction in heterozygosity in a subpopulation due to genetic drift. In other words, it is the proportion of the total heterozygosity that is explained by the subpopulation heterozygosity. When Fst = 0 subpopulations are not differentiated.
See example in Slide 11, Class III.
Population genetics webpages and links
Examples of population subdivision
In this slide we see two numerical examples of a study of genetic differentiation between 2 populations. Imagine we want to investigate the level of gene flow between two sites on the geographical distribution of a species.
In the first example we have two population with same allele frequencies, but one of them is out of HWE, with deficiency of heterozygotes, which is indicated by the deviation of Fis from 0. The deficit of heterozygote in the second population deviates Fit from 0. However, because the allelic frequencies between these populations are similar, the Fst value is 0, indicating gene flow and no genetic differentiation.
The second example shows two population in HWE, but because of the variation in allele frequencies between them, Fit and Fst are different from 0. Fst ? 0 indicates no gene flow, population structure, population subdivision.
The examples we saw in the previous slides show a snapshot of the present situation of a population or a group of populations. If we add history to the distribution of genetic diversity we have a historical perspective: lineages are represented by phylogenetic trees or networks, that imply time.
An evolutionary lineage (also called a clade) is composed of species, taxa, or individuals that are related by descent from a common ancestor. Lineages are often determined by the techniques of molecular systematics. A lineage can be distinguished from a mere collection of species by the fact that it contains only and all individuals that share a common ancestor.
In phylogenetic reconstructions above the species level, the lineages are represented by dicotomic trees, where the nodes represent an extinct common ancestor and the length of the branches represent the amount of change accumulated since the divergence.
Below the species level, the lineages can also be represented by phylogenetic trees, but are better represented by networks, as the "ancestors" (represented by individuals at the nodes) are alive, and networks allow polytomies.
In the slide we present an example of a phylogenetic tree of several Diptera, and a network of all detected mtDNA haplotypes for the microfrog Microbatrachella capensis.
Within species, the most common genetic markers to reconstruct genealogies are those of uniparental inheritance: DNA sequencies in mtDNA and markers (DNA sequencies or microsatellites) in the Y chromosome in mammals.
When the lineages are superimposed to their biogeographic distribution, the analysis of population structure takes a biogeographic and historical perspective.
Fly phylogeny image : http://www.inhs.uiuc.edu/cee/FLYTREE/images/flyphylogeny.jpg
Definition in: http://en.wikipedia.org/wiki/Lineage_(evolution)
Diversity in uniparental markers
The most utilized measure of diversity for diploid markers is H (heterozygosity).
Equivalent measures of diversity were also designed for uniparental markers like mtDNA sequences.
The simplest one is the "haplotype diversity" h, which is simply the relative number of haplotypes present in a population. It is obtained by dividing the different haplotype counts by the total number of individuals.
The other well-extended measure is the nucleotide diversity, which is equivalent to the heterozygosity at the nucleotide level. It measure the average number of nucleotide differences between haplotypes at the population. It is given by the equation
p = n / (n-1) ? xixjp ij
where p ij is the proportion of different nucleotide between sequence i and sequence j.
n is the total number of sequences in the sample. Xi and xj are the frequencies of sequences xi and xj in the population.
For instance, a sample with high haplotype diversity means that the proportion of different haplotypes in the population is high. A value of h = 0.9 indicates that 90% of the individuals carry a different haplotype.
A value of p 0.001 indicates that, on average, the sequences in this sample differ in 0.1 %, or that, in pair wise comparisons, any two sequences taken from this sample would differ in 1 out of 1000 nucleotides. This is a very low nucleotide diversity value.
Now, assuming that these two measure were taken in the same population we can say that this sample is composed by a high number of haplotypes, and that all are very similar to each other. A possible evolutionary scenario for this case is that of a recent population expansion, where a considerable number of new haplotypes have recently originated from one or a few haplotypes. When this is portrayed as a network, it is represented as a star-shaped phylogeny.
Phylogeography (See example in Slides 1-3, Class III)
Phylogeography is the study of the processes controlling the geographic distributions of lineages by constructing the genealogies of populations and genes (Avise 2000).
Phylogeographical inferences are usually made by studying the reconstructed genealogical histories of individual genes (gene trees) sampled from different populations. The processes that can be inferred from a tree or network are historical demographical factors like population bottlenecks, population expansions, and levels of gene flow (e.g. vicariance).
The network in the slide was reconstructed from variation at the mtDNA control region of 300 haplochromine cichlids from Lake Victoria region. Cichlids underwent an explosive radiation (rapid diversification and speciation) during the Pleistocene.
The size of the circles represent the frequency in the sample, the colours indicate the geographical origin of the sample. The star-shaped phylogenies represent a population expansion: the diversification process starts from a frequent haplotype. The presence of the same haplotype in different regions indicates gene flow (e.g. the biggest circle on the left hand side represent a haplotype that is present in Lake Kivu and Uganda lakes.
Avise, J. C. (2000). Phylogeography: the history and formation of species, Harvard University Press.
The figure was taken from
Verheyen E, Salzburger W, Snoeks J, Meyer A (2003). Origin of the Superflock of Cichlid Fishes from Lake Victoria, East Africa. Science 300:325-329.
Several factors have to be taken into account to decide whether and/or which populations are of value to maintain the evolutionary potential of the species. The principles of population genetics and the utilization of a variety of molecular markers in the last 2 decades has facilitated the identification of populations or species that can be particularly vulnerable to extinction due to, e.g., reduction in diversity, or populations that represent unique or rare lineages. Most of the phylogenetic reconstruction and identification of lineages utilized neutral variation. Thus, the concept of ESU gave major importance to identification of lineages and phylogenetic reconstruction.
Nowadays, the adaptive value of a population, although sometimes difficult to measure, is proposed to be taken into account. Following, we review the concepts of ESU that have been utilized in the last 15 years:
Ryder 1986: populations that actually represent significant adaptive variation based on concordance between sets of data derived by different techniques.
Waples 1991: populations that are reproductively separate from other populations and have unique or different adaptations.
Moritz 1994: populations that are reciprocally monophyletic for mtDNA alleles and show significant divergence of allele frequencies at nuclear loci.
More recently, Crandall (Crandall 2000) incorporated the concepts of "genetic" and "ecological exchangeability" to define ESUs
Ecological exchangeability implies that individuals can be moved to the area occupied by other individuals of the same species and occupy the same niche or selective regime. Characters that confer ecological exchangeability should be heritable.
Ecological exchangeability is rejected when populations underwent differentiation due to drift or selection, if they display different loci under selection or different life history traits.
Genetic exchangeability is directly related to the level of gene flow. Unique alleles, low gene flow estimates and phylogenetic divergence concordant with geographic barriers are criteria for rejecting genetic exchangeability.
Crandall KA, Bininda-Emonds ORP, Mace GM, Wayne RK (2000) Considering evolutionary processes in conservation biology. Trends Ecol. Evol., 15, 290-295.
Moritz, C. (1994) Defining 'evolutionary significant units' for conservation. Trends Ecol. Evol. 9, 373-375.
Ryder, O.A. (1986) Species conservation and systematics: the dilemma of subspecies. Trends Ecol. Evol. 1, 9-10
Waples, R.S. (1991) Pacific salmon, Oncorhynchus spp., and the definition of 'species' under the endangered species act. Mar. Fish. Rev. 53, 11-22