1. DNA variation in Ecology and Evolution I- Organization of the genome

2. Slide 2

Slide 2

Aim of the course

The concept of diversity is easy to understand. We all know that all biological forms present variation at many different levels of organization: there are different types of cells within one organism, individuals of the same species present different appearances (or phenotypes), and the diversity of a community or ecosystem is given by the number of species inhabiting it.

In the present work we will focus on the study of the principles and processes that mold variation at the species level. All observed external characters, or external appearances of organisms, are known as "phenotypes". These phenotypes are, in turn, governed by the interaction of the genetic material with the environment at a major or lesser extent. The genetic material of genetic composition is known as "genotype".

In the first part of this course, we will present the organization of the genetic information.

Then, we will present methodological approaches to study variation at the DNA level, and, finally,

the application of molecular markers in the study and inference of ecological, demographical and evolutionary processes.

3. Slide 3

Slide 3

The idea of the transmission of characters or traits from one generation to the next has always been intuitively present in our minds, way before the principles and material of inheritance were disclosed. This general and vague understanding of the first settlers allowed the domestication of plants and animals from the Neolithic times.




4. Slide 4

Slide 4

However, it wasn't till the middle of the 19th century that the rules of inheritance were discovered. Gregor Mendel, an Austrian monk and a teacher at a local high school, who studied botany and experimented with pea crosses. He selected 22 varieties of peas, experimented with crosses for 8 years and published his seminal work in 1866. However, his work remained "unattended" by the scientific community for the next 35 years, as major attention was drawn to a book published a few years before: The Origin of Species, by Charles Darwin.

5. Slide 5

Slide 5

Mendel experimented crossing pea plants with these characters: tall or short, purple or white flowers, smooth or wrinkled peas, round or shriveled peas, green or yellow peas. After crossing purple-flowered plants with white-flowered plants he cross-pollinated the offspring (F1) . Then, he discovered that three-quarters of the offspring (F2) were purple-flowered when they bloomed, and one-quarter white. Then, there were characters that were "masked" in the first generation and passed onto the 2nd generation, what he called "recessive", whereas he called "dominant" the characters that "overshadowed" them. The laws of inheritance of Mendel, can be summarized as:

Law of independent assortment: characters (factors) segregate independently. It means, the character "hight" ("tall" or "short") is independent from the character "colour" ("purple" or "white" flowers).

Law of independent segregation: characters occur in alternative forms (today we call them alleles). They occur in pairs within individuals, and they are inherited from each parent. These pairs separate (or segregate) during gametes production in the parents, and recombine later on in reproduction. Each parent contributes one allele to the offspring.

Law of dominance: for each character, one factor is dominant and another recessive and appears in a ratio of approximately 3:1. Combinations of alleles that include the dominant form will show only that one.

The representation of independent segregation and resulting crosses can be represented in a way called "Punnet square". Here, we represent all possible combination of alleles in the gametes of the parents, and all possible results of the combination between their gametes. The proportion of offspring with a given genotype, then, can be predicted in terms of probabilities. Individuals carrying two different variants of a character, 2 different alleles, are called Hetetozygotes, whereas individuals who present only one variant, duplicated, are called Homozygotes.. This is a fundamental concept we will resume later on. As a general rule, we denote dominant alleles in capital letters

Now, we will see how material of inheritance is organized, and how alleles are segregated in the gerlime.


6. Slide 6

Slide 6

Nowadays we know that the inheritance material is carried in DNA (deoxyribonucleic acid), molecules that are organized in discrete units called chromosomes. Chromosomes occur in pairs, one from each parent. The process of Meiosis is fundamental to the understanding of how characters are segregated. Every cell of the organism has 2 sets of each chromosome. However, to pass on the information to the next generation, the information has to be "halved", as the other half has to be provided by the other parent. This process of reduction of the genetic information during the formation of the gametes is called meiosis. In this process, one diploid cell gives origin to 4 haploid cells.

Prior to the meiotic division, during a period called "interphase", the chromosomes are composed of a centromere and one chromatide. The genetic information then gets duplicated, resulting in two chromatids attached by the centromere. Thus, at the end of the interphase, the cell contains 2N chromosomes with duplicated genetic information.

In the first stage of division, homologous chromosomes pair with each other and interchange genetic material. The name of this process is called "crossing over". In other words, chromatids of homologous chromosomes shuffle fragments to arrive to a new combination of maternal and paternal genetic information. This is the major advantage of "sex" in evolution, as it provides a mechanism for generating genetic variation. At the end of the 1st cell division, then, two cells are obtained, each containing only one set of chromosomes from all the initial pairs of homologous chromosomes, and each chromosome contains its information in a duplicated fashion. This first division reduces the number of chromosomes in the cells.

The 2nd cell division separates the sister chromatids of the chromosomes and results in 4 haploid cells, carrying only one set of chromosome. Ploidy is the number of chromosome sets present in the nucleus of the cells (see also Slide 32 of this presentation). In this case, the resulting cells are haploid. The meiosis process starts with a diploid cell.

If we consider the change in content of DNA along the cell division, and we call 2c the amount of DNA present in the interphase, we see a change to 4c after duplication of the chromatids. At the end of meiosis 1, the two resulting cells contain 2c DNA again. However, containing only one of each chromosome (but with two chromatids) unlike at the starting point.

The 2nd division halves the amount of genetic information present in these cells, therefore resulting in 4 cells with 1c content of genetic information.

Now we can see the link between this process and the Mendel laws. Sorting of homologous chromosomes into different resulting cells (or gametes) explains the law of segregation. We can also see that the "independent assortment" is related to characters that are coded for in different chromosomes. That is why peas with purple or white flowers can have peas that are either smooth or wrinkled.


Not used but important:


diploid- having two full sets of homologous chromosomes

haploid- having a single set of chromosomes

homologous- the same or corresponding in structure

7. Slide 7

Slide 7

The process of cellular division in all organisms also implies a process of cell division with transfer of the genetic information to the 2 resulting "daughter cells". This process of cell division followed by another cell division is called "cell cycle" and has common features to the process of meiosis. However, meiosis is a "dead end" and cells do not enter another cycle of division.

The process by which the chromosomes are copied into an exact copy of themselves and passed onto the resulting cells in the process of cell division is called mitosis. This process occurs in all but the germ (sexual) cells of our bodies. Similarly to meiosis, the DNA of each chromosome is duplicated. Here, the sister chromatids are sorted into the resulting cells, but DO NOT INTERCHANGE GENETIC MATERIAL, and the daughter cells contain exactly the same genetic information as the original cell that divided into 2. These daughter cells can now start a new cycle of division.

Now that we understand the principles of transmission of inheritance or "characters' we will see how the genetic information is organized: the concept of a genome and different types of "genomes" carried by eukaryotic organisms.


8. Slide 8

Slide 8

Organization of the genetic information

The complete genetic information possessed by an organism is called a GENOME. Eukaryotic organisms not only contain genetic information in the nucleus of their cells, where chromosomes are located, but also carry extra-nuclear DNA. This is contained in organoids like mithochondria and chloroplasts.


For genome definition

9. Slide 9

Slide 9

Nuclear genetic information

Nuclear DNA is organized in discrete units called chromosomes, which are visible during the cell cycle in metaphase when the chromatine contracts and gets condensed and packed in a scaffold of accompanying protein, the histones.

The genetic information is also compartmentalized in these different units. For example, in mammals and birds, a distinct pair of homologous chromosomes carries the genetic information for sexual determination. These are called sexual chromosomes. The rest of the chromosomes are called "somatic". In mammals, chromosome X carries the information to determine female, whereas the Y-chromosome determines that the individuals that carries it is a male. Mammal females carry 2 X chromosomes and males one copy of a Y and one copy of an X chromosome. Mammal males are called the "heterogametic sex" because the 2 sexual chromosomes are different, whereas females are the "homogametic sex". In contrast to mammals, the heterogametic sex in birds are females.

In this slide, we see the chromosomes of the human species in metaphase mitosis, when the chromosomes are composed by two sister chromatids, in other words, with duplicated genetic information. This "set of photographed, banded chromosomes arranged in order from largest to smallest" is called Karyotype. The human nuclear genome, then , is composed by a set of 22 paired somatic chromosomes plus a pair of sexual chromosomes; XY in the case of males, and XX in the case of females.

The internal structure of this metaphasic chromosomes as seen in the microscope is a condensed, supercoiled fiber composed by a molecule of DNA and several accompanying proteins. The DNA molecule is packed around 8 molecules of basic proteins, the histones H1 H2a H2b and H3. This organization is called chromatin, and shows different levels of condensation along the cell cycle. The highest level of condensation is achieved right before cell division.

Following, we are going to see how the molecular structure of DNA.

karyotype :

Other important links

10. Slide 10

Slide 10

Molecular structure of DNA


DNA is a polymeric molecule composed by a string of its component units, the nucleotides. The DNA is also known as the "double helix", as two opposing strings of nucleotides twist in a clock-wise (right handed) manner. Each turn of the helix contains 10 pairs of nucleotides. Each block, unit, or nucleotide is composed by a nitrogenous base, deoxyribose, a sugar with 5 carbons, (in orange) and a phosphate (in light purple), covalently attached. The carbon atoms are numbered from 1' to 5', and the orientation of the DNA strand is given by these sugar carbon. We see in the picture that the 2 strands are organized in pairs of nucleotides and that they display opposite "direction" from the 5' to 3' end. This order is important for the process of DNA information as we will see later on.

Nitrogenous bases are cyclic compounds that have carbon and nitrogen atoms in their cycles. Nitrogenous bases with 2 cycles are called purines, and bases with only 1 cycle are pyrimidines. Purines are Adenine and Guanine, and Cytosine and Thymine are Pyrimidines and are denoted with the capital letter A, G, C and T.

In the double helix, the two chains or strands are in opposite direction and have nucleotides that match in an ordered fashion: A with T; and C with G. The nature of this bonding between these nucleotides is disrupted with temperature.

The order of the nitrogenous bases in the DNA molecule carries the genetic information of the make-up of all organisms. The organization and types of DNA sequences that are the main components of the genome follows next.


11. Slide 11

Slide 11

Watson and Crick

James Watson and Francis Crick were working at Cambridge University and were 24 and 36 years old respectively when they discovered the structure of DNA in 1953. They were working along with a team from the King's College in London, Maurice Wilkins and Rosalind Franklin, who obtained crystallographic images of DNA. Watson and Crick imagined building blocks along a twisted ladder from crystallographic images obtained by Rosalind Franklin. They won the Nobel price in 1962; Wilkins shared the Nobel price with Watson and Crick. Rosalind Franklin died of ovarian cancer in 1958. Nobel prizes are not awarded posthumously.

12. Slide 12

Slide 12

Nuclear DNA: coding and non-coding sequences

Nuclear DNA contains coding and non-coding stretches. The coding stretches are composed by GENES, basic units of hereditary material that carry information to give origin to a product. There are other stretches of DNA along the genome that do not code for known products, or are just intervening sequences between coding stretches. Generically speaking, these sequences were originally called "junk DNA". The nomination of this DNA as "junk" is somewhat unfair, as some of these sequences have a regulatory function that pick up signals to switch on/off certain genes. Despite their debated functionality, they can provide invaluable information about evolutionary processes.

Link = gene definition

13. Slide 13

Slide 13

Genes: the coding DNA

The products of genes are RNA, another nucleic acid that, by contrast to DNA, it is composed by a single strand, and Thymine is replaced by another Pyrimidine: Uracyl. There are three types of RNA: messenger RNA (mRNA), transport RNA (tRNA) and ribosomal RNA (rRNA).

Messenger RNA is an intermediate between the DNA contained in the nucleus and the machinery to construct proteins, situated in the cytoplasm of the cells. Therefore, the function of this mRNA is to be a "messenger", carrying the information to construct proteins to the "factory" in the cytoplasm. Messenger RNA is copied from DNA in a process called transcription. Once in the cytoplasm, the ribosomes, of ribosomal RNA, "read" the genetic information and construct proteins with the help of transfer RNA. Transfer RNA's function is to bring and place the blocks or units that compose proteins in position. The final product of this process is a protein, a polymeric molecule composed by a chain of units called AMINOACIDS. There are 20 aminoacids, and each one is recognized by a specific transfer RNA.

The sizes of the different types of RNA products vary: whereas a protein-coding gene can generate mRNA of several hundreds or thousands of nitrogenous bases (we will call them just "bases" from now on), tRNA are only 70-90 bases long.

This chain of events constitutes one of the fundamental dogmas of molecular biology: genetic information is copied from DNA to RNA in the nucleus, and this information is translated into proteins in the cytoplasm.


14. Slide 14

Slide 14

Organization of tRNA and rRNA genes

In the human genome, there are approximately 500 genes coding for cytoplasmic tRNA, which are locate in all chromosomes except Y and 22.

The ribosomes are composed of RNA: a large is formed by the 28S, 5.8S and 5S coding regions, whereas the small subunit is coded by the 18S gene.

The organization of the ribosomal genes consists of two types of clusters of repeats of 100s -1000s units composed of alternating stretches of transcribed and non-transcribed DNA. One cluster codes for 5S RNA genes . The second cluster codes for the other 3 ribosomal genes: 18S, 5.8S and 28S. They appear separated by transcribed stretches: ITS1 and ITS 2, that are excised post-transcriptionally.

The coding regions for ribosomal RNA are highly conserved across species along the evolutionary range, for which they are frequently applied in the resolution of deep phylogenies. The ITS (Internal Transcribed Spacers) are less constricted than the surrounding coding regions and they are prone to gain mutations and change more rapidly than the ribosomal coding DNA. For this reason, they are frequently used to resolve shallow phylogenies, above and below the species level.

The chromosomal location of genes coding for ribosomal RNA are called NOR or Nucleolus Organizing Region. In cytological preparations with silver, these regions of the chromosome are intensely stained due to their high transcriptional activity. Cytogenetic evolutionary studies in the '60s and '70s used these regions to determine homology between chromosomes of closely related species, as their position in the chromosomes tend to be conserved across species.


15. Slide 15

Slide 15

Genes: Organization of single copy DNA

The regions coding for these three types of RNA are also in different in internal organization and copy number in eukaryotes.

Single copy DNA is coding for proteins. Upstream of the DNA sequence that will be transcribed into messenger RNA, there are a group of regulatory sequences for the transcription that remain untranscribed. The internal structure of the transcribed region is blocks of coding DNA (exons) interspersed with blocks of non-coding DNA (introns). These are transcribed as a single unit into mRNA and processed within the nucleus prior to the "exportation" to the cytoplasm. The introns are excised; other post-transcriptional processes include, for instance, the addition of a G in the 5' end, and a poly-adenine tail that will be used as a signal for transportation to the cytoplasm.

16. Slide 16

Slide 16

Proteins and Gene families

Genes coding for proteins are usually called "structural genes". The human genome contains approximately 25 000 genes, a very small number in comparison to the size of the genome: 3 000 million base pairs. Other genes code for other products of regulatory function that we will not discuss in the present course.

During the course of evolution, genes sometimes become duplicated and tend to appear in tandem. The processes involved in the origin of these gene families range from unequal crossing over that lead to gene duplication to structural rearrangement of chromosome segments. The results are clusters of related genes that keep the same or similar functions, or the functions may even diverge.

17. Slide 17

Slide 17

Gene families: concepts

As implied in the previous slide, a gene family is a set of genes related by homology.

Homology is a relationship of identity by descent: in other words, two genes share a common ancestor. This concept applies to genes in different species (case b in the slide) or even gene duplication in the same genome (case a in the slide). An example of this case is given by the a-globin genes in human chromosome 16.

A special case of homologs genes is orthologs: through speciation process, genes accumulate differences but retain the same function. This concept is crucial to predict gene function in newly sequenced genomes.

Paralogs are genes that were duplicated in the same genome, but didn't retain the same ancestral function along the course of evolution.

Examples of these cases: ribosomal 18S gene, commonly used to resolve deep phylogenies due to its slow evolutionary (mutation) rate. An example of paralogy in evolution is the origin duplication and divergence in function between prolactin and somatotrophin genes during vertebrate evolution.

See example in Slide 9, Class III

18. Slide 18

Slide 18

Non-coding DNA: Satellite DNA

Satellite DNA is also know as "highly repetitive DNA" or 'junk DNA". It is composed of tandem repeats of the same sequence motif, a stretch of DNA of a few hundreds to thousands of base pairs. They are usually located close to the centromers and the telomers (the ends of chromosomes) in the chromosomes. The repeat motif is usually conserved and DNA sequence similarity in the repeat unit is also generally conserved among closely related species. In the '80s the study of satellite DNA was very popular in evolutionary studies above the species level. The function of these repeated stretches is unknown, but they may play a functional role in these regions by representing binding sites for proteins. ( . Initially, some function related to aging was attributed to the telomeric repetitive DNA, which is related to their role of protecting the vulnerable end of the chromosomes.

In the photo, we see metaphasic chromosomes of cattle that were probed with repetitive DNA with a technique call FISH (fluorescent in situ hybridization). The two colours are representing different types of highly repetitive DNA that were detected with specific probes.

The most extended molecular techniques that were applied in the study of satellite DNA in comparative studies was the fragmentation of the DNA with restriction enzymes, followed by separation of the fragments in agarose gels, Southern blotting and probing with labeled known fragments of DNA. (see Chapter II, slide 4 and 5)



Prashad N, Cutler RG., (1076) Biochim Biophys Acta. 418(1):1-23. Percent satellite DNA as a function of tissue and age of mice.)

Stephen Neidle, Gary N Parkinson (2203) The structure of telomeric DNA Current Opinion in Structural Biology, 13, (3), 275-283.

19. Slide 19

Slide 19

Minisatellites and the origin of DNA fingerprinting

Minisatellites, other tandem arrangements of repeats, were discovered in 1980. The unit of repeat, in this case, is smaller than that one of satellite DNA, ranging from approximately seven to a hundred of base pairs. Minisatellites are not located in the telomeric or centromeric region, but interspersed among genes. In humans, they are mostly found in the subtelomeric regions and the most common unit of repeats is TTAGGG.

In some cases minisatellites have been associated with regulatory functions of gene expression or with the origin of certain diseases of genetic origin, like fragile X in humans (

Minisatellites are characterized by high levels of polymorphism: in other words, a very high number of alleles are usually found within species, and the allelic number is given by the NUMBER OF REPEAT UNITS. These loci are also called VNRT or Variable Number of Tandem Repeats.

The origin of the elevated number of alleles can be attributed to errors during DNA duplication processes and unequal crossing over during meiosis.

The high level of variation in minisatellite loci made them very attractive in forensic cases and identification of individuals for pedigree reconstruction during the 80's. The utilization of different loci in the identification of individuals were extremely accurate, as the probabilities of having a second profile "by chance" could be as low as 1 in 20 billion, depending on the number of loci utilized, the number of alleles per loci and their relative frequency in the population. Then a new term was coined by Alec Jeffreys for these techniques: DNA Fingerprinting. He was the first scientist to apply these loci in a forensic case, a paternity dispute among foreign immigrants in England. Later on, however other loci with even higher allelic variation replaced the minisatellites in individual identification, ecological studies and micro-evolutionary processes: the microsatellites.

The techniques to study variation at the minisatellite loci were very similar to those applied to the study of satellites: DNA was fragmented with restriction enzymes, electrophoresed, transferred to a membrane (process known as Southern Blot) and probed with a labeled fragment of known DNA.

See examples in Slide 6, Class II

20. Slide 20

Slide 20

Non-coding DNA: microsatellites

Microsatellites replaced the utilization of minisatellites from the '90s. Similarly to the previously described type of loci, microsatellites also consist of a motif repeated in tandem, and the number or repeats of the motif also give the allelic variation. The repeat motifs vary between 2-6 base pairs, and so are also called STRs, Short Tandem Repeats. The allelic variation is even higher than for minisatellites, implying a higher mutation rate. The position of these motifs in the genome differs from the last 2 markers: microsatellites can be found interspersed among genes, and even within genes in intronic (non-coding) regions.

The application of microsatellites is wider then minisatellites as, besides accurate individual identification and inference of evolutionary processes they are utilized in genetic mapping, the inference of the relative position of loci along a chromosome.

The generation of microsatellite profiles is simpler than minisatellites as it only implies PCR reactions with primers matching the flanking regions, and the resolution of the PCR products by electrophoresis. Nowadays modern electrophoresis systems do not use agarose gels but capillary electrophoresis and automation.

See examples in Slides 10 and 11, Class II; Slides 11 and 12, Class III

21. Mobile elements. The origin of interspersed repetitive DNA

Slide 21

Mobile elements; jumping genes.

Mobile elements, transposons or "jumping genes" are fragments of DNA that can move around to different positions in the genome of a single cell. In the process, they may cause mutations, or increase (or decrease) the amount of DNA in the genome.(

Jumping elements were discovered by Barbara Mc Clintock in maize, in 1948, who noticed deletions, insertions and translocations. She won the Nobel Price for her discoveries 35 years later.

22. Slide 22

Slide 22

Transposons II

Nowadays we understand the process of self-propagation of these elements. There are two basic types: retrotranposons and transposons. Their molecular structure is an internal coding region flanked by terminal repeats (e.g. LTR or Long terminal repeats)

Transposons propagate themselves in a "cut and paste" fashion. They code for an enzyme than excises these sequences from the site, and then they integrate into the genome somewhere else.

Retrotransposons are elements that copy themselves into RNA, which, in turn, it is copied into DNA and inserted in other regions of the genome in a "copy and paste" fashion. The process involves copying RNA into DNA with an enzyme coded by the retrotransposon itself. This is a reverse process from the known as "central dogma of molecular biology".

The process of copying RNA into DNA is shared by retroviruses like HIV, HTLV, or T-cell leukaemia virus. The evolutionary origin of these elements is though to have arisen from viral infections.

About 40 % of the human genome and 50% of the maize genome consist of retrotransposons.

Examples of elements originated by retrotransposition are:

LINES or Long Interspersed Sequence Repeats. They consist of repeats of a few hundreds to 9000 base pairs and there are 850 000 in the human genome.

SINES or Short Interspersed Sequence Repeats: consists of repeats of a few hundreds of base pairs. The most important SINES in the human genome are the Alu elements, (so called after the restriction enzyme that allowed their detection). They consist of 300 bp repeats and make up to 11% of our genome. Many of the Alu elements occur within introns or structural genes.

The LINES and SINES sequences show similarity in closely related species and they were used in evolutionary studies in the '80s following a methodological procedure similar to that applied to satellite DNA.


23. Slide 23

Slide 23

The Other genome: mitochondrial DNA

Mitochondrial DNA is a covalently closed circle of DNA, which varies in size between approximately 12-20 Kb in animals. This DNA is located in the mitochondria, and codes 37 genes. Thirteen of them are proteins (enzymes) involved in the respiratory cycle to produce energy in the form of ATP (adenosinetriphosphate). The mtDNA also codes 22tRNA and 2 ribosomal RNA that will translate the mitochondria-encoded proteins. The mtDNA also contains a region that controls transcription, which usually displays large amount of variation among individuals within species. This is called the "control region", D-loop or hyper-variable region.

By contrast to the nuclear DNA genes, the mitochondrial genome is very compact, lacks of repetitive elements and genes are not structured in coding regions and intervening non-coding regions. The lack of introns and the circular structure of the mitochondrial genome resembles the prokaryotic genomic DNA. For this reason, the evolutionary origin of eukaryotic cells are likely endosymbiotic: one cell was absorbed into another without being digested.

After fertilization, mtDNA contained in sperm is destroyed; therefore this type of DNA is only maternally inherited. There are exceptional cases in the animal kingdom where the mtDNA is frequently transmitted by the male parent, for instance in mussels. Besides, the mitochondrial DNA is not subjected to recombination. The inheritance fashion of mtDNA allows the reconstruction of maternal lineages and trace them back in time. Similarly, the utilization of Y- chromosome information allows the reconstruction of paternal lineages. The allelic variants for the uni-parentally inherited DNA present in a population or species are called HAPLOTYPES.

Some mt genes, e.g. the coding for ribosomal 16S RNA and COI, are frequently utilized in phylogenetic reconstruction. Other genes display higher mutation rates, which, combined with the small affective population size for mtDNA, results in genetic drift (see Slide 20, Class II) that is more pronounced than that for nuclear genes. For this reason, mtDNA is especially attractive in the study of microevolutionary processes, and population genetics. Of particular interest is its application to phylogeography: the study of the processes controlling the geographic distributions of lineages by constructing the genealogies of populations and genes. See examples of lineages in Slide 26, Class II; phylogeographic reconstruction in Slides 1-3, Class III.

image: human mtDNA

24. Slide 24

Slide 24

The genetic code

Previously we saw that the information contained in the coding regions of DNA, either nuclear or mitochondrial, is transported by the processed messenger RNA to the cytoplasm where it will be translated into protein. How is this information coded?

The mRNA is "read" one codon (3 base pairs) at a time by the ribosome from its 5' end to its 3' end. The tRNA carrying aminoacids connects to the corresponding codon.

These two RNAs match by their complementary triplets, the codon in the mRNA and the anti-codon in the tRNA by non-covalent Hydrogen bonds. After one "block" or unit of information (codon) is read, the translation machinery moves one more space and reads the following codon, and adds the corresponding amino acid to the growing protein chain. Remember that the mRNA is copied from the antisense strand of DNA. Therefore, the codons are read along the mRNA in the same order as they are present in the DNA from the 5' to the 3' end.

When we calculate all the possible combination of 4 elements taken by 3, we obtain a total of 64 different possibilities. However, there are only 20 amino acids to code for. This results in amino acids being coded by more than one possible combination, and for this reason, the genetic code is called REDUNDANT.

Besides, some codons give START or STOP signals to the translation process. For instance, AUG is the START codon and is going to define the reading frame of a string of nucleotides.

The table in the figure shows the code for nuclear vertebrate DNA. We can see that some amino acids are coded by 4 or by two codons. A codon is said to be "four-fold degenerate" if any nucleotide at the 3rd position determines the same amino acid. Similarly, a "two-fold degenerate" codon, determines the same amino acid when the 3rd position is occupied by any purine or any pyrimidine. Also, the genetic code is tolerant to mutations that do not affect the amino acid property. For instance, NUN (N stands for any nucleotide) tends to code for hydrophobic amino acids. 0-fold degenerate sites are those that determine an amino acid change.

The redundancy makes the genetic code more tolerant to fault mutations at the third position. Changes in the coding region that do not affect the result in the protein are called SILENT MUTATIONS and constituted the foundation of the Neutral Theory of Evolution.

A few differences exist in the code between eukaryotes and prokaryotes, and consequently, the mtDNA code is more closely related to that of prokaryotes.

genetic code

25. Slide 25

Slide 25

Examples genetic code

This is an example of the alignment of a short fragment of DNA coding for the mitochondrial gene COI (cytochrome oxidase I) in several krill species of the genus Euphausia.

Highlighted in yellow are the codons that present silent mutations; none of the observed differences across species determine amino acid change. The non-synonymous substitutions are shown in red.

In the second aminoacid position of the alignment we see an example of amino acid replacement. The mutation that gave origin to the amino acid change from Glycine to Cysteine in Euphausia gibboides is in the 1st position of the codon.

Alanine occupies the 3rd position in the amino acid chain. This is a case of 4-fold degeneracy of the code, we can see the presence of either A, T, C or G in the third position of the corresponding codon in different krill species.

At the end of the alignment we can see several amino acid replacements. Hystidine is replaced by Glutamine, two charged amino acids. The codons for these two aminoacids are 2-fold degenerate. We can see that a transversion at the 3rd position changing A to G between Euphausia tenera and Euphausia tricantha codes for Glutamine.

26. The origin of genetic variation: MUTATIONS

Slide 26


The concept of mutation can be simply expressed as a change in the heritable material. The persistence of these new variants in the population or species will be ultimately dependant on other factors like their adaptive values, and if non-adaptive, would be dependant on external factors like selection, genetic drift and other stochastic factors.

The most important changes of evolutionary significance are point mutations, duplications, chromosomal rearrangements and polyploidization.

27. Slide 27

Slide 27

Mutations within loci

The variation within loci gives origin to new alleles. Although this type of mutation is more commonly known as "point mutation", there are other mechanisms that also generate new alleles: insertions and deletions.

We previously saw point mutations in protein coding regions. Changes in a single nucleotide are called synonymous and non-synonymous substitutions when they are silent or determine amino acid change respectively. Insertions and deletions in coding regions also occur, but unless the change involves 3 nucleotides, that change in the reading frame would determine this allelic variant being deleterious (harmful) for the organism.

Deletions and insertions are the mutations that generate allelic variation in microsatellites and minisatelllite loci.

Changes in RNA genes of can cause modification in their secondary structure.

28. Slide 28

Slide 28

Point mutations

We briefly explained before the constitution of the DNA and the organization of the complementary bases. Mutations do not occur entirely at random. Some changes are more frequent than other, for instance changes to another nucleotide of the same type, purine to purine or pyrimidine to pyrimidine are more frequent than changes that incur in a change of nucleotide type (purine to pyrimidine of vice versa). The first changes are called transitions whereas the second type of change are called transversions.

29. Slide 29

Slide 29

Insertions - deletions

We previously saw a few examples of point mutations in protein-coding regions. In this slide, the effect of insertions and deletions in a protein coding region is shown:

The example shows an insertion that results in a shift of the reading frame in the corresponding protein. In this case, not only new amino acids are incorporated into the protein, but also the translation process will terminate at an earlier point. Changes of this type can result in a product that is non functional and therefore harmful to the organism.

30. Slide 30

Slide 30

Changes in ITS1

This is an example of an insertion (or deletion) in the ITS-1 region of two individuals of the krill species Euphausia recurva. We see that on single insertion (or deletion) of Adenine favors or prevents, respectively, the formation of a 2-base-long small stem. The absence of this A therefore determines the formation of a bigger loop.

The function of this secondary structure is unknown, and we cannot ascertain if it confers any adaptive value to the organism. Since this polymorphism is frequent in this species, we assume it doesn't.

WE have already seen examples of gene duplication in Slide 17. We will see other major changes in the genome at the chromosomal level.

31. Slide 31

Slide 31

There are changes that also affect the structural organization of the genome at the chromosomal level. Closely related species sometimes show a different number of chromosomes, and a few chromosomes that differ in size and shape. This is the result of major processes of chromosomal rearrangements that produce breakpoints, fusion, inversion of fragments within the same chromosome, and translocation of fragments between non-homologous chromosomes.

Animations (not included):

32. Changes at the ploidy level

Slide 32


Ploidy is the number of single sets of chromosomes in a cell or organisms. For instance, we are diploid organisms as we carry 2 sets of homologous chromosomes. Our gametes, however, are haploid cells as they carry only 1 set of chromosomes. Cells within the same organisms can have different ploidy: our liver cells are octaploid.

Cells or organisms that carry more than 2 sets of chromosomes are known as polyploids (triploids when they have 3 sets, tetraploid when they have 4 sets).

The evolutionary changes in the number of chromosome sets is very frequent among plants, that can give origin to different species after cross- species fertilization by simply no disjunction of chromosomes in the first meiotic division. As a result, the second meiotic division separates the sister chromatids of all existing chromosomes. Therefore, the new species will automatically carry all chromosomes, all genetic information from parental species.

Examples of tetraploids are Pelargonium, maize, cotton, cabbage, and leek.

Examples of hexaploids: wheat, oat.

Examples of octaploids: strawberry, sugar cane.

33. Slide 33

Slide 33

Mutation rates

Mutation rate is the number of mutation events per unit of time. The units of time are measured differently depending on the objective of the study: we can consider generations, number of cell divisions, etc. In the initial studies of mutants the rate used to be expressed in number of mutation per million gametes, and it was obtained from counting mutant individuals resulting from controlled crosses.

The rate at which new variants arise in a genome depends on the type of DNA and genome. This rate varies greatly across species, among genes within the same species and between the nuclear and the mitochondrial genome.

In humans, for instance, the incidence -or frequency per gametes- of mutations causing achondroplasia (dwarfism) is 4-12 x 10-5, and hemophilia A 2-4 x 10-5 .

In an evolutionary scenario we are interested in expresing the mutation rates in other units, like mutations per generation

For coding regions, on average the rates per generation are approximately ~10-8 - 10-9 per base pair (or site); ~10-6 - 10-5 per gene and per genome ~0.02 - 1

Microsatellites show a mutation rate that is 10 to 100 times faster.

Mitochondrial DNA control region in humans is approximately 0.011.

Form the figures in this slide we can see that the different mutation rates in different regions of the genome would allow to infer evolutionary processes at different level of organization. For instance, mitochondrial control region and microsatellites are very effective in the inference of historical and demographical scenarios, whereas the more conserved nuclear protein-coding genes would be uninformative at the population level, and are used, instead, in the inference of phylogenetic relationship between species.

mt mutation rate


Am J Hum Genet. 2000 May;66(5):1599-609. Epub 2000 Apr 7.

34. Slide 34

Slide 34

Molecular clocks

Tightly linked to the concept of substitution or mutation rate is the idea of "molecular clocks". This concept implies that the rate at which mutations occur and are kept constant along the evolutionary time, in other word, the mutations "click" regularly spaced in time. Although this is not absolutely true, as the mutation rates vary along and between lineages, it can give an approximate idea of speciation times.

For instance, it is widely accepted the the general rate for mtDNA genes is a change of 2% per million years.