# Exercise 2.1

## Molecular Clocks

### INTRODUCTION

During the 1960s, Emil Zuckerkandl and Linus Pauling hypothesized that comparisons of genetic sequences from different organisms could be used to date their divergence. This idea, known as the molecular clock hypothesis, has been used countless times to provide dates for the splitting of lineages. The molecular clock has even been used to date disease outbreaks, such as those caused by HIV (human immunodeficiency virus) and the H1N1 (“swine” flu) virus.

To say that a DNA sequence behaves like a molecular clock does not mean that the sequence changes will keep time like a standard clock; the changes are not that regular. Instead, the changes (substitutions) appear more or less randomly. The outcome is that while DNA sequence changes can provide information that helps us estimate the date of divergences, there will be variance associated with those estimates.

Here you will simulate the evolution of DNA sequences. Unlike in nearly all cases of real evolution, you will know the ancestral sequence. This ancestral DNA sequence is shown at the bottom of the simulation window. The DNA sequence is 100 nucleotides long (only one strand is shown); it is displayed in 4 rows of 25 nucleotides each. Above and to the left of the ancestor is one descendant sequence; above and to the right of the ancestor is the other.

### QUESTIONS

Set time to 10 (for 10 million years) and run the simulation.

Notice that nucleotides in each of the diverging lineages will randomly change due to allelic substitutions over the course of the 10 million years. Nucleotides that have not changed are displayed in black; those that have changed once are in purple; those that have changed twice are in light blue, and so forth according to the key provided.

Question 1. How many nucleotides changed during the evolution of the descendant lineage on the left? If a nucleotide changed more than once, count that as one nucleotide change.

Question 2. How many nucleotides changed during the evolution of the descendant lineage on the right?

Question 3. How many nucleotides are different between the descendant on the left and the descendant on the right? How does this answer compare with the sum of the changes in Questions 1 and 2?

Question 4. What circumstance would lead the number of changes in Question 3 to be less than the sum of changes in Questions 1 and 2?

Repeat the simulation 9 more times (for a total of 10 replicates). For each replicate, count and record the number of substitutions accumulated in each lineage. Calculate the mean and variance of the number of substitutions in each lineage over the 10 replicates. (Page A-3 of the textbook appendix provides the formula [Equation A.1] for calculating variances.)

Question 5. What were the mean and the variance for the number of substitutions for the left lineage? For the right lineage? Were the means of the left and right lineage approximately equal?

Set time to 5 (for 5 million years) and run the simulation ten times. For each replicate, record the number of substitutions. Calculate the mean and the variance.

Question 6. What are your results? How do they compare with the results obtained in Question 5?

Set time to 15 (for 15 million years) and run the simulation ten times. For each replicate, record the number of substitutions. Calculate the mean and the variance.

Question 7. What are your results? How do they compare with the results obtained in Question 5?

Graph the mean number of substitutions (use separate data points for the left and right lineage) against time (in million of years).

Question 8. What does your graph look like? Based on your graph, provide the average substitution rate per nucleotide per million years. Remember there are 100 nucleotides in the sample.

Question 9. Based on your estimate of the substitution rate, how many substitutions would you predict will accumulate between the ancestral and each descendant lineage in 50 million years?

Test your prediction by running 10 replicates with the time set at 50 million years. For each replicate, record the number of nucleotide changes that occurred along the left and the right lineages. Compute the mean for each.

Question 10. What are the means for the number of nucleotides that changed along the left and the right lineages? How do these figures compare with your prediction? Provide a reason why the observed figures do not match with the prediction.