Pop. Gen. VI: Signatures of Selection

The question

The advent of whole genome sequencing technology has turned the field of genetics on its’ head. All of the population genetics theory we have covered so far in this course was developed long before there was such a thing as “DNA sequence data”. In fact, most key developments in population genetics took place long before we even know that DNA was the inherited molecule carrying “Mendelian factors” or “genes”1. Moreover, it wasn’t until the advent of PCR amplification that we started being able to study more than a handful of genes at a time. For most of the history of population genetics, we relied heavily on theory because we could not study DNA directly!

1 R.A. Fisher published his seminal paper unifying Mendelian inheritance and the evolution of quantitative traits in 1922! Most of the seminal papers by Sewall Wright, J.B.S. Haldane, and R.A. Fishewere published in the late 1920’s and 1930’s. Yet, the helical structure of DNA was deciphered by the joint efforts of Rosalind Franklin (1920-1958), James Watson, Francis Crick (1916-2004), and Maurice Wilkins (1916-2004)!

Fast-forward to the present time, where whole genome resequencing costs around $100 per sample (!) We are literally drowning in genomic data, and the current challenge facing geneticists is how to leverage the deep body of existing theory to more effectively study the vast amounts of genomic data we are now able to generate.

This leads us to one of the central questions that modern population geneticists think about:

How do we detect signals of past or contemporary selection from DNA sequence data?

put another way, we ask:

What are the genomic consequences of selection at linked sites?

The simplest case

Consider the simplest case of positive selection for a beneficial mutation at a single locus. What happens? how can we detect evidence of this selection? If we could look back in time at the frequency trajectory of the new beneficial mutation, it might look like this

But this would be a very exception data set indeed if we were studying a natural population. Usually, we don’t have the luxury of time-series data like this2. So, how else might we detect a signal of selection? If we can compare contemporary DNA sequence data with a historical sample we might be able to detect the change in frequency. Alternatively, if we compare with the DNA sequence of a close relative of our species/population, we might be able to detect this selection event as a Substitution (non-synonymous). But again, we need historical data or a reference population.

2 Unless, for example, one is doing experimental evolution in the lab, as is often done with model organisms like D. melanogaster or C. elegans.

But what if we want to study contemporary populations? Or do not have access to historical data?

  • If we don’t have comparative data, we can’t find substitutions.
  • We could go out and measure fitness directly and try to correlate with sequence variants, but this is DIFFICULT.

So what can we do?

Insight!

Think about LD! What does selection do to neutral variation at linked sites? We can leverage the fact that selection generates statistical correlations between allelic states at selected sites and nearby neutral sites! This allows us to leverage ‘peaks’ of LD as signals of past or ongoing selection!

Under construction!

Again, the remaining material is currently presented in a .ppt lecture. I hope to update this webpage with a series of animations and figures covering signatures of selection for four different selection scenarios:

Please bear with me.

Footprints of different forms of selection

  • Positive
  • Purifying
  • Balancing
  • Diversifying

Different scales of study.

Different signatures are useful at different scales of study.

  • Within population (\(\pi\), \(r^2\))
  • Between population (\(F_{ST}\))
  • Between species (\(dN/dS\))

Whole-genome patterns

  • Recurrent sweeps.
  • Background selection.

Pop. Gen. Wrap-up

This is the end of the Population Genetics Lectures for this course. We finish up with two in-class exercises:

  1. Assignment 1 on Linkage Disequilibrium.
  2. Computer exercise on population genetics.

Next week, we shift from one locus to many and begin our exploration of quantitative genetics.