Pop. Gen. II: Inbreeding & Population Subdivision

Generalized Hardy-Weinberg

The key assumption underlying the Hardy-Weinberg proportions we overed in the last lecture is that fertilizations involve randomly sampled haploid gametes. That is, mating is random with respect to the genotypes at the focal locus. But, what happens when there is some process1 that causes non-random mating?

1 Processes causing devation from H-W”:

  1. Assortative mating (e.g., positive by height)
  2. If parents are related (inbreeding)
  3. If population is subdivided (can only mate within subpopulation)

We can easily generalize H-W proportions by introducing a new term, \(F\), which represents the probability of homozygosity due to some biological or stochastic process causing non-random mating. The resulting expected genotypic frequencies are:

Table 1: Single diploid locus, \(\mbb{A}\)
\(A_1A_1\) \(A_1A_2\) \(A_2A_2\)
\(\text{Frequency}\) \(p^2(1 - F) + p F\) \(2pq(1 - F)\) \(q^2(1 - F) + qF\)

We will encounter several incarnations of \(F\), but the most common is known as “Wright’s inbreeding coefficient”, which we will explain below.

Wright’s Inbreeding Coefficient

To build a concept of “inbreeding”, we first need to establish the idea of ‘identity by descent’2. IBD offers a natural concept to use as a basis for quantifying relatedness between individuals. Essentially, the more related two individuals are, the more alleles that are IBD they ought to share. Sewall Wright developed a metric, \(f\) to quantify inbreeding as the total probability that a given individual inherits two alleles that are identical by descent, or IBD.

2 Identity by descent (IBD): Two alleles at the same locus that are descended from the same ancestral allele somewhere in their recent pasts are said to be identical by descent.

Consider the following family history, which shows a focal offspring (diamond) resulting from a full-sibling mating event, and goes back to the grand-parental generation. We don’t really care what the specific alleles are, so we arbitrarily assign two alleles each to the grandparents (\(ab\) and \(cd\)). Now, can quantify the overall probability that the focal individual inherits two alleles from their grandparents that are IBD as follows:

For the focal offspring, only four possible genotypes can results from inheriting two alleles that are IBD: \(aa\), \(bb\), \(cc\), or \(dd\). The key is to trace each of the possible ways that the focal offspring could inherit two copies of the same allele from their grandparents, and quantify the overall probability of this occurring. Let’s start with the \(a\) allele:

Figure 1: Pedigree illustrating a full-sib mating and inheritance of an IBD allele (indicated in red).
  • For the focal offspring to inherit two copies of the \(a\) allele, both of their parents must have carried it, which means they must have inherited it from the grandfather.
    • The probability of each parent inheriting the \(a\) is \(1/2\).
    • The probability of each parent passing it to the focal offspring is \(1/2\).
  • This means there were \(4\) events, each with a probability of \(1/2\) that needed to happen in order for the focal offspring to be \(aa\). We can write this probability as follows:

\[ \Pr(aa) = \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} = \left( \frac{1}{2}\right)^4 \]

  • Likewise, we can show that the same calculation holds for each of the three other possible genotypes resulting from inheriting two IBD alleles:

\[ \begin{aligned} \Pr(aa) &= \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} = \left( \frac{1}{2}\right)^4 \\ \Pr(bb) &= \left( \frac{1}{2}\right)^4 \\ \Pr(cc) &= \left( \frac{1}{2}\right)^4 \\ \Pr(dd) &= \left( \frac{1}{2}\right)^4 \\ \end{aligned} \]

  • Finally, we need to sum these four probabilities to give the total probability of being IBD:

\[ f_{\text{full-sib}} = 4 \times \left( \frac{1}{2}\right)^4 = \left( \frac{1}{2}\right)^2 = \frac{1}{4} \]

That is, the inbreeding coefficient for an individual resulting from a full-sib mating is \(f_{\text{full-sib}} = 1/4\).

The calculation of \(f\) we did in the above example was for a focal individual with a known pedigree. But we are far more often interested in quantifying the extent of inbreeding at the population level. To do this, we introduce the population-level inbreeding coefficient, \(F_I\), which is equal to the mean of the individual inbreeding coefficients for all individuals in the population, \(F_I = \overline{f}\), though we often just write \(F\).

Often, we need to make some simplifying assumptions about the rate of different forms of inbreeding in a populations in order to calculate \(F_I\). But the following example is instructive:

The Kel Kummer were a south Saharan tribe founded in the seventeenth century and by the 1970’s were composed of approximately \(300\) people. Among the Kel Kummer, strict tribal endogamy was maintained - marriage between a man and his mother’s brother’s daughter (i.e., with a maternal cousin) was regarded as obligatory. This resulted in the following characteristic ‘fishnet’ pedigree:

Figure 2: The Kel Kummer pedigree going back 6 generations (image credit unkn.).

After many generations of consistent \(1^{st}\) cousin marriages, the population-level inbreeding coefficient for the Kel Kummer was \(F_I \approx 0.10\).

A final example is helpful to put the effects of inbreeding at the population level into perspective.

Western cultures enforce laws prohibiting marriage between 1st cousins. What are the consequences of such inbred marriages on the overall inbreeding coefficient? Let the frequency of \(1^{st}\) cousin marriages be \(1\%\). For offspring of a \(1^{st}\) cousin marriage, \(f = 1/16\). What is the population-level \(F_I\)?

\[ F_I = \left( 0.01 \times \frac{1}{16} \right) + (0.99 \times 0.0) = 0.000625 \]

Hence, rare \(1^{st}\) cousin marriages have a negligible effect on population-level inbreeding.

Insight:

Notice that F is always positive, but bounded by \(F \in (0, 1)\). Have another look at our generalized H-W proportions:

\(A_1A_1\) \(A_1A_2\) \(A_2A_2\)
\(\text{Frequency}\) \(\underbrace{p^2(1 - F) + p F}_{\text{increases}\,~\uparrow}\) \(\underbrace{2pq(1 - F)}_{\text{decreases}\,~\downarrow}\) \(\underbrace{q^2(1 - F) + qF}_{\text{increases}\,~\uparrow}\)
  1. Inbreeding DECREASES frequency of heterozygotes
  2. Inbreeding INCREASES frequency of homozygotes
  3. Inbreeding DOES NOT ALTER ALLELE FREQUENCIES.

Estimating \(F\)

Now that we have both standard and generalized H-W proportions, it is natural to ask how we can estimate deviations from standard H-W. That is, how do we estimate \(F\) from genotype frequency data?

We can do this by comparing the observed heterozygosity under inbreeding with the expected heterozygosity under standard H-W proportions:

\[ \frac{\text{heterozygotes observed (w/ inbreeding)}}{\text{heterozygotes expected (H-W)}} = \frac{2pq(1 - F)}{2pq} = 1 - F. \]

Rearranging, we see that \(F\) can be estimated by:

\[ \hat{F} = 1 - \frac{\text{het}_{\text{obs.}}}{\text{het}_{\text{HW}}} \]

Self-fertilization

One extreme form of inbreeding that is especially relevant to natural populations is self-fertilization. For example, many hermaphroditic plant species are self-compatible, and can therefore fertilize their ovules with their own pollen. Many heramphrodite animals are also self compatible (e.g., Caenorhabditis elegans). Let’s walk through a simple model of selfing that offers additional insights into the population genetic consequences of inbreeding:

  1. Let the probability of self-fertilization and random outcrossing be \(S\) and \((1−S)\), respectively.
  2. As we saw before with full-sib and \(1^{st}\) cousin matings, selfing will alter \(F_I\). We can write the expected value of \(F_I\) after one generation of partial selfing as follows:

\[ F_I^{\prime} = S \left( F_I + \frac{1 - F_I}{2} \right) \]

Let’s walk through the logic:

  • An individual can inherit two I.B.D. alleles only if they are produced by an act of selfing.
  • Given that an individual is produced by selfing, it can have two IBD alleles for one of two reasons, either:
    • Its’ parent had two IBD alleles and self-fertilized to produce the focal offspring, OR…
    • Its’ parent did not have two IBD alleles, but the offspring was formed by two gametes carrying the same allele from the parent, which happens with frequency \((1−F_I)/2\).

We can write out the per-generation change in \(F_I\) as

\[ \Delta_S F_I = F_I^{\prime} - F_I \]

At equilibrium (i.e., setting \(\Delta_S F_I = 0\) and solving for F_I), we have3:

3 Derivation:

\[\begin{aligned} 0 &= F^{\prime}_I - F_I \\ 0 &= S\left(F_I + \frac{(1 - F_I)}{2} \right) - F_I \\ F_I &= S F_I + \frac{S - S F_I}{2} \\ 2 F_I &= 2 S F_I + S - S F_I \\ 2 F_I &= S F_I + S \\ 2 F_I - S F_I &= S \\ F_I (2 - S) &= S \\ F_I &= \frac{S}{2 - S} \end{aligned}\]

\[ F_I = \frac{S}{2 - S} \]

Insight: No allele frequencies needed!
  1. We are able to immediately calculate \(F_I\) from \(S\) without knowing any allele frequencies!
  2. Selfing does not change allele frequencies!!!
  3. Allele frequencies DO NOT change under ANY form of inbreeding!!!

Population subdivision

Many populations are geographically widespread and/or inhabit patchy habitats with effective barriers to migration such that they do not behave as a single panmictic4 population.

4 Panmictic: a single randomly mating population.

When a population is subdivided into several smaller sub-populations, genetic differentiation between subpopulations can arise leading to deviations from H-W for the population as a whole.

Can you think of some reasons why subpopulations might diverge from one another?

5 The \(S\) stands for ‘subpopulation’, the \(T\) stands for ‘total’.

But, how do we quantify a deviation from H-W caused by subpopulation differentiation? Here, we introduce another \(F\)-statistic called \(F_{ST}\)5. The approach to calculating \(F_{ST}\) is as follows:

  • Using genotype frequencies in each subpopulation, we can calculate the frequency of heterozygotes in the Total population.
  • This observed frequency of heterozygosity can be compared to the expected heterozygosity in a panmictic population of the same total size under H-W proportions.
Figure 3: How do we measure subpopulation differentiation?
Note that the \(c_i\) terms are weights. They are calculated as the proportion of the total population that the \(i^{th}\) patch represents. That is: \(c_i = n_i/N_{tot}\).
\(\text{Frequency}\) \(A_1A_1\) \(A_1A_2\) \(A_2A_2\)
In \(i^{th}\) patch \(p_i^2\) \(2 p_i q_i\) \(q_i^2\)
Avg. across sub. pops. \(\sum_i c_i p_i^2\) \(\sum_i c_i 2 p_i q_i\) \(\sum_i c_i q_i^2\)
Avg. across tot. pop. \(p^2(1 - F) + p F\) \(2pq(1 - F)\) \(q^2(1 - F) + q F\)

Notice that in the above table, we have two different expressions for heterozygote frequencies in the total population. By equating the two ways of writing heterozygote frequencies, we get:

\[ F_{ST} = \frac{2pq - \sum_i c_i 2p_i q_i}{2p q} \]

Insight: \(F_{ST}\) can be expressed simply using heterozygosities
  • We can estimate \(p\) and \(q\) using mean allele frequencies across the total population, \(\overline{p}\) and \(\overline{q}\).
  • \(H_T = 2 \overline{p} \overline{q}\) is the average heterozygosity in the total population.
  • \(\overline{H}_S = \sum_i c_i 2 p_i q_i\) is the average heterozygosity for subpopulations.

Substituting into the expression above, we get the very common expression of \(F_{ST}\), which is a simple expression of using these heterozygosities:

\[ \begin{aligned} F_{ST} &= \frac{\textcolor{DarkRed}{2pq} - \textcolor{RoyalBlue}{\sum_i c_i 2p_i q_i}}{\textcolor{DarkRed}{2pq}} \\ F_{ST} &= \frac{\textcolor{DarkRed}{H_T} - \textcolor{RoyalBlue}{\overline{H}_S}}{\textcolor{DarkRed}{H_T}} \end{aligned} \]

Or equivalently:

\[ F_{ST} = 1 - \frac{\textcolor{RoyalBlue}{\overline{H}_S}}{\textcolor{DarkRed}{H_T}} \]

Wahlund’s variance

We can also express \(F_{ST}\) in terms of the variance in allele frequencies. Substituting:

\[ \begin{aligned} 2 p q &= 1 - p^2 - q^2 \\ \sum_i c_i 2 p_i q_i &= 1 - \sum_i c_i (p_i^2 + q_i^2), \end{aligned} \]

we get:

\[ F_{ST} = \frac{\sum_i c_i (p_i^2 + q_i^2) - p_i^2 - q_i^2}{1 - p_i^2 - q_i^2}. \]

Note, however, that:

\[ \begin{aligned} \sum_i c_i p_i^2 - p^2 &= \Var (p_i) \\ \sum_i c_i q_i^2 - q^2 &= \Var (q_i) \end{aligned} \]

Thus, we can express \(F_{ST}\) in terms of the variance of allele frequencies:

\[ \begin{aligned} F_{ST} &= \frac{2 \Var (p_i)}{1 - p^2 - q^2} \\ &= \frac{2 \Var (p_i)}{2 p q} \\ &= \frac{\Var (p_i)}{p q} \\ &= \frac{\Var (p_i)}{\overline{p}(1 - \overline{p})} \end{aligned} \]

This is known as Wahlund’s variance6.

6 Sten Wahlund was a statistician at Uppsala studying the Saami people (the indigenous peoples of northern Sweden, Finland, Lapland). He is best know for identifying the “Wahlund effect” (population subdivision reduces heterozygosity below H-W expectations any time the allele frequencies differ among subpopulations). These are probably the only things of scientific interest to come from his research, which was most unfortunately focused on race biology and eugenics!

  • take-home message 1
  • take-home message 2
  • take-home message …