The package GenomeAdmixR was designed to simulate the generation of iso-female lines, and simulate genomic changes during, and after, formation of iso-female lines. Furthermore, it allows for simulation of effects after crossing individuals from iso-female lines.
library(GenomeAdmixR)
library(ggplot2)
packageVersion("GenomeAdmixR")
## [1] '2.1.7'
In order to create an iso-female line, we first have to create a ‘wild’ population, from which we will draw individuals. To create such a population, we make use of the function ‘simulate_admixture’, using the ancestry module:
<- simulate_admixture(
wildpop module = ancestry_module(number_of_founders = 4,
morgan = 1),
pop_size = 100,
total_runtime = 1000)
This creates a population of 100 diploid individuals (e.g. 2N = 200), based upon 10 founders (e.g. 10 individuals that can be genetically distinguished from each other). Then, for 1000 generations, this population is inbred and genomes of the 20 founders are allowed to mix. All individuals have 2 chromosomes (for simplicity and computational speed, we only model one chromosome pair), which are 1 Morgan long. The result can be written to file (optional), which can be usefull when generating for instance an extremely large wild population, which can be used later (e.g. read from file).
The result of the function is a structure containing 1000 individuals:
wildpop
## $population
## [1] "Population with 100 individuals"
##
## attr(,"class")
## [1] "genomadmixr_simulation"
Each individual has two chromosomes, with a given number of junctions (e.g. delineations between two contiguous genomic stretches from different ancestors):
1]] wildpop[[
## [1] "Population with 100 individuals"
In order to create an iso-female line, two random individuals are drawn from the wild population, and selected for inbreeding. We can simulate this process with:
<- create_iso_female(
isofemale module = ancestry_module(input_population = wildpop,
morgan = 1),
n = 1,
inbreeding_pop_size = 1000,
run_time = 200)
Here, n indicates the number of isofemales to be created from the same source population, the inbreeding population size is set to be small, in order to speed up computation. Maximum run time is best set hight, to assure that all individuals become genetically identical.
Now, we can plot the isofemales, this plots the two chromosomes next to each other, where colors indicate different ancestors:
plot(isofemale[[1]])
Because we have chosen 4 ancestors, these plots are not terribly informative, let’s try a toy example:
<- simulate_admixture(
wildpop module = ancestry_module(number_of_founders = 4,
morgan = 1),
pop_size = 100,
total_runtime = 100)
<- create_iso_female(
isofemale module = ancestry_module(input_population = wildpop,
morgan = 1),
n = 1,
inbreeding_pop_size = 100,
run_time = 10000)
plot(wildpop$population[[1]])
plot(isofemale[[1]])
We can also plot a (section of) a single chromosome:
plot_chromosome(isofemale[[1]]$chromosome1, xmin = 0.0, xmax = 0.5)
It gets more interesting if we create two populations, and draw iso-females from them, and cross these. First, we have to create two populations.
<-
both_populations simulate_admixture(
module = ancestry_module(morgan = 1),
migration = migration_settings(population_size = c(100, 100),
migration_rate = 0.0,
initial_frequencies =
list(c(rep(1, 20), rep(0, 20)),
c(rep(0, 20), rep(1, 20)))
),total_runtime = 1000)
## Warning in check_initial_frequencies(initial_frequencies): starting frequencies were normalized to 1
## Warning in check_initial_frequencies(initial_frequencies): starting frequencies were normalized to 1
<- both_populations$population_1
population_1
<- both_populations$population_2 population_2
To make sure that the two populations dont share ancestor indices, we use the function ‘increase_ancestor’.
Now that we have two populations, we can generate two different iso-females from them:
<- create_iso_female(
isofemales module = ancestry_module(input_population = population_1,
morgan = 1),
n = 2,
inbreeding_pop_size = 100,
run_time = 10000)
plot_chromosome(isofemales[[1]]$chromosome1, 0, 1)
plot_chromosome(isofemales[[2]]$chromosome1, 0, 1)
We can now use these two isofemales to seed a new inbreeding population:
<-
mixed_population simulate_admixture(
module = ancestry_module(input_population = list(isofemales[[1]],
2]]),
isofemales[[morgan = 1),
pop_size = 100,
total_runtime = 100)
And plot some individuals from our new population:
plot(mixed_population$population[[1]])
We can show the effect of overlap by calculating the Fst value:
<- calculate_fst(population_1,
fst
population_2,sampled_individuals = 10,
number_of_markers = 100,
random_markers = TRUE)
fst
## [1] 0.9986667
The FST calculation function uses the library hierfstat to calculate the FST. To do so, it samples 10 random individuals (the parameter sampled individuals) from the population, and then assesses the genetic content at 100 randomly placed markers (number of markers, and random_markers = TRUE). To increase power, a higher number of individuals can be sampled, although this does increase computational load.
We can also calculate LD statistics. LD is only defined for markers, so again we have to simulate artificial markers along the chromosome.
<- calculate_ld(wildpop,
ld_results markers = 10)
plot(ld_results$ld_matrix~ld_results$dist_matrix,
xlab = "Genetic Distance (Morgan)",
ylab = "LD",
pch = 16)
plot(ld_results$rsq_matrix~ld_results$dist_matrix,
xlab = "Genetic Distance (Morgan)",
ylab = "r_sq",
pch = 16)
plot(ld_results$ld_matrix~ld_results$rsq_matrix,
xlab = "r_sq",
ylab = "LD",
pch = 16)
To analyze LD patterns better, it makes more sense to create a population from scratch. We expect for a strongly inbred population, that there is no LD:
<- simulate_admixture(
no_ld_pop module = ancestry_module(number_of_founders = 4,
morgan = 1),
pop_size = 100,
total_runtime = 1000)
<- calculate_ld(no_ld_pop,
ld_results sampled_individuals = 10,
markers = 10)
plot(ld_results$ld_matrix~ld_results$dist_matrix,
pch = 16,
xlab = "Distance",
ylab = "LD",
xlim = c(0, 1),
ylim = c(0, 1))
Alternatively, for a barely inbred population, we expect a negative relationship between LD and distance:
<- simulate_admixture(
strong_ld_pop module = ancestry_module(number_of_founders = 4,
morgan = 1),
pop_size = 100,
total_runtime = 10)
<- calculate_ld(strong_ld_pop,
ld_results sampled_individuals = 10,
markers = 10)
plot(ld_results$ld_matrix~ld_results$dist_matrix,
pch = 16,
xlab = "Distance",
ylab = "LD",
xlim = c(0, 1),
ylim = c(0, 1))
Selection for specific markers (e.g. SNPs) can potentially cause local allelel frequency biases, and influence genetic patterns within the population as a whole.
The user can provide a selection ‘matrix’ that specifies how selection acts upon different genotypes. As genotypes we indicate here ‘aa’ (wildtype), ‘aA’ (heterozygote) and ‘AA’ (homozygote with allele under selection), where wildtype is any allele not under selection. As alleles under selection we take the general genomic content of ancestors, and hence alleles refer to anestors.
A perhaps interesting simulation would be one where the heterozygote has an advantage over the other genotypes
<- 0.1
s <- matrix(nrow = 1, ncol = 5)
selection_matrix 1, ] <- c(0.5,
selection_matrix[1.0, 1.0 + s, 1.0,
0)
<- 0.5
markers
<- simulate_admixture(
selected_pop module = ancestry_module(number_of_founders = 2,
morgan = 1,
markers = markers),
pop_size = 1000,
total_runtime = 100,
select_matrix = selection_matrix)
## Found a selection matrix, performing simulation including selection
plot_over_time(selected_pop$frequencies, markers)
Indeed we observe that the frequencies of both ancestors are in equilibrium around 0.5, as expected.
Another interesting simulation would be where we give only the homozygous mutant a selective benefit. Here we expect that it takes some time before the homozygote takes over.
<- 0.1
s 1, ] <- c(0.5,
selection_matrix[1.0, 1.0, 1.0 + s,
0)
<- simulate_admixture(
selected_pop module = ancestry_module(number_of_founders = 10,
morgan = 1,
markers = markers),
pop_size = 1000,
total_runtime = 300,
select_matrix = selection_matrix)
## Found a selection matrix, performing simulation including selection
plot_over_time(selected_pop$frequencies, markers)