The algebraic.dist package provides an algebra
over probability distributions. You can combine distribution
objects with standard arithmetic operators (+,
-, *, /, ^) and
mathematical functions (exp, log,
sqrt, …), and the package will either
simplify the result to a known closed-form distribution
or fall back to Monte Carlo sampling when no closed
form exists.
This means you can write expressions like
normal(0, 1) + normal(2, 3) and get back a
normal object, without manually deriving the result. When
the package cannot simplify, the result is an edist
(expression distribution) that supports all the usual methods –
mean, vcov, sampler,
cdf, and more – via MC approximation.
Arithmetic operators on dist objects create
edist (expression distribution) objects, which are then
automatically simplified when a closed-form rule applies. When no rule
matches, the edist is kept as-is.
Since the sum of a normal and an exponential has no simple closed
form, z remains an edist:
is_edist(z)
#> [1] TRUE
class(z)
#> [1] "+_normal_exponential" "x_normal_exponential" "y_normal_exponential"
#> [4] "edist" "dist"The edist still supports all distribution methods.
Moments are computed via Monte Carlo:
The theoretical mean is 0 + 1 = 1 and variance is
1 + 1 = 2, so the MC estimates should be close.
The real power of the package is automatic simplification. When you compose distributions using operations that have known closed-form results, the package returns the simplified distribution directly – no sampling needed.
The sum and difference of independent normals is normal. Scalar multiplication and location shifts are also handled.
Sum of normals:
z <- normal(1, 4) + normal(2, 5)
is_normal(z)
#> [1] TRUE
z
#> Normal distribution (mu = 3, var = 9)
mean(z)
#> [1] 3
vcov(z)
#> [1] 9The result is Normal(mu = 3, var = 9), since means add and variances add for independent normals.
Difference of normals:
z <- normal(1, 4) - normal(2, 5)
z
#> Normal distribution (mu = -1, var = 9)
mean(z)
#> [1] -1
vcov(z)
#> [1] 9Normal(mu = -1, var = 9) – means subtract, but variances still add.
Scalar multiplication:
If X ~ Normal(mu, v), then
c * X ~ Normal(c * mu, c^2 * v).
Location shift:
z <- normal(5, 2) + 10
z
#> Normal distribution (mu = 15, var = 2)
mean(z)
#> [1] 15
vcov(z)
#> [1] 2Adding a constant shifts the mean but preserves the variance.
Sums of independent exponentials (with the same rate) produce gamma distributions. Gamma distributions with the same rate also combine.
Sum of exponentials:
z <- exponential(2) + exponential(2)
is_gamma_dist(z)
#> [1] TRUE
z
#> Gamma distribution (shape = 2, rate = 2)
params(z)
#> shape rate
#> 2 2Two independent Exp(2) variables sum to Gamma(shape = 2, rate = 2).
Gamma plus exponential:
z <- gamma_dist(3, 2) + exponential(2)
z
#> Gamma distribution (shape = 4, rate = 2)
params(z)
#> shape rate
#> 4 2Gamma(3, 2) + Exp(2) = Gamma(4, 2).
Sum of gammas (same rate):
z <- gamma_dist(3, 2) + gamma_dist(4, 2)
z
#> Gamma distribution (shape = 7, rate = 2)
params(z)
#> shape rate
#> 7 2Gamma(3, 2) + Gamma(4, 2) = Gamma(7, 2) – shapes add when rates match.
Certain mathematical transformations of distributions have known closed-form results.
Exponential of a normal is log-normal:
z <- exp(normal(0, 1))
is_lognormal(z)
#> [1] TRUE
z
#> Log-normal distribution (meanlog = 0, sdlog = 1)Logarithm of a log-normal is normal:
Standard normal squared is chi-squared:
This rule specifically applies when the normal has mean 0 and variance 1.
Chi-squared addition:
z <- chi_squared(3) + chi_squared(5)
is_chi_squared(z)
#> [1] TRUE
z
#> Chi-squared distribution (df = 8)
params(z)
#> df
#> 8Degrees of freedom add: Chi-squared(3) + Chi-squared(5) = Chi-squared(8).
Poisson addition:
z <- poisson_dist(2) + poisson_dist(3)
is_poisson_dist(z)
#> [1] TRUE
z
#> Poisson distribution (lambda = 5)
params(z)
#> lambda
#> 5Rates add: Poisson(2) + Poisson(3) = Poisson(5).
Product of log-normals:
z <- lognormal(0, 1) * lognormal(1, 2)
is_lognormal(z)
#> [1] TRUE
z
#> Log-normal distribution (meanlog = 1, sdlog = 2.23607)
params(z)
#> meanlog sdlog
#> 1.000000 2.236068The product of independent log-normals is log-normal because
log(X * Y) = log(X) + log(Y), and the sum of independent
normals is normal. So the meanlog values add and the
sdlog values combine as
sqrt(sdlog1^2 + sdlog2^2).
Uniform scaling and shifting:
z <- 2 * uniform_dist(0, 1)
is_uniform_dist(z)
#> [1] TRUE
z
#> Uniform distribution (min = 0, max = 2)
params(z)
#> min max
#> 0 2Scaling a Uniform(0, 1) by 2 gives Uniform(0, 2).
z <- uniform_dist(0, 1) + 5
is_uniform_dist(z)
#> [1] TRUE
z
#> Uniform distribution (min = 5, max = 6)
params(z)
#> min max
#> 5 6Shifting Uniform(0, 1) by 5 gives Uniform(5, 6).
When no algebraic rule matches, the result remains an unsimplified
edist. All distribution methods still work – they just use
Monte Carlo under the hood.
The true mean of X * Y where
X ~ Normal(0, 1) and Y ~ Exp(1) is
E[X] * E[Y] = 0 * 1 = 0 (by independence), so the MC
estimate should be near zero.
You can explicitly convert any distribution to an empirical
distribution via realize(). This draws n
samples and wraps them in an empirical_dist that supports
exact (discrete) methods for CDF, density, conditional, and more.
set.seed(3)
z <- normal(0, 1) * exponential(1)
rd <- realize(z, n = 10000)
rd
#> Realized distribution (10000 samples from: Expression distribution: x * y)
#> source:
#> Expression distribution: x * y
#> x ~ Normal distribution (mu = 0, var = 1)
#> y ~ Exponential distribution (rate = 1)
mean(rd)
#> [1] -0.003321398
vcov(rd)
#> [,1]
#> [1,] 2.048537For edist objects, methods like cdf(),
density(), and inv_cdf() automatically
materialize samples behind the scenes (with caching, so repeated calls
share the same sample). You do not need to call realize()
manually for these:
The Summary group generic is implemented for
distributions. The sum() and prod() generics
reduce via + and *, so they benefit from all
the simplification rules. The min() and max()
generics create elementwise edist expressions.
Min of exponentials has a special rule – the minimum of independent exponentials is exponential with the sum of rates:
z <- min(exponential(1), exponential(2))
is_exponential(z)
#> [1] TRUE
z
#> Exponential distribution (rate = 3)
params(z)
#> rate
#> 3This is the well-known result: if X ~ Exp(r1) and
Y ~ Exp(r2), then
min(X, Y) ~ Exp(r1 + r2).
Sum reduces via the + operator, so
simplification rules apply:
z <- sum(normal(0, 1), normal(2, 3), normal(-1, 2))
is_normal(z)
#> [1] TRUE
z
#> Normal distribution (mu = 1, var = 6)
mean(z)
#> [1] 1
vcov(z)
#> [1] 6Three normals sum to a single normal: mean = 0 + 2 - 1 = 1, var = 1 + 3 + 2 = 6.
Max and other combinations that lack closed forms
remain as edist:
The package provides functions for common asymptotic results from probability theory.
clt(base_dist) returns the limiting distribution of the
standardized sample mean, sqrt(n) * (Xbar_n - mu). For a
univariate distribution with variance sigma^2, this is
Normal(0, sigma^2).
z <- clt(exponential(1))
z
#> Normal distribution (mu = 0, var = 1)
mean(z)
#> [1] 0
vcov(z)
#> [1] 1For Exp(1), the mean is 1 and the variance is 1, so the CLT limit is Normal(0, 1).
lln(base_dist) returns the degenerate distribution at
the population mean – the limit of Xbar_n as
n -> infinity. It is represented as a normal with zero
variance.
delta_clt(base_dist, g, dg) returns the limiting
distribution of sqrt(n) * (g(Xbar_n) - g(mu)) under the
delta method. You supply the function g and its derivative
dg.
z <- delta_clt(exponential(1), g = exp, dg = exp)
z
#> Normal distribution (mu = 0, var = 7.38906)
mean(z)
#> [1] 0
vcov(z)
#> [1] 7.389056For g = exp applied to Exp(1) (mean = 1, var = 1), the
delta method gives
Normal(0, (exp(1))^2 * 1) = Normal(0, e^2).
normal_approx(x) constructs a normal distribution that
matches the mean and variance of any input distribution:
g <- gamma_dist(5, 2)
n <- normal_approx(g)
n
#> Normal distribution (mu = 2.5, var = 1.25)
mean(n)
#> [1] 2.5
vcov(n)
#> [1] 1.25The Gamma(5, 2) has mean 2.5 and variance 1.25, so the normal approximation is Normal(2.5, 1.25).
To see the CLT in action, we can draw replicated standardized sample means at increasing sample sizes and watch them converge to the predicted distribution:
set.seed(42)
base <- exponential(rate = 1)
ns <- c(10, 100, 1000)
for (n in ns) {
samp_means <- replicate(5000, {
x <- sampler(base)(n)
sqrt(n) * (mean(x) - mean(base))
})
cat(sprintf("n=%4d: mean=%.3f, var=%.3f\n",
n, mean(samp_means), var(samp_means)))
}
#> n= 10: mean=-0.006, var=1.002
#> n= 100: mean=-0.003, var=1.022
#> n=1000: mean=0.007, var=0.995Compare with the CLT prediction:
z <- clt(base)
cat(sprintf("CLT: mean=%.3f, var=%.3f\n", mean(z), vcov(z)))
#> CLT: mean=0.000, var=1.000As n grows, the empirical mean converges to 0 and the
empirical variance converges to 1, matching the CLT limit of Normal(0,
1).