In this vignette, we show how to use the bcf
package to
fit a model and use the fitted object to predict estimates for new
data.
We use the same data generating process as in the “Simple Example” vignette:
set.seed(1)
## Training data
p <- 3 # two control variables and one effect moderator
n <- 1000
n_burn <- 2000
n_sim <- 1500
x <- matrix(rnorm(n*(p-1)), nrow=n)
x <- cbind(x, x[,2] + rnorm(n))
weights <- abs(rnorm(n))
# create targeted selection, whereby a practice's likelihood of joining the intervention (pi)
# is related to their expected outcome (mu)
mu <- -1*(x[,1]>(x[,2])) + 1*(x[,1]<(x[,2])) - 0.1
# generate treatment variable
pi <- pnorm(mu)
z <- rbinom(n,1,pi)
# tau is the true treatment effect. It varies across practices as a function of
# X3 and X2
tau <- 1/(1 + exp(-x[,3])) + x[,2]/10
# generate the response using q, tau and z
y_noiseless <- mu + tau*z
# set the noise level relative to the expected mean function of Y
sigma <- diff(range(mu + tau*pi))/8
# draw the response variable with additive error
y <- y_noiseless + sigma*rnorm(n)/sqrt(weights)
# Fitting the model
bcf_out <- bcf(y = y,
z = z,
x_control = x,
x_moderate = x,
pihat = pi,
nburn = n_burn,
nsim = n_sim,
w = weights,
n_chains = 2,
update_interval = 1)
We make predictions at 10 new observations, including some extreme values:
set.seed(1)
n_test = 10
x_test <- matrix(rnorm(n_test*(p-1), 0, 2), nrow=n_test) # sd of 2 makes x_test more dispersed than x
x_test <- cbind(x_test, x_test[,2] + rnorm(n_test))
mu_pred <- -1*(x_test[,1]>(x_test[,2])) + 1*(x_test[,1]<(x_test[,2])) - 0.1
pi_pred <- pnorm(mu_pred)
z_pred <- rbinom(n_test,1, pi_pred)
We now predict \(y\) and estimate treatment effects \(\tau\) for these new observations based on our fitted model.
pred_out = predict(object=bcf_out,
x_predict_control=x_test,
x_predict_moderate=x_test,
pi_pred=pi_pred,
z_pred=z_pred,
n_cores = 1,
save_tree_directory = '.')
#> Initializing BCF Prediction
#> Starting Prediction
#> Starting to Predict Chain 1
#> Loading...
#> ntrees 50
#> p 3
#> ndraws 1500
#> done
#> Loading...
#> ntrees 200
#> p 4
#> ndraws 1500
#> done
#> Starting to Predict Chain 2
#> Loading...
#> ntrees 50
#> p 3
#> ndraws 1500
#> done
#> Loading...
#> ntrees 200
#> p 4
#> ndraws 1500
#> done
Let’s compare the results for our training and testing data. We will show the estimated treatment effects for training and test observations as a function of \(x_3\), which is an effect modifier.
tau_ests_preds <- data.frame(x = c(x[,3], x_test[,3]),
Mean = c(colMeans(bcf_out$tau),
colMeans(pred_out$tau)),
Low95 = c(apply(bcf_out$tau, 2, quantile, 0.025),
apply(pred_out$tau, 2, quantile, 0.025)),
Up95 = c(apply(bcf_out$tau, 2, quantile, 0.975),
apply(pred_out$tau, 2, quantile, 0.975)),
group = factor(c(rep("training", n), rep("testing", n_test))),
agroup = c(rep(0.2, n), rep(1, n_test)))
ggplot(tau_ests_preds, aes(x, Mean, color = group)) +
geom_pointrange(aes(ymin = Low95, ymax = Up95), alpha = tau_ests_preds$agroup) +
xlab(TeX("$x_3$")) +
ylab(TeX("$\\hat{\\tau}$"))
Note that the credible intervals get wider as the \(x_3\) values get closer to the end of the range, as we would hope.