We propose a few examples on the usage of SIHR to simulated dataset. We will show how to conduct inference for linear functionals (LF) and quadratic functionals (QF) on linear and logistic regression settings, respectively.
Load the library:
We consider the setting that \(n=200, p=150\) with \[ X_i \sim N(\textbf{0}_p, \textbf{I}_p),\; Y_i = \alpha + X_i^\intercal \beta + \epsilon_i, \; \epsilon_i\sim N(0,1),\; \textrm{where }\; \alpha = -0.5, \; \beta = (0.5, \textbf{1}_4, \textbf{0}_{p-5}). \] Our goal is to construct valid inference for objectives:
The 1st and 2nd objectives will be achieved togther by LF( ), while the 3d objective will be conducted with QF( ).
Generate Data
set.seed(0)
n <- 200
p <- 150
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- -0.5 + X %*% c(0.5, rep(1, 4), rep(0, p - 5))
Loadings for Linear Functionals
loading1 <- c(1, rep(0, p - 1)) # for 1st objective, true value = 0.5
loading2 <- c(1, 1, rep(0, p - 2)) # for 2nd objective, true value = 1.5
loading.mat <- cbind(loading1, loading2)
Conduct Inference, call LF
with
model="linear"
:
Est <- LF(X, y, loading.mat, model = "linear", intercept = TRUE, intercept.loading = FALSE, verbose = TRUE)
#> ---> Computing for loading (1/2)...
#> The projection direction is identified at mu = 0.044356at step =4
#> ---> Computing for loading (2/2)...
#> The projection direction is identified at mu = 0.044356at step =4
The parameter intercept
indicates whether we fit the
model with/without intercept term. The parameter
intercept.loading
indicates whether we include intercept
term in the inference objective. In this example, the model is fitted
with intercept, but we do not include it in our final objective.
Methods for LF
ci(Est)
#> loading lower upper
#> 1 1 0.4892515 0.5115561
#> 2 2 1.4886438 1.5184631
summary(Est)
#> Call:
#> Inference for Linear Functional
#>
#> Estimators:
#> loading est.plugin est.debias Std. Error z value Pr(>|z|)
#> 1 0.4764 0.5004 0.005690 87.94 0 ***
#> 2 1.4533 1.5036 0.007607 197.65 0 ***
Notice that the true values are \(0.5\) and \(1.5\) for 1st and 2nd objective respectively, both are included in their corresponding confidence interval. Also it is evident that our bias-corrected estimators is much closer to the true values than the Lasso estimators.
For quadratic functionals, we need to specify the subset \(G \subseteq [p]\). If argument \(A\) is not specified (default = NULL), we will automatically conduct inference on \(\beta_G \Sigma_{G,G} \beta_G\).
Conduct Inference, call QF
with
model="linear"
. The argument split
indicates
whether we split samples or not for computing the initial estimator.
Est <- QF(X, y, G, A = NULL, model = "linear", intercept = TRUE, verbose = TRUE)
#> The projection direction is identified at mu = 0.062729at step =3
ci
method for QF
ci(Est)
#> tau lower upper
#> 1 0.25 2.239521 3.890422
#> 2 0.50 2.233725 3.896219
#> 3 1.00 2.222250 3.907693
summary
method for QF
summary(Est)
#> Call:
#> Inference for Quadratic Functional
#>
#> tau est.plugin est.debias Std. Error z value Pr(>|z|)
#> 0.25 2.9 3.065 0.4212 7.278 3.400e-13 ***
#> 0.50 2.9 3.065 0.4241 7.227 4.947e-13 ***
#> 1.00 2.9 3.065 0.4300 7.128 1.016e-12 ***
In the output results, each row represents the result for different values of \(\tau\), the enlargement factor for asymptotic variance to handle super-efficiency. Notice that the true value is \(3.25\) for 3rd objective, which is included in the confidence interval.
The procedures of usage in the logistic regression setting are almost
the same as the one in linear setting, except that we need to specify
the argument model="logistic"
or
model="logistic_alter"
, instead of
model="linear"
. We propose two different debiasing methods
for logistic regression, both work theoretically and empiricially.
We consider the setting that \(n=200, p=150\) with \[ X_i \sim N(\textbf{0}_p, \textbf{I}_p),\; P_i = \frac{\exp(\alpha + X_i^\intercal \beta)}{1+\exp(\alpha + X_i^\intercal \beta)},\; Y_i = {\rm Binomial}(P_i),\; \textrm{where }\; \alpha = -0.5, \;\beta = (0.5, 1, \textbf{0}_{p-2}). \] Our goal is to construct valid inference for objectives:
The 1st and 2nd objectives will be achieved togther by LF( ), while the 3d objective will be conducted with QF( ).
Generate Data
set.seed(1)
n <- 200
p <- 120
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
val <- -1.5 + X[, 1] * 0.5 + X[, 2] * 1
prob <- exp(val) / (1 + exp(val))
y <- rbinom(n, 1, prob)
Loadings for Linear Functionals
loading1 <- c(1, 1, rep(0, p - 2)) # for 1st objective, true value = 1.5
loading2 <- c(-0.5, -1, rep(0, p - 2)) # for 2nd objective, true value = -1.25
loading.mat <- cbind(loading1, loading2)
Conduct Inference, call LF
with
model="logistic"
or
model="logistic_alter"
:
Est <- LF(X, y, loading.mat, model = "logistic", verbose = TRUE)
#> ---> Computing for loading (1/2)...
#> The projection direction is identified at mu = 0.028911at step =5
#> ---> Computing for loading (2/2)...
#> The projection direction is identified at mu = 0.028911at step =5
Methods for LF
ci(Est)
#> loading lower upper
#> 1 1 0.6510009 1.866141
#> 2 2 -1.3927844 -0.424308
summary(Est)
#> Call:
#> Inference for Linear Functional
#>
#> Estimators:
#> loading est.plugin est.debias Std. Error z value Pr(>|z|)
#> 1 0.3745 1.2586 0.3100 4.060 4.907e-05 ***
#> 2 -0.2762 -0.9085 0.2471 -3.677 2.357e-04 ***
Notice that the true values are \(1.5\) and \(-1.25\) for 1st and 2nd objective respectively, both are included in their corresponding confidence interval. Also it is evident that our bias-corrected estimators is much closer to the true values than the Lasso estimators.
For quadratic functionals, we find that sufficient larger sample size is needed for better empirical result, since we need to split samples to obtain initial estimators. Thus, we generate another simulated data but with larger sample size \(n=400\).
set.seed(0)
n <- 400
p <- 120
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
val <- -1.5 + X[, 1] * 0.5 + X[, 2] * 1
prob <- exp(val) / (1 + exp(val))
y <- rbinom(n, 1, prob)
G <- c(1:3) # 3rd objective, true value = 1.25
Conduct Inference, call QF
with
model="logistic_alter"
.
Est <- QF(X, y, G, A = NULL, model = "logistic_alter", intercept = TRUE, verbose = TRUE)
#> The projection direction is identified at mu = 0.029056at step =5
ci
method for QF
ci(Est)
#> tau lower upper
#> 1 0.25 0.2274503 2.048520
#> 2 0.50 0.1339998 2.141970
#> 3 1.00 0.0000000 2.306665
summary
method for QF
summary(Est)
#> Call:
#> Inference for Quadratic Functional
#>
#> tau est.plugin est.debias Std. Error z value Pr(>|z|)
#> 0.25 0.6434 1.138 0.4646 2.450 0.01430 *
#> 0.50 0.6434 1.138 0.5122 2.222 0.02631 *
#> 1.00 0.6434 1.138 0.5963 1.908 0.05633 .
In the output results, each row represents the result for different values of \(\tau\), the enlargement factor for asymptotic variance to handle super-efficiency. Notice that the true value is \(3.25\) for 3rd objective, which is included in the confidence interval.