This vignette provides an in-depth walk-through of the
MRPWorkflow
and MRPModel
classes in the
shinymrp package. You will find practical information
about the purpose, arguments, and caveats of each method, enabling you
to conduct MRP analyses programmatically.
For an introduction to the package and workflow concepts, begin with the Getting started with shinymrp vignette.
MRP analysis begins by instantiating an MRPWorkflow
object with mrp_workflow()
. The object-oriented design
efficiently manages your data and supports a reproducible workflow
without requiring repeated user input. All workflow methods are accessed
via the R6 $
operator.
MRP requires two key data components:
For details on accepted formats and preprocessing, see the Data Preprocessing vignette.
Sample data should be a tidy data frame with your outcome measure and predictors. Time-varying data are also supported, allowing dates or indices (e.g., week, month, or year) to be incorporated for time-varying summaries and estimates.
The example_sample_data()
function demonstrates accepted
structures and common options:
sample_data <- example_sample_data(
is_timevar = TRUE,
is_aggregated = TRUE,
special_case = NULL,
family = "binomial"
)
#> Warning in value[[3L]](cond): GitHub API check failed: HTTP 403 Forbidden..
#> Proceeding with download.
head(sample_data)
#> # A tibble: 6 × 10
#> zip sex race age time county state date total positive
#> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <date> <dbl> <dbl>
#> 1 71354 female black 0-17 3 22025 22 2020-01-13 1 0
#> 2 71354 female black 0-17 5 22025 22 2020-01-27 1 1
#> 3 71354 female black 0-17 9 22025 22 2020-02-24 1 0
#> 4 71354 female black 18-34 1 22025 22 2019-12-30 2 1
#> 5 71354 female black 18-34 6 22025 22 2020-02-03 1 0
#> 6 71354 female black 18-34 9 22025 22 2020-02-24 1 1
Data preprocessing is performed with the $preprocess()
method, which standardizes formats (e.g., recodes age groups, creates
time indices), removes duplicates, and prepares data for modeling.
Note: Only basic cleaning is performed. Conduct preliminary data checks beforehand.
Use the $link_acs()
method to retrieve
poststratification data from the American Community Survey (ACS),
specifying the desired geographic level and reference year (i.e., the
last year of the five-year ACS data collection). Supported levels
include ZIP code, county, and state. If not specified,
poststratification data are aggregated for the U.S. population.
You can also import your own poststratification data using
$load_pstrat()
. Requirements
mirror those for sample data.
pstrat_data <- example_pstrat_data()
#> Warning in value[[3L]](cond): GitHub API check failed: HTTP 403 Forbidden..
#> Proceeding with download.
workflow$load_pstrat(pstrat_data, is_aggregated = TRUE)
#> Preprocessing poststratification data...
#> Warning in value[[3L]](cond): GitHub API check failed: HTTP 403 Forbidden..
#> Proceeding with download.
The MRPWorkflow
class provides methods for creating
visualizations of demographic distributions, geographic coverage, and
outcome summaries.
Use $demo_bars()
to generate bar plots comparing
demographic distributions in the sample and poststratification data.
For analyzing COVID-19 test data with linked geographic predictors,
the $covar_hist()
method creates frequency histograms of
the geographic predictors. This method will give the following error if
it is called with incompatible data that do not include geographic
predictors.
workflow$covar_hist(covar = "income")
#> Error in workflow$covar_hist(covar = "income"): Covariate data is not available. This method is only available for COVID data.
$sample_size_map()
visualizes sample sizes across
geographic areas in an interactive choropleth map, powered by highcharter.
workflow$sample_size_map()
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
$outcome_plot()
creates plots of average outcome values,
including time variation if present.
$outcome_map()
visualizes outcome summaries by
geography, with options for the maximum or minimum value across
time.
The MRPModel
class is a wrapper around CmdStanR objects,
providing a high-level interface for obtaining posterior summaries and
diagnostics.
Use $create_model()
to specify the model, including the
mean structure (fixed and varying effects, two-way interactions) and
priors. Varying effects assume normal distributions with unknown
standard deviations (with specified priors). Stan code is generated for
compilation via CmdStanR. Supported prior types include:
normal(mu, sigma)
student_t(nu, mu, sigma)
structured
(see Si
et al., 2020)Default priors are used if ""
(empty string) is
supplied:
The $fit()
method runs Stan’s main Markov chain Monte
Carlo algorithm via CmdStanR.
The fitted MRPModel
object extracts draws from the
internal CmdStanMCMC
object for summaries and
diagnostics.
$summary()
for posterior statistics, effective sample
sizes, and R-hat diagnostics.model1$summary()
#> $fixed
#> Estimate Est.Error l-95% CI u-95% CI R-hat Bulk_ESS
#> intercept -0.1926188 0.2250724 -0.59452797 0.3132129 1.007664 134.0609
#> sex.male 0.2936252 0.1328935 0.02452109 0.5443322 1.000301 424.1697
#> race.black NA NA NA NA NA NA
#> race.other 0.7855232 0.1859582 0.41538058 1.1373412 1.006413 415.9744
#> race.white 0.8434822 0.1679745 0.47585995 1.1415483 1.001577 458.3526
#> Tail_ESS
#> intercept 82.97744
#> sex.male 407.90451
#> race.black NA
#> race.other 337.37038
#> race.white 381.81452
#>
#> $varying
#> Estimate Est.Error l-95% CI u-95% CI R-hat Bulk_ESS
#> age (intercept) 0.2883083 0.2692683 0.03661777 1.1676160 1.011352 55.09682
#> time (intercept) 0.2704408 0.1275997 0.04227495 0.5847031 1.003678 199.92277
#> Tail_ESS
#> age (intercept) 103.3675
#> time (intercept) 180.6145
#>
#> $other
#> data frame with 0 columns and 0 rows
$diagnostics()
for detailed
diagnostic information about the MCMC process.model1$diagnostics()
#> Metric
#> 1 Divergence
#> 2 Maximum tree depth
#> 3 E-BFMI
#> Message
#> 1 0 of 500 (0.0%) transitions ended with a divergence
#> 2 0 of 500 transitions hit the maximum tree depth limit of 10
#> 3 0 of 2 chains had an E-BFMI less than 0.3
$pp_check()
to compare posterior predictive samples against
observed data.$compare_models()
.model2 <- workflow$create_model(
intercept_prior = "normal(0, 4)",
fixed = list(
sex = "normal(0, 2)",
race = "normal(0, 2)"
),
varying = list(
age = "",
time = ""
),
interaction = list(
`age:time` = "normal(0, 1)",
`race:time` = "normal(0, 1)",
`sex:time` = "normal(0, 1)"
)
)
model2$fit(
n_iter = 500,
n_chains = 2,
seed = 123,
show_messages = FALSE,
show_exceptions = FALSE
)
#> Warning: 1 of 500 (0.0%) transitions ended with a divergence.
#> See https://mc-stan.org/misc/warnings for details.
Use $estimate_plot()
to visualize the model estimates,
either overall or by group, with a 95% credible interval by default.
Users can specify alternative intervals.
For time-varying data, $estimate_map()
creates snapshots
of geography-based estimates at specific time points.
time_index
is irrelevant for cross-sectional data.