% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pred_stacked_regression.R
\name{pred_stacked_regression}
\alias{pred_stacked_regression}
\title{Perform Stacked Regression on Existing Prediction Models}
\usage{
pred_stacked_regression(
  x,
  positivity_constraint = FALSE,
  new_data,
  binary_outcome = NULL,
  survival_time = NULL,
  event_indicator = NULL
)
}
\arguments{
\item{x}{an object of class "\code{predinfo}" produced by calling
\code{\link{pred_input_info}} containing information on at least two
existing prediction models.}

\item{positivity_constraint}{TRUE/FALSE denoting if the weights within the
stacked regression model should be constrained to be non-negative (TRUE) or
should be allowed to take any value (FALSE). See details.}

\item{new_data}{data.frame upon which the prediction models should be
aggregated.}

\item{binary_outcome}{Character variable giving the name of the column in
\code{new_data} that represents the observed binary outcomes (should be
coded 0 and 1 for non-event and event, respectively). Only relevant for
\code{model_type}="logistic"; leave as \code{NULL} otherwise. Leave as
\code{NULL} if \code{new_data} does not contain any outcomes.}

\item{survival_time}{Character variable giving the name of the column in
\code{new_data} that represents the observed survival times. Only relevant
for \code{x$model_type}="survival"; leave as \code{NULL} otherwise.}

\item{event_indicator}{Character variable giving the name of the column in
\code{new_data} that represents the observed survival indicator (1 for
event, 0 for censoring). Only relevant for \code{x$model_type}="survival";
leave as \code{NULL} otherwise.}
}
\value{
A object of class "\code{predSR}". This is the same as that detailed
in \code{\link{pred_input_info}}, with the added element containing the
estimates of the meta-model obtained by stacked regression.
}
\description{
This function takes a set of existing prediction models, and uses the new
dataset to combine/aggregate them into a single 'meta-model', as described in
Debray et al. 2014.
}
\details{
This function takes a set of (previously estimated) prediction
models that were each originally developed for the same prediction task,
and pool/aggregate these into a single prediction model (meta-model) using
stacked regression based on new data (data not used to develop any of the
existing models). The methodological details can be found in Debray et al.
2014.

Given that the existing models are likely to be highly co-linear (since
they were each developed for the same prediction task), it has been
suggested to impose a positivity constraint on the weights of the stacked
regression model (Debray et al. 2014.). If \code{positivity_constraint} is
set to TRUE, then the stacked regression model will be estimated by
optimising the (log-)likelihood using bound constrained optimization
("L-BFGS-B"). This is currently only implemented for logistic regression
models (i.e., if \code{x$model_type}="logistic"). For survival models,
positivity_constraint = FALSE.

\code{new_data} should be a data.frame, where each row should be an
observation (e.g. patient) and each variable/column should be a predictor
variable. The predictor variables need to include (as a minimum) all of the
predictor variables that are included in the existing prediction models
(i.e., each of the variable names supplied to
\code{\link{pred_input_info}}, through the \code{model_info} parameter,
must match the name of a variables in \code{new_data}).

Any factor variables within \code{new_data} must be converted to dummy
(0/1) variables before calling this function. \code{\link{dummy_vars}} can
help with this. See \code{\link{pred_predict}} for examples.

\code{binary_outcome}, \code{survival_time} and \code{event_indicator} are
used to specify the outcome variable(s) within \code{new_data} (use
\code{binary_outcome} if \code{x$model_type} = "logistic", or use
\code{survival_time} and \code{event_indicator} if \code{x$model_type} =
"survival").
}
\examples{
LogisticModels <- pred_input_info(model_type = "logistic",
                                  model_info = SYNPM$Existing_logistic_models)
SR <- pred_stacked_regression(x = LogisticModels,
                              new_data = SYNPM$ValidationData,
                              binary_outcome = "Y")
summary(SR)

#Survival model example:
TTModels <- pred_input_info(model_type = "survival",
                            model_info = SYNPM$Existing_TTE_models,
                            cum_hazard = list(SYNPM$TTE_mod1_baseline,
                                                  SYNPM$TTE_mod2_baseline,
                                                  SYNPM$TTE_mod3_baseline))
SR <- pred_stacked_regression(x = TTModels,
                              new_data = SYNPM$ValidationData,
                              survival_time = "ETime",
                              event_indicator = "Status")
summary(SR)

}
\references{
Debray, T.P., Koffijberg, H., Nieboer, D., Vergouwe, Y.,
Steyerberg, E.W. and Moons, K.G. (2014), Meta-analysis and aggregation of
multiple published prediction models. \emph{Statistics in Medicine}, 33:
2341-2362
}
\seealso{
\code{\link{pred_input_info}}
}
