The computational load of function pre
can be heavy.
This vignette shows some ways to reduce it.
There are two main steps in fitting a rule ensemble: 1) Rule generation and 2) Estimation of the final ensemble. Both can be adjusted to reduce computational load.
By default, pre
uses the conditional inference tree
algorithm Hothorn, Hornik, & Zeileis
(2006) as implemented in function ctree
of
R
package partykit
Hothorn & Zeileis (2015) for rule induction.
The main reason is that it does not present with a selection bias
towards variables with a greater number of possible cut points. Yet, use
of ctree
brings a relatively heavy computational load:
airq <- airquality[complete.cases(airquality), ]
airq$Month <- factor(airq$Month)
library("pre")
set.seed(42)
system.time(airq.ens <- pre(Ozone ~ ., data = airq))
## user system elapsed
## 2.74 0.01 2.75
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 3.229132
## number of terms = 13
## mean cv error (se) = 336.2188 (95.24635)
##
## cv error type : Mean-Squared Error
Computational load can be substantially reduced by employing the CART
algorithm of Breiman, Friedman, Olshen, &
Stone (1984) as implemented in function rpart
from
the package of the same name of Therneau &
Atkinson (2022). This can be specified through the
tree.unbiased
argument:
## user system elapsed
## 1.67 0.03 1.70
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 1.935814
## number of terms = 23
## mean cv error (se) = 287.7377 (86.43107)
##
## cv error type : Mean-Squared Error
Note, however, that the variable selection induced by fitting CART
trees may also present in the final ensemble. Furthermore, use of CART
trees will tend to result in more complex ensembles, because the default
stopping criterion in rpart
is considerably less
conservative than that of ctree
. As a result, many more
rules may be created and retained.
Reducing tree (and therby rule) depth will reduce computational load of both rule fitting and estimation of the final ensemble:
## user system elapsed
## 2.26 0.05 2.33
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 1.676698
## number of terms = 14
## mean cv error (se) = 424.1761 (126.9922)
##
## cv error type : Mean-Squared Error
Likely, reducing the maximum depth will improve interpretability, but may decrease predictive accuracy for the final ensemble.
By default, 500 trees are generated. Computation time can be reduced
substantially by reducing the number of trees. This may of course
negatively impact predictive accuracy. When using a smaller number of
trees, it is likely beneficial to increase the learning rate
(learnrate = .01
, by default) accordingly:
set.seed(42)
system.time(airq.ens.nt <- pre(Ozone ~ ., data = airq, ntrees = 100L, learnrate = .05))
## user system elapsed
## 0.83 0.01 0.85
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 1.935814
## number of terms = 11
## mean cv error (se) = 276.9282 (69.1872)
##
## cv error type : Mean-Squared Error
Function cv.glmnet
from package glmnet
is
used for fitting the final model. The relevant arguments of function
cv.glmnet
can be passed directly to function
pre
.
For example, parallel computation can be employed by specifying
par.final = TRUE
in the call to pre
(and
registering parallel beforehand, e.g., using doMC
).
Parallel computation will not affect performance of the final ensemble.
Note that parallel computation will only reduce computation time for
datasets with a (very) large number of observations. For smaller
datasets, use of parallel computation may even be increased computation
time.
The number of cross-validation repetitions can be also be reduced, but this will likely affect predictive performance of the final ensemble.
## user system elapsed
## 2.67 0.00 2.68
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 2.559032
## number of terms = 13
## mean cv error (se) = 307.2755 (105.2568)
##
## cv error type : Mean-Squared Error