This vignette documents the use of the MANOVA.RM
package
for the analysis of semi-parametric repeated measures designs and
multivariate data. The package consists of three parts - one for
repeated measurements, one for multivariate data and one for the
combination of both - which will be explained in detail below. All
functions calculate the Wald-type statistic (WTS) as well as the
ANOVA-type statistic (ATS) for repeated measures and a modification
thereof (MATS) for multivariate data based on means. These test
statistics can be used for arbitrary semi-parametric designs, even with
unequal covariance matrices among groups and small sample sizes. For a
short overview of the test statistics and corresponding references see
the next section. Furthermore, different resampling approaches are
provided in order to improve the small sample behavior of the test
statistics. The WTS requires non-singular covariance matrices. If the
covariance matrix is singular, a warning is returned.
For detailed explanations and examples, see also Friedrich, Konietschke and Pauly (2019).
We consider the following model \[ X_{ik} = \mu_i + \varepsilon_{ik} \] for observation vectors from group \(i\) and individual \(k\). Here, \(\mu_i\) denotes the group mean which we wish to infer and \(\varepsilon_{ik}\) are the common error terms with expectation 0 and existing (co-)variances. We do not need to assume normally distributed errors for these methods and the covariances may be unequal between groups. Null hypotheses are formulated via contrast matrices, i.e., as \(H \mu = 0\), where the rows of \(H\) sum to zero. Then the Wald-type statistic (WTS) is a quadratic form in the estimated mean vector involving the Moore-Penrose inverse of the (transformed) empirical covariance matrix. Under the null hypothesis the WTS is asymptotically \(\chi^2\) distributed with rank(H) degrees of freedom. However, this approximation is only valid for large sample sizes, see e.g. Brunner (2001). We therefore recommend to use a resampling approach as proposed by Konietschke et al. (2015) or Friedrich et al. (2017).
Since the WTS requires non-singular covariance matrices, two other test statistics have been proposed: the ATS for repeated measures designs (Brunner, 2001) and the MATS for multivariate data (Friedrich and Pauly, 2018). The ATS is a scaled quadratic form in the mean vector, which is usually approximated by an F-distribution with estimated degrees of freedom. This approach is also implemented in the SAS PROC Mixed procedure. However, it is rather conservative for small sample sizes and does not provide an asymptotic level \(\alpha\) test.
The MATS for multivariate data has the advantage of being invariant under scale transformations of the data, an important feature for multivariate data. Since its asymptotic distribution involves unknown parameters, we again propose to use bootstrap approaches instead (Friedrich and Pauly, 2018).
MANOVA
and MANOVA.wide
is the format of the
data: For MANOVA
data need to be in long format (one row
per measurement), whereas MANOVA.wide
is for data in wide
format (one row per individual).Note that a combination of both types of data, i.e., multivariate
longitudinal data, can now also be analyzed with MANOVA.RM
,
see below.
RM
functionThe RM
function calculates the WTS and ATS in a repeated
measures design with an arbitrary number of crossed whole-plot
(between-subject) and sub-plot (within-subject) factors. The resampling
methods provided are a permutation procedure, a parametric bootstrap
approach and a wild bootstrap using Rademacher weights. The permutation
procedure provides no valid approach for the ATS and is thus not
implemented.
For illustration purposes, we consider the data set
o2cons
, which is included in MANOVA.RM
.
library(MANOVA.RM)
data(o2cons)
The data set contains measurements on the oxygen consumption of leukocytes in the presence and absence of inactivated staphylococci at three consecutive time points. More details on the study and the statistical model can be found in Friedrich et al. (2017). Due to the study design, both time and staphylococci are within-subject factors while the treatment (Verum vs. Placebo) is a between-subject factor.
head(o2cons)
## O2 Staphylococci Time Group Subject
## 1 1.48 1 6 P 1
## 2 2.81 1 12 P 1
## 3 3.56 1 18 P 1
## 4 1.04 0 6 P 1
## 5 2.07 0 12 P 1
## 6 2.81 0 18 P 1
We will now analyze this data using the RM
function. The
RM
function takes as arguments:
formula
: A formula consisting of the outcome variable
on the left hand side of a ~ operator and the factor variables of
interest on the right hand side. The within-subject factor(s) must be
specified last in the formula,
e.g. cbind(outcome1, outcome2) ~ between1 * between2 * within1 * within2
.data
: A data.frame containing the variables in
formula
.subject
: The column name of the subjects variable in
the data frame.no.subf
: The number of within-subject factors, default
is 1. Alternatively, the within-subjects factors can be directly
specified using within
.iter
: The number of iterations for the resampling
approach. Default value is 10,000.alpha
: The significance level, default is 0.05.resampling
: The resampling method, one of ‘Perm’,
‘paramBS’ or ‘WildBS’. Default is set to ‘Perm’.para
: A logical indicating whether parallel computing
should be used. Default is FALSE
.CPU
: The number of cores used for parallel computing.
If not specified, cores are detected automatically.seed
: A random seed for the resampling procedure. If
omitted, no reproducible seed is set. Note that the random seeds for the
parallel computing differ from those without parallelisation. Thus, to
reproduce results obtained with MANOVA.RM version <0.5.1, argument
para
must be set to TRUE
.CI.method
: The method for calculating the quantiles
used for the confidence intervals, either ‘t-quantile’ (the default) or
‘resampling’ (based on quantile of the resampled WTS).dec
: The number of decimals the results should be
rounded to. Default is 3.model1 <- RM(O2 ~ Group * Staphylococci * Time, data = o2cons,
subject = "Subject", no.subf = 2, iter = 1000,
resampling = "Perm", seed = 1234)
summary(model1)
## Call:
## O2 ~ Group * Staphylococci * Time
## A repeated measures analysis with 2 within-subject factor(s) ( Staphylococci,Time ) and 1 between-subject factor(s).
##
## Descriptive:
## Group Staphylococci Time n Means Lower 95 % CI Upper 95 % CI
## 1 P 0 6 12 1.322 1.150 1.493
## 5 P 0 12 12 2.430 2.196 2.664
## 9 P 0 18 12 3.425 3.123 3.727
## 3 P 1 6 12 1.618 1.479 1.758
## 7 P 1 12 12 2.434 2.164 2.704
## 11 P 1 18 12 3.527 3.273 3.781
## 2 V 0 6 12 1.394 1.201 1.588
## 6 V 0 12 12 2.570 2.355 2.785
## 10 V 0 18 12 3.677 3.374 3.979
## 4 V 1 6 12 1.656 1.471 1.840
## 8 V 1 12 12 2.799 2.500 3.098
## 12 V 1 18 12 4.029 3.802 4.257
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## Group "11.167" "1" "0.001"
## Staphylococci "20.401" "1" "<0.001"
## Group:Staphylococci "2.554" "1" "0.11"
## Time "4113.057" "2" "<0.001"
## Group:Time "24.105" "2" "<0.001"
## Staphylococci:Time "4.334" "2" "0.115"
## Group:Staphylococci:Time "4.303" "2" "0.116"
##
## ANOVA-Type Statistic (ATS):
## Test statistic df1 df2 p-value
## Group "11.167" "1" "57.959" "0.001"
## Staphylococci "20.401" "1" "Inf" "<0.001"
## Group:Staphylococci "2.554" "1" "Inf" "0.11"
## Time "960.208" "1.524" "Inf" "<0.001"
## Group:Time "5.393" "1.524" "Inf" "0.009"
## Staphylococci:Time "2.366" "1.983" "Inf" "0.094"
## Group:Staphylococci:Time "2.147" "1.983" "Inf" "0.117"
##
## p-values resampling:
## Perm (WTS)
## Group "<0.001"
## Staphylococci "<0.001"
## Group:Staphylococci "0.119"
## Time "<0.001"
## Group:Time "<0.001"
## Staphylococci:Time "0.152"
## Group:Staphylococci:Time "0.146"
The output consists of four parts: model1$Descriptive
gives an overview of the descriptive statistics: The number of
observations, mean and confidence intervals (based on either quantiles
of the t-distribution or the resampling-based quantiles) are displayed
for each factor level combination. model1$WTS
contains the
results for the Wald-type test: The test statistic, degree of freedom
and p-values based on the asymptotic \(\chi^2\) distribution are displayed. Note
that the \(\chi^2\) approximation is
very liberal for small sample sizes. model1$ATS
contains
the corresponding results based on the ATS. This test statistic tends to
rather conservative decisions in case of small sample sizes and is even
asymptotically only an approximation, thus not providing an asymptotic
level \(\alpha\) test. Finally,
model1$resampling
contains the p-values based on the chosen
resampling approach. For the ATS, the permutation procedure cannot be
applied. Due to the above mentioned issues for small sample sizes, the
resampling procedure is recommended.
We consider the data set EEG
from the
MANOVA.RM
package: At the Department of Neurology,
University Clinic of Salzburg, 160 patients were diagnosed with either
Alzheimer’s Disease (AD), mild cognitive impairments (MCI), or
subjective cognitive complaints without clinically significant deficits
(SCC), based on neuropsychological diagnostics (Bathke et al.(2018)).
This data set contains z-scores for brain rate and Hjorth complexity,
each measured at frontal, temporal and central electrode positions and
averaged across hemispheres. In addition to standardization, complexity
values were multiplied by -1 in order to make them more easily
comparable to brain rate values: For brain rate we know that the values
decrease with age and pathology, while Hjorth complexity values are
known to increase with age and pathology. The three between-subject
factors considered were sex (men vs. women), diagnosis (AD vs. MCI
vs. SCC), and age (\(< 70\)
vs. \(>= 70\) years). Additionally,
the within-subject factors region (frontal, temporal, central) and
feature (brain rate, complexity) structure the response vector.
data(EEG)
EEG_model <- RM(resp ~ sex * diagnosis * feature * region,
data = EEG, subject = "id", within = c("feature", "region"),
resampling = "WildBS",
iter = 1000, alpha = 0.01, seed = 987)
summary(EEG_model)
## Call:
## resp ~ sex * diagnosis * feature * region
## A repeated measures analysis with 2 within-subject factor(s) ( feature,region ) and 2 between-subject factor(s).
##
## Descriptive:
## sex diagnosis feature region n Means Lower 99 % CI Upper 99 % CI
## 1 M AD brainrate central 12 -1.010 -4.881 2.861
## 13 M AD brainrate frontal 12 -1.007 -4.991 2.977
## 25 M AD brainrate temporal 12 -0.987 -4.493 2.519
## 7 M AD complexity central 12 -1.488 -10.053 7.077
## 19 M AD complexity frontal 12 -1.086 -6.906 4.735
## 31 M AD complexity temporal 12 -1.320 -7.203 4.562
## 3 M MCI brainrate central 27 -0.447 -1.591 0.696
## 15 M MCI brainrate frontal 27 -0.464 -1.646 0.719
## 27 M MCI brainrate temporal 27 -0.506 -1.584 0.572
## 9 M MCI complexity central 27 -0.257 -1.139 0.625
## 21 M MCI complexity frontal 27 -0.459 -1.997 1.079
## 33 M MCI complexity temporal 27 -0.490 -1.796 0.816
## 5 M SCC brainrate central 20 0.459 -0.414 1.332
## 17 M SCC brainrate frontal 20 0.243 -0.670 1.156
## 29 M SCC brainrate temporal 20 0.409 -1.210 2.028
## 11 M SCC complexity central 20 0.349 -0.070 0.767
## 23 M SCC complexity frontal 20 0.095 -1.037 1.227
## 35 M SCC complexity temporal 20 0.314 -0.598 1.226
## 2 W AD brainrate central 24 -0.294 -1.978 1.391
## 14 W AD brainrate frontal 24 -0.159 -1.813 1.495
## 26 W AD brainrate temporal 24 -0.285 -1.776 1.206
## 8 W AD complexity central 24 -0.128 -1.372 1.116
## 20 W AD complexity frontal 24 0.026 -1.212 1.264
## 32 W AD complexity temporal 24 -0.194 -1.670 1.283
## 4 W MCI brainrate central 30 -0.106 -1.076 0.863
## 16 W MCI brainrate frontal 30 -0.074 -1.032 0.885
## 28 W MCI brainrate temporal 30 -0.069 -1.064 0.925
## 10 W MCI complexity central 30 0.094 -0.464 0.652
## 22 W MCI complexity frontal 30 0.131 -0.768 1.031
## 34 W MCI complexity temporal 30 0.121 -0.652 0.895
## 6 W SCC brainrate central 47 0.537 -0.049 1.124
## 18 W SCC brainrate frontal 47 0.548 -0.062 1.159
## 30 W SCC brainrate temporal 47 0.559 -0.015 1.133
## 12 W SCC complexity central 47 0.384 0.110 0.659
## 24 W SCC complexity frontal 47 0.403 -0.038 0.845
## 36 W SCC complexity temporal 47 0.506 0.132 0.880
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## sex "9.973" "1" "0.002"
## diagnosis "42.383" "2" "<0.001"
## sex:diagnosis "3.777" "2" "0.151"
## feature "0.086" "1" "0.769"
## sex:feature "2.167" "1" "0.141"
## diagnosis:feature "5.317" "2" "0.07"
## sex:diagnosis:feature "1.735" "2" "0.42"
## region "0.07" "2" "0.966"
## sex:region "0.876" "2" "0.645"
## diagnosis:region "6.121" "4" "0.19"
## sex:diagnosis:region "1.532" "4" "0.821"
## feature:region "0.652" "2" "0.722"
## sex:feature:region "0.423" "2" "0.81"
## diagnosis:feature:region "7.142" "4" "0.129"
## sex:diagnosis:feature:region "2.274" "4" "0.686"
##
## ANOVA-Type Statistic (ATS):
## Test statistic df1 df2 p-value
## sex "9.973" "1" "36.497" "0.003"
## diagnosis "13.124" "1.343" "36.497" "<0.001"
## sex:diagnosis "1.904" "1.343" "36.497" "0.174"
## feature "0.086" "1" "Inf" "0.769"
## sex:feature "2.167" "1" "Inf" "0.141"
## diagnosis:feature "1.437" "1.562" "Inf" "0.238"
## sex:diagnosis:feature "1.031" "1.562" "Inf" "0.342"
## region "0.018" "1.611" "Inf" "0.965"
## sex:region "0.371" "1.611" "Inf" "0.644"
## diagnosis:region "1.091" "2.046" "Inf" "0.337"
## sex:diagnosis:region "0.376" "2.046" "Inf" "0.691"
## feature:region "0.126" "1.421" "Inf" "0.81"
## sex:feature:region "0.077" "1.421" "Inf" "0.864"
## diagnosis:feature:region "0.829" "1.624" "Inf" "0.415"
## sex:diagnosis:feature:region "0.611" "1.624" "Inf" "0.51"
##
## p-values resampling:
## WildBS (WTS) WildBS (ATS)
## sex "0.003" "0.003"
## diagnosis "<0.001" "<0.001"
## sex:diagnosis "0.127" "0.131"
## feature "0.769" "0.769"
## sex:feature "0.141" "0.141"
## diagnosis:feature "0.072" "0.251"
## sex:diagnosis:feature "0.467" "0.388"
## region "0.976" "0.986"
## sex:region "0.693" "0.704"
## diagnosis:region "0.154" "0.344"
## sex:diagnosis:region "0.857" "0.817"
## feature:region "0.799" "0.936"
## sex:feature:region "0.888" "0.942"
## diagnosis:feature:region "0.111" "0.507"
## sex:diagnosis:feature:region "0.741" "0.657"
We find significant effects at level \(\alpha = 0.01\) of the between-subject factors sex and diagnosis, while none of the within-subject factors or interactions become significant.
The RM()
function is equipped with a plotting option,
displaying the calculated means along with the \((1-\alpha)\) confidence intervals. The
plot
function takes an RM
object as an
argument. Furthermore, additional graphical parameters can be used to
customize the plots. The optional argument legendpos
specifies the position of the legend in higher-way layouts, while the
argument gap
(default 0.1) specifies the distance between
the error bars.
plot(model1, leg = FALSE)
For illustration purposes, we reduce the EEG-model above to a two-way design:
EEGnew <- EEG[EEG$region == "temporal", ]
EEG_model2 <- RM(resp ~ sex*feature, within = "feature", no.subf = 1, subject = "id", data = EEGnew)
plot(EEG_model2, legendpos = "topleft", col = c(4, 2))
MANOVA
functionThe MANOVA
function calculates the WTS for multivariate
data in a design with crossed or nested factors. Additionally, a
modified ANOVA-type statistic (MATS) is calculated which has the
additional advantage of being applicable to designs involving singular
covariance matrices and is invariant under scale transformations of the
data, see Friedrich and Pauly (2018) for details. The resampling methods
provided are a parametric bootstrap approach and a wild bootstrap using
Rademacher weights. Note that only balanced nested designs (i.e., the
same number of factor levels \(b\) for
each level of the factor \(A\)) with up
to three factors are implemented. Designs involving both crossed and
nested factors are not implemented. Data must be provided in long format
(for wide format, see MANOVA.wide
below).
We again consider the data set EEG
from the
MANOVA.RM
package, but now we ignore the within-subject
factors. Therefore, we are now in a multivariate setting with 6
measurements per patient and three crossed factors sex, age and
diagnosis. Due to the small number of subjects in some groups (e.g.,
only 2 male patients aged \(<\) 70
were diagnosed with AD) we restrict our analyses to two factors at a
time. The analysis of this example is shown below.
The most important arguments of the MANOVA
function
are:
formula
: A formula consisting of the outcome variable
on the left hand side of a ~ operator and the factor variables of
interest on the right hand side.data
: A data.frame containing the variables in
formula
.subject
: The column name of the subjects variable in
the data frame.resampling
: The resampling method, one of ‘paramBS’ and
‘WildBS’. Default is set to ‘paramBS’.nested.levels.unique
: For nested designs only: A
logical specifying whether the levels of the nested factor(s) are
labeled uniquely or not. Default is FALSE, i.e., the levels of the
nested factor are the same for each level of the main factor. For an
example and more explanations see the GFD package and the corresponding
vignette.data(EEG)
EEG_MANOVA <- MANOVA(resp ~ sex * diagnosis,
data = EEG, subject = "id", resampling = "paramBS",
iter = 1000, alpha = 0.01, seed = 987)
summary(EEG_MANOVA)
## Call:
## resp ~ sex * diagnosis
##
## Descriptive:
## sex diagnosis n Mean 1 Mean 2 Mean 3 Mean 4 Mean 5 Mean 6
## 1 M AD 12 -0.987 -1.007 -1.010 -1.320 -1.086 -1.488
## 3 M MCI 27 -0.506 -0.464 -0.447 -0.490 -0.459 -0.257
## 5 M SCC 20 0.409 0.243 0.459 0.314 0.095 0.349
## 2 W AD 24 -0.285 -0.159 -0.294 -0.194 0.026 -0.128
## 4 W MCI 30 -0.069 -0.074 -0.106 0.121 0.131 0.094
## 6 W SCC 47 0.559 0.548 0.537 0.506 0.403 0.384
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## sex "12.604" "6" "0.05"
## diagnosis "55.158" "12" "<0.001"
## sex:diagnosis "9.79" "12" "0.634"
##
## modified ANOVA-Type Statistic (MATS):
## Test statistic
## sex 45.263
## diagnosis 194.165
## sex:diagnosis 18.401
##
## p-values resampling:
## paramBS (WTS) paramBS (MATS)
## sex "0.127" "0.006"
## diagnosis "<0.001" "<0.001"
## sex:diagnosis "0.762" "0.209"
The output consists of several parts: First, some descriptive
statistics of the data set are displayed, namely the sample size and
mean for each factor level combination and each dimension. (Dimensions
occur in the same order as in the original data set. For a labeled
output, use MANOVA.wide()
.) In this example, Mean 1 to Mean
3 correspond to the brainrate (temporal, frontal, central) while Mean
4–6 correspond to complexity. Second, the results based on the WTS are
displayed. For each factor, the test statistic, degree of freedom and
p-value is given. For the MATS, only the value of the test statistic is
given, since inference is here based on the resampling procedure only.
The resampling-based p-values are finally displayed for both test
statistics.
MANOVA.wide
functionThe MANOVA.wide
function is used for data provided in
wide format, i.e., with one row per unit. Input and output are almost
identical to the MANOVA
function, except that no
subject
variable needs to be specified. The formula now
consists of the matrix of outcome variables (bound together via
cbind()
) on the left hand side of the ~ operator and the
factors of interest on the right. For an example we use a data set on
producing plastic film from Krzanowski (1998, p. 381), see also
summary.manova
:
tear <- c(6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3,
6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6)
gloss <- c(9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0, 9.9, 9.5, 9.4,
9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2)
opacity <- c(4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7,
2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9)
rate <- gl(2,10, labels = c("Low", "High"))
additive <- gl(2, 5, length = 20, labels = c("Low", "High"))
example <- data.frame(tear, gloss, opacity, rate, additive)
fit <- MANOVA.wide(cbind(tear, gloss, opacity) ~ rate * additive, data = example, iter = 1000)
summary(fit)
## Call:
## cbind(tear, gloss, opacity) ~ rate * additive
##
## Descriptive:
## rate additive n tear gloss opacity
## 1 Low Low 5 6.30 9.56 3.74
## 2 Low High 5 6.68 9.58 3.84
## 3 High Low 5 6.88 8.72 3.14
## 4 High High 5 7.28 9.40 5.02
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## rate "25.9" "3" "<0.001"
## additive "14.591" "3" "0.002"
## rate:additive "4.589" "3" "0.204"
##
## modified ANOVA-Type Statistic (MATS):
## Test statistic
## rate 23.808
## additive 11.835
## rate:additive 4.296
##
## p-values resampling:
## paramBS (WTS) paramBS (MATS)
## rate "0.005" "0.001"
## additive "0.026" "0.04"
## rate:additive "0.31" "0.274"
A function for calculating and plotting of confidence regions is
available for MANOVA
objects. Details on the methods can be
found in Friedrich and Pauly (2018).
Confidence regions can be calculated using the conf.reg
function. Note that confidence regions can only be plotted in designs
with 2 dimensions. The conf.reg
function takes as
arguments:
object
: A MANOVA
object calculated via
MANOVA()
or MANOVA.wide()
.nullhypo
: In designs involving more than one factor, it
is necessary to specify the null hypothesis, i.e., the contrast of
interest.As an example, we consider the data set water
from the
package HSAUR3
. The data set contains measurements of
mortality and drinking water hardness for 61 cities in England and
Wales. Suppose we want to analyse whether these measurements differ
between northern and southern towns. Since the data set is in wide
format, we need to use the MANOVA.wide
function.
if (requireNamespace("HSAUR3", quietly = TRUE)) {
library(HSAUR3)
data(water)
test <- MANOVA.wide(cbind(mortality, hardness) ~ location, data = water, iter = 1000, resampling = "paramBS", seed = 123)
summary(test)
cr <- conf.reg(test)
cr
plot(cr, col = 2, lty = 2, xlab = "Difference in mortality", ylab ="Difference in water hardness")
}
## Lade nötiges Paket: tools
## Call:
## cbind(mortality, hardness) ~ location
##
## Descriptive:
## location n mortality hardness
## 1 North 35 1633.600 30.400
## 2 South 26 1376.808 69.769
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## location "51.584" "2" "<0.001"
##
## modified ANOVA-Type Statistic (MATS):
## Test statistic
## location 69.882
##
## p-values resampling:
## paramBS (WTS) paramBS (MATS)
## location "<0.001" "<0.001"
The output consists of the necessary parameters specifying the
ellipsoid: the center, the eigenvectors which determine the axes of the
ellipsoid as well as the scaling factors for the eigenvectors, which are
calculated based on the eigenvalues, the bootstrap quantile and the
total sample size. For more information on the construction of the
confidence ellipses see Friedrich and Pauly (2018). For observations
with two dimensions, the confidence ellipse can be plotted using the
generic plot
function. The usual plotting parameters can be
used to customize the plots.
Calculation of simultaneous confidence intervals and multivariate
p-values for contrasts of the mean vector is based on the sum statistic,
see Friedrich and Pauly (2018) for details. Note that the confidence
intervals and p-values returned are simultaneous, i.e., they maintain
the given alpha-level. Confidence intervals are calculated based on
summary effects, i.e., averaging over all dimensions, whereas the
returned p-values are multivariate. If the original model contains more
than one factor, the corresponding contrasts are calculated for all
combinations of factor levels. If this is not desired, the parameter
interaction
may be set to FALSE and the factor of interest
specified. This has the same effect as fitting the model with only the
respective factor and then calculating the post-hoc tests based on this
model. The function simCI
takes the following
arguments:
object
: A MANOVA
object calculated via
MANOVA()
or MANOVA.wide()
.contrast
: The contrast of interest, can either be
“pairwise” or “user-defined”.contmat
: For user-defined contrasts, the contrast
matrix must be specified here. Note that its rows must sum to zero.type
: Pairwise contrasts are calculated based on the
contrMat function in package multcomp, see also the corresponding help
page. The type of the pairwise comparison must be specified here.base
: an integer specifying which group is considered
the baseline group for Dunnett contrasts, see also the documentation of
contrMat()
from the multcomp-package.interaction
: Logical. If interaction = FALSE in models
with more than one factor, the factor of interest for the post-hoc
analysis must be specified. Default is TRUE, which means post-hoc tests
are performed for all factor level combinations.factor
: Only needed if interaction = FALSE. Specifies
the factor for which post-hoc analysis are requested.silent
: Set to TRUE to suppress output.As an example, we consider the EEG_MANOVA
example from
above:
# pairwise comparison using Tukey contrasts
simCI(EEG_MANOVA, contrast = "pairwise", type = "Tukey")
##
## #------ Call -----#
##
## - Contrast: Tukey
## - Confidence level: 99 %
##
## #------Multivariate post-hoc comparisons: p-values -----#
##
## contrast p.value
## 1 M MCI - M AD 0.973
## 2 M SCC - M AD 0.568
## 3 W AD - M AD 0.885
## 4 W MCI - M AD 0.768
## 5 W SCC - M AD 0.431
## 6 M SCC - M MCI 0.371
## 7 W AD - M MCI 0.995
## 8 W MCI - M MCI 0.837
## 9 W SCC - M MCI 0.112
## 10 W AD - M SCC 0.840
## 11 W MCI - M SCC 0.939
## 12 W SCC - M SCC 0.992
## 13 W MCI - W AD 0.997
## 14 W SCC - W AD 0.541
## 15 W SCC - W MCI 0.586
##
## #-----------Confidence intervals for summary effects-------------#
##
## Estimate Lower Upper
## M MCI - M AD 4.275 -15.170054 23.720054
## M SCC - M AD 8.767 -10.142709 27.676709
## W AD - M AD 5.864 -13.918584 25.646584
## W MCI - M AD 6.995 -11.990624 25.980624
## W SCC - M AD 9.835 -8.828579 28.498579
## M SCC - M MCI 4.492 -3.618270 12.602270
## W AD - M MCI 1.589 -8.388475 11.566475
## W MCI - M MCI 2.720 -5.565728 11.005728
## W SCC - M MCI 5.560 -1.958548 13.078548
## W AD - M SCC -2.903 -11.792148 5.986148
## W MCI - M SCC -1.772 -8.708877 5.164877
## W SCC - M SCC 1.068 -4.931627 7.067627
## W MCI - W AD 1.131 -7.918518 10.180518
## W SCC - W AD 3.971 -4.381787 12.323787
## W SCC - W MCI 2.840 -3.394768 9.074768
A one-way layout using MANOVA.wide():
oneway <- MANOVA.wide(cbind(brainrate_temporal, brainrate_central) ~ diagnosis, data = EEGwide, iter = 1000)
# and a user-defined contrast matrix
H <- as.matrix(cbind(rep(1, 5), -1*Matrix::Diagonal(5)))
# user-specified comparison
simCI(oneway, contrast = "user-defined", contmat = H)
##
## #------ Call -----#
##
## - Contrast: user-defined
## - Confidence level: 95 %
##
## #------Multivariate post-hoc comparisons: p-values -----#
##
## [1] 1.000 0.685 0.665 0.012 0.006
##
## #-----------Confidence intervals for summary effects-------------#
##
## Estimate Lower Upper
## 1 0.013 -0.9462296 0.9722296
## 2 -0.243 -1.0046345 0.5186345
## 3 -0.251 -1.0128958 0.5108958
## 4 -1.033 -1.7686629 -0.2973371
## 5 -1.033 -1.7490081 -0.3169919
If the global null hypothesis is rejected, it may be of interest to infer the univariate outcomes that caused the rejection. To answer this question, one can simply calculate the univariate p-values and adjust them accordingly for multiple testing, e.g., using Bonferroni-correction. An example is given below. We consider a one-way layout of the EEG data with influencing factor sex and all 6 outcome variables:
model_sex <- MANOVA.wide(cbind(brainrate_temporal, brainrate_central, brainrate_frontal,
complexity_temporal, complexity_central, complexity_frontal) ~ sex, data = EEGwide, iter = 1000, seed = 987)
summary(model_sex)
## Call:
## cbind(brainrate_temporal, brainrate_central, brainrate_frontal,
## complexity_temporal, complexity_central, complexity_frontal) ~
## sex
##
## Descriptive:
## sex n brainrate_temporal brainrate_central brainrate_frontal
## 1 M 59 -0.294 -0.254 -0.335
## 2 W 101 0.172 0.149 0.195
## complexity_temporal complexity_central complexity_frontal
## 1 -0.386 -0.302 -0.399
## 2 0.226 0.176 0.233
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## sex "15.382" "6" "0.017"
##
## modified ANOVA-Type Statistic (MATS):
## Test statistic
## sex 55.725
##
## p-values resampling:
## paramBS (WTS) paramBS (MATS)
## sex "0.034" "<0.001"
Since the global hypothesis is rejected at 5% level, we continue with the univariate calculations:
EEG1 <- MANOVA.wide(brainrate_temporal ~ sex, data = EEGwide, iter = 1000, seed = 987)
EEG2 <- MANOVA.wide(brainrate_central ~ sex, data = EEGwide, iter = 1000, seed = 987)
EEG3 <- MANOVA.wide(brainrate_frontal ~ sex, data = EEGwide, iter = 1000, seed = 987)
EEG4 <- MANOVA.wide(complexity_temporal ~ sex, data = EEGwide, iter = 1000, seed = 987)
EEG5 <- MANOVA.wide(complexity_central ~ sex, data = EEGwide, iter = 1000, seed = 987)
EEG6 <- MANOVA.wide(complexity_frontal ~ sex, data = EEGwide, iter = 1000, seed = 987)
Adjust for multiple testing using the parametric bootstrap MATS and Bonferroni adjustment:
p.adjust(c(EEG1$resampling[, 2], EEG2$resampling[, 2], EEG3$resampling[, 2],
EEG4$resampling[, 2], EEG5$resampling[, 2], EEG6$resampling[, 2]),
method = "bonferroni")
## [1] 0.024 0.096 0.006 0.006 0.066 0.000
This reveals that the central variables (comparison 2 and 5) do not contribute to the significant difference between male and female patients.
To create a data example for a nested design, we use the
curdies
data set from the GFD
package and
extend it by introducing an artificial second outcome variable. In this
data set, the levels of the nested factor (site) are named uniquely,
i.e., levels 1-3 of factor site belong to “WINTER”, whereas levels 4-6
belong to “SUMMER”. Therefore, nested.levels.unique
must be
set to TRUE. The code for the analysis using both wide and long format
is presented below.
if (requireNamespace("GFD", quietly = TRUE)) {
library(GFD)
data(curdies)
set.seed(123)
curdies$dug2 <- curdies$dugesia + rnorm(36)
# first possibility: MANOVA.wide
fit1 <- MANOVA.wide(cbind(dugesia, dug2) ~ season + season:site, data = curdies, iter = 1000, nested.levels.unique = TRUE, seed = 123)
# second possibility: MANOVA (long format)
dug <- c(curdies$dugesia, curdies$dug2)
season <- rep(curdies$season, 2)
site <- rep(curdies$site, 2)
curd <- data.frame(dug, season, site, subject = rep(1:36, 2))
fit2 <- MANOVA(dug ~ season + season:site, data = curd, subject = "subject", nested.levels.unique = TRUE, seed = 123, iter = 1000)
# comparison of results
summary(fit1)
summary(fit2)
}
## Call:
## cbind(dugesia, dug2) ~ season + season:site
##
## Descriptive:
## season site n dugesia dug2
## 1 SUMMER 4 6 0.419 -0.050
## 2 SUMMER 5 6 0.229 0.028
## 3 SUMMER 6 6 0.194 0.763
## 4 WINTER 1 6 2.049 2.497
## 5 WINTER 2 6 4.182 4.123
## 6 WINTER 3 6 0.678 0.724
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## season "6.999" "2" "0.03"
## season:site "16.621" "8" "0.034"
##
## modified ANOVA-Type Statistic (MATS):
## Test statistic
## season 12.296
## season:site 15.064
##
## p-values resampling:
## paramBS (WTS) paramBS (MATS)
## season "0.064" "0.036"
## season:site "0.305" "0.249"
## Call:
## dug ~ season + season:site
##
## Descriptive:
## season site n Mean 1 Mean 2
## 1 SUMMER 4 6 0.419 -0.050
## 2 SUMMER 5 6 0.229 0.028
## 3 SUMMER 6 6 0.194 0.763
## 4 WINTER 1 6 2.049 2.497
## 5 WINTER 2 6 4.182 4.123
## 6 WINTER 3 6 0.678 0.724
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## season "6.999" "2" "0.03"
## season:site "16.621" "8" "0.034"
##
## modified ANOVA-Type Statistic (MATS):
## Test statistic
## season 12.296
## season:site 15.064
##
## p-values resampling:
## paramBS (WTS) paramBS (MATS)
## season "0.064" "0.036"
## season:site "0.305" "0.249"
The multRM()
function provides a combination of the
approaches described above. It is suitable for repeated measures
designs, in which multiple outcomes have been recorded at each time
point. The multRM()
function takes as arguments:
formula
: A model formula object. The left hand side
contains the matrix of response variables (using cbind()) and the right
hand side contains the factor variables of interest. The within-subject
factor(s) must be specified last in the formula,
e.g. cbind(outcome1, outcome2) ~ between1 * between2 * within1 * within2
.data
: A data.frame, list or environment containing the
variables in formula
. Data must be in long format and must
not contain missing values.subject
: The column name of the subjects in the data.
NOTE: Subjects within different groups of between-subject factors must
have individual labels.within
: Specifies the within-subject factor(s) in the
formula.iter
: The number of iterations used for calculating the
resampled statistic. The default option is 10,000.alpha
: A number specifying the significance level; the
default is 0.05.resampling
: The resampling method to be used, one of
“paramBS” (parametric bootstrap approach) and “WildBS” (wild bootstrap
approach with Rademacher weights).para
: Logical, indicating whether parallel computing
should be used. Default is FALSE.CPU
: The number of cores used for parallel computing.
If omitted, cores are detected automatically.seed
: A random seed for the resampling procedure. If
omitted, no reproducible seed is set.dec
: Number of decimals the results should be rounded
to. Default is 3.As an example, we again use the EEG dataset. This time, imagine we
have two outcomes (brainrate and complexity) measured for each of the
three regions (within-subject factor). We additionally consider the
between-subject factor sex. The tidyr
package can be used
to transform our original data to this format.
if (requireNamespace("tidyr", quietly = TRUE)) {
library(tidyr)
eeg <- spread(EEG, feature, resp)
head(eeg)
fit <- multRM(cbind(brainrate, complexity) ~ sex * region, data = eeg, subject = "id", within = "region", iter = 1000)
summary(fit)
}
## Call:
## cbind(brainrate, complexity) ~ sex * region
## A multivariate repeated measures analysis with 1 within-subject factor(s) ( region )and 1 between-subject factor(s).
##
## Descriptive:
## sex region n brainrate complexity
## 1 M central 59 -0.254 -0.302
## 2 M frontal 59 -0.335 -0.399
## 3 M temporal 59 -0.294 -0.386
## 4 W central 101 0.149 0.176
## 5 W frontal 101 0.195 0.233
## 6 W temporal 101 0.172 0.226
##
## Wald-Type Statistic (WTS):
## Test statistic df p-value
## sex "12.45" "2" "0.002"
## region "0.192" "4" "0.996"
## sex:region "2.79" "4" "0.593"
##
## modified ANOVA-Type Statistic (MATS):
## Test statistic
## sex 54.203
## region 0.048
## sex:region 0.703
##
## p-values resampling:
## paramBS (WTS) paramBS (MATS)
## sex "0.002" "<0.001"
## region "0.997" "0.992"
## sex:region "0.609" "0.431"
The output is similar to that of MANOVA()
described
above.
Bathke, A. et al. (2018). Testing Mean Differences among Groups: Multivariate and Repeated Measures Analysis with Minimal Assumptions. Multivariate Behavioral Research, 53(3), 348-359, DOI: 10.1080/00273171.2018.1446320.
Brunner, E. (2001). Asymptotic and approximate analysis of repeated measures designs under heteroscedasticity. Mathematical Statistics with Applications in Biometry.
Friedrich, S., Brunner, E. and Pauly, M. (2017). Permuting longitudinal data in spite of the dependencies. Journal of Multivariate Analysis, 153, 255-265.
Friedrich, S., and Pauly, M. (2018). MATS: Inference for potentially singular and heteroscedastic MANOVA. Journal of Multivariate Analysis, 165, 166-179, DOI: 10.1016/j.jmva.2017.12.008.
Friedrich, S., Konietschke, F., and Pauly, M. (2019). Resampling-Based Analysis of Multivariate Data and Repeated Measures Designs with the R Package MANOVA.RM. The R Journal, 11(2), 380-400.
Konietschke, F., Bathke, A. C., Harrar, S. W. and Pauly, M. (2015). Parametric and nonparametric bootstrap methods for general MANOVA. Journal of Multivariate Analysis, 140, 291-301.