| Type: | Package | 
| Title: | Permutations Tests and Performance Indicator for Zero-Inflated Proportions Response | 
| Version: | 0.1.1 | 
| Date: | 2021-06-07 | 
| Author: | Melina Ribaud | 
| Maintainer: | Melina Ribaud <melina.ribaud@gmail.com> | 
| Description: | Permutations tests to identify factor correlated to zero-inflated proportions response. Provide a performance indicator based on Spearman correlation to quantify the part of correlation explained by the selected set of factors. See details for the method at the following preprint e.g.: https://hal.archives-ouvertes.fr/hal-02936779v3. | 
| URL: | https://gitlab.paca.inrae.fr/meribaud/ziprop | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Depends: | R (≥ 3.5.0), rgenoud, purrr, data.table, parallel | 
| Suggests: | markdown, knitr, ggplot2, ggrepel, ggthemes, kableExtra, stringr | 
| RoxygenNote: | 7.1.1 | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2021-06-09 12:50:02 UTC; melinaribaud | 
| Repository: | CRAN | 
| Date/Publication: | 2021-06-09 13:20:02 UTC | 
Statistic for non-numeric factor tests
Description
Statistic for non-numeric factor tests (same statistic as H-test).
Usage
T_stat_discr(permu, al)
Arguments
| permu | the response vector. | 
| al | the factor. | 
Value
the statistic.
Examples
permu = runif(100,-10,10)
al = as.factor(sample(1:3,100,replace=TRUE))
T_stat_discr(permu, al)
Statistic for non-numeric factor multiple tests
Description
Statistic for non-numeric factor multiple tests (difference in mean ranks).
Usage
T_stat_multi(permu, al)
Arguments
| permu | the response vector. | 
| al | the factor. | 
Value
the means difference of two levels for a discrete factor.
Examples
permu = runif(100,-10,10)
al = as.factor(sample(1:3,100,replace=TRUE))
T_stat_multi(permu, al)
ZIprop: A package for Zero-Inflated Proportions data (ZIprop)
Description
We propose a by block-permutation-based methodology (i) to identify factors (discrete or continuous) that are potentially significant, (ii) to define a performance indicator to quantify the percentage of correlation explained by the significant factors subset for Zero-Inflated Proportions data (ZIprop).
References
Melina Ribaud, Edith Gabriel, Joseph Hughes, Samuel Soubeyrand. Identifying potential significant factors impacting zero-inflated proportions data. 2020. hal-02936779
The scalar delta
Description
Calculate the scalar delta.
This parameter comes from the optimal Spearman’s correlation
when the rank of two vectors X and proba are equal except on a given set of indices.
In our context, this set correspond to the zero-values of the vector proba.
Usage
delta(X, proba)
Arguments
| X | a vector. | 
| proba | a zero-inflated proportions response. | 
Value
Delta the scalar Delta calculated for the vector x
and the vector proba.
Examples
X = rnorm(100)
proba = runif(100)
proba[sample(1:100,80)]=0
Delta = delta(X,proba)
print(Delta)
diffFactors
Description
Data for the comparison of COVID-19 mortality in European and North American geographic entities
Usage
data(diffFactors)
Format
A data frame with 483 rows and 32 variables
Details
- geographic_entity_receptor are the entity receptor 
- geographic_entity_source are the entity source 
- proba is the probability that the receptor follows the mortality dynamics of the source 
- other columns are the difference between factors 
Author(s)
Melina Ribaud, Davide Martinetti and Samuel Soubeyrand
References
equineDiffFactors
Description
Equine Influenza dataset
Usage
data(equineDiffFactors)
Format
A data frame with 2256 rows and 8 variables
Details
- ID.source are the ID of source hosts 
- ID.recep are the ID of receiver hosts 
- y are the vector of transmission probabilities source -> receiver 
- other columns are the factors 
Author(s)
Melina Ribaud and Joseph Hughes
References
Zero-inflated proportions dataset
Description
A dataset example to test the package functions. The factor X1 to X5 and F1 to F5 are correlated to the responses y.
Usage
data(example_data)
Format
A data frame with 440 rows and 23 variables
Details
- ID.source are the ID of source hosts 
- ID.recep are the ID of receiver hosts 
- y are the vector of transmission probabilities source -> receiver 
- X1 to X10 are continuous factor 
- F1 to F10 are discrete factor 
Turn factor into multiple column
Description
Turns a factor with several levels into a matrix with several columns composed of zeros and ones.
Usage
fact2mat(x)
Arguments
| x | a vector. | 
Value
Columns with zeros and ones.
Examples
x = sample(1:3,100,replace = TRUE)
fact2mat(x)
The performance indicator
Description
Calculate the indicator for a vector X
and a zero-inflated proportions response proba.
Usage
indicator(X, proba)
Arguments
| X | a vector. | 
| proba | a zero-inflated proportions response. | 
Value
a scalar represents the performance indicator
and the vector proba.
Examples
X = rnorm(100)
proba = runif(100)
proba[sample(1:100,80)]=0
print(indicator(X,proba))
The max performance indicator
Description
Search for the set of parameters that maximize the indicator (equivalent to Spearman correlation). For a given set of factors scaled between 0 and 1 and a zero-inflated proportions response.
Usage
indicator_max(
  DT,
  ColNameFactor,
  ColNameWeight = "weight",
  bounds = c(-10, 10),
  max_generations = 200,
  hard_limit = TRUE,
  wait_generations = 50,
  other_class = NULL
)
Arguments
| DT | a data table contains the factors and the response. | 
| ColNameFactor | a char vector with the name of the selected factor. | 
| ColNameWeight | a char with the name of the ZI response. | 
| bounds | default is $[-10;10]$. Upper and Lower bounds. | 
| max_generations | default is 200 see genoud for more information. | 
| hard_limit | default is TRUE see genoud for more information. | 
| wait_generations | default is 50 see genoud for more information. | 
| other_class | a char vector with the name of other classes than numeric (factor or char). | 
Value
Return a list of two elements with the value of the indicator and the associate set of parameters (beta).
Examples
library(data.table)
data(example_data)
# For real cases increase max_generations and wait_generations
I_max = indicator_max(example_data,
names(example_data)[c(4:8, 14:18)],
ColNameWeight = "proba",
max_generations = 20,
wait_generations = 5)
print(I_max)
Construct Design Matrix
Description
Creates a design matrix by expanding factors to a set of dummy variables.
Usage
model_matrix(DT, ColNameFactor, other_class)
Arguments
| DT | a data table contains the factors and the response. | 
| ColNameFactor | a char vector with the name of the selected factor. | 
| other_class | a char vector with the name of other classes than numeric (factor or char). | 
Value
return the value.
Examples
library(data.table)
data(example_data)
m = model_matrix (example_data,
colnames(example_data)[-c(1:3)],
other_class = colnames(example_data)[14:23])
print(m)
Permutations tests
Description
Permutations tests to identify factor correlated to a zero-inflated proportions response. The statistic are the Spearman's correlation for numeric factor and mean by level for other factor.
Usage
permDT(
  DT,
  ColNameFactor,
  B = 1000,
  nclust = 1,
  ColNameWeight = "weight",
  ColNameRecep = "ID.recep",
  ColNameSource = "ID.source",
  seed = NULL,
  no_const = FALSE,
  num_class = ColNameFactor,
  other_class = NULL,
  multiple_test = FALSE,
  adjust_method = "none",
  alpha = 0.05
)
Arguments
| DT | a data table contains the factors and the response. | 
| ColNameFactor | a char vector with the name of the selected factor. | 
| B | number of permutations (use at least B=1000 permutations to get a correct accuracy of the p-value.) | 
| nclust | number of proc for parallel computation. | 
| ColNameWeight | a char with the name of the ZI response. | 
| ColNameRecep | colname of the column with the target names | 
| ColNameSource | colname of the column with the contributor names | 
| seed | vector with the seed for the permutations: size( | 
| no_const | FALSE for receiver block constraint for permutations: TRUE no constraint. | 
| num_class | a char vector with the name of numeric factor. | 
| other_class | a char vector with the name of other classes than numeric (factor or char). | 
| multiple_test | useful option only for discrete factors: Set TRUE to calculate multiple tests. | 
| adjust_method | p-values adjusted methods (default "none" ). c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr", "none"). | 
| alpha | significant level (default 0.05). | 
Value
A data frame with two columns. One for the statistics and the other one for the p-value.
Examples
library(data.table)
data(example_data)
res = permDT (example_data,
colnames(example_data)[c(4,10,14,20)],
B = 10,
nclust = 1,
ColNameWeight = "y",
ColNameRecep = "ID.recep",
ColNameSource = "ID.source",
seed = NULL,
num_class = colnames(example_data)[c(4,10)],
other_class = colnames(example_data)[c(14,20)])
print(res)
Scale vector
Description
Scale a vector between 0 and 1.
Usage
scale_01(x)
Arguments
| x | a vector. | 
Value
the scaled vector of x.
Examples
x = runif(100,-10,10)
x_scale = scale_01(x)
range(x_scale)