Title: | A Toolbox for Public Health and Epidemiology |
Version: | 3.0.0 |
Maintainer: | Josie Athens <josie.athens@gmail.com> |
Description: | A toolbox for making R functions and capabilities more accessible to students and professionals from Epidemiology and Public Health related disciplines. Includes a function to report coefficients and confidence intervals from models using robust standard errors (when available), functions that expand 'ggplot2' plots and functions relevant for introductory papers in Epidemiology or Public Health. Please note that use of the provided data sets is for educational purposes only. |
Depends: | R (≥ 4.5.0), emmeans, ggplot2, gtsummary, huxtable, stats |
Imports: | car, dplyr, Epi, epitools, jtools, lmtest, performance, sandwich, sjlabelled, sjmisc, survival, tibble, tidyselect |
Suggests: | broom, broom.helpers, cardx, easystats, effectsize, ggeffects, ISwR, knitr, MASS, nlme, rstatix |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.3 |
VignetteBuilder: | knitr |
BugReports: | https://github.com/josie-athens/pubh/issues |
NeedsCompilation: | no |
Packaged: | 2025-10-10 06:21:18 UTC; josie |
Author: | Josie Athens [aut, cre], Frank Harell [ctb], John Fox [ctb], R-Core [ctb] |
Repository: | CRAN |
Date/Publication: | 2025-10-21 11:20:02 UTC |
Survival of patients with sepsis.
Description
A randomised, double-blind, placebo-controlled trial of intravenous ibuprofen in 455 patients who had sepsis, defined as fever, tachycardia, tachypnea, and acute failure of at least one organ system.
Usage
Bernard
Format
A labelled tibble with 455 rows and 9 variables:
- id
Patient ID
- treat
Treatment, factor with levels "Placebo" and "Ibuprofen".
- race
Race/ethnicity, factor with levels "White", "African American" and "Other".
- fate
Mortality status at 30 days, factor with levels "Alive" and "Dead".
- apache
Baseline APACHE score.
- o2del
Oxygen delivery at baseline.
- followup
Follow-up time in hours.
- temp0
Baseline temperature in centigrades.
- temp10
Temperature after 36 hr in centigrades.
Source
Bernard, GR, et al. (1997) The effects of ibuprofen on the physiology and survival of patients with sepsis, N Engl J Med 336: 912–918.
Prevalence of Helicobacter pylori infection in preschool children.
Description
A data set containing the prevalence of Helicobacter pylori infection in preschool children according to parental history of duodenal or gastric ulcer.
Usage
Brenner
Format
A labelled tibble with 863 rows and 2 variables:
- ulcer
History of duodenal or gastric ulcer, factor with levels "No" and "Yes".
- infected
Infected with Helicobacter pylori, factor with levels "No" and "Yes".
Source
Brenner H, Rothenbacher D, Bode G, Adler G (1998) Parental history of gastric or duodenal ulcer and prevalence of Helicobacter pylori infection in preschool children: population based study. BMJ 316:665.
Migraine pain reduction.
Description
Randomised control trial on children suffering from frequent and severe migraine. Control group represents untreated children. The active treatments were either relaxation alone or relaxation with biofeedback.
Usage
Fentress
Format
A labelled tibble with 18 rows and 2 variables:
- pain
Reduction in weekly headache activity expressed as percentage of baseline data.
- group
Group, a factor with levels "Untreated", "Relaxation" (alone) and "Biofeedback" (relaxation and biofeedback).
Source
Fentress, DW, et al. (1986) Biofeedback and relaxation-response in the treatment of pediatric migraine. Dev Med Child Neurol 28:1 39-46.
Altman, DA (1991) Practical statistics for medical research. Chapman & Hall/CRC.
Examples
data(Fentress)
Fentress |>
strip_error(x = group, y = pain)
T-cell counts from Hodgkin's disease patients.
Description
Number of CD4+ T-cells and CD8+ T-cells in blood samples from patients in remission from Hodgkin's disease or in remission from disseminated malignancies.
Usage
Hodgkin
Format
A labelled tibble with 40 rows and 3 variables:
- CD4
Concentration of CD4+ T-cells (cells / mm^3).
- CD8
Concentration of CD8+ T-cells (cells / mm^3).
- Group
Group, factor with levels "Non-Hodgkin" and "Hodgkin".
Source
Shapiro, CM, et al (1986) Immunologic status of patients in remission from Hodgkin's disease and disseminated malignancies. Am J Med Sci 293:366-370.
Altman, DA (1991) Practical statistics for medical research. Chapman & Hall/CRC.
Examples
data(Hodgkin)
require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
Hodgkin <- Hodgkin |>
mutate(
Ratio = CD4 / CD8
) |>
var_labels(
Ratio = "CD4+ / CD8+ T-cells"
)
estat(~ Ratio | Group, data = Hodgkin)
Hodgkin |>
qq_plot(y = Ratio) +
facet_grid(cols = vars(Group))
Body weight and plasma volume.
Description
Body weight and plasma volume in eight healthy men.
Usage
Kirkwood
Format
A labelled data frame with 8 rows and 3 variables:
- subject
Subject ID.
- weight
Body weight in kg.
- volume
Plasma volume in litres.
Source
Kirkwood, BR and Sterne, JAC (2003) Essential Medical Statistics. Second Edition. Blackwell.
Examples
data(Kirkwood)
Kirkwood |>
ggplot(aes(weight, volume)) +
geom_point()
Breast cancer and age of childbirth.
Description
An international case-control study to test the hypothesis that breast cancer is related to the age that a woman gives childbirth.
Usage
Macmahon
Format
A labelled tibble with 185 rows and 2 variables:
- cancer
Diagnosed with breast cancer, a factor with levels "No" and "Yes".
- age
Age mother gives childbirth, factor with levels "<20", "20-24", "25-29", "30-34" and ">34".
Source
Macmahon, B. et al. (1970). Age at first birth and breast cancer risk. Bull WHO 43, 209-221.
Onchocerciasis in Sierra Leone.
Description
Study of onchocerciasis ("river blindness") in Sierra Leone, in which subjects were classified according to whether they lived in villages in savannah or rainforest area.
Usage
Oncho
Format
A labelled tibble with 1302 rows and 7 variables:
- id
Subject ID.
- mf
Infected with Onchocerciasis volvulus, factor with levels "Not-infected" and "Infected".
- area
Area of residence, factor with levels "Savannah" and "Rainforest".
- agegrp
Age group in years, factor with levels "5-9", "10-19", "20-39" and "40+".
- sex
Subject sex, factor with levels "Male" and "Female".
- mfload
Microfiliariae load.
- lesions
Severe eye lesions, factor with levels "No" and "Yes".
Source
McMahon, JE, Sowa, SIC, Maude, GH and Kirkwood BR (1988) Onchocerciasis in Sierra Leone 2: a comparison of forest and savannah villages. Trans Roy Soc Trop Med Hyg 82: 595-600.
Kirkwood, BR and Sterne, JAC (2003) Essential Medical Statistics. Second Edition. Blackwell.
Extracorporeal membrane oxygenation in neonates.
Description
A clinical trial on the value of extracorporeal membrane oxygenation for term neonates with severe respiratory failure. RCT compares active treatment against conventional management.
Usage
Roberts
Format
A labelled tibble with 185 rows and 2 variables:
- emo
Extracorporeal membrane oxygenation treatment, factor with levels "No" and "Yes".
- survived
One year survival, factor with levels "No" and "Yes".
Source
Roberts, TE (1998) Extracorporeal Membrane Oxygenation Economics Working Group. Economic evaluation and randomised controlled trial of extracorporeal membrane oxygenation: UK collaborative trial. Brit Med J 317:911-16.
Oral contraceptives and stroke.
Description
A case-control study of oral contraceptives and stroke in young women with presence or absence of hypertension. Cases represent thrombotic stroke and controls are hospital controls. The group of no hypertension includes normal blood pressure (<140/90 mm Hg) and borderline hypertension (140-159/90-94 mm Hg). Hypertension group includes moderate hypertension (160-179/95-109 mm Hg) and severe hypertension (180+/110+ mm Hg). This data has been used as an example of join exposure by Rothman for measuring interactions (see examples).
Usage
Rothman
Format
A labelled tibble with 477 rows and 3 variables:
- stroke
Thrombotic stroke, factor with levels "No" and "Yes".
- oc
Current user of oral contraceptives, factor with levels "Non-user" and "User".
- ht
Hypertension, factor with levels "No" (<160/95 mm Hg) and "Yes".
Source
Collaborative Group for the Study of Stroke in Young Women (1975) Oral contraceptives and stroke in young women. JAMA 231:718-722.
Rothman, KJ (2002) Epidemiology. An Introduction. Oxford University Press.
Examples
data(Rothman)
mhor(stroke ~ ht / oc, data = Rothman)
## Model with standard interaction term:
model1 <- glm(stroke ~ ht * oc, data = Rothman, family = binomial)
glm_coef(model1)
## Model considering join exposure:
Rothman$join <- 0
Rothman$join[Rothman$oc == "Non-user" & Rothman$ht == "Yes"] <- 1
Rothman$join[Rothman$oc == "User" & Rothman$ht == "No"] <- 2
Rothman$join[Rothman$oc == "User" & Rothman$ht == "Yes"] <- 3
Rothman$join <- factor(Rothman$join, labels = c(
"Unexposed", "Hypertension", "OC user",
"OC and hypertension"
))
require(sjlabelled, quietly = TRUE)
Rothman$join <- set_label(Rothman$join, label = "Exposure")
model2 <- glm(stroke ~ join, data = Rothman, family = binomial)
glm_coef(model2)
Passive smoking in adulthood and cancer risk.
Description
A case-control study to investigate the effects of passive smoking on cancer. Passive smoking was defined as exposure to the cigarette smoke of a spouse who smoked at least one cigarette per day for at least 6 months.
Usage
Sandler
Format
A labelled tibble with 998 rows and 3 variables:
- passive
Passive smoker, factor with levels "No" and "Yes".
- cancer
Diagnosed with cancer, factor with levels "No" and "Yes".
- smoke
Active smoker, factor with levels "No" and "Yes".
Source
Sandler, DP, Everson, RB, Wilcox, AJ (1985). Passive smoking in adulthood and cancer risk. Amer J Epidem, 121: 37-48.
Examples
data(Sandler)
mhor(cancer ~ smoke / passive, data = Sandler)
Measured and self-reported weight in New Zealand.
Description
Data on measured and self-reported weight from 40–50 year old participants in the 1989/1990 Life In New Zealand Survey.
Usage
Sharples
Format
A tibble with 343 rows and 4 variables:
- srweight
Self-reported weight in kg.
- weight
Measured weight in kg.
- srbmi
Body mass index calculated from self-reported weight and self-reported height in kg/m^2.
- mbmi
Body mass index calculated from measured weight and measured height in kg/m^2.
Source
Sharples, H, et al. (2012) Agreement between measured and self-reported height, weight and BMI in predominantly European middle-aged New Zealanders: findings from a nationwide 1989 survey. New Zealand Med J 125: 60-69.
Examples
Sharples |>
bland_altman(x = weight, y = srweight,
transform = TRUE, size = 0.7) +
xlab("Mean of weights (kg)") +
ylab("Measured / Self-reported weight")
RCT on the treatment of epilepsy.
Description
Randomised control trial of an antiepilectic drug (prograbide), in which the number of seizures of 59 patients at baseline and other four follow-up visits were recorded.
Usage
Thall
Format
A tibble with 59 rows and 8 variables:
- id
Subject ID.
- treat
Treatment, factor with levels "Control" and "Prograbide".
- base
Number of seizures at baseline.
- age
Age in years at baseline.
- y1
Number of seizures at year one follow-up.
- y2
Number of seizures at year two follow-up.
- y3
Number of seizures at year three follow-up.
- y4
Number of seizures at year four follow-up.
Source
Thall, PF and Vail, SC (1990) Some covariance models for longitudinal count data with over-dispersion. Biometrics, 46: 657-671.
Stukel, TA (1993) Comparison of methods for the analysis of longitudinal data. Statistics Med 12: 1339-1351.
Shoukri, MM and Chaudhary, MA (2007) Analysis of correlated data with SAS and R. Third Edition. Chapman & Hall/CRC.
Peak knee velocity in walking at flexion and extension.
Description
Data of peak knee velocity in walking at flexion and extension in studies about functional performance in cerebral palsy.
Usage
Tuzson
Format
A labelled tibble with 18 rows and 2 variables:
- flexion
Peak knee velocity in gait: flexion (degree/s).
- extension
Peak knee velocity in gait: extension (degree/s).
Source
Tuzson, AE, Granata, KP, and Abel, MF (2003) Spastic velocity threshold constrains functional performance in cerebral palsy. Arch Phys Med Rehabil 84: 1363-1368.
Examples
data(Tuzson)
Tuzson |>
ggplot(aes(extension, flexion)) +
geom_point()
cor.test(~ flexion + extension, data = Tuzson)
Smoking and mortality in Whickham, England.
Description
Data represents women participating in a health survey in Whickham, England in 1972-1974.
Usage
Vanderpump
Format
A labelled tibble with 1314 rows and 3 variables:
- vstatus
Vitality status, factor with levels "Alive" and "Death".
- smoker
Smoking status, factor with levels "Non-smoker" and "Smoker".
- agegrp
Age group, factor with levels "18-44", "45-64" and "64+".
Source
Vanderpump, MP, et al (1996) Thyroid, 6:155-160.
Appleton, DR, French, JM and Vanderpump, PJ (1996) Ignoring a covariate: An example of Simpson's paradox. The American Statistician 50:340-341.
Vittinghoff, E, Glidden, DV, Shiboski, SC and McCulloh, CE (2005) Regression methods in Biostatistics. Springer.
Examples
data(Vanderpump)
mhor(vstatus ~ agegrp / smoker, data = Vanderpump)
Bar charts with error bars.
Description
bar_error
constructs bar charts in with error bars showing 95
confidence intervals around mean values. High of bars represent mean values.
Usage
bar_error(data, x, y, alpha = 0.7)
Arguments
data |
A data frame. |
x |
A variable in the data frame. |
y |
A variable in the data frame |
alpha |
Opacity of the colour fill (0 = invisible, 1 = opaque). |
Examples
require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
data(birthwt, package = "MASS")
birthwt <- birthwt |>
mutate(
smoke = factor(smoke, labels = c("Non-smoker", "Smoker"))
) |>
var_labels(
bwt = "Birth weight (g)",
smoke = "Smoking status"
)
birthwt |>
bar_error(x = smoke, y = bwt)
Bland-Altman agreement plots.
Description
Bland-Altman agreement plots.
Usage
bland_altman(data, x, y, size = 1, transform = FALSE)
Arguments
data |
A data frame. |
x |
A numerical variable. |
y |
A numerical variable. |
size |
Size of the symbol using to plot data. |
transform |
Logical, should ratios instead of differences be used to construct the plot? |
Details
bland_altman
constructs Bland-Altman agreement plots.
Variables in formula
are continuous paired observations. When the distribution of the outcome
is not normal, but becomes normal with a log-transformation, bland_altman
can plot the ratio between
outcomes (difference in the log scale) by using option transform = TRUE
.
Examples
data(Sharples)
Sharples |>
bland_altman(
x = weight, y = srweight,
transform = TRUE, size = 0.7
) +
xlab("Mean of weights (kg)") +
ylab("Measured / Self-reported weight")
Palette inspired by the British Medical Journal.
Description
Palette inspired by the British Medical Journal.
Usage
bmj
Format
An object of class character
of length 9.
Bootstrap Confidence Intervals.
Description
bst
estimates confidence intervals around the mean
, median
or geo_mean
.
Usage
bst(x, stat = "mean", n = 1000, CI = 95, digits = 2)
Arguments
x |
A numerical variable. Missing observations are removed by default. |
stat |
Statistic, either "mean" (default), "median" or "gmean" (geometric mean). |
n |
Number of replicates for the bootstrap (n=1000 by default). |
CI |
Confidence intervals (CI=95 by default). |
digits |
Number of digits for rounding (default = 2). |
Value
A data frame with the estimate and confidence intervals.
Examples
data(IgM, package = "ISwR")
bst(IgM, "median")
bst(IgM, "gmean")
Coefficient of determination.
Description
coef_det
estimates the coefficient of determination (r-squared) from fitted (predicted) and
observed values. Outcome from the model is assumed to be numerical.
Usage
coef_det(obs, fit)
Arguments
obs |
Vector with observed values (numerical outcome). |
fit |
Vector with fitted (predicted) values. |
Value
A scalar, the coefficient of determination (r-squared).
Examples
## Linear regression:
Riboflavin <- seq(0, 80, 10)
OD <- 0.0125 * Riboflavin + rnorm(9, 0.6, 0.03)
titration <- data.frame(Riboflavin, OD)
model1 <- lm(OD ~ Riboflavin, data = titration)
summary(model1)
coef_det(titration$OD, fitted(model1))
## Non-linear regression:
library(nlme, quietly = TRUE)
data(Puromycin)
mm.tx <- gnls(rate ~ SSmicmen(conc, Vm, K),
data = Puromycin,
subset = state == "treated"
)
summary(mm.tx)
coef_det(Puromycin$rate[1:12], mm.tx$fitted)
Cosmetics for tables of regression coefficients.
Converts tables generated by tbl_regression
to huxtable
and adds some cosmetics.
Description
Cosmetics for tables of regression coefficients.
Converts tables generated by tbl_regression
to huxtable
and adds some cosmetics.
Usage
cosm_reg(gt_tbl, pad = 3, type = 3, bold = TRUE, head_label = "**Variable**")
Arguments
gt_tbl |
A table object generated by |
pad |
Numerical, padding above and bellow rows. |
type |
Anova's type to calculate global p-values. |
bold |
Display labels in bold? |
head_label |
Character, label to be used as head for the variable's column. |
Value
A huxtable
.
Examples
require(sjlabelled, quietly = TRUE)
data(diet, package = "Epi")
diet <- diet |>
var_labels(
chd = "Coronary Heart Disease",
fibre = "Fibre intake (g/day)"
)
model_binom <- glm(chd ~ fibre, data = diet, family = binomial)
model_binom |>
tbl_regression(exponentiate = TRUE) |>
cosm_reg(bold = TRUE) |>
theme_pubh(1) |>
add_footnote(get_r2(model_binom), font_size = 9)
data(birthwt, package = "MASS")
birthwt <- birthwt |>
mutate(
smoke = factor(smoke, labels = c("Non-smoker", "Smoker")),
race = factor(race, labels = c("White", "African American", "Other"))
) |>
var_labels(
bwt = "Birth weight (g)",
smoke = "Smoking status",
race = "Race"
)
model_norm <- lm(bwt ~ smoke + race, data = birthwt)
model_norm |>
tbl_regression() |>
cosm_reg(bold = TRUE) |>
theme_pubh(1) |>
add_footnote(get_r2(model_norm), font_size = 9)
Cosmetics for summary tables Adds some cosmetics to tables of descriptive statistics generated by tbl_summary.
Description
Cosmetics for summary tables Adds some cosmetics to tables of descriptive statistics generated by tbl_summary.
Usage
cosm_sum(gt_tbl, pad = 3, bold = FALSE, head_label = "**Variable**")
Arguments
gt_tbl |
A table object generated by |
pad |
Numerical, padding above and bellow rows. |
bold |
Display labels in bold? |
head_label |
Character, label to be used as head for the variable's column. |
Details
Function cosm_sum
adds some cosmetics to tables generated by tbl_summary
, then converts the table as a huxtable
and sets proper alignment.
Value
A huxtable
.
Examples
require(dplyr, quietly = TRUE)
data(Oncho)
Oncho |>
select(-id) |>
tbl_summary() |>
cosm_sum(bold = TRUE) |>
theme_pubh(1)
Table of descriptive statistics by categorical variable.
Description
cross_tbl
is a wrapper to function from package tbl_summary
that constructs tables of descriptive statistics stratified by levels of a categorical outcome.
Usage
cross_tbl(
data,
by,
head_label = " ",
bold = TRUE,
show_total = TRUE,
p_val = FALSE,
pad = 3,
method = 2,
...
)
Arguments
data |
A data frame where the variables in the |
by |
The quoted name of the categorical variable (factor) used for the stratification. |
head_label |
Character, label to be used as head for the variable's column. |
bold |
Display labels in bold? |
show_total |
Logical, show column with totals? |
p_val |
Logical, show p-values? |
pad |
Numerical, padding above and bellow rows. |
method |
An integer indicating methods for continuous variables. 1 Reports means and standard deviations. 2 Reports medians and interquartile ranges. |
... |
Additional arguments passed to |
Details
Function cross_tbl
is a relatively simple wrapper to function tbl_summary
. It constructs contingency tables and can also be used to report a table with descriptives for all variables stratified by one of the variables. Please see examples to see how to list variables. If data is labelled, the label of the stratifying variable is used as part of the header.
Value
A huxtable with descriptive statistics stratified by levels of the outcome.
See Also
Examples
require(dplyr, quietly = TRUE)
#' data(Oncho)
## A two by two contingency table:
Oncho |>
select(mf, area) |>
cross_tbl(by = "mf", bold = TRUE) |>
theme_pubh(2)
## Reporting prevalence:
Oncho |>
select(mf, area) |>
cross_tbl(by = "area", bold = TRUE) |>
theme_pubh(2)
## Descriptive statistics for all variables in the \code{Oncho} data set except \code{id}.
Oncho |>
select(-id) |>
cross_tbl(by = "mf", bold = TRUE) |>
theme_pubh(2)
Constructs effect plots on for cases when the response (outcome) is a numerical variable and the explanatory (exposure) is a categorical variable.
Description
effect_plot
constructs plots with error bars on which the upper and lower limits
of the bars are given by the variables CI_high
and CI_low
, respectively.
Usage
effect_plot(data, x, y, orientation = "x")
Arguments
data |
A data frame. |
x |
A variable in the data frame. |
y |
A variable in the data frame. |
orientation |
A string indicating the orientation of the bars. |
Value
An error bars plot, for example, those to show the effects of a categorical variable on a numerical variable.
Examples
data(Fentress)
require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
pain_cis <- Fentress |>
gen_bst_df(pain ~ group)
names(pain_cis)[1] = "Mean"
pain_cis = pain_cis |>
var_labels(Mean = "Pain reduction")
pain_cis |>
effect_plot(x = Cohort, y = Mean)
Descriptive statistics for continuous variables.
Description
estat
calculates descriptives of numerical variables.
Usage
estat(object = NULL, formula = NULL, data = NULL, digits = 2, label = NULL)
Arguments
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: ~ x or ~ x|z (for groups). |
data |
A data frame where the variables in the |
digits |
Number of digits for rounding (default = 2). |
label |
Label used to display the name of the variable (see examples). |
Value
A data frame with descriptive statistics.
See Also
Examples
data(kfm, package = "ISwR")
require(sjlabelled, quietly = TRUE)
kfm <- kfm |>
var_labels(
dl.milk = "Breast-milk intake (dl/day)",
sex = "Sex",
weight = "Child weight (kg)",
ml.suppl = "Milk substitute (ml/day)",
mat.weight = "Maternal weight (kg)",
mat.height = "Maternal height (cm)"
)
kfm |>
estat(~dl.milk)
estat(~ dl.milk | sex, data = kfm)
kfm |>
estat(~ weight | sex)
Expand a data frame.
Description
expand_df
expands a data frame by a vector of frequencies.
Usage
expand_df(aggregate.data, index.var = "Freq", retain.freq = FALSE)
Arguments
aggregate.data |
A data frame. |
index.var |
A numerical variable with the frequencies (counts). |
retain.freq |
Logical expression indicating if frequencies should be kept. |
Details
This is a generic function that resembles weighted frequencies in other statistical packages
(for example, Stata). expand.df
was adapted from a function developed by deprecated package
epicalc
(now package epiDisplay
).
Value
An expanded data frame with replicates given by the frequencies.
Examples
Freq <- c(5032, 5095, 41, 204)
Mortality <- gl(2, 2, labels = c("No", "Yes"))
Calcium <- gl(2, 1, 4, labels = c("No", "Yes"))
anyca <- data.frame(Freq, Mortality, Calcium)
anyca
anyca.exp <- expand_df(anyca)
with(anyca.exp, table(Calcium, Mortality))
Relative and Cumulative Frequency.
Description
freq_cont
tabulates a continuous variable by given classes.
Usage
freq_cont(x, bks, dg = 2)
Arguments
x |
A numerical (continuous) variable. Ideally, relatively long (greater than 100 observations). |
bks |
Breaks defining the classes (see example). |
dg |
Number of digits for rounding (default = 2). |
Value
A data frame with the classes, the mid-point, the frequencies, the relative and cumulative frequencies.
Examples
data(IgM, package = "ISwR")
Ab <- data.frame(IgM)
estat(~IgM, data = Ab)
freq_cont(IgM, seq(0, 4.5, 0.5))
Generate a data frame with estimate and bootstrap CIs.
Description
gen_bst_df
is a function called that generates a data frame with
confidence intervals of a continuous variable by levels of one or two categorical ones (factors).
Usage
gen_bst_df(object = NULL, formula = NULL, data = NULL, stat = "mean", ...)
Arguments
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: |
data |
A data frame where the variables in the |
stat |
Statistic used for bst. |
... |
Passes optional arguments to bst. |
Value
A data frame with the confidence intervals by level.
Examples
data(kfm, package = "ISwR")
require(sjlabelled, quietly = TRUE)
kfm <- kfm |>
var_labels(
dl.milk = "Breast-milk intake (dl/day)",
sex = "Sex",
weight = "Child weight (kg)",
ml.suppl = "Milk substitute (ml/day)",
mat.weight = "Maternal weight (kg)",
mat.height = "Maternal height (cm)"
)
kfm |>
gen_bst_df(dl.milk ~ sex)
data(Fentress)
Fentress |> gen_bst_df(pain ~ group)
Geometric mean.
Description
Geometric mean.
Usage
geo_mean(x)
Arguments
x |
A numeric variable with no negative values. |
Value
A scalar, the calculated geometric mean.
Examples
data(IgM, package = "ISwR")
Ab <- data.frame(IgM)
estat(~IgM, data = Ab)
geo_mean(IgM)
Estimate R2 or Pseudo-R2 from regression models
Description
get_r2
is a is a wrap function that calls r2
from package performance
.
Calculates the R2 or pseudo-R2 value for different regression model objects, returning a character object for easy printing in tables of coefficients.
Usage
get_r2(model, ...)
Arguments
model |
A statistical regression model. |
... |
Additional arguments passed to |
Details
The main purpose of get_r2
is to allow easy printing of R2 value in tables of coefficients (see examples).
See Also
r2
.
Examples
require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
data(birthwt, package = "MASS")
birthwt <- birthwt |>
mutate(
smoke = factor(smoke, labels = c("Non-smoker", "Smoker")),
race = factor(race, labels = c("White", "African American", "Other"))
) |>
var_labels(
bwt = "Birth weight (g)",
smoke = "Smoking status",
race = "Race"
)
model_norm <- lm(bwt ~ smoke + race, data = birthwt)
get_r2(model_norm)
Annotating a plot to display differences between groups.
Description
gg_star
is a function used to display differences between groups (see details).
Usage
gg_star(fig, x1, y1, x2, y2, y3, legend = "*", ...)
Arguments
fig |
A gformula object. |
x1 |
Position in |
y1 |
Position in |
x2 |
Position in |
y2 |
Position in |
y3 |
Position in |
legend |
Character text used for annotating the plot. |
... |
Additional information passed to |
Details
This function draws an horizontal line from coordinate (x1
, y2
)
to coordinate (x2
, y2
). Draws vertical lines below the horizontal line,
towards data, from (x1
, y1
) to (x1
, y2
) and from
(x2
, y1
) to (x2
, y2
). Finally, adds
text above the horizontal line, at the mid point between x1
and x2
.
See examples.
Examples
data(kfm, package = "ISwR")
require(sjlabelled, quietly = TRUE)
data(energy, package = "ISwR")
energy <- energy |>
var_labels(
expend = "Energy expenditure (MJ/day)",
stature = "Stature"
)
p <- energy |>
strip_error(x = stature, y = expend)
gg_star(p, x1 = 1, y1 = 13, x2 = 2, y2 = 13.2, y3 = 13.4, "**") +
ylim(0, 13.6)
Table of coefficients from generalised linear models.
Description
glm_coef
displays estimates with confidence intervals and p-values from generalised linear models (see Details).
Usage
glm_coef(
model,
digits = 2,
alpha = 0.05,
labels = NULL,
se_rob = FALSE,
type = "cond",
exp_norm = FALSE
)
Arguments
model |
A model from any of the classes listed in the details section. |
digits |
A scalar, number of digits for rounding the results (default = 2). |
alpha |
Significant level (default = 0.05) used to calculate confidence intervals. |
labels |
An optional character vector with the names of the coefficients (including intercept). |
se_rob |
Logical, should robust errors be used to calculate confidence intervals? (default = FALSE). |
type |
Character, either "cond" (condensed) or "ext" (extended). See details. |
exp_norm |
Logical, should estimates and confidence intervals should be exponentiated? (for family == "gaussian"). |
Details
glm_coef
recognises objects (models) from the following classes: clm
, clogit
,
coxph
, gee
, glm
, glmerMod
, lm
, lme
, lmerMod
, multinom
, negbin
,
polr
and surveg
For models from logistic regression (including conditional logistic, ordinal and multinomial), Poisson or survival analysis, coefficient estimates and corresponding confidence intervals are automatically exponentiated (back-transformed).
By default, glm_coef
uses naive standard errors for calculating confidence intervals but has the option of using robust standard errors instead.
glm_coef
can display two different data frames depending on the option of type
,
for type type = "cond"
(the default), glm_coef
displays the standard table of coefficients
with confidence intervals and p-values; for type = "ext"
, glm_coef
displays additional statistics including standard errors.
Please read the Vignette on Regression for more details.
Value
A data frame with estimates, confidence intervals and p-values from glm
objects.
Examples
require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
## Continuous outcome.
data(birthwt, package = "MASS")
birthwt <- birthwt |>
mutate(
smoke = factor(smoke, labels = c("Non-smoker", "Smoker")),
race = factor(race, labels = c("White", "African American", "Other"))
) |>
var_labels(
bwt = "Birth weight (g)",
smoke = "Smoking status",
race = "Race"
)
model_norm <- lm(bwt ~ smoke + race, data = birthwt)
glm_coef(model_norm, labels = model_labels(model_norm))
## Logistic regression.
data(diet, package = "Epi")
model_binom <- glm(chd ~ fibre, data = diet, family = binomial)
model_binom |>
glm_coef(labels = c("Constant", "Fibre intake (g/day)"))
model_binom |>
glm_coef(labels = c("Constant", "Fibre intake (g/day)"), type = "ext")
Harmonic mean.
Description
Harmonic mean.
Usage
harm_mean(x)
Arguments
x |
A numeric variable with no zero values. |
Value
A scalar, the calculated harmonic mean.
Examples
data(IgM, package = "ISwR")
Ab <- data.frame(IgM)
estat(~IgM, data = Ab)
harm_mean(IgM)
Inverse of the logit
Description
inv_logit
Calculates the inverse of the logit (probability in logistic regression)
Usage
inv_logit(x)
Arguments
x |
Numerical value used to compute the inverse of the logit. |
Ranks leverage observations from Jackknife method.
Description
jack_knife
Ranks the squared differences between mean values from Jackknife analysis
(arithmetic mean estimated by removing one observation at a time) and the original mean value.
Usage
jack_knife(x)
Arguments
x |
A numeric variable. Missing values are removed by default. |
Value
Data frame with the ranked squared differences.
See Also
Examples
x <- rnorm(10, 170, 8)
x
mean(x)
jack_knife(x)
x <- rnorm(100, 170, 8)
mean(x)
head(jack_knife(x))
Jackknife for means.
Description
knife_mean
is an internal function. Calculates arithmetic means by removing one observation
at a time.
Usage
knife_mean(x)
Arguments
x |
A numerical variable. Missing values are removed for the mean calculation. |
Value
A vector with the mean calculations.
Examples
x <- rnorm(10, 170, 8)
x
mean(x)
knife_mean(x)
Leverage.
Description
leverage
is an internal function called by rank_leverage
.
Usage
leverage(x)
Arguments
x |
A numeric variable. Missing values are removed by default. |
Details
Estimates the leverage of each observation around the arithmetic mean.
Value
Variable with corresponding leverage estimations
Examples
x <- rnorm(10, 170, 8)
x
mean(x)
leverage(x)
rank_leverage(x)
Goodness of fit for Logistic Regression.
Description
logistic_gof
performs the Hosmer and Lemeshow test to test the goodness of fit of a logistic
regression model. This function is part of residuals.lrm
from package rms
.
Usage
logistic_gof(model)
Arguments
model |
A logistic regression model object. |
Author(s)
Frank Harell, Vanderbilt University <f.harrell@vanderbilt.edu>
References
Hosmer DW, Hosmer T, Lemeshow S, le Cessie S, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Stat in Med 16:965–980, 1997.
Examples
data(diet, package = "Epi")
model <- glm(chd ~ fibre, data = diet, family = binomial)
glm_coef(model, labels = c("Constant", "Fibre intake (g/day)"))
logistic_gof(model)
Mantel-Haenszel odds ratio.
Description
mhor
computes odds ratios by levels of the stratum variable as well as the Mantel-Haenszel
pooled odds ratio. The test for effect modification (test for interaction) is also displayed.
Usage
mhor(object = NULL, formula = NULL, data = NULL)
Arguments
object |
When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples. |
formula |
A formula with shape: outcome ~ stratum/exposure. |
data |
A data frame containing the variables used in |
Value
Odds ratios with 95
outcome
by levels of stratum
. The Mantel-Haenszel pooled OR and the test
for effect modification is also reported.
Examples
data(oswego, package = "epitools")
require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
oswego <- oswego |>
mutate(
ill = factor(ill, labels = c("No", "Yes")),
sex = factor(sex, labels = c("Female", "Male")),
chocolate.ice.cream = factor(chocolate.ice.cream, labels = c("No", "Yes"))
) |>
var_labels(
ill = "Developed illness",
sex = "Sex",
chocolate.ice.cream = "Consumed chocolate ice cream"
)
oswego |>
mhor(ill ~ sex / chocolate.ice.cream)
Using labels as coefficient names in tables of coefficients.
Description
model_labels
replaces row names in glm_coef
with labels from the original data frame.
Usage
model_labels(model, intercept = TRUE)
Arguments
model |
A generalised linear model. |
intercept |
Logical, should the intercept be added to the list of coefficients? |
Details
model_labels
does not handle yet interaction terms, see examples.
Please read the Vignette on Regression for more examples.
Examples
require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
data(birthwt, package = "MASS")
birthwt <- birthwt |>
mutate(
smoke = factor(smoke, labels = c("Non-smoker", "Smoker")),
race = factor(race, labels = c("White", "African American", "Other"))
) |>
var_labels(
bwt = "Birth weight (g)",
smoke = "Smoking status",
race = "Race"
)
model_norm <- lm(bwt ~ smoke + race, data = birthwt)
glm_coef(model_norm, labels = model_labels(model_norm))
model_int <- lm(formula = bwt ~ smoke * race, data = birthwt)
model_int |>
glm_coef(labels = c(
model_labels(model_int),
"Smoker: African American",
"Smoker: Other"
))
Multiple comparisons with plot.
Description
multiple
displays results from post-doc analysis and constructs corresponding plot.
Usage
multiple(
model,
formula,
adjust = "hochberg",
type = "response",
reverse = TRUE,
level = 0.95,
digits = 2,
...
)
Arguments
model |
A fitted model supported by |
formula |
A formula with shape: |
adjust |
Method to adjust CIs and p-values (see details). |
type |
Type of prediction (matching "linear.predictor", "link", or "response"). |
reverse |
Logical argument. Determines the direction of comparisons. |
level |
Confidence interval significance level. |
digits |
Number of digits for rounding (default = 2). |
... |
Further arguments passed to |
Details
The default adjusting method is "mvt" which uses the multivariate t distribution.
Other options are: "bonferroni", "holm", "hochberg", "tukey" and "none". The default option for argument reverse
is to make reverse comparisons, i.e., against the reference level matching comparisons from lm
and glm
.
Value
A list with objects: df
A data frame with adjusted p-values, fig_ci
a plot with estimates and adjusted confidence intervals and fig_pval
a plot comparing adjusted p-values.
See Also
Examples
data(birthwt, package = "MASS")
birthwt$race <- factor(birthwt$race, labels = c("White", "African American", "Other"))
model_1 <- aov(bwt ~ race, data = birthwt)
multiple(model_1, ~race)
multiple(model_1, ~race) |>
effect_plot(x = estimate, y = contrast, orientation = "y") +
xlab("Difference in birth weights (g)") +
ylab("") + geom_vline(xintercept = 0, col = bmj[2], lty = 2)
Function to calculate OR using Wald CI, and plot trend.
Description
odds_trend
calculates the odds ratio with confidence intervals (Wald) for different levels.
(three or more) of the exposure variable.
Usage
odds_trend(data, x, y, method = "wald", ...)
Arguments
data |
A data frame. |
x |
A variable in the data frame. |
y |
A variable in the data frame. |
method |
Method for calculating confidence interval around odds ratio. |
... |
Passes optional arguments to |
Details
odds_trend
is a wrap function that calls oddsratio
from package epitools
.
Additional methods for confidence intervals include: "midp"
, "fisher"
, and "small"
.
Value
A data frame.
See Also
Examples
## A cross-sectional study looked at the association between obesity and a biopsy resulting
## from mammography screening.
Freq <- c(3441, 34, 39137, 519, 20509, 280, 12149, 196, 11882, 199)
Biopsy <- gl(2, 1, 10, labels = c("No", "Yes"))
Weight <- gl(5, 2, 10, labels = c(
"Underweight", "Normal", "Over (11-24%)",
"Over (25-39%)", "Over (> 39%)"
))
breast <- data.frame(Freq, Biopsy, Weight)
breast
breast <- expand_df(breast)
require(sjlabelled, quietly = TRUE)
breast <- var_labels(breast,
Weight = "Weight group"
)
breast |>
odds_trend(x = Weight, y = Biopsy)
breast |>
odds_trend(x = Weight, y = Biopsy) |>
effect_plot(x = OR, y = Exposure)
Given y solve for x in a simple linear model.
Description
predict_inv
Calculates the value the predictor x that generates value y with a simple linear model.
Usage
predict_inv(model, y)
Arguments
model |
A simple linear model object (class lm). |
y |
A numerical scalar, the value of the outcome for which we want to calculate the predictor x. |
Value
The estimated value of the predictor.
Examples
## Spectrophotometry example. Titration curve for riboflavin (nmol/ml). The sample has an absorbance
## of 1.15. Aim is to estimate the concentration of riboflavin in the sample.
Riboflavin <- seq(0, 80, 10)
OD <- 0.0125 * Riboflavin + rnorm(9, 0.6, 0.03)
titration <- data.frame(Riboflavin, OD)
require(sjlabelled, quietly = TRUE)
titration <- titration |>
var_labels(
Riboflavin = "Riboflavin (nmol/ml)",
OD = "Optical density"
)
titration |>
ggplot(aes(Riboflavin, OD)) +
geom_point(col = bmj[1]) +
geom_smooth(col = bmj[2], se = TRUE, lwd = 0.5, method = "loess")
## Model with intercept different from zero:
model <- lm(OD ~ Riboflavin, data = titration)
glm_coef(model)
predict_inv(model, 1.15)
Proportion, p1 from proportion p2 and OR.
Description
prop_or
is a simple function to calculate a proportion, from another proportion and the odds
ratio between them.
Usage
prop_or(p2, or)
Arguments
p2 |
The value of a proportion in the unexposed group (p2). |
or |
The odds ratio of p1/p2. |
Value
p1
, the proportion in the exposed group (p1).
Examples
flu <- matrix(c(20, 80, 220, 140), nrow = 2)
colnames(flu) <- c("Yes", "No")
rownames(flu) <- c("Vaccine", "Placebo")
flu
or <- (20 * 140) / (80 * 220)
p2 <- 80 / 220
prop_or(p2 = p2, or = or)
20 / 240
Pseudo R2 (logistic regression)
pseudo_r2
Calculates R2 analogues (pseudo R2) of logistic regression.
Description
Pseudo R2 (logistic regression)
pseudo_r2
Calculates R2 analogues (pseudo R2) of logistic regression.
Usage
pseudo_r2(model)
Arguments
model |
A logistic regression model. |
Details
pseudo_r2
calculates three pseudo R2 of logistic regression models: 1) Nagelkerke, @0 Cox and Snell, 3) Hosmer and Lemeshow.
Value
A data frame with the calculated pseudo R2 values.
Examples
data(Oncho)
model_oncho <- glm(mf ~ area, data = Oncho, binomial)
glm_coef(model_oncho, labels = c("Constant", "Area (rainforest/savannah)"))
pseudo_r2(model_oncho)
Quantile-quantile plots against the standard Normal distribution.
Description
qq_plot
constructs quantile-quantile plots against the standard normal distribution
(also known as quantile-normal plots).
Usage
qq_plot(data, y, size = 0.7)
Arguments
data |
A data frame. |
y |
A numerical variable in the data frame. |
size |
Size of the marker. |
Examples
data(kfm, package = "ISwR")
require(sjlabelled, quietly = TRUE)
kfm <- kfm |>
var_labels(
dl.milk = "Breast-milk intake (dl/day)",
sex = "Sex",
weight = "Child weight (kg)",
ml.suppl = "Milk substitute (ml/day)",
mat.weight = "Maternal weight (kg)",
mat.height = "Maternal height (cm)"
)
kfm |>
qq_plot(dl.milk)
Ranks observations based upon influence measures on models.
Description
rank_influence
calculates influence measures of each data observation on models and then ranks them.
Usage
rank_influence(model)
Arguments
model |
A generalised linear model object. |
Details
rank_influence
is a wrap function that calls influence.measures
, ranks observations on
their significance influence on the model and displays the 10 most influential observations
(if they are significant).
See Also
Examples
data(diet, package = "Epi")
model <- glm(chd ~ fibre, data = diet, family = binomial)
rank_influence(model)
Ranks observations by leverage.
Description
rank_leverage
ranks observations by their leverage (influence) on the arithmetic mean.
Usage
rank_leverage(x)
Arguments
x |
A numeric variable. Missing values are removed by default. |
Value
A data frame ranking observations by their leverage around the mean.
See Also
Examples
x <- rnorm(10, 170, 8)
x
mean(x)
rank_leverage(x)
x <- rnorm(100, 170, 8)
mean(x)
head(rank_leverage(x))
Reference range (reference interval).
Description
reference_range
estimates the reference range (reference interval) of a numerical variable.
Usage
reference_range(avg, std)
Arguments
avg |
The arithmetic mean (a scalar numerical value). |
std |
The standard deviation (a scalar numerical value). |
Details
The reference range assumes normality and represents the limits that would include 95 observations.
Value
A data frame with the reference range limits.
Examples
x <- rnorm(100, 170, 8)
round(mean(x), 2)
round(sd(x), 2)
round(reference_range(mean(x), sd(x)), 2)
Relative Dispersion.
Description
Calculates the coefficient of variation (relative dispersion) of a variable. The relative dispersion is defined as the standard deviation over the arithmetic mean.
Usage
rel_dis(x)
Arguments
x |
A numerical variable. NA's observations are removed by default. |
Value
The coefficient of variation (relative dispersion).
Examples
height <- rnorm(100, 170, 8)
rel_dis(height)
Rounding p-values.
Description
round_pval
is an internal function called by glm_coef
to round p-values from model coefficients.
Usage
round_pval(pval)
Arguments
pval |
vector of p-values, numeric. |
Sum of squares for Jackknife.
Description
ss_jk
is an internal function called by jack_knife
. It calculates the squared
difference of a numerical variable around a given value (for example, the mean).
Usage
ss_jk(obs, stat)
Arguments
obs |
A numerical vector with no missing values (NA's). |
stat |
The value of the statistic that is used as a reference. |
Value
The squared difference between a variable and a given value.
Examples
x <- rnorm(10, 170, 8)
x
mean(x)
ss_jk(x, mean(x))
jack_knife(x)
Internal function to calculate descriptive statistics.
Description
stats_quotes
is an internal function called by estat
.
Usage
stats_quotes(x, data2, digits = 2)
Arguments
x |
a numeric variable |
data2 |
A data frame where |
digits |
Number of digits for rounding. |
Strip plots with error bars.
Description
strip_error
constructs strip plots with error bars showing 95
confidence intervals around mean values.
Usage
strip_error(data, x, y, size = 0.5)
Arguments
data |
A data frame. |
x |
A categorical variable (the exposure). |
y |
A numerical variable (the outcome or response variable). |
size |
Size of the marker. |
Examples
data(energy, package = "ISwR")
require(sjlabelled, quietly = TRUE)
energy <- energy |>
var_labels(
expend = "Energy expenditure (MJ/day)",
stature = "Stature"
)
energy |>
strip_error(x = stature, y = expend)
A theme for huxtables This function quickly set a default style for a huxtable.
Description
A theme for huxtables This function quickly set a default style for a huxtable.
Usage
theme_pubh(ht, rw = 1)
Arguments
ht |
A huxtable object. |
rw |
A numeric vector with the rows on which a bottom border is desired. |
Details
theme_pubh
is a variation of theme_article
with the added flexibility
of adding a bottom border line at desired row numbers.
Examples
require(dplyr, quietly = TRUE)
data(Oncho)
Oncho |>
select(area, mf) |>
cross_tbl(by = "area") |>
theme_pubh(2)
data(Bernard)
t1 <- estat(~ apache | fate, data = Bernard)
t2 <- estat(~ o2del | fate, data = Bernard)
rbind(t1, t2) |>
as_hux() |>
theme_pubh(c(1, 3))