Help for package pubh

Title:

A Toolbox for Public Health and Epidemiology

Version:

3.0.0

Maintainer:

Josie Athens <josie.athens@gmail.com>

Description:

A toolbox for making R functions and capabilities more accessible to students and professionals from Epidemiology and Public Health related disciplines. Includes a function to report coefficients and confidence intervals from models using robust standard errors (when available), functions that expand 'ggplot2' plots and functions relevant for introductory papers in Epidemiology or Public Health. Please note that use of the provided data sets is for educational purposes only.

Depends:

R (≥ 4.5.0), emmeans, ggplot2, gtsummary, huxtable, stats

Imports:

car, dplyr, Epi, epitools, jtools, lmtest, performance, sandwich, sjlabelled, sjmisc, survival, tibble, tidyselect

Suggests:

broom, broom.helpers, cardx, easystats, effectsize, ggeffects, ISwR, knitr, MASS, nlme, rstatix

License:

GPL-2

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.3

VignetteBuilder:

knitr

BugReports:

https://github.com/josie-athens/pubh/issues

NeedsCompilation:

Packaged:

2025-10-10 06:21:18 UTC; josie

Author:

Josie Athens [aut, cre], Frank Harell [ctb], John Fox [ctb], R-Core [ctb]

Repository:

CRAN

Date/Publication:

2025-10-21 11:20:02 UTC

Survival of patients with sepsis.

Description

A randomised, double-blind, placebo-controlled trial of intravenous ibuprofen in 455 patients who had sepsis, defined as fever, tachycardia, tachypnea, and acute failure of at least one organ system.

Usage

Bernard

Format

A labelled tibble with 455 rows and 9 variables:

id: Patient ID
treat: Treatment, factor with levels "Placebo" and "Ibuprofen".
race: Race/ethnicity, factor with levels "White", "African American" and "Other".
fate: Mortality status at 30 days, factor with levels "Alive" and "Dead".
apache: Baseline APACHE score.
o2del: Oxygen delivery at baseline.
followup: Follow-up time in hours.
temp0: Baseline temperature in centigrades.
temp10: Temperature after 36 hr in centigrades.

Source

Bernard, GR, et al. (1997) The effects of ibuprofen on the physiology and survival of patients with sepsis, N Engl J Med 336: 912–918.

Prevalence of Helicobacter pylori infection in preschool children.

Description

A data set containing the prevalence of Helicobacter pylori infection in preschool children according to parental history of duodenal or gastric ulcer.

Usage

Brenner

Format

A labelled tibble with 863 rows and 2 variables:

ulcer: History of duodenal or gastric ulcer, factor with levels "No" and "Yes".
infected: Infected with Helicobacter pylori, factor with levels "No" and "Yes".

Source

Brenner H, Rothenbacher D, Bode G, Adler G (1998) Parental history of gastric or duodenal ulcer and prevalence of Helicobacter pylori infection in preschool children: population based study. BMJ 316:665.

Migraine pain reduction.

Description

Randomised control trial on children suffering from frequent and severe migraine. Control group represents untreated children. The active treatments were either relaxation alone or relaxation with biofeedback.

Usage

Fentress

Format

A labelled tibble with 18 rows and 2 variables:

pain: Reduction in weekly headache activity expressed as percentage of baseline data.
group: Group, a factor with levels "Untreated", "Relaxation" (alone) and "Biofeedback" (relaxation and biofeedback).

Source

Fentress, DW, et al. (1986) Biofeedback and relaxation-response in the treatment of pediatric migraine. Dev Med Child Neurol 28:1 39-46.

Altman, DA (1991) Practical statistics for medical research. Chapman & Hall/CRC.

Examples

data(Fentress)

Fentress |>
  strip_error(x = group, y = pain)

T-cell counts from Hodgkin's disease patients.

Description

Number of CD4+ T-cells and CD8+ T-cells in blood samples from patients in remission from Hodgkin's disease or in remission from disseminated malignancies.

Usage

Hodgkin

Format

A labelled tibble with 40 rows and 3 variables:

CD4: Concentration of CD4+ T-cells (cells / mm^3).
CD8: Concentration of CD8+ T-cells (cells / mm^3).
Group: Group, factor with levels "Non-Hodgkin" and "Hodgkin".

Source

Shapiro, CM, et al (1986) Immunologic status of patients in remission from Hodgkin's disease and disseminated malignancies. Am J Med Sci 293:366-370.

Altman, DA (1991) Practical statistics for medical research. Chapman & Hall/CRC.

Examples

data(Hodgkin)
require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)

Hodgkin <- Hodgkin |>
  mutate(
    Ratio = CD4 / CD8
  ) |>
  var_labels(
    Ratio = "CD4+ / CD8+ T-cells"
  )

estat(~ Ratio | Group, data = Hodgkin)

Hodgkin |>
  qq_plot(y = Ratio) +
  facet_grid(cols = vars(Group))

Body weight and plasma volume.

Description

Body weight and plasma volume in eight healthy men.

Usage

Kirkwood

Format

A labelled data frame with 8 rows and 3 variables:

subject: Subject ID.
weight: Body weight in kg.
volume: Plasma volume in litres.

Source

Kirkwood, BR and Sterne, JAC (2003) Essential Medical Statistics. Second Edition. Blackwell.

Examples

data(Kirkwood)

Kirkwood |>
  ggplot(aes(weight, volume)) +
  geom_point()

Breast cancer and age of childbirth.

Description

An international case-control study to test the hypothesis that breast cancer is related to the age that a woman gives childbirth.

Usage

Macmahon

Format

A labelled tibble with 185 rows and 2 variables:

cancer: Diagnosed with breast cancer, a factor with levels "No" and "Yes".
age: Age mother gives childbirth, factor with levels "<20", "20-24", "25-29", "30-34" and ">34".

Source

Macmahon, B. et al. (1970). Age at first birth and breast cancer risk. Bull WHO 43, 209-221.

Onchocerciasis in Sierra Leone.

Description

Study of onchocerciasis ("river blindness") in Sierra Leone, in which subjects were classified according to whether they lived in villages in savannah or rainforest area.

Usage

Oncho

Format

A labelled tibble with 1302 rows and 7 variables:

id: Subject ID.
mf: Infected with Onchocerciasis volvulus, factor with levels "Not-infected" and "Infected".
area: Area of residence, factor with levels "Savannah" and "Rainforest".
agegrp: Age group in years, factor with levels "5-9", "10-19", "20-39" and "40+".
sex: Subject sex, factor with levels "Male" and "Female".
mfload: Microfiliariae load.
lesions: Severe eye lesions, factor with levels "No" and "Yes".

Source

McMahon, JE, Sowa, SIC, Maude, GH and Kirkwood BR (1988) Onchocerciasis in Sierra Leone 2: a comparison of forest and savannah villages. Trans Roy Soc Trop Med Hyg 82: 595-600.

Kirkwood, BR and Sterne, JAC (2003) Essential Medical Statistics. Second Edition. Blackwell.

Extracorporeal membrane oxygenation in neonates.

Description

A clinical trial on the value of extracorporeal membrane oxygenation for term neonates with severe respiratory failure. RCT compares active treatment against conventional management.

Usage

Roberts

Format

A labelled tibble with 185 rows and 2 variables:

emo: Extracorporeal membrane oxygenation treatment, factor with levels "No" and "Yes".
survived: One year survival, factor with levels "No" and "Yes".

Source

Roberts, TE (1998) Extracorporeal Membrane Oxygenation Economics Working Group. Economic evaluation and randomised controlled trial of extracorporeal membrane oxygenation: UK collaborative trial. Brit Med J 317:911-16.

Oral contraceptives and stroke.

Description

A case-control study of oral contraceptives and stroke in young women with presence or absence of hypertension. Cases represent thrombotic stroke and controls are hospital controls. The group of no hypertension includes normal blood pressure (<140/90 mm Hg) and borderline hypertension (140-159/90-94 mm Hg). Hypertension group includes moderate hypertension (160-179/95-109 mm Hg) and severe hypertension (180+/110+ mm Hg). This data has been used as an example of join exposure by Rothman for measuring interactions (see examples).

Usage

Rothman

Format

A labelled tibble with 477 rows and 3 variables:

stroke: Thrombotic stroke, factor with levels "No" and "Yes".
oc: Current user of oral contraceptives, factor with levels "Non-user" and "User".
ht: Hypertension, factor with levels "No" (<160/95 mm Hg) and "Yes".

Source

Collaborative Group for the Study of Stroke in Young Women (1975) Oral contraceptives and stroke in young women. JAMA 231:718-722.

Rothman, KJ (2002) Epidemiology. An Introduction. Oxford University Press.

Examples

data(Rothman)

mhor(stroke ~ ht / oc, data = Rothman)

## Model with standard interaction term:
model1 <- glm(stroke ~ ht * oc, data = Rothman, family = binomial)
glm_coef(model1)

## Model considering join exposure:
Rothman$join <- 0
Rothman$join[Rothman$oc == "Non-user" & Rothman$ht == "Yes"] <- 1
Rothman$join[Rothman$oc == "User" & Rothman$ht == "No"] <- 2
Rothman$join[Rothman$oc == "User" & Rothman$ht == "Yes"] <- 3
Rothman$join <- factor(Rothman$join, labels = c(
  "Unexposed", "Hypertension", "OC user",
  "OC and hypertension"
))

require(sjlabelled, quietly = TRUE)
Rothman$join <- set_label(Rothman$join, label = "Exposure")

model2 <- glm(stroke ~ join, data = Rothman, family = binomial)
glm_coef(model2)

Passive smoking in adulthood and cancer risk.

Description

A case-control study to investigate the effects of passive smoking on cancer. Passive smoking was defined as exposure to the cigarette smoke of a spouse who smoked at least one cigarette per day for at least 6 months.

Usage

Sandler

Format

A labelled tibble with 998 rows and 3 variables:

passive: Passive smoker, factor with levels "No" and "Yes".
cancer: Diagnosed with cancer, factor with levels "No" and "Yes".
smoke: Active smoker, factor with levels "No" and "Yes".

Source

Sandler, DP, Everson, RB, Wilcox, AJ (1985). Passive smoking in adulthood and cancer risk. Amer J Epidem, 121: 37-48.

Examples

data(Sandler)

mhor(cancer ~ smoke / passive, data = Sandler)

Measured and self-reported weight in New Zealand.

Description

Data on measured and self-reported weight from 40–50 year old participants in the 1989/1990 Life In New Zealand Survey.

Usage

Sharples

Format

A tibble with 343 rows and 4 variables:

srweight: Self-reported weight in kg.
weight: Measured weight in kg.
srbmi: Body mass index calculated from self-reported weight and self-reported height in kg/m^2.
mbmi: Body mass index calculated from measured weight and measured height in kg/m^2.

Source

Sharples, H, et al. (2012) Agreement between measured and self-reported height, weight and BMI in predominantly European middle-aged New Zealanders: findings from a nationwide 1989 survey. New Zealand Med J 125: 60-69.

Examples

Sharples |>
  bland_altman(x = weight, y = srweight,
  transform = TRUE, size = 0.7) +
  xlab("Mean of weights (kg)") +
  ylab("Measured / Self-reported weight")

RCT on the treatment of epilepsy.

Description

Randomised control trial of an antiepilectic drug (prograbide), in which the number of seizures of 59 patients at baseline and other four follow-up visits were recorded.

Usage

Thall

Format

A tibble with 59 rows and 8 variables:

id: Subject ID.
treat: Treatment, factor with levels "Control" and "Prograbide".
base: Number of seizures at baseline.
age: Age in years at baseline.
y1: Number of seizures at year one follow-up.
y2: Number of seizures at year two follow-up.
y3: Number of seizures at year three follow-up.
y4: Number of seizures at year four follow-up.

Source

Thall, PF and Vail, SC (1990) Some covariance models for longitudinal count data with over-dispersion. Biometrics, 46: 657-671.

Stukel, TA (1993) Comparison of methods for the analysis of longitudinal data. Statistics Med 12: 1339-1351.

Shoukri, MM and Chaudhary, MA (2007) Analysis of correlated data with SAS and R. Third Edition. Chapman & Hall/CRC.

Peak knee velocity in walking at flexion and extension.

Description

Data of peak knee velocity in walking at flexion and extension in studies about functional performance in cerebral palsy.

Usage

Tuzson

Format

A labelled tibble with 18 rows and 2 variables:

flexion: Peak knee velocity in gait: flexion (degree/s).
extension: Peak knee velocity in gait: extension (degree/s).

Source

Tuzson, AE, Granata, KP, and Abel, MF (2003) Spastic velocity threshold constrains functional performance in cerebral palsy. Arch Phys Med Rehabil 84: 1363-1368.

Examples

data(Tuzson)

Tuzson |>
  ggplot(aes(extension, flexion)) +
  geom_point()

cor.test(~ flexion + extension, data = Tuzson)

Smoking and mortality in Whickham, England.

Description

Data represents women participating in a health survey in Whickham, England in 1972-1974.

Usage

Vanderpump

Format

A labelled tibble with 1314 rows and 3 variables:

vstatus: Vitality status, factor with levels "Alive" and "Death".
smoker: Smoking status, factor with levels "Non-smoker" and "Smoker".
agegrp: Age group, factor with levels "18-44", "45-64" and "64+".

Source

Vanderpump, MP, et al (1996) Thyroid, 6:155-160.

Appleton, DR, French, JM and Vanderpump, PJ (1996) Ignoring a covariate: An example of Simpson's paradox. The American Statistician 50:340-341.

Vittinghoff, E, Glidden, DV, Shiboski, SC and McCulloh, CE (2005) Regression methods in Biostatistics. Springer.

Examples

data(Vanderpump)

mhor(vstatus ~ agegrp / smoker, data = Vanderpump)

Bar charts with error bars.

Description

bar_error constructs bar charts in with error bars showing 95 confidence intervals around mean values. High of bars represent mean values.

Usage

bar_error(data, x, y, alpha = 0.7)

Arguments

data

A data frame.

x

A variable in the data frame.

y

A variable in the data frame

alpha

Opacity of the colour fill (0 = invisible, 1 = opaque).

Examples

require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
data(birthwt, package = "MASS")

birthwt <- birthwt |>
  mutate(
    smoke = factor(smoke, labels = c("Non-smoker", "Smoker"))
  ) |>
  var_labels(
    bwt = "Birth weight (g)",
    smoke = "Smoking status"
  )

birthwt |>
  bar_error(x = smoke, y = bwt)

Bland-Altman agreement plots.

Description

Bland-Altman agreement plots.

Usage

bland_altman(data, x, y, size = 1, transform = FALSE)

Arguments

data

A data frame.

x

A numerical variable.

y

A numerical variable.

size

Size of the symbol using to plot data.

transform

Logical, should ratios instead of differences be used to construct the plot?

Details

bland_altman constructs Bland-Altman agreement plots.

Variables in formula are continuous paired observations. When the distribution of the outcome is not normal, but becomes normal with a log-transformation, bland_altman can plot the ratio between outcomes (difference in the log scale) by using option transform = TRUE.

Examples

data(Sharples)

Sharples |>
  bland_altman(
    x = weight, y = srweight,
    transform = TRUE, size = 0.7
  ) +
  xlab("Mean of weights (kg)") +
  ylab("Measured / Self-reported weight")

Palette inspired by the British Medical Journal.

Description

Palette inspired by the British Medical Journal.

Usage

bmj

Format

An object of class character of length 9.

Bootstrap Confidence Intervals.

Description

bst estimates confidence intervals around the mean, median or geo_mean.

Usage

bst(x, stat = "mean", n = 1000, CI = 95, digits = 2)

Arguments

x

A numerical variable. Missing observations are removed by default.

stat

Statistic, either "mean" (default), "median" or "gmean" (geometric mean).

n

Number of replicates for the bootstrap (n=1000 by default).

CI

Confidence intervals (CI=95 by default).

digits

Number of digits for rounding (default = 2).

Value

A data frame with the estimate and confidence intervals.

Examples

data(IgM, package = "ISwR")
bst(IgM, "median")

bst(IgM, "gmean")

Coefficient of determination.

Description

coef_det estimates the coefficient of determination (r-squared) from fitted (predicted) and observed values. Outcome from the model is assumed to be numerical.

Usage

coef_det(obs, fit)

Arguments

obs

Vector with observed values (numerical outcome).

fit

Vector with fitted (predicted) values.

Value

A scalar, the coefficient of determination (r-squared).

Examples

## Linear regression:
Riboflavin <- seq(0, 80, 10)
OD <- 0.0125 * Riboflavin + rnorm(9, 0.6, 0.03)
titration <- data.frame(Riboflavin, OD)
model1 <- lm(OD ~ Riboflavin, data = titration)
summary(model1)
coef_det(titration$OD, fitted(model1))

## Non-linear regression:
library(nlme, quietly = TRUE)
data(Puromycin)
mm.tx <- gnls(rate ~ SSmicmen(conc, Vm, K),
  data = Puromycin,
  subset = state == "treated"
)
summary(mm.tx)
coef_det(Puromycin$rate[1:12], mm.tx$fitted)

Cosmetics for tables of regression coefficients. Converts tables generated by `tbl_regression` to `huxtable` and adds some cosmetics.

Description

Cosmetics for tables of regression coefficients. Converts tables generated by tbl_regression to huxtable and adds some cosmetics.

Usage

cosm_reg(gt_tbl, pad = 3, type = 3, bold = TRUE, head_label = "**Variable**")

Arguments

gt_tbl

A table object generated by tbl_regression.

pad

Numerical, padding above and bellow rows.

type

Anova's type to calculate global p-values.

bold

Display labels in bold?

head_label

Character, label to be used as head for the variable's column.

Value

A huxtable.

Examples

require(sjlabelled, quietly = TRUE)

data(diet, package = "Epi")
diet <- diet |>
  var_labels(
    chd = "Coronary Heart Disease",
    fibre = "Fibre intake (g/day)"
  )

model_binom <- glm(chd ~ fibre, data = diet, family = binomial)

model_binom |>
  tbl_regression(exponentiate = TRUE) |>
  cosm_reg(bold = TRUE) |>
  theme_pubh(1) |>
  add_footnote(get_r2(model_binom), font_size = 9)

data(birthwt, package = "MASS")
birthwt <- birthwt |>
  mutate(
    smoke = factor(smoke, labels = c("Non-smoker", "Smoker")),
    race = factor(race, labels = c("White", "African American", "Other"))
  ) |>
  var_labels(
    bwt = "Birth weight (g)",
    smoke = "Smoking status",
    race = "Race"
  )

model_norm <- lm(bwt ~ smoke + race, data = birthwt)

model_norm |>
  tbl_regression() |>
  cosm_reg(bold = TRUE) |>
  theme_pubh(1) |>
  add_footnote(get_r2(model_norm), font_size = 9)

Cosmetics for summary tables Adds some cosmetics to tables of descriptive statistics generated by tbl_summary.

Description

Cosmetics for summary tables Adds some cosmetics to tables of descriptive statistics generated by tbl_summary.

Usage

cosm_sum(gt_tbl, pad = 3, bold = FALSE, head_label = "**Variable**")

Arguments

gt_tbl

A table object generated by tbl_summary.

pad

Numerical, padding above and bellow rows.

bold

Display labels in bold?

head_label

Character, label to be used as head for the variable's column.

Details

Function cosm_sum adds some cosmetics to tables generated by tbl_summary, then converts the table as a huxtable and sets proper alignment.

Value

A huxtable.

Examples

require(dplyr, quietly = TRUE)

data(Oncho)
Oncho |>
  select(-id) |>
  tbl_summary() |>
  cosm_sum(bold = TRUE) |>
  theme_pubh(1)

Table of descriptive statistics by categorical variable.

Description

cross_tbl is a wrapper to function from package tbl_summary that constructs tables of descriptive statistics stratified by levels of a categorical outcome.

Usage

cross_tbl(
  data,
  by,
  head_label = " ",
  bold = TRUE,
  show_total = TRUE,
  p_val = FALSE,
  pad = 3,
  method = 2,
  ...
)

Arguments

data

A data frame where the variables in the formula can be found.

by

The quoted name of the categorical variable (factor) used for the stratification.

head_label

Character, label to be used as head for the variable's column.

bold

Display labels in bold?

show_total

Logical, show column with totals?

p_val

Logical, show p-values?

pad

Numerical, padding above and bellow rows.

method

An integer indicating methods for continuous variables. 1 Reports means and standard deviations. 2 Reports medians and interquartile ranges.

...

Additional arguments passed to tbl_summary.

Details

Function cross_tbl is a relatively simple wrapper to function tbl_summary. It constructs contingency tables and can also be used to report a table with descriptives for all variables stratified by one of the variables. Please see examples to see how to list variables. If data is labelled, the label of the stratifying variable is used as part of the header.

Value

A huxtable with descriptive statistics stratified by levels of the outcome.

Examples

require(dplyr, quietly = TRUE)

#' data(Oncho)

## A two by two contingency table:
Oncho |>
  select(mf, area) |>
  cross_tbl(by = "mf", bold = TRUE) |>
  theme_pubh(2)

## Reporting prevalence:
Oncho |>
  select(mf, area) |>
  cross_tbl(by = "area", bold = TRUE) |>
  theme_pubh(2)

## Descriptive statistics for all variables in the \code{Oncho} data set except \code{id}.
Oncho |>
  select(-id) |>
  cross_tbl(by = "mf", bold = TRUE) |>
  theme_pubh(2)

Constructs effect plots on for cases when the response (outcome) is a numerical variable and the explanatory (exposure) is a categorical variable.

Description

effect_plot constructs plots with error bars on which the upper and lower limits of the bars are given by the variables CI_high and CI_low, respectively.

Usage

effect_plot(data, x, y, orientation = "x")

Arguments

data

A data frame.

x

A variable in the data frame.

y

A variable in the data frame.

orientation

A string indicating the orientation of the bars.

Value

An error bars plot, for example, those to show the effects of a categorical variable on a numerical variable.

Examples

data(Fentress)
require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)

pain_cis <- Fentress |>
  gen_bst_df(pain ~ group)
names(pain_cis)[1] = "Mean"
pain_cis = pain_cis |>
  var_labels(Mean = "Pain reduction")

pain_cis |>
  effect_plot(x = Cohort, y = Mean)

Descriptive statistics for continuous variables.

Description

estat calculates descriptives of numerical variables.

Usage

estat(object = NULL, formula = NULL, data = NULL, digits = 2, label = NULL)

Arguments

object

When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples.

formula

A formula with shape: ~ x or ~ x|z (for groups).

data

A data frame where the variables in the formula can be found.

digits

Number of digits for rounding (default = 2).

label

Label used to display the name of the variable (see examples).

Value

A data frame with descriptive statistics.

Examples

data(kfm, package = "ISwR")
require(sjlabelled, quietly = TRUE)
kfm <- kfm |>
  var_labels(
    dl.milk = "Breast-milk intake (dl/day)",
    sex = "Sex",
    weight = "Child weight (kg)",
    ml.suppl = "Milk substitute (ml/day)",
    mat.weight = "Maternal weight (kg)",
    mat.height = "Maternal height (cm)"
  )

kfm |>
  estat(~dl.milk)

estat(~ dl.milk | sex, data = kfm)

kfm |>
  estat(~ weight | sex)

Expand a data frame.

Description

expand_df expands a data frame by a vector of frequencies.

Usage

expand_df(aggregate.data, index.var = "Freq", retain.freq = FALSE)

Arguments

aggregate.data

A data frame.

index.var

A numerical variable with the frequencies (counts).

retain.freq

Logical expression indicating if frequencies should be kept.

Details

This is a generic function that resembles weighted frequencies in other statistical packages (for example, Stata). expand.df was adapted from a function developed by deprecated package epicalc (now package epiDisplay).

Value

An expanded data frame with replicates given by the frequencies.

Examples

Freq <- c(5032, 5095, 41, 204)
Mortality <- gl(2, 2, labels = c("No", "Yes"))
Calcium <- gl(2, 1, 4, labels = c("No", "Yes"))
anyca <- data.frame(Freq, Mortality, Calcium)
anyca
anyca.exp <- expand_df(anyca)
with(anyca.exp, table(Calcium, Mortality))

Relative and Cumulative Frequency.

Description

freq_cont tabulates a continuous variable by given classes.

Usage

freq_cont(x, bks, dg = 2)

Arguments

x

A numerical (continuous) variable. Ideally, relatively long (greater than 100 observations).

bks

Breaks defining the classes (see example).

dg

Number of digits for rounding (default = 2).

Value

A data frame with the classes, the mid-point, the frequencies, the relative and cumulative frequencies.

Examples

data(IgM, package = "ISwR")
Ab <- data.frame(IgM)
estat(~IgM, data = Ab)
freq_cont(IgM, seq(0, 4.5, 0.5))

Generate a data frame with estimate and bootstrap CIs.

Description

gen_bst_df is a function called that generates a data frame with confidence intervals of a continuous variable by levels of one or two categorical ones (factors).

Usage

gen_bst_df(object = NULL, formula = NULL, data = NULL, stat = "mean", ...)

Arguments

object

When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples.

formula

A formula with shape: y ~ x or y ~ x|z where y is a numerical variable and both x and z are factors.

data

A data frame where the variables in the formula can be found.

stat

Statistic used for bst.

...

Passes optional arguments to bst.

Value

A data frame with the confidence intervals by level.

Examples

data(kfm, package = "ISwR")
require(sjlabelled, quietly = TRUE)
kfm <- kfm |>
  var_labels(
    dl.milk = "Breast-milk intake (dl/day)",
    sex = "Sex",
    weight = "Child weight (kg)",
    ml.suppl = "Milk substitute (ml/day)",
    mat.weight = "Maternal weight (kg)",
    mat.height = "Maternal height (cm)"
  )

kfm |>
  gen_bst_df(dl.milk ~ sex)

data(Fentress)

Fentress |> gen_bst_df(pain ~ group)

Geometric mean.

Description

Geometric mean.

Usage

geo_mean(x)

Arguments

x

A numeric variable with no negative values.

Value

A scalar, the calculated geometric mean.

Examples

data(IgM, package = "ISwR")
Ab <- data.frame(IgM)
estat(~IgM, data = Ab)
geo_mean(IgM)

Estimate R2 or Pseudo-R2 from regression models

Description

get_r2 is a is a wrap function that calls r2 from package performance. Calculates the R2 or pseudo-R2 value for different regression model objects, returning a character object for easy printing in tables of coefficients.

Usage

get_r2(model, ...)

Arguments

model

A statistical regression model.

...

Additional arguments passed to r2.

Details

The main purpose of get_r2 is to allow easy printing of R2 value in tables of coefficients (see examples).

Examples

require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)

data(birthwt, package = "MASS")
birthwt <- birthwt |>
  mutate(
    smoke = factor(smoke, labels = c("Non-smoker", "Smoker")),
    race = factor(race, labels = c("White", "African American", "Other"))
  ) |>
  var_labels(
    bwt = "Birth weight (g)",
    smoke = "Smoking status",
    race = "Race"
  )

model_norm <- lm(bwt ~ smoke + race, data = birthwt)

get_r2(model_norm)

Annotating a plot to display differences between groups.

Description

gg_star is a function used to display differences between groups (see details).

Usage

gg_star(fig, x1, y1, x2, y2, y3, legend = "*", ...)

Arguments

fig

A gformula object.

x1

Position in x for the start of the horizontal line.

y1

Position in y for the start of the vertical line, below to the horizontal line.

x2

Position in x for the end of the horizontal line.

y2

Position in y where the horizontal line is drawn.

y3

Position in y where the text is added.

legend

Character text used for annotating the plot.

...

Additional information passed to gf_text.

Details

This function draws an horizontal line from coordinate (x1, y2) to coordinate (x2, y2). Draws vertical lines below the horizontal line, towards data, from (x1, y1) to (x1, y2) and from (x2, y1) to (x2, y2). Finally, adds text above the horizontal line, at the mid point between x1 and x2. See examples.

Examples

data(kfm, package = "ISwR")
require(sjlabelled, quietly = TRUE)
data(energy, package = "ISwR")
energy <- energy |>
  var_labels(
    expend = "Energy expenditure (MJ/day)",
    stature = "Stature"
  )

p <- energy |>
  strip_error(x = stature, y = expend)

gg_star(p, x1 = 1, y1 = 13, x2 = 2, y2 = 13.2, y3 = 13.4, "**") +
ylim(0, 13.6)

Table of coefficients from generalised linear models.

Description

glm_coef displays estimates with confidence intervals and p-values from generalised linear models (see Details).

Usage

glm_coef(
  model,
  digits = 2,
  alpha = 0.05,
  labels = NULL,
  se_rob = FALSE,
  type = "cond",
  exp_norm = FALSE
)

Arguments

model

A model from any of the classes listed in the details section.

digits

A scalar, number of digits for rounding the results (default = 2).

alpha

Significant level (default = 0.05) used to calculate confidence intervals.

labels

An optional character vector with the names of the coefficients (including intercept).

se_rob

Logical, should robust errors be used to calculate confidence intervals? (default = FALSE).

type

Character, either "cond" (condensed) or "ext" (extended). See details.

exp_norm

Logical, should estimates and confidence intervals should be exponentiated? (for family == "gaussian").

Details

glm_coef recognises objects (models) from the following classes: clm, clogit, coxph, gee, glm, glmerMod, lm, lme, lmerMod, multinom, negbin, polr and surveg

For models from logistic regression (including conditional logistic, ordinal and multinomial), Poisson or survival analysis, coefficient estimates and corresponding confidence intervals are automatically exponentiated (back-transformed).

By default, glm_coef uses naive standard errors for calculating confidence intervals but has the option of using robust standard errors instead.

glm_coef can display two different data frames depending on the option of type, for type type = "cond" (the default), glm_coef displays the standard table of coefficients with confidence intervals and p-values; for type = "ext", glm_coef displays additional statistics including standard errors.

Please read the Vignette on Regression for more details.

Value

A data frame with estimates, confidence intervals and p-values from glm objects.

Examples

require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)

## Continuous outcome.
data(birthwt, package = "MASS")
birthwt <- birthwt |>
  mutate(
    smoke = factor(smoke, labels = c("Non-smoker", "Smoker")),
    race = factor(race, labels = c("White", "African American", "Other"))
  ) |>
  var_labels(
    bwt = "Birth weight (g)",
    smoke = "Smoking status",
    race = "Race"
  )

model_norm <- lm(bwt ~ smoke + race, data = birthwt)

glm_coef(model_norm, labels = model_labels(model_norm))

## Logistic regression.
data(diet, package = "Epi")
model_binom <- glm(chd ~ fibre, data = diet, family = binomial)
model_binom |>
  glm_coef(labels = c("Constant", "Fibre intake (g/day)"))

model_binom |>
  glm_coef(labels = c("Constant", "Fibre intake (g/day)"), type = "ext")

Harmonic mean.

Description

Harmonic mean.

Usage

harm_mean(x)

Arguments

x

A numeric variable with no zero values.

Value

A scalar, the calculated harmonic mean.

Examples

data(IgM, package = "ISwR")
Ab <- data.frame(IgM)
estat(~IgM, data = Ab)
harm_mean(IgM)

Inverse of the logit

Description

inv_logit Calculates the inverse of the logit (probability in logistic regression)

Usage

inv_logit(x)

Arguments

x

Numerical value used to compute the inverse of the logit.

Ranks leverage observations from Jackknife method.

Description

jack_knife Ranks the squared differences between mean values from Jackknife analysis (arithmetic mean estimated by removing one observation at a time) and the original mean value.

Usage

jack_knife(x)

Arguments

x

A numeric variable. Missing values are removed by default.

Value

Data frame with the ranked squared differences.

Examples

x <- rnorm(10, 170, 8)
x
mean(x)
jack_knife(x)

x <- rnorm(100, 170, 8)
mean(x)
head(jack_knife(x))

Jackknife for means.

Description

knife_mean is an internal function. Calculates arithmetic means by removing one observation at a time.

Usage

knife_mean(x)

Arguments

x

A numerical variable. Missing values are removed for the mean calculation.

Value

A vector with the mean calculations.

Examples

x <- rnorm(10, 170, 8)
x
mean(x)
knife_mean(x)

Leverage.

Description

leverage is an internal function called by rank_leverage.

Usage

leverage(x)

Arguments

x

A numeric variable. Missing values are removed by default.

Details

Estimates the leverage of each observation around the arithmetic mean.

Value

Variable with corresponding leverage estimations

Examples

x <- rnorm(10, 170, 8)
x
mean(x)
leverage(x)
rank_leverage(x)

Goodness of fit for Logistic Regression.

Description

logistic_gof performs the Hosmer and Lemeshow test to test the goodness of fit of a logistic regression model. This function is part of residuals.lrm from package rms.

Usage

logistic_gof(model)

Arguments

model

A logistic regression model object.

Author(s)

Frank Harell, Vanderbilt University <f.harrell@vanderbilt.edu>

References

Hosmer DW, Hosmer T, Lemeshow S, le Cessie S, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Stat in Med 16:965–980, 1997.

Examples

data(diet, package = "Epi")
model <- glm(chd ~ fibre, data = diet, family = binomial)
glm_coef(model, labels = c("Constant", "Fibre intake (g/day)"))
logistic_gof(model)

Mantel-Haenszel odds ratio.

Description

mhor computes odds ratios by levels of the stratum variable as well as the Mantel-Haenszel pooled odds ratio. The test for effect modification (test for interaction) is also displayed.

Usage

mhor(object = NULL, formula = NULL, data = NULL)

Arguments

object

When chaining, this holds an object produced in the earlier portions of the chain. Most users can safely ignore this argument. See details and examples.

formula

A formula with shape: outcome ~ stratum/exposure.

data

A data frame containing the variables used in formula.

Value

Odds ratios with 95 outcome by levels of stratum. The Mantel-Haenszel pooled OR and the test for effect modification is also reported.

Examples

data(oswego, package = "epitools")
require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
oswego <- oswego |>
  mutate(
    ill = factor(ill, labels = c("No", "Yes")),
    sex = factor(sex, labels = c("Female", "Male")),
    chocolate.ice.cream = factor(chocolate.ice.cream, labels = c("No", "Yes"))
  ) |>
  var_labels(
    ill = "Developed illness",
    sex = "Sex",
    chocolate.ice.cream = "Consumed chocolate ice cream"
  )

oswego |>
  mhor(ill ~ sex / chocolate.ice.cream)

Using labels as coefficient names in tables of coefficients.

Description

model_labels replaces row names in glm_coef with labels from the original data frame.

Usage

model_labels(model, intercept = TRUE)

Arguments

model

A generalised linear model.

intercept

Logical, should the intercept be added to the list of coefficients?

Details

model_labels does not handle yet interaction terms, see examples.

Please read the Vignette on Regression for more examples.

Examples

require(dplyr, quietly = TRUE)
require(sjlabelled, quietly = TRUE)

data(birthwt, package = "MASS")
birthwt <- birthwt |>
  mutate(
    smoke = factor(smoke, labels = c("Non-smoker", "Smoker")),
    race = factor(race, labels = c("White", "African American", "Other"))
  ) |>
  var_labels(
    bwt = "Birth weight (g)",
    smoke = "Smoking status",
    race = "Race"
  )

model_norm <- lm(bwt ~ smoke + race, data = birthwt)

glm_coef(model_norm, labels = model_labels(model_norm))

model_int <- lm(formula = bwt ~ smoke * race, data = birthwt)

model_int |>
  glm_coef(labels = c(
    model_labels(model_int),
    "Smoker: African American",
    "Smoker: Other"
  ))

Multiple comparisons with plot.

Description

multiple displays results from post-doc analysis and constructs corresponding plot.

Usage

multiple(
  model,
  formula,
  adjust = "hochberg",
  type = "response",
  reverse = TRUE,
  level = 0.95,
  digits = 2,
  ...
)

Arguments

model

A fitted model supported by emmeans, such as the result of a call to aov, lm, glm, etc.

formula

A formula with shape: ~ y or ~ y|x (for interactions). Where y is the term of the model on which comparisons are made and x is a term interacting with y.

adjust

Method to adjust CIs and p-values (see details).

type

Type of prediction (matching "linear.predictor", "link", or "response").

reverse

Logical argument. Determines the direction of comparisons.

level

Confidence interval significance level.

digits

Number of digits for rounding (default = 2).

...

Further arguments passed to emmeans.

Details

The default adjusting method is "mvt" which uses the multivariate t distribution. Other options are: "bonferroni", "holm", "hochberg", "tukey" and "none". The default option for argument reverse is to make reverse comparisons, i.e., against the reference level matching comparisons from lm and glm.

Value

A list with objects: df A data frame with adjusted p-values, fig_ci a plot with estimates and adjusted confidence intervals and fig_pval a plot comparing adjusted p-values.

Examples

data(birthwt, package = "MASS")
birthwt$race <- factor(birthwt$race, labels = c("White", "African American", "Other"))

model_1 <- aov(bwt ~ race, data = birthwt)
multiple(model_1, ~race)

multiple(model_1, ~race) |>
  effect_plot(x = estimate, y = contrast, orientation = "y") +
  xlab("Difference in birth weights (g)") +
  ylab("") + geom_vline(xintercept = 0, col = bmj[2], lty = 2)

Function to calculate OR using Wald CI, and plot trend.

Description

odds_trend calculates the odds ratio with confidence intervals (Wald) for different levels. (three or more) of the exposure variable.

Usage

odds_trend(data, x, y, method = "wald", ...)

Arguments

data

A data frame.

x

A variable in the data frame.

y

A variable in the data frame.

method

Method for calculating confidence interval around odds ratio.

...

Passes optional arguments to oddsratio.

Details

odds_trend is a wrap function that calls oddsratio from package epitools.

Additional methods for confidence intervals include: "midp", "fisher", and "small".

Value

A data frame.

Examples

## A cross-sectional study looked at the association between obesity and a biopsy resulting
## from mammography screening.

Freq <- c(3441, 34, 39137, 519, 20509, 280, 12149, 196, 11882, 199)
Biopsy <- gl(2, 1, 10, labels = c("No", "Yes"))
Weight <- gl(5, 2, 10, labels = c(
  "Underweight", "Normal", "Over (11-24%)",
  "Over (25-39%)", "Over (> 39%)"
))
breast <- data.frame(Freq, Biopsy, Weight)
breast

breast <- expand_df(breast)
require(sjlabelled, quietly = TRUE)

breast <- var_labels(breast,
  Weight = "Weight group"
)

breast |>
  odds_trend(x = Weight, y = Biopsy)

breast |>
  odds_trend(x = Weight, y = Biopsy) |>
  effect_plot(x = OR, y = Exposure)

Given y solve for x in a simple linear model.

Description

predict_inv Calculates the value the predictor x that generates value y with a simple linear model.

Usage

predict_inv(model, y)

Arguments

model

A simple linear model object (class lm).

y

A numerical scalar, the value of the outcome for which we want to calculate the predictor x.

Value

The estimated value of the predictor.

Examples

## Spectrophotometry example. Titration curve for riboflavin (nmol/ml). The sample has an absorbance
## of 1.15. Aim is to estimate the concentration of riboflavin in the sample.

Riboflavin <- seq(0, 80, 10)
OD <- 0.0125 * Riboflavin + rnorm(9, 0.6, 0.03)
titration <- data.frame(Riboflavin, OD)

require(sjlabelled, quietly = TRUE)
titration <- titration |>
  var_labels(
    Riboflavin = "Riboflavin (nmol/ml)",
    OD = "Optical density"
  )

titration |>
  ggplot(aes(Riboflavin, OD)) +
  geom_point(col = bmj[1]) +
  geom_smooth(col = bmj[2], se = TRUE, lwd = 0.5, method = "loess")

## Model with intercept different from zero:
model <- lm(OD ~ Riboflavin, data = titration)
glm_coef(model)

predict_inv(model, 1.15)

Proportion, p1 from proportion p2 and OR.

Description

prop_or is a simple function to calculate a proportion, from another proportion and the odds ratio between them.

Usage

prop_or(p2, or)

Arguments

p2

The value of a proportion in the unexposed group (p2).

or

The odds ratio of p1/p2.

Value

p1, the proportion in the exposed group (p1).

Examples

flu <- matrix(c(20, 80, 220, 140), nrow = 2)
colnames(flu) <- c("Yes", "No")
rownames(flu) <- c("Vaccine", "Placebo")
flu

or <- (20 * 140) / (80 * 220)
p2 <- 80 / 220
prop_or(p2 = p2, or = or)
20 / 240

Pseudo R2 (logistic regression) `pseudo_r2` Calculates R2 analogues (pseudo R2) of logistic regression.

Description

Pseudo R2 (logistic regression) pseudo_r2 Calculates R2 analogues (pseudo R2) of logistic regression.

Usage

pseudo_r2(model)

Arguments

model

A logistic regression model.

Details

pseudo_r2 calculates three pseudo R2 of logistic regression models: 1) Nagelkerke, @0 Cox and Snell, 3) Hosmer and Lemeshow.

Value

A data frame with the calculated pseudo R2 values.

Examples

data(Oncho)
model_oncho <- glm(mf ~ area, data = Oncho, binomial)
glm_coef(model_oncho, labels = c("Constant", "Area (rainforest/savannah)"))
pseudo_r2(model_oncho)

Quantile-quantile plots against the standard Normal distribution.

Description

qq_plot constructs quantile-quantile plots against the standard normal distribution (also known as quantile-normal plots).

Usage

qq_plot(data, y, size = 0.7)

Arguments

data

A data frame.

y

A numerical variable in the data frame.

size

Size of the marker.

Examples

data(kfm, package = "ISwR")
require(sjlabelled, quietly = TRUE)
kfm <- kfm |>
  var_labels(
    dl.milk = "Breast-milk intake (dl/day)",
    sex = "Sex",
    weight = "Child weight (kg)",
    ml.suppl = "Milk substitute (ml/day)",
    mat.weight = "Maternal weight (kg)",
    mat.height = "Maternal height (cm)"
  )

kfm |>
  qq_plot(dl.milk)

Ranks observations based upon influence measures on models.

Description

rank_influence calculates influence measures of each data observation on models and then ranks them.

Usage

rank_influence(model)

Arguments

model

A generalised linear model object.

Details

rank_influence is a wrap function that calls influence.measures, ranks observations on their significance influence on the model and displays the 10 most influential observations (if they are significant).

Examples

data(diet, package = "Epi")
model <- glm(chd ~ fibre, data = diet, family = binomial)
rank_influence(model)

Ranks observations by leverage.

Description

rank_leverage ranks observations by their leverage (influence) on the arithmetic mean.

Usage

rank_leverage(x)

Arguments

x

A numeric variable. Missing values are removed by default.

Value

A data frame ranking observations by their leverage around the mean.

Examples

x <- rnorm(10, 170, 8)
x
mean(x)
rank_leverage(x)

x <- rnorm(100, 170, 8)
mean(x)
head(rank_leverage(x))

Reference range (reference interval).

Description

reference_range estimates the reference range (reference interval) of a numerical variable.

Usage

reference_range(avg, std)

Arguments

avg

The arithmetic mean (a scalar numerical value).

std

The standard deviation (a scalar numerical value).

Details

The reference range assumes normality and represents the limits that would include 95 observations.

Value

A data frame with the reference range limits.

Examples

x <- rnorm(100, 170, 8)
round(mean(x), 2)
round(sd(x), 2)

round(reference_range(mean(x), sd(x)), 2)

Relative Dispersion.

Description

Calculates the coefficient of variation (relative dispersion) of a variable. The relative dispersion is defined as the standard deviation over the arithmetic mean.

Usage

rel_dis(x)

Arguments

x

A numerical variable. NA's observations are removed by default.

Value

The coefficient of variation (relative dispersion).

Examples

height <- rnorm(100, 170, 8)
rel_dis(height)

Rounding p-values.

Description

round_pval is an internal function called by glm_coef to round p-values from model coefficients.

Usage

round_pval(pval)

Arguments

pval

vector of p-values, numeric.

Sum of squares for Jackknife.

Description

ss_jk is an internal function called by jack_knife. It calculates the squared difference of a numerical variable around a given value (for example, the mean).

Usage

ss_jk(obs, stat)

Arguments

obs

A numerical vector with no missing values (NA's).

stat

The value of the statistic that is used as a reference.

Value

The squared difference between a variable and a given value.

Examples

x <- rnorm(10, 170, 8)
x
mean(x)
ss_jk(x, mean(x))
jack_knife(x)

Internal function to calculate descriptive statistics.

Description

stats_quotes is an internal function called by estat.

Usage

stats_quotes(x, data2, digits = 2)

Arguments

x

a numeric variable

data2

A data frame where x can be found.

digits

Number of digits for rounding.

Strip plots with error bars.

Description

strip_error constructs strip plots with error bars showing 95 confidence intervals around mean values.

Usage

strip_error(data, x, y, size = 0.5)

Arguments

data

A data frame.

x

A categorical variable (the exposure).

y

A numerical variable (the outcome or response variable).

size

Size of the marker.

Examples

data(energy, package = "ISwR")
require(sjlabelled, quietly = TRUE)
energy <- energy |>
  var_labels(
    expend = "Energy expenditure (MJ/day)",
    stature = "Stature"
  )

energy |>
  strip_error(x = stature, y = expend)

A theme for huxtables This function quickly set a default style for a huxtable.

Description

A theme for huxtables This function quickly set a default style for a huxtable.

Usage

theme_pubh(ht, rw = 1)

Arguments

ht

A huxtable object.

rw

A numeric vector with the rows on which a bottom border is desired.

Details

theme_pubh is a variation of theme_article with the added flexibility of adding a bottom border line at desired row numbers.

Examples

require(dplyr, quietly = TRUE)
data(Oncho)

Oncho |>
  select(area, mf) |>
  cross_tbl(by = "area") |>
  theme_pubh(2)

data(Bernard)

t1 <- estat(~ apache | fate, data = Bernard)
t2 <- estat(~ o2del | fate, data = Bernard)
rbind(t1, t2) |>
  as_hux() |>
  theme_pubh(c(1, 3))