Help for package peppwR

Title:

Power Analysis for Phosphopeptide Abundance Hypothesis Test

Version:

0.1.0

Author:

Dan MacLean [aut, cre]

Maintainer:

Dan MacLean <dan.maclean@tsl.ac.uk>

Description:

Estimate best fit distributions and do power analysis for hypothesis tests on phosphopeptide abundance data.

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.3.3

URL:

https://teammaclean.github.io/peppwR/, https://github.com/TeamMacLean/peppwR

BugReports:

https://github.com/TeamMacLean/peppwR/issues

Depends:

R (≥ 4.1.0)

Imports:

cowplot, dplyr, fitdistrplus, ggplot2, methods, purrr, RColorBrewer, scales, tibble, tidyr, univariateML

Suggests:

bench, knitr, pkgdown, rmarkdown, testthat (≥ 3.0.0)

Config/testthat/edition:

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2026-03-25 10:46:38 UTC; macleand

Repository:

CRAN

Date/Publication:

2026-03-30 09:20:08 UTC

Apply missingness to a vector

Description

Apply missingness to a vector

Usage

apply_missingness(x, na_rate, mnar_score = 0)

Arguments

x

Numeric vector

na_rate

Proportion to make NA

mnar_score

MNAR intensity (0 = MCAR)

Value

Vector with some values replaced by NA

JZS Bayes factor approximation

Description

JZS Bayes factor approximation

Usage

bf_jzs(t, n1, n2)

Arguments

t

t-statistic

n1

Sample size group 1

n2

Sample size group 2

Value

Bayes factor

Compute dataset-level MNAR metric

Description

Calculates Spearman correlation between log(mean_abundance) and NA rate across peptides to detect whether low-abundance peptides have more missing values than high-abundance peptides.

Usage

compute_dataset_mnar(missingness)

Arguments

missingness

A tibble with columns na_rate and mean_abundance (as produced by fit_distributions)

Value

A list with:

correlation: Spearman correlation coefficient (negative = MNAR pattern)
p_value: p-value for the correlation
n_peptides: Number of peptides with missing data used in calculation
interpretation: Human-readable interpretation string

MNAR Detection

MNAR (Missing Not At Random) in mass spectrometry typically manifests as low-abundance peptides having higher rates of missing values due to detection limits. This function detects this pattern by correlating mean abundance with NA rate across all peptides.

A negative correlation indicates that low-abundance peptides have more missing values - the hallmark of detection-limit-driven MNAR.

Interpretation

Correlation (r)	Interpretation
r > -0.1	No evidence of MNAR
-0.3 < r < -0.1	Weak evidence
-0.5 < r < -0.3	Moderate evidence
r < -0.5	Strong evidence of MNAR

Compute fitted density values

Description

Compute fitted density values

Usage

compute_fitted_density(raw_vals, dist_name, x_range)

Arguments

raw_vals

Raw observed values

dist_name

Distribution name

x_range

X values to compute density at

Value

Density values or NULL on failure

Compute missingness statistics for a vector of values

Description

Calculates the number and proportion of missing (NA) values.

Usage

compute_missingness(values)

Arguments

values

Numeric vector (may contain NAs)

Value

List with:

n_total: Total number of values
n_missing: Number of NA values
na_rate: Proportion missing (0-1)

Compute per-peptide power (helper for plot_power_vs_effect)

Description

Compute per-peptide power (helper for plot_power_vs_effect)

Usage

compute_perpeptide_power(
  fits_data,
  n_per_group,
  effect_size,
  alpha,
  test,
  n_sim
)

Compute QQ points for a distribution fit

Description

Compute QQ points for a distribution fit

Usage

compute_qq_points(raw_vals, dist_name)

Arguments

raw_vals

Raw observed values

dist_name

Distribution name

Value

List with theoretical and sample quantiles, or NULL

Compute t-statistic for two groups

Description

Compute t-statistic for two groups

Usage

compute_t_stat(x, y)

Arguments

x

First group

y

Second group

Value

t-statistic

Find minimum detectable effect size

Description

Find minimum detectable effect size

Usage

find_effect_size(
  distribution,
  params,
  n_per_group,
  target_power,
  alpha,
  test,
  n_sim
)

Arguments

distribution

Distribution name

params

Distribution parameters

n_per_group

Sample size per group

target_power

Target power level

alpha

Significance level

test

Statistical test

n_sim

Number of simulations per effect size

Value

List with effect_size and effect_curve

Find required sample size for target power

Description

Find required sample size for target power

Usage

find_sample_size(
  distribution,
  params,
  effect_size,
  target_power,
  alpha,
  test,
  n_sim
)

Arguments

distribution

Distribution name

params

Distribution parameters

effect_size

Effect size to detect

target_power

Target power level

alpha

Significance level

test

Statistical test

n_sim

Number of simulations per sample size

Value

List with n and power_curve

Fit distributions to peptide abundance data

Description

Fits candidate distributions to each peptide's abundance values and selects the best fit by AIC. Also computes missingness statistics including dataset-level MNAR detection.

Usage

fit_distributions(data, id, group, value, distributions = "continuous")

Arguments

data

A data frame containing peptide abundance data

id

Column name for peptide identifier

group

Column name for group/condition

value

Column name for abundance values

distributions

Which distributions to fit: "continuous" (default), "counts", "all", or a character vector of distribution names

Value

A peppwr_fits object containing:

⁠$data⁠: Nested tibble with original data and fit results
⁠$best⁠: Best-fitting distribution for each peptide
⁠$missingness⁠: Per-peptide missingness statistics
⁠$dataset_mnar⁠: Dataset-level MNAR correlation and interpretation

Missingness Tracking

The returned object includes:

Per-peptide NA rates (in ⁠$missingness⁠)
Dataset-level MNAR correlation (in ⁠$dataset_mnar⁠)

The dataset-level MNAR metric correlates log(mean_abundance) with NA rate across peptides. A negative correlation indicates low-abundance peptides have more missing values - typical of detection-limit-driven MNAR.

Print the result to see both metrics. For small sample sizes (N < 15), the dataset-level correlation is more reliable than per-peptide scores.

Format answer based on question type

Description

Format answer based on question type

Usage

format_answer(answer, question)

Arguments

answer

The answer value

question

The question type

Value

Formatted string

Get density function for a distribution

Description

Get density function for a distribution

Usage

get_dfunc(distribution)

Arguments

distribution

Distribution name

Value

Density function

Get distribution parameters from fitted results

Description

Get distribution parameters from fitted results

Usage

get_dist_params(dist_name, fit_df, raw_data)

Arguments

dist_name

Distribution name

fit_df

Fit results data frame

raw_data

Raw data values

Value

List of parameters

Get distributions for a preset

Description

Get distributions for a preset

Usage

get_distribution_preset(preset = "continuous")

Arguments

preset

One of "continuous" (default), "counts", or "all"

Value

Character vector of distribution names

Get quantile function for a distribution

Description

Get quantile function for a distribution

Usage

get_qfunc(distribution)

Arguments

distribution

Distribution name

Value

Quantile function

Get the random generation function for a distribution

Description

Get the random generation function for a distribution

Usage

get_rfunc(distribution)

Arguments

distribution

Distribution name

Value

Random generation function

Get the test function for a given test name

Description

Get the test function for a given test name

Usage

get_test_func(test)

Arguments

test

Test name

Value

Test function

Interpret dataset-level MNAR result

Description

Interpret dataset-level MNAR result

Usage

interpret_dataset_mnar(correlation, p_value, n_peptides)

Arguments

correlation

Spearman correlation coefficient

p_value

p-value for the correlation

n_peptides

Number of peptides in calculation

Value

Interpretation string

Check if data appears to be count data (non-negative integers)

Description

Check if data appears to be count data (non-negative integers)

Usage

is_count_data(x)

Arguments

x

Numeric vector

Value

TRUE if data looks like counts, FALSE otherwise

Map distribution name to R function prefix

Description

Map distribution name to R function prefix

Usage

map_dist_name(dist_name)

Arguments

dist_name

Distribution name

Value

R function name for random generation

Create a new peppwr_fits object

Description

Create a new peppwr_fits object

Usage

new_peppwr_fits(
  data,
  fits,
  best,
  call,
  missingness = NULL,
  dataset_mnar = NULL
)

Arguments

data

Original data (nested tibble)

fits

Fit results per peptide (list of tibbles with dist, loglik, aic)

best

Best-fitting distribution per peptide (character vector)

call

Original function call

missingness

Tibble with missingness statistics per peptide (optional)

dataset_mnar

Dataset-level MNAR metric (optional, from compute_dataset_mnar)

Value

A peppwr_fits object

Create a new peppwr_power object

Description

Create a new peppwr_power object

Usage

new_peppwr_power(mode, question, answer, simulations, params, call)

Arguments

mode

Either "aggregate" or "per_peptide"

question

What was solved for: "power", "sample_size", or "effect_size"

answer

The computed answer

simulations

List of simulation details

params

List of input parameters used

call

Original function call

Value

A peppwr_power object

Plot method for peppwr_fits

Description

Creates a bar chart showing the count of best-fit distributions

Usage

## S3 method for class 'peppwr_fits'
plot(x, ...)

Arguments

x

A peppwr_fits object

...

Additional arguments (ignored)

Value

A ggplot object

Plot method for peppwr_power

Description

Creates power curves or % peptides at threshold plots

Usage

## S3 method for class 'peppwr_power'
plot(x, ...)

Arguments

x

A peppwr_power object

...

Additional arguments (ignored)

Value

A ggplot object

Plot density overlay: observed histogram with fitted density curve

Description

Plot density overlay: observed histogram with fitted density curve

Usage

plot_density_overlay(fits, peptide_id = NULL, n_overlay = 6)

Arguments

fits

A peppwr_fits object

peptide_id

Specific peptide to plot (NULL for multiple)

n_overlay

Number of peptides to overlay when peptide_id is NULL

Value

A ggplot object

Plot missingness statistics

Description

Creates a visualization of missing data patterns with two panels:

Usage

plot_missingness(fits)

Arguments

fits

A peppwr_fits object

Details

Panel 1: Distribution of NA rates across peptides
Panel 2: Mean abundance vs NA rate scatter plot showing the dataset-level MNAR correlation. The subtitle displays the Spearman correlation coefficient and p-value. A negative correlation indicates that low-abundance peptides have more missing values - the hallmark of detection-limit-driven MNAR.

Value

A ggplot object or gtable (combined panels)

MNAR Detection

MNAR (Missing Not At Random) in mass spectrometry typically manifests as low-abundance peptides having higher rates of missing values due to detection limits. Panel 2 visualizes this relationship and reports the correlation coefficient.

Plot distribution of fitted parameters across peptidome

Description

Plot distribution of fitted parameters across peptidome

Usage

plot_param_distribution(fits)

Arguments

fits

A peppwr_fits object

Value

A ggplot object

Plot power curve for aggregate mode

Description

Plot power curve for aggregate mode

Usage

plot_power_aggregate(x)

Arguments

x

A peppwr_power object

Value

A ggplot object

Plot power heatmap: N x effect size grid

Description

Plot power heatmap: N x effect size grid

Usage

plot_power_heatmap(
  distribution,
  params,
  n_range,
  effect_range,
  n_steps = 6,
  n_sim = 100,
  test = "wilcoxon",
  alpha = 0.05
)

Arguments

distribution

Distribution name

params

Distribution parameters

n_range

Range of sample sizes (vector of length 2)

effect_range

Range of effect sizes (vector of length 2)

n_steps

Number of grid points per dimension

n_sim

Number of simulations per grid cell

test

Statistical test to use

alpha

Significance level

Value

A ggplot object

Plot % peptides at threshold for per-peptide mode

Description

Plot % peptides at threshold for per-peptide mode

Usage

plot_power_perpeptide(x)

Arguments

x

A peppwr_power object

Value

A ggplot object

Plot power vs effect size at fixed N

Description

Plot power vs effect size at fixed N

Usage

plot_power_vs_effect(power_result, effect_range, n_steps = 10, n_sim = NULL)

Arguments

power_result

A peppwr_power object

effect_range

Range of effect sizes to explore

n_steps

Number of effect size values to compute

n_sim

Number of simulations per point

Value

A ggplot object

Plot QQ plots for goodness-of-fit

Description

Plot QQ plots for goodness-of-fit

Usage

plot_qq(fits, peptide_id = NULL, n_plots = 6)

Arguments

fits

A peppwr_fits object

peptide_id

Specific peptide to plot (NULL for multiple)

n_plots

Number of peptides to plot when peptide_id is NULL

Value

A ggplot object

Power analysis for peptide abundance data

Description

Power analysis for peptide abundance data

Usage

power_analysis(distribution, ...)

Arguments

distribution

Distribution name (character) or peppwr_fits object for per-peptide mode

...

Additional arguments

Value

A peppwr_power object

Power analysis with specified distribution (aggregate mode)

Description

Power analysis with specified distribution (aggregate mode)

Default power analysis method (aggregate mode with defaults)

Usage

## S3 method for class 'character'
power_analysis(
  distribution,
  params,
  effect_size = NULL,
  n_per_group = NULL,
  target_power = NULL,
  alpha = 0.05,
  test = "wilcoxon",
  find = "power",
  n_sim = 1000,
  ...
)

## Default S3 method:
power_analysis(distribution, ...)

Arguments

distribution

Distribution name (e.g., "norm", "gamma", "lnorm")

params

List of distribution parameters

effect_size

Fold change to detect

n_per_group

Sample size per group (required for find="power")

target_power

Target power (required for find="sample_size" or find="effect_size")

alpha

Significance level (default 0.05)

test

Statistical test to use (default "wilcoxon")

find

What to solve for: "power", "sample_size", or "effect_size"

n_sim

Number of simulations (default 1000)

...

Additional arguments (ignored)

Value

A peppwr_power object

Power analysis for per-peptide mode using fitted distributions

Description

Power analysis for per-peptide mode using fitted distributions

Usage

## S3 method for class 'peppwr_fits'
power_analysis(
  distribution,
  effect_size = NULL,
  n_per_group = NULL,
  target_power = NULL,
  alpha = 0.05,
  test = "wilcoxon",
  find = "power",
  n_sim = 1000,
  on_fit_failure = "exclude",
  proportion_threshold = 0.5,
  include_missingness = FALSE,
  apply_fdr = FALSE,
  prop_null = 0.9,
  fdr_threshold = 0.05,
  ...
)

Arguments

distribution

A peppwr_fits object from fit_distributions()

effect_size

Fold change to detect

n_per_group

Sample size per group (required for find="power")

target_power

Target power (required for find="sample_size")

alpha

Significance level (default 0.05)

test

Statistical test to use (default "wilcoxon")

find

What to solve for: "power" or "sample_size"

n_sim

Number of simulations per peptide (default 1000)

on_fit_failure

How to handle failed fits: "exclude", "empirical", or "lognormal"

proportion_threshold

Proportion of peptides that must reach target_power (default 0.5)

include_missingness

If TRUE, incorporate peptide-specific NA rates into simulations

apply_fdr

If TRUE, use FDR-aware simulation with Benjamini-Hochberg correction. Note: not compatible with test = "bayes_t" (Bayes factors cannot be converted to p-values)

prop_null

Proportion of true null peptides (default 0.9 = 90% unchanged)

fdr_threshold

FDR threshold for calling discoveries (default 0.05)

...

Additional arguments (ignored)

Value

A peppwr_power object

Print method for peppwr_fits

Description

Print method for peppwr_fits

Usage

## S3 method for class 'peppwr_fits'
print(x, ...)

Arguments

x

A peppwr_fits object

...

Additional arguments (ignored)

Value

The object invisibly

Print method for peppwr_power

Description

Print method for peppwr_power

Usage

## S3 method for class 'peppwr_power'
print(x, ...)

Arguments

x

A peppwr_power object

...

Additional arguments (ignored)

Value

The object invisibly

Print method for summary.peppwr_fits

Description

Print method for summary.peppwr_fits

Usage

## S3 method for class 'summary.peppwr_fits'
print(x, ...)

Arguments

x

A summary.peppwr_fits object

...

Additional arguments (ignored)

Value

The object invisibly

Print method for summary.peppwr_power

Description

Print method for summary.peppwr_power

Usage

## S3 method for class 'summary.peppwr_power'
print(x, ...)

Arguments

x

A summary.peppwr_power object

...

Additional arguments (ignored)

Value

The object invisibly

Run power simulation

Description

Run power simulation

Usage

run_power_sim(
  distribution,
  params,
  n_per_group,
  effect_size,
  alpha = 0.05,
  test = "wilcoxon",
  n_sim = 1000
)

Arguments

distribution

Distribution name

params

List of distribution parameters

n_per_group

Number of samples per group

effect_size

Fold change for treatment

alpha

Significance level

test

Statistical test to use ("wilcoxon", "bootstrap_t", "bayes_t", "rankprod")

n_sim

Number of simulations

Value

Power estimate (proportion of significant results)

Run power simulation using empirical bootstrap

Description

Estimates power by repeatedly resampling from observed data rather than sampling from fitted distributions.

Usage

run_power_sim_empirical(
  raw_data,
  n_per_group,
  effect_size,
  alpha = 0.05,
  test = "wilcoxon",
  n_sim = 1000
)

Arguments

raw_data

Numeric vector of observed values

n_per_group

Number of samples per group

effect_size

Fold change for treatment

alpha

Significance level

test

Statistical test to use

n_sim

Number of simulations

Value

Power estimate (proportion of significant results)

Run FDR-aware power simulation for whole peptidome

Description

Simulates an entire peptidome experiment with a mix of true nulls and true alternatives, then applies Benjamini-Hochberg correction to compute FDR-adjusted power.

Usage

run_power_sim_fdr(
  fits,
  effect_size,
  n_per_group,
  prop_null = 0.9,
  fdr_threshold = 0.05,
  alpha = 0.05,
  test = "wilcoxon",
  n_sim = 1000
)

Arguments

fits

A peppwr_fits object

effect_size

Fold change for treatment

n_per_group

Number of samples per group

prop_null

Proportion of peptides that are true nulls (no effect). Default 0.9 (90% unchanged).

fdr_threshold

FDR threshold for calling discoveries. Default 0.05.

alpha

Nominal significance level (used for simulation). Default 0.05.

test

Statistical test to use. Default "wilcoxon".

n_sim

Number of simulation iterations. Default 1000.

Value

FDR-adjusted power estimate (proportion of true alternatives detected)

Run power simulation with missingness

Description

Estimates power accounting for realistic missing data patterns.

Usage

run_power_sim_with_missingness(
  distribution,
  params,
  n_per_group,
  effect_size,
  na_rate = 0,
  mnar_score = 0,
  alpha = 0.05,
  test = "wilcoxon",
  n_sim = 1000
)

Arguments

distribution

Distribution name

params

List of distribution parameters

n_per_group

Number of samples per group

effect_size

Fold change for treatment

na_rate

Proportion of values that will be NA

mnar_score

MNAR intensity (0 = MCAR)

alpha

Significance level

test

Statistical test to use

n_sim

Number of simulations

Value

Power estimate (proportion of significant results)

Simulate experiment using empirical bootstrap

Description

Resamples from observed data instead of using parametric distributions. Useful when distribution fitting fails or for non-parametric analysis.

Usage

simulate_empirical(raw_data, n_per_group, effect_size)

Arguments

raw_data

Numeric vector of observed values

n_per_group

Number of samples per group

effect_size

Fold change for treatment (multiplicative effect)

Value

List with control and treatment vectors

Simulate an experiment with control and treatment groups

Description

Simulate an experiment with control and treatment groups

Usage

simulate_experiment(distribution, params, n_per_group, effect_size)

Arguments

distribution

Distribution name (e.g., "norm", "gamma", "lnorm")

params

List of distribution parameters

n_per_group

Number of samples per group

effect_size

Fold change for treatment (multiplicative effect)

Value

List with control and treatment vectors

Simulate experiment with realistic missingness

Description

Generates control and treatment samples from a distribution, then introduces missing values according to the specified rate and MNAR pattern.

Usage

simulate_with_missingness(
  distribution,
  params,
  n_per_group,
  effect_size,
  na_rate = 0,
  mnar_score = 0
)

Arguments

distribution

Distribution name (e.g., "norm", "gamma", "lnorm")

params

List of distribution parameters

n_per_group

Number of samples per group

effect_size

Fold change for treatment (multiplicative effect)

na_rate

Proportion of values to make NA (0-1)

mnar_score

MNAR intensity: 0 = MCAR, positive = low values more likely to be missing. Typical values: 0-3.

Value

List with control and treatment vectors (may contain NAs)

Summary method for peppwr_fits

Description

Summary method for peppwr_fits

Usage

## S3 method for class 'peppwr_fits'
summary(object, ...)

Arguments

object

A peppwr_fits object

...

Additional arguments (ignored)

Value

A list with summary statistics

Summary method for peppwr_power

Description

Summary method for peppwr_power

Usage

## S3 method for class 'peppwr_power'
summary(object, ...)

Arguments

object

A peppwr_power object

...

Additional arguments (ignored)

Value

A list with summary statistics

Bayes factor t-test

Description

Computes Bayes factor for difference between two groups

Usage

test_bayes_t(control, treatment)

Arguments

control

Control group values

treatment

Treatment group values

Value

Bayes factor (BF10: evidence for alternative over null)

Bootstrap-t test

Description

Performs a bootstrap-t test comparing two groups

Usage

test_bootstrap_t(control, treatment, n_boot = 1000)

Arguments

control

Control group values

treatment

Treatment group values

n_boot

Number of bootstrap iterations (default 1000)

Value

p-value from the test

Rank Products test

Description

Performs rank products test comparing two groups Simplified implementation for two-group comparison

Usage

test_rankprod(control, treatment, n_perm = 1000)

Arguments

control

Control group values

treatment

Treatment group values

n_perm

Number of permutations for p-value estimation (default 1000)

Value

p-value from the test

Wilcoxon rank-sum test

Description

Wilcoxon rank-sum test

Usage

test_wilcoxon(control, treatment)

Arguments

control

Control group values

treatment

Treatment group values

Value

p-value from the test

Validate a peppwr_fits object

Description

Validate a peppwr_fits object

Usage

validate_peppwr_fits(x)

Arguments

x

Object to validate

Value

The validated object (invisibly)

Validate a peppwr_power object

Description

Validate a peppwr_power object

Usage

validate_peppwr_power(x)

Arguments

x

Object to validate

Value

The validated object (invisibly)

Package {peppwR}

Apply missingness to a vector

Description

Usage

Arguments

Value

JZS Bayes factor approximation

Description

Usage

Arguments

Value

Compute dataset-level MNAR metric

Description

Usage

Arguments

Value

MNAR Detection

Interpretation

Compute fitted density values

Description

Usage

Arguments

Value

Compute missingness statistics for a vector of values

Description

Usage

Arguments

Value

See Also

Compute per-peptide power (helper for plot_power_vs_effect)

Description

Usage

Compute QQ points for a distribution fit

Description

Usage

Arguments

Value

Compute t-statistic for two groups

Description

Usage

Arguments

Value

Find minimum detectable effect size

Description

Usage

Arguments

Value

Find required sample size for target power

Description

Usage

Arguments

Value

Fit distributions to peptide abundance data

Description

Usage

Arguments

Value

Missingness Tracking

See Also

Format answer based on question type

Description

Usage

Arguments

Value

Get density function for a distribution

Description

Usage

Arguments

Value

Get distribution parameters from fitted results

Description

Usage

Arguments

Value

Get distributions for a preset

Description

Usage

Arguments

Value

Get quantile function for a distribution